Re: [R] Is there a way to vectorize this? [with correction]

2008-11-01 Thread Duncan Temple Lang



Nutter, Benjamin wrote:

** Sorry to repost.  I forgot to include a function necessary to make
the example work **

I apologize up front for this being a little long. I hope it's
understandable.  Please let me know if I need to clarify anything.

Several months ago I wrote a series of functions to help me take my R
analyses and build custom reports in html files.  Each function either
builds or modifies a string of html code that can then be written to a
file to produce the desired output.

To make modifications in the html code, I've placed 'markers' around
certain characteristics that I might want to change.  For instance, the
alignment characteristics have an 'algnmark' on either side of them.
When I wish to change the alignment, I can find where these markers are,
determine their location, and replace the contents between them. 


I've been using the functions for a few months now, and am pleased with
the utility.  Unfortunately, as I was writing these, I wasn't very
strong with my vectorization skills and relied on for loops (lots of for
loops) to get through the work.  So while I'm pleased with the utility,
I've been trying to optimize the functions by vectorizing the for loops.

At this point, I've hit a small snag.  I have a situation where I can't
seem to figure out how to vectorize the loop.  Part of me wonders if it
is even possible. 


The scenario is this:  I run a string of code through the loop, on each
pass, the section of code in need of modification is identified and the
changes are made.  When this is done, however, the length of the string
changes.  The change in length needs to be recognized in the next pass
through the loop.


At a quick glance, it seems  merely trying to transform each instance of

  algnmark  align=left algnmark

to

  algnmark  align=right algnmark

If so, you are going about this in an unnecessarily complicated manner.

html.text = function(text, new.align)
 gsub(algnmark  align=[a-z]+ algnmark,
   paste(algnmark  align=, new.align,  algnmark, sep = ),
   text)

would be much more explicit about what you are doing.

You actually want to be more specific about this and
replace only within  , i.e. html elements.


You might benefit from a package like R2HTML and using that
to generate the content.
However, building reports by building strings containing markup and 
content/text seems simple and is easy to get started, but actually 
becomes complicated.  You might look at Sweave, or alternatively

build the document directly yourself using tools designed for
creating HTML (or XML).
You can use xmlParse(), newXMLNode() and friends in the XML package
to read an empty template document and then add new nodes, etc.
When you use this approach, you can access individual nodes and
change them without having to work with the entire content.

Alternatively, in a few weeks, we'll release some tools for working
directly with .docx and .xslx and modifying their content.

 D.





Okay, some code to illustrate what I mean.  This first function formats
the html file.  I only include it because it will be necessary to create
illustrate what the function is doing.  I am eliminating all comments
and spacing from the code for brevity.

#*** Start of html.file.start
'html.file.start' - function(title, size=11, font=Times New Roman){
  size - format(floor(size),nsmall=1)
  code - paste(
html xmlns:o='urn:schemas-microsoft-com:office:office\'
  xmlns:w=\'urn:schemas-microsoft-com:office:word\'
  xmlns=\'http://www.w3.org/TR/REC-html40\'
  head
meta http-equiv=Content-Type content=\'text/html;
charset=windows-1252\'
meta name=ProgId content=Word.Document
meta name=Generator content=\'Microsoft Word 11\'
meta name=Originator content=\'Microsoft Word 11\'
   style
  !--
/* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
 p.MsoEndnoteText, li.MsoEndnoteText, div.MsoEndnoteText
  {margin-top:2.0pt;
  margin-right:0in;
  margin-bottom:0in;
  margin-left:.15in;
  margin-bottom:.0001pt;
  text-indent:-.15in;
  mso-pagination:none;
  font-size:9.0pt;
  mso-bidi-font-size:10.0pt;
  font-family:'Times New Roman';
  mso-fareast-font-family:'Times New Roman';}
   p.Textbody, li.Textbody, div.Textbody--
   /style
,
title,title,/title
  /head
  body lang=EN-US style=\'tab-interval:.5in;,
   textmark; font-size:,size,pt; textmark;,
   fontmark; font-family:,font,; fontmark;\', sep=)
  return(code)
} # End of html.file.start


# Start of html.text
'html.text' - function(text, size=11, font=Times New Roman,
align=left, title){
  size - format(floor(size),nsmall=1)
  if(missing(title)) title -  else title - paste(br/,title)
  title - paste(b,title,/bbr/\n,sep=)
  code - paste(
p class=MsoNormal ,
 algnmark align=,align, algnmark
  span class=GramE 

Re: [R] Is there a way to vectorize this? [with correction]

2008-11-01 Thread Gabor Grothendieck
Here is a function that has arguments similar to gsub.  The first is the
pattern where the portion to actually be replaced should be in
parentheses and the others are the replacement string and the text:

library(gsubfn)
replace.in.context - function(pattern, replacement, x, ...) {
gsubfn(pattern, m + b ~ sub(b, replacement, m), x, backref = 1, ...)
}

txt - algnmark  align=left algnmark
new.align - left
replace.in.context(algnmark  align=([a-z]+) algnmark, new.align, txt)


On Sat, Nov 1, 2008 at 12:20 PM, Duncan Temple Lang
[EMAIL PROTECTED] wrote:


 Nutter, Benjamin wrote:

 ** Sorry to repost.  I forgot to include a function necessary to make
 the example work **

 I apologize up front for this being a little long. I hope it's
 understandable.  Please let me know if I need to clarify anything.

 Several months ago I wrote a series of functions to help me take my R
 analyses and build custom reports in html files.  Each function either
 builds or modifies a string of html code that can then be written to a
 file to produce the desired output.

 To make modifications in the html code, I've placed 'markers' around
 certain characteristics that I might want to change.  For instance, the
 alignment characteristics have an 'algnmark' on either side of them.
 When I wish to change the alignment, I can find where these markers are,
 determine their location, and replace the contents between them.
 I've been using the functions for a few months now, and am pleased with
 the utility.  Unfortunately, as I was writing these, I wasn't very
 strong with my vectorization skills and relied on for loops (lots of for
 loops) to get through the work.  So while I'm pleased with the utility,
 I've been trying to optimize the functions by vectorizing the for loops.

 At this point, I've hit a small snag.  I have a situation where I can't
 seem to figure out how to vectorize the loop.  Part of me wonders if it
 is even possible.
 The scenario is this:  I run a string of code through the loop, on each
 pass, the section of code in need of modification is identified and the
 changes are made.  When this is done, however, the length of the string
 changes.  The change in length needs to be recognized in the next pass
 through the loop.

 At a quick glance, it seems  merely trying to transform each instance of

  algnmark  align=left algnmark

 to

  algnmark  align=right algnmark

 If so, you are going about this in an unnecessarily complicated manner.

 html.text = function(text, new.align)
  gsub(algnmark  align=[a-z]+ algnmark,
   paste(algnmark  align=, new.align,  algnmark, sep = ),
   text)

Here are a few alternatives.  For all of them we assume:

txt - algnmark  align=right algnmark
new.align - left

Their main advantage is that the context need not be
written out twice which might help avoid errors:

1. This solution avoids repeating the context explicitly:

gsub((algnmark  align=)[a-z]+( algnmark),
  paste(\\1, new.align, \\2, sep = ), txt)

2. zero-width perl regexps could be used here:

gsub((?=algnmark  align=)[a-z]+(?= algnmark), new.align, txt, perl = TRUE)

This has the advantage that the replacement string is just new.align but
does require marking up the regexp slightly more.

3. Another possibility is to use the gsubfn package.  gsubfn is like
gsub except the replacement string is a function.   The portion of the regular
expression in parentheses is known as the back reference and the entire
string matched by the regular expression is called the match.  backref
= 1 says pass
the match and 1 back reference to the function.  gsubfn accepts a formula
notation for functions (or ordinary notation) and using that we define
the function
to use sub to replace the back reference with new.align in the match:

gsubfn(algnmark  align=([a-z]+) algnmark, m+b ~ sub(b, new.align,
m), txt, backref = 1)

This gives a regexp which is nearly as simple as Thomas' while
avoiding explicit repetition of the context in the replace
ment.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a way to vectorize this? [with correction]

2008-10-31 Thread Nutter, Benjamin
** Sorry to repost.  I forgot to include a function necessary to make
the example work **

I apologize up front for this being a little long. I hope it's
understandable.  Please let me know if I need to clarify anything.

Several months ago I wrote a series of functions to help me take my R
analyses and build custom reports in html files.  Each function either
builds or modifies a string of html code that can then be written to a
file to produce the desired output.

To make modifications in the html code, I've placed 'markers' around
certain characteristics that I might want to change.  For instance, the
alignment characteristics have an 'algnmark' on either side of them.
When I wish to change the alignment, I can find where these markers are,
determine their location, and replace the contents between them. 

I've been using the functions for a few months now, and am pleased with
the utility.  Unfortunately, as I was writing these, I wasn't very
strong with my vectorization skills and relied on for loops (lots of for
loops) to get through the work.  So while I'm pleased with the utility,
I've been trying to optimize the functions by vectorizing the for loops.

At this point, I've hit a small snag.  I have a situation where I can't
seem to figure out how to vectorize the loop.  Part of me wonders if it
is even possible. 

The scenario is this:  I run a string of code through the loop, on each
pass, the section of code in need of modification is identified and the
changes are made.  When this is done, however, the length of the string
changes.  The change in length needs to be recognized in the next pass
through the loop.

Okay, some code to illustrate what I mean.  This first function formats
the html file.  I only include it because it will be necessary to create
illustrate what the function is doing.  I am eliminating all comments
and spacing from the code for brevity.

#*** Start of html.file.start
'html.file.start' - function(title, size=11, font=Times New Roman){
  size - format(floor(size),nsmall=1)
  code - paste(
html xmlns:o='urn:schemas-microsoft-com:office:office\'
  xmlns:w=\'urn:schemas-microsoft-com:office:word\'
  xmlns=\'http://www.w3.org/TR/REC-html40\'
  head
meta http-equiv=Content-Type content=\'text/html;
charset=windows-1252\'
meta name=ProgId content=Word.Document
meta name=Generator content=\'Microsoft Word 11\'
meta name=Originator content=\'Microsoft Word 11\'
   style
  !--
/* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
 p.MsoEndnoteText, li.MsoEndnoteText, div.MsoEndnoteText
  {margin-top:2.0pt;
  margin-right:0in;
  margin-bottom:0in;
  margin-left:.15in;
  margin-bottom:.0001pt;
  text-indent:-.15in;
  mso-pagination:none;
  font-size:9.0pt;
  mso-bidi-font-size:10.0pt;
  font-family:'Times New Roman';
  mso-fareast-font-family:'Times New Roman';}
   p.Textbody, li.Textbody, div.Textbody--
   /style
,
title,title,/title
  /head
  body lang=EN-US style=\'tab-interval:.5in;,
   textmark; font-size:,size,pt; textmark;,
   fontmark; font-family:,font,; fontmark;\', sep=)
  return(code)
} # End of html.file.start


# Start of html.text
'html.text' - function(text, size=11, font=Times New Roman,
align=left, title){
  size - format(floor(size),nsmall=1)
  if(missing(title)) title -  else title - paste(br/,title)
  title - paste(b,title,/bbr/\n,sep=)
  code - paste(
p class=MsoNormal ,
 algnmark align=,align, algnmark
  span class=GramE style=\',
 textmark; font-size:,size,pt; textmark;,
 fontmark; font-family:,font,; fontmark;,
 stylemark; font-weight:normal; font-style:normal;,
 text-decoration:none; stylemark;\',
  title,text,  
  /span
/p,sep=)
  return(code)
} #** End of html.text


So here is the function I'm trying to vectorize.

#*** Start of html.align
html.align - function(code,new.align=left){
  #* Create a string to replace the current alignment setting.
  align - paste( align=,new.align, ,sep=)

  #* Function to pass to sapply.  This is handy when 'code'
  #*  is a vector.
  f1 - function(code,align=align){
mark - unlist(gregexpr(algnmark,code)) #* Get positions of
markers
if(mark[1]0){
  odd - seq(1,length(mark),by=2) #* odd elements are starting
marker
  evn - seq(2,length(mark),by=2) #* even elements are ending marker

  mark[odd] - mark[odd]+9  #* These two lines determine the
starting
  mark[evn] - mark[evn]-1  #* and ending elements of the substring
to
#* be replaced

  for(i in 1:length(odd)){

l.old - nchar(code)  #* store the length of the code segment.

old.align - substr(code,mark[odd[i]],mark[evn[i]])