Re: [R] Reducing the size of a large script top speed onset of execution

2010-01-09 Thread Dennis Fisher

Professor Ripley,

Thanks for your suggestions.  I will look into the package approach.

As far as the "source" speed issue, you suggested that the problem may  
relate to guessing encodings so I added:

options(encoding="UTF-8")
at the beginning of the code (was this the correct approach to the  
problem?).  That did not make any obvious difference to the duration  
to source the script.  Do you have an specific suggestions that might  
speed the process?


Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

On Jan 9, 2010, at 8:53 AM, Prof Brian Ripley wrote:

Please just use make a package; then all the effort of parsing the  
code is done at install time, you can use lazy-loading   Or if  
you are for some reason averse to that, source the code into an  
environment, save that and simply attach() its save file next time.


Packages of that size load in a few milliseconds (as you see each  
time you start R:  stats is 27000 lines).


source() is doing more work to allow it to guess encodings, keeping  
references to the original sources, back out code if the whole  
script does not parse 


On Sat, 9 Jan 2010, Dennis Fisher wrote:


Colleagues,

(R 2.10 on all platforms)

I have a lengthy script (18000 lines) that runs within a graphical  
interface. The script consists of 100's of function followed by a  
single command that calls these functions (execution depends on a  
number of environment variables passed to the script).  As a  
result, nothing is executed until the final line of code is read.
It takes 15-20 seconds to load the code - I would like to speed  
that process.  Two questions:


1.  The code contains numerous large blocks that are executed under  
only one set of conditions (which are known when the code is  
called).  For example, there might be code such as:

if (CONDITION)
{
... (hundreds of lines of code, including embedded curly 
brackets)
} else invisible()
if (!CONDITION)
{
... (hundreds of lines of code, including embedded curly 
brackets)
}
I assume that I could speed loading appreciably if I set up two  
scripts, each of which excluded "irrelevant" code depending on the  
CONDITION.  For example, if I knew that CONDITION was false, I  
would exclude the first block of code above; conversely, if I know  
that CONDITION was true, I would exclude the second block.


I would like to write code in R (or in sed [UNIX stream editor]) to  
create these two new scripts.  However, the regular expressions  
that would be needed are beyond me and I would appreciate help from  
this forum.  Specifically, I would like to search for:

if (CONDITION or
if (!CONDITION
as the start of the block and
} - the matching curly bracket
at the end of the block, then remove those lines from the code.   
These text entries are always on a line by themselves.  Finding the  
"if (CONDITION" line should be relatively easy.  The difficulty for  
me is identifying the matching curly bracket - there are often  
paired brackets within the block of code:


if (CONDITION)
{
...
if (SOMETHINGELSE)  {   }
if (YETANOTHER)
{
}
}   <-  this is the bracket that I 
need to match

There are also instances in which the entire block occurs on one  
line:

if (CONDITION)  { ...} else invisible()
or
if (CONDITION ... else invisible()

Of note, I can remove the "else invisible() statements if they are  
problematic to a solution.


2.  A related issue regards loading in the graphical interface vs.  
loading at the command line (OS X).  The graphical interface loads  
in 15-20 seconds - the graphical interface is sending code as  
rapidly as it can.  In contrast, at the command line, the course is  
source()'d and it takes 30-40 seconds.  I would have expected the  
latter approach to be as fast or faster because R would accept code  
as fast as it could.


Does anyone have an explanation for this behavior; also, any ideas  
as to how to speed the process at the command line would be  
appreciated.  Thanks for any suggestions.


Dennis




Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of

Re: [R] Reducing the size of a large script top speed onset of execution

2010-01-09 Thread Prof Brian Ripley
Please just use make a package; then all the effort of parsing the 
code is done at install time, you can use lazy-loading   Or if you 
are for some reason averse to that, source the code into an 
environment, save that and simply attach() its save file next time.


Packages of that size load in a few milliseconds (as you see each time 
you start R:  stats is 27000 lines).


source() is doing more work to allow it to guess encodings, keeping 
references to the original sources, back out code if the whole script 
does not parse 


On Sat, 9 Jan 2010, Dennis Fisher wrote:


Colleagues,

(R 2.10 on all platforms)

I have a lengthy script (18000 lines) that runs within a graphical interface. 
The script consists of 100's of function followed by a single command that 
calls these functions (execution depends on a number of environment variables 
passed to the script).  As a result, nothing is executed until the final line 
of code is read.   It takes 15-20 seconds to load the code - I would like to 
speed that process.  Two questions:


1.  The code contains numerous large blocks that are executed under only one 
set of conditions (which are known when the code is called).  For example, 
there might be code such as:

if (CONDITION)
{
		... (hundreds of lines of code, including embedded curly 
brackets)

} else invisible()
if (!CONDITION)
{
		... (hundreds of lines of code, including embedded curly 
brackets)

}
I assume that I could speed loading appreciably if I set up two scripts, each 
of which excluded "irrelevant" code depending on the CONDITION.  For example, 
if I knew that CONDITION was false, I would exclude the first block of code 
above; conversely, if I know that CONDITION was true, I would exclude the 
second block.


I would like to write code in R (or in sed [UNIX stream editor]) to create 
these two new scripts.  However, the regular expressions that would be needed 
are beyond me and I would appreciate help from this forum.  Specifically, I 
would like to search for:
	if (CONDITION 
or

if (!CONDITION
as the start of the block and
} - the matching curly bracket
at the end of the block, then remove those lines from the code.  These text 
entries are always on a line by themselves.  Finding the "if (CONDITION" line 
should be relatively easy.  The difficulty for me is identifying the matching 
curly bracket - there are often paired brackets within the block of code:


if (CONDITION)
{
...
if (SOMETHINGELSE)  {   }
if (YETANOTHER)
{
}
		}<-  this is the bracket that 
I need to match


There are also instances in which the entire block occurs on one line:
if (CONDITION)  { ...} else invisible()
or
if (CONDITION ... else invisible()

Of note, I can remove the "else invisible() statements if they are 
problematic to a solution.


2.  A related issue regards loading in the graphical interface vs. loading at 
the command line (OS X).  The graphical interface loads in 15-20 seconds - 
the graphical interface is sending code as rapidly as it can.  In contrast, 
at the command line, the course is source()'d and it takes 30-40 seconds.  I 
would have expected the latter approach to be as fast or faster because R 
would accept code as fast as it could.


Does anyone have an explanation for this behavior; also, any ideas as to how 
to speed the process at the command line would be appreciated.  Thanks for 
any suggestions.


Dennis




Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reducing the size of a large script top speed onset of execution

2010-01-09 Thread Dennis Fisher

Colleagues,

(R 2.10 on all platforms)

I have a lengthy script (18000 lines) that runs within a graphical  
interface.  The script consists of 100's of function followed by a  
single command that calls these functions (execution depends on a  
number of environment variables passed to the script).  As a result,  
nothing is executed until the final line of code is read.   It takes  
15-20 seconds to load the code - I would like to speed that process.   
Two questions:


1.  The code contains numerous large blocks that are executed under  
only one set of conditions (which are known when the code is called).   
For example, there might be code such as:

if (CONDITION)
{
... (hundreds of lines of code, including embedded curly 
brackets)
} else invisible()
if (!CONDITION)
{
... (hundreds of lines of code, including embedded curly 
brackets)
}
I assume that I could speed loading appreciably if I set up two  
scripts, each of which excluded "irrelevant" code depending on the  
CONDITION.  For example, if I knew that CONDITION was false, I would  
exclude the first block of code above; conversely, if I know that  
CONDITION was true, I would exclude the second block.


I would like to write code in R (or in sed [UNIX stream editor]) to  
create these two new scripts.  However, the regular expressions that  
would be needed are beyond me and I would appreciate help from this  
forum.  Specifically, I would like to search for:

if (CONDITION   
or
if (!CONDITION
as the start of the block and
} - the matching curly bracket
at the end of the block, then remove those lines from the code.  These  
text entries are always on a line by themselves.  Finding the "if  
(CONDITION" line should be relatively easy.  The difficulty for me is  
identifying the matching curly bracket - there are often paired  
brackets within the block of code:


if (CONDITION)
{
...
if (SOMETHINGELSE)  {   }
if (YETANOTHER)
{
}
}   <-  this is the bracket that I 
need to match

There are also instances in which the entire block occurs on one line:
if (CONDITION)  { ...} else invisible()
or
if (CONDITION ... else invisible()

Of note, I can remove the "else invisible() statements if they are  
problematic to a solution.


2.  A related issue regards loading in the graphical interface vs.  
loading at the command line (OS X).  The graphical interface loads in  
15-20 seconds - the graphical interface is sending code as rapidly as  
it can.  In contrast, at the command line, the course is source()'d  
and it takes 30-40 seconds.  I would have expected the latter approach  
to be as fast or faster because R would accept code as fast as it could.


Does anyone have an explanation for this behavior; also, any ideas as  
to how to speed the process at the command line would be appreciated.   
Thanks for any suggestions.


Dennis




Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.