Re: [Rd] Speeding up library loading

2005-04-26 Thread Duncan Murdoch
Ali - wrote:

Lazy loading just converts an object into a small instruction to load 
the object. If the object was already small, there's no advantage to 
that.  It's mainly designed to avoid memory use (some rarely used 
objects can be gigantic).

From a design point of view the reason is that this isn't the problem 
lazy loading is trying to solve. We didn't have a problem with 
packages that have huge number of small objects, but we did have a 
problem with packages that had a moderate number of moderately large 
objects.

In addition, trying to optimize performance is not usually a good idea 
unless you can measure the performance of different implementations on 
real applications, and we didn't have applications like that.

Assume 100 C++ classes each class having 100 member functions. After 
wrapping these classes into R, if the wrapping design is class-oriented 
we should have like 100 objects. At the same time, if the wrapping 
design is function-oriented we have like 10`000 objects which are too 
lazy for lazy loading.

I have tried wrapping exactly the same classes by R.oo based on S3 and 
the outcome package was much faster in both installation and loading. 
The package went slow once I tried it with S4. I guess R.oo makes the 
package more class-oriented while S4 object-orientation is really 
function-oriented causing all this friction in installation and loading.

Is there any way to ask R to lazy-load each object as a 'bundle of S4 
methods with the same class'?
I don't think so.  There are ways to load a bundle of objects all at 
once (put them in an environment, attach the environment), but S4 
methods aren't self-contained, they need to be registered with the 
system.   You could probably write a function to load them and register 
them all at once, but I don't think it exists now.

Duncan Murdoch
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-26 Thread Ali -

Assume 100 C++ classes each class having 100 member functions. After 
wrapping these classes into R, if the wrapping design is class-oriented we 
should have like 100 objects. At the same time, if the wrapping design is 
function-oriented we have like 10`000 objects which are too lazy for lazy 
loading.

I have tried wrapping exactly the same classes by R.oo based on S3 and the 
outcome package was much faster in both installation and loading. The 
package went slow once I tried it with S4. I guess R.oo makes the package 
more class-oriented while S4 object-orientation is really 
function-oriented causing all this friction in installation and loading.

Is there any way to ask R to lazy-load each object as a 'bundle of S4 
methods with the same class'?
I don't think so.  There are ways to load a bundle of objects all at once 
(put them in an environment, attach the environment), but S4 methods aren't 
self-contained, they need to be registered with the system.   You could 
probably write a function to load them and register them all at once, but I 
don't think it exists now.

Duncan Murdoch
(1) What is the difference between loading and registering objects in R?
(2) You are talking about 'loading and registering at once'. Isn't this 'at 
once' the cause of slow loading?

(3) Doesn't having many environments mean lose of efficiency again?
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-26 Thread Duncan Murdoch
Ali - wrote:

Assume 100 C++ classes each class having 100 member functions. After 
wrapping these classes into R, if the wrapping design is 
class-oriented we should have like 100 objects. At the same time, if 
the wrapping design is function-oriented we have like 10`000 objects 
which are too lazy for lazy loading.

I have tried wrapping exactly the same classes by R.oo based on S3 
and the outcome package was much faster in both installation and 
loading. The package went slow once I tried it with S4. I guess R.oo 
makes the package more class-oriented while S4 object-orientation is 
really function-oriented causing all this friction in installation 
and loading.

Is there any way to ask R to lazy-load each object as a 'bundle of S4 
methods with the same class'?

I don't think so.  There are ways to load a bundle of objects all at 
once (put them in an environment, attach the environment), but S4 
methods aren't self-contained, they need to be registered with the 
system.   You could probably write a function to load them and 
register them all at once, but I don't think it exists now.

Duncan Murdoch

(1) What is the difference between loading and registering objects in R?
Loading just creates the object.  Registering it is what setMethod() and 
such calls do.  They allow the system to know that it should call that 
function in response to a call to the generic with a certain signature, 
and so on.
(2) You are talking about 'loading and registering at once'. Isn't this 
'at once' the cause of slow loading?
I haven't done any profiling, but I would guess the registering is the 
slow part.

(3) Doesn't having many environments mean lose of efficiency again?
Yes, I'd guess that looking things up in a chain of 100 environments is 
slower than looking them up in one gigantic environment.  Again, I 
haven't done any profiling, but I'd guess it would come close to being 
100 times worse, i.e. in practice order N time instead of order 1 time 
(but I'm sure these aren't the theoretical limits).

But you were asking about delayed loading, so I was assuming that in 
most cases you would only load a small subset of those 100 environments. 
 I haven't tried any big problems like yours, but I would be willing to 
guess that registering is slower than O(N), so cutting down on the 
number of things you register will give a big improvement on loading speed.

But you do have to remember the two pieces of advice you've been given 
in this thread:

  - nobody else has written a package with ten thousand methods, so 
you're likely to find things out that nobody else knows about.

  - The S4 object model is quite different from that of C++, so it 
probably doesn't make sense to have a direct correspondence between C++ 
classes and methods and R classes and methods.  There are probably much 
more efficient ways to get access to the functionality of your C++ library.

Duncan Murdoch
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Uwe Ligges
Ali - wrote:
(1) When R tries to load a library, does it load 'everything' in the 
library at once?
No, see ?lazyLoad
(2) Is there any options to 'load as you go'?
Well, this is the way R does it
Uwe Ligges

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Ali -

UweL Ali - wrote:
 (1) When R tries to load a library, does it load 'everything' in 
the
 library at once?

UweL No, see ?lazyLoad
are you sure Ali is talking about *package*s.
He did use the word library though, and most of us (including
Uwe!) know the difference...
 (2) Is there any options to 'load as you go'?
UweL Well, this is the way R does it
for packages yes, because of lazyloading, as Uwe mentioned above.
For libraries, (you know: the things you get from compiling and
linking C code ..), it may be a bit different.
What do you really mean, packages or libraries,
Ali?
Well, the terminology used here is a bit confusing. ?library shows something 
like 'library(package)' and that's why I used the term 'library' for loading 
packages. The package does load some dll's but what I meant by library was 
actually package.

The package I am working on currently has one big R file (~ 4 Mb) and this 
causes at least 2 troubles:

(1) Things are slow:
   (a) Installation with (LazyLoad = Yes) is slow. Then when the library is 
loaded into R, the loading is slow too. So LazyLoad is of not big help.

   (b) Installation with (SaveImage = Yes) is -extremely- slow. To give you 
some idea, compiling the associated C++ code takes around 10 mins while 
saving the R images takes more than 40 mins (the package is a wrapper for 
some C++ libraries. All the R functions do is to call .Call). this doesn't 
improve the loading speed as well.
   (c) Installation with (LazyLoad = Yes) AND (SaveImage = Yes) causes this 
error:

   preparing package package_name for lazy loading
   make: *** [lazyload] Error 1
   *** Installation of package_name failed ***
   It is likely that this happens because of some memory problems.
(2) After all, when the package is loaded, not surprisingly, loads of memory 
is taken. It seems that the whole (huge) file is loaded into R at once and 
turning LazyLoad on or off doesn't make a difference when the package is 
big.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Uwe Ligges
Ali - wrote:

UweL Ali - wrote:
 (1) When R tries to load a library, does it load 'everything' 
in the
 library at once?

UweL No, see ?lazyLoad
are you sure Ali is talking about *package*s.
He did use the word library though, and most of us (including
Uwe!) know the difference...
 (2) Is there any options to 'load as you go'?
UweL Well, this is the way R does it
for packages yes, because of lazyloading, as Uwe mentioned above.
For libraries, (you know: the things you get from compiling and
linking C code ..), it may be a bit different.
What do you really mean, packages or libraries,
Ali?

Well, the terminology used here is a bit confusing. ?library shows 
something like 'library(package)' and that's why I used the term 
'library' for loading packages. The package does load some dll's but 
what I meant by library was actually package.

The package I am working on currently has one big R file (~ 4 Mb) and 
this causes at least 2 troubles:

(1) Things are slow:
   (a) Installation with (LazyLoad = Yes) is slow. Then when the library 
is loaded into R, the loading is slow too. So LazyLoad is of not big help.

   (b) Installation with (SaveImage = Yes) is -extremely- slow. To give 
you some idea, compiling the associated C++ code takes around 10 mins 
while saving the R images takes more than 40 mins (the package is a 
wrapper for some C++ libraries. All the R functions do is to call 
.Call). this doesn't improve the loading speed as well.
   (c) Installation with (LazyLoad = Yes) AND (SaveImage = Yes) causes 
this error:

   preparing package package_name for lazy loading
   make: *** [lazyload] Error 1
   *** Installation of package_name failed ***
   It is likely that this happens because of some memory problems.
(2) After all, when the package is loaded, not surprisingly, loads of 
memory is taken. It seems that the whole (huge) file is loaded into R at 
once and turning LazyLoad on or off doesn't make a difference when the 
package is big.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

4Mb R file just containing .Call()s? Never seen something like that.
If these are all very small functions, lazy load won't be of that 
advantage, because you have to load the index file anyway.

You know, R including all base and recommended packages has just ~ 6Mb 
of R code. Are you really sure about your code?

Uwe Ligges
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Ali -

4Mb R file just containing .Call()s? Never seen something like that.
If these are all very small functions, lazy load won't be of that 
advantage, because you have to load the index file anyway.

You know, R including all base and recommended packages has just ~ 6Mb of R 
code. Are you really sure about your code?
Positively. The wrapped library is actually much bigger than R, it brings a 
few hundered new classes to R. The library has been already wrapped to other 
languages like java, and the loading speed for these other languages is 
quite reasonable. I cannot see any reasons why not this can be done with R 
too -- as a computational application R is supposed to be efficient in all 
ways.

It seems that, so far, no packages as big as this one have been created for 
R. I would appreciate any clues from the development team for improving the 
performance of big packages in R.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Roger D. Peng
Is it possible to break the package into multiple parts, perhaps 
like a bundle?  Then you could only load the parts that you need 
at any particular time.

-roger
Ali - wrote:

4Mb R file just containing .Call()s? Never seen something like that.
If these are all very small functions, lazy load won't be of that 
advantage, because you have to load the index file anyway.

You know, R including all base and recommended packages has just ~ 6Mb 
of R code. Are you really sure about your code?

Positively. The wrapped library is actually much bigger than R, it 
brings a few hundered new classes to R. The library has been already 
wrapped to other languages like java, and the loading speed for these 
other languages is quite reasonable. I cannot see any reasons why not 
this can be done with R too -- as a computational application R is 
supposed to be efficient in all ways.

It seems that, so far, no packages as big as this one have been created 
for R. I would appreciate any clues from the development team for 
improving the performance of big packages in R.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Ali -

Is it possible to break the package into multiple parts, perhaps like a 
bundle?  Then you could only load the parts that you need at any particular 
time.

It could be done, but the question is, what if one of the packages in the 
bundle depends on all of the rest? And the bigger question is, why lazy 
loading is not efficient when it comes to many small functions?

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Roger D. Peng
I think the reason, as Uwe already said, is that you have to load 
the lazyload index file, and in your case that file is likely to 
be as large as the R file itself.

-roger
Ali - wrote:

Is it possible to break the package into multiple parts, perhaps like 
a bundle?  Then you could only load the parts that you need at any 
particular time.

It could be done, but the question is, what if one of the packages in 
the bundle depends on all of the rest? And the bigger question is, why 
lazy loading is not efficient when it comes to many small functions?

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Duncan Murdoch
Ali - wrote:

Is it possible to break the package into multiple parts, perhaps like 
a bundle?  Then you could only load the parts that you need at any 
particular time.

It could be done, but the question is, what if one of the packages in 
the bundle depends on all of the rest? And the bigger question is, why 
lazy loading is not efficient when it comes to many small functions?
Lazy loading just converts an object into a small instruction to load 
the object. If the object was already small, there's no advantage to 
that.  It's mainly designed to avoid memory use (some rarely used 
objects can be gigantic).

Duncan Murdoch
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Thomas Lumley
On Mon, 25 Apr 2005, Duncan Murdoch wrote:
Ali - wrote:

Is it possible to break the package into multiple parts, perhaps like a 
bundle?  Then you could only load the parts that you need at any 
particular time.

It could be done, but the question is, what if one of the packages in the 
bundle depends on all of the rest? And the bigger question is, why lazy 
loading is not efficient when it comes to many small functions?
Lazy loading just converts an object into a small instruction to load the 
object. If the object was already small, there's no advantage to that.  It's 
mainly designed to avoid memory use (some rarely used objects can be 
gigantic).

From a design point of view the reason is that this isn't the problem lazy 
loading is trying to solve. We didn't have a problem with packages that 
have huge number of small objects, but we did have a problem with packages 
that had a moderate number of moderately large objects.

In addition, trying to optimize performance is not usually a good idea 
unless you can measure the performance of different implementations on 
real applications, and we didn't have applications like that.

-thomas
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speeding up library loading

2005-04-25 Thread Ali -

Lazy loading just converts an object into a small instruction to load the 
object. If the object was already small, there's no advantage to that.  
It's mainly designed to avoid memory use (some rarely used objects can be 
gigantic).
From a design point of view the reason is that this isn't the problem lazy 
loading is trying to solve. We didn't have a problem with packages that 
have huge number of small objects, but we did have a problem with packages 
that had a moderate number of moderately large objects.

In addition, trying to optimize performance is not usually a good idea 
unless you can measure the performance of different implementations on real 
applications, and we didn't have applications like that.
Assume 100 C++ classes each class having 100 member functions. After 
wrapping these classes into R, if the wrapping design is class-oriented we 
should have like 100 objects. At the same time, if the wrapping design is 
function-oriented we have like 10`000 objects which are too lazy for lazy 
loading.

I have tried wrapping exactly the same classes by R.oo based on S3 and the 
outcome package was much faster in both installation and loading. The 
package went slow once I tried it with S4. I guess R.oo makes the package 
more class-oriented while S4 object-orientation is really function-oriented 
causing all this friction in installation and loading.

Is there any way to ask R to lazy-load each object as a 'bundle of S4 
methods with the same class'?

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel