Re: [Rd] Speeding up library loading
Ali - wrote: Lazy loading just converts an object into a small instruction to load the object. If the object was already small, there's no advantage to that. It's mainly designed to avoid memory use (some rarely used objects can be gigantic). From a design point of view the reason is that this isn't the problem lazy loading is trying to solve. We didn't have a problem with packages that have huge number of small objects, but we did have a problem with packages that had a moderate number of moderately large objects. In addition, trying to optimize performance is not usually a good idea unless you can measure the performance of different implementations on real applications, and we didn't have applications like that. Assume 100 C++ classes each class having 100 member functions. After wrapping these classes into R, if the wrapping design is class-oriented we should have like 100 objects. At the same time, if the wrapping design is function-oriented we have like 10`000 objects which are too lazy for lazy loading. I have tried wrapping exactly the same classes by R.oo based on S3 and the outcome package was much faster in both installation and loading. The package went slow once I tried it with S4. I guess R.oo makes the package more class-oriented while S4 object-orientation is really function-oriented causing all this friction in installation and loading. Is there any way to ask R to lazy-load each object as a 'bundle of S4 methods with the same class'? I don't think so. There are ways to load a bundle of objects all at once (put them in an environment, attach the environment), but S4 methods aren't self-contained, they need to be registered with the system. You could probably write a function to load them and register them all at once, but I don't think it exists now. Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Assume 100 C++ classes each class having 100 member functions. After wrapping these classes into R, if the wrapping design is class-oriented we should have like 100 objects. At the same time, if the wrapping design is function-oriented we have like 10`000 objects which are too lazy for lazy loading. I have tried wrapping exactly the same classes by R.oo based on S3 and the outcome package was much faster in both installation and loading. The package went slow once I tried it with S4. I guess R.oo makes the package more class-oriented while S4 object-orientation is really function-oriented causing all this friction in installation and loading. Is there any way to ask R to lazy-load each object as a 'bundle of S4 methods with the same class'? I don't think so. There are ways to load a bundle of objects all at once (put them in an environment, attach the environment), but S4 methods aren't self-contained, they need to be registered with the system. You could probably write a function to load them and register them all at once, but I don't think it exists now. Duncan Murdoch (1) What is the difference between loading and registering objects in R? (2) You are talking about 'loading and registering at once'. Isn't this 'at once' the cause of slow loading? (3) Doesn't having many environments mean lose of efficiency again? __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Ali - wrote: Assume 100 C++ classes each class having 100 member functions. After wrapping these classes into R, if the wrapping design is class-oriented we should have like 100 objects. At the same time, if the wrapping design is function-oriented we have like 10`000 objects which are too lazy for lazy loading. I have tried wrapping exactly the same classes by R.oo based on S3 and the outcome package was much faster in both installation and loading. The package went slow once I tried it with S4. I guess R.oo makes the package more class-oriented while S4 object-orientation is really function-oriented causing all this friction in installation and loading. Is there any way to ask R to lazy-load each object as a 'bundle of S4 methods with the same class'? I don't think so. There are ways to load a bundle of objects all at once (put them in an environment, attach the environment), but S4 methods aren't self-contained, they need to be registered with the system. You could probably write a function to load them and register them all at once, but I don't think it exists now. Duncan Murdoch (1) What is the difference between loading and registering objects in R? Loading just creates the object. Registering it is what setMethod() and such calls do. They allow the system to know that it should call that function in response to a call to the generic with a certain signature, and so on. (2) You are talking about 'loading and registering at once'. Isn't this 'at once' the cause of slow loading? I haven't done any profiling, but I would guess the registering is the slow part. (3) Doesn't having many environments mean lose of efficiency again? Yes, I'd guess that looking things up in a chain of 100 environments is slower than looking them up in one gigantic environment. Again, I haven't done any profiling, but I'd guess it would come close to being 100 times worse, i.e. in practice order N time instead of order 1 time (but I'm sure these aren't the theoretical limits). But you were asking about delayed loading, so I was assuming that in most cases you would only load a small subset of those 100 environments. I haven't tried any big problems like yours, but I would be willing to guess that registering is slower than O(N), so cutting down on the number of things you register will give a big improvement on loading speed. But you do have to remember the two pieces of advice you've been given in this thread: - nobody else has written a package with ten thousand methods, so you're likely to find things out that nobody else knows about. - The S4 object model is quite different from that of C++, so it probably doesn't make sense to have a direct correspondence between C++ classes and methods and R classes and methods. There are probably much more efficient ways to get access to the functionality of your C++ library. Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Ali - wrote: (1) When R tries to load a library, does it load 'everything' in the library at once? No, see ?lazyLoad (2) Is there any options to 'load as you go'? Well, this is the way R does it Uwe Ligges __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
UweL Ali - wrote: (1) When R tries to load a library, does it load 'everything' in the library at once? UweL No, see ?lazyLoad are you sure Ali is talking about *package*s. He did use the word library though, and most of us (including Uwe!) know the difference... (2) Is there any options to 'load as you go'? UweL Well, this is the way R does it for packages yes, because of lazyloading, as Uwe mentioned above. For libraries, (you know: the things you get from compiling and linking C code ..), it may be a bit different. What do you really mean, packages or libraries, Ali? Well, the terminology used here is a bit confusing. ?library shows something like 'library(package)' and that's why I used the term 'library' for loading packages. The package does load some dll's but what I meant by library was actually package. The package I am working on currently has one big R file (~ 4 Mb) and this causes at least 2 troubles: (1) Things are slow: (a) Installation with (LazyLoad = Yes) is slow. Then when the library is loaded into R, the loading is slow too. So LazyLoad is of not big help. (b) Installation with (SaveImage = Yes) is -extremely- slow. To give you some idea, compiling the associated C++ code takes around 10 mins while saving the R images takes more than 40 mins (the package is a wrapper for some C++ libraries. All the R functions do is to call .Call). this doesn't improve the loading speed as well. (c) Installation with (LazyLoad = Yes) AND (SaveImage = Yes) causes this error: preparing package package_name for lazy loading make: *** [lazyload] Error 1 *** Installation of package_name failed *** It is likely that this happens because of some memory problems. (2) After all, when the package is loaded, not surprisingly, loads of memory is taken. It seems that the whole (huge) file is loaded into R at once and turning LazyLoad on or off doesn't make a difference when the package is big. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Ali - wrote: UweL Ali - wrote: (1) When R tries to load a library, does it load 'everything' in the library at once? UweL No, see ?lazyLoad are you sure Ali is talking about *package*s. He did use the word library though, and most of us (including Uwe!) know the difference... (2) Is there any options to 'load as you go'? UweL Well, this is the way R does it for packages yes, because of lazyloading, as Uwe mentioned above. For libraries, (you know: the things you get from compiling and linking C code ..), it may be a bit different. What do you really mean, packages or libraries, Ali? Well, the terminology used here is a bit confusing. ?library shows something like 'library(package)' and that's why I used the term 'library' for loading packages. The package does load some dll's but what I meant by library was actually package. The package I am working on currently has one big R file (~ 4 Mb) and this causes at least 2 troubles: (1) Things are slow: (a) Installation with (LazyLoad = Yes) is slow. Then when the library is loaded into R, the loading is slow too. So LazyLoad is of not big help. (b) Installation with (SaveImage = Yes) is -extremely- slow. To give you some idea, compiling the associated C++ code takes around 10 mins while saving the R images takes more than 40 mins (the package is a wrapper for some C++ libraries. All the R functions do is to call .Call). this doesn't improve the loading speed as well. (c) Installation with (LazyLoad = Yes) AND (SaveImage = Yes) causes this error: preparing package package_name for lazy loading make: *** [lazyload] Error 1 *** Installation of package_name failed *** It is likely that this happens because of some memory problems. (2) After all, when the package is loaded, not surprisingly, loads of memory is taken. It seems that the whole (huge) file is loaded into R at once and turning LazyLoad on or off doesn't make a difference when the package is big. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel 4Mb R file just containing .Call()s? Never seen something like that. If these are all very small functions, lazy load won't be of that advantage, because you have to load the index file anyway. You know, R including all base and recommended packages has just ~ 6Mb of R code. Are you really sure about your code? Uwe Ligges __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
4Mb R file just containing .Call()s? Never seen something like that. If these are all very small functions, lazy load won't be of that advantage, because you have to load the index file anyway. You know, R including all base and recommended packages has just ~ 6Mb of R code. Are you really sure about your code? Positively. The wrapped library is actually much bigger than R, it brings a few hundered new classes to R. The library has been already wrapped to other languages like java, and the loading speed for these other languages is quite reasonable. I cannot see any reasons why not this can be done with R too -- as a computational application R is supposed to be efficient in all ways. It seems that, so far, no packages as big as this one have been created for R. I would appreciate any clues from the development team for improving the performance of big packages in R. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Is it possible to break the package into multiple parts, perhaps like a bundle? Then you could only load the parts that you need at any particular time. -roger Ali - wrote: 4Mb R file just containing .Call()s? Never seen something like that. If these are all very small functions, lazy load won't be of that advantage, because you have to load the index file anyway. You know, R including all base and recommended packages has just ~ 6Mb of R code. Are you really sure about your code? Positively. The wrapped library is actually much bigger than R, it brings a few hundered new classes to R. The library has been already wrapped to other languages like java, and the loading speed for these other languages is quite reasonable. I cannot see any reasons why not this can be done with R too -- as a computational application R is supposed to be efficient in all ways. It seems that, so far, no packages as big as this one have been created for R. I would appreciate any clues from the development team for improving the performance of big packages in R. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Is it possible to break the package into multiple parts, perhaps like a bundle? Then you could only load the parts that you need at any particular time. It could be done, but the question is, what if one of the packages in the bundle depends on all of the rest? And the bigger question is, why lazy loading is not efficient when it comes to many small functions? __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
I think the reason, as Uwe already said, is that you have to load the lazyload index file, and in your case that file is likely to be as large as the R file itself. -roger Ali - wrote: Is it possible to break the package into multiple parts, perhaps like a bundle? Then you could only load the parts that you need at any particular time. It could be done, but the question is, what if one of the packages in the bundle depends on all of the rest? And the bigger question is, why lazy loading is not efficient when it comes to many small functions? _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Ali - wrote: Is it possible to break the package into multiple parts, perhaps like a bundle? Then you could only load the parts that you need at any particular time. It could be done, but the question is, what if one of the packages in the bundle depends on all of the rest? And the bigger question is, why lazy loading is not efficient when it comes to many small functions? Lazy loading just converts an object into a small instruction to load the object. If the object was already small, there's no advantage to that. It's mainly designed to avoid memory use (some rarely used objects can be gigantic). Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
On Mon, 25 Apr 2005, Duncan Murdoch wrote: Ali - wrote: Is it possible to break the package into multiple parts, perhaps like a bundle? Then you could only load the parts that you need at any particular time. It could be done, but the question is, what if one of the packages in the bundle depends on all of the rest? And the bigger question is, why lazy loading is not efficient when it comes to many small functions? Lazy loading just converts an object into a small instruction to load the object. If the object was already small, there's no advantage to that. It's mainly designed to avoid memory use (some rarely used objects can be gigantic). From a design point of view the reason is that this isn't the problem lazy loading is trying to solve. We didn't have a problem with packages that have huge number of small objects, but we did have a problem with packages that had a moderate number of moderately large objects. In addition, trying to optimize performance is not usually a good idea unless you can measure the performance of different implementations on real applications, and we didn't have applications like that. -thomas __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speeding up library loading
Lazy loading just converts an object into a small instruction to load the object. If the object was already small, there's no advantage to that. It's mainly designed to avoid memory use (some rarely used objects can be gigantic). From a design point of view the reason is that this isn't the problem lazy loading is trying to solve. We didn't have a problem with packages that have huge number of small objects, but we did have a problem with packages that had a moderate number of moderately large objects. In addition, trying to optimize performance is not usually a good idea unless you can measure the performance of different implementations on real applications, and we didn't have applications like that. Assume 100 C++ classes each class having 100 member functions. After wrapping these classes into R, if the wrapping design is class-oriented we should have like 100 objects. At the same time, if the wrapping design is function-oriented we have like 10`000 objects which are too lazy for lazy loading. I have tried wrapping exactly the same classes by R.oo based on S3 and the outcome package was much faster in both installation and loading. The package went slow once I tried it with S4. I guess R.oo makes the package more class-oriented while S4 object-orientation is really function-oriented causing all this friction in installation and loading. Is there any way to ask R to lazy-load each object as a 'bundle of S4 methods with the same class'? __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel