[Rd] R installation glitches.
Dear list, I installed R 2.3.1 on a solaris machine. The first time, I used the wrong option --enable-shlib, the installation went smoothly (glitch 1), but of course did not generate libR.so. So, I ran ./configure again with --enable-R-shlib, make and make install went well but still there was no libR.so generated (glitch 2). Finally, I removed the build directory and started fresh, and everything went well. I should have used the correct options, or should have used make distclean afterwards, but the fact that configure/make did not complain may be considered as bugs. Regards, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Provide both shlib and standard versions of R?
> Why guess? There are accurate statements in the R-admin manual, I read the manual 2 years ago, and the info I got was still correct. > and the RH RPM change was discussed on this list in 2006: > https://stat.ethz.ch/pipermail/r-devel/2006-January/036118.html I simply do not know RPMs have been built with this option on, and there is no definite place/word I can be assured about this. > No, for the reasons given in the R-admin manual. Even the reason I gave in my first email was not up to date (most binary distributions are compiled with this option), my suggest is still valid: why not provide two binaries? Most users can use standard R without sacrificing performance and embedding applications use libR.so and, if needed, a separate binary R_embed. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Provide both shlib and standard versions of R?
> then either build your own with correct options or talk to your > distribution's packaging team. It seems that my knowledge about this option is outdated. When I first encountered this problem two years ago, the R/rpm distribution came with no libR.so. I was told that --enable-R-shlib would lead to 10% - 20% performance loss, and I had to re-compile R if I need to embed it. So I guess performance is no longer an issue and shared libraries are provided as default on all platforms now? I certainly welcome this change and I apologize for my unfounded accusation to R. BTW, shouldn't --enable-R-shlib be yes by default during ./configure?. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Provide both shlib and standard versions of R?
> That is not true, almost all binaries come with R as shared library - > it is in fact the default on Mac OS X and Windows. Most Linux > distributions provide a shared library binary as well. This would be good news. But at least, under linux, ./configure --help --enable-R-shlibbuild the shared/dynamic library 'libR' [no] This option is not enabled by default. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Provide both shlib and standard versions of R?
Dear list, To operate R from python via a Python package rpy, R has to be compiled with --enable-R-shlib. This is troublesome since none of the binary distributions (except for windows?) is built with this option so rpy users have to build R from source. This can be quite a challenge, especially on platforms like macOSX. Is it possible to provide both standard and shlib version of R by default? This may not be too hard for R (link twice? I am not quite sure.) but will benefit all applications that embed R. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R plot display problem under windows when using python rpy module.
> I was wondering > what is the difference between rpy and RSPython. The main difference is that rpy does one-way communication, is simpler than RSPython, than is easier to use. > So I have a question for you - why rpy instead of RSPython? Have you > tried both, and can you give a comparison of pros and cons? I am not an expert on RSPython. I tried RSPython and rpy and chose rpy for the following reasons: 1. rpy is in active maintenance. As you can see from rpy webpage, rpy supports all versions of R till 2.2.1, python 2.4 and provides binary installers like rpy-0.4.6-R-2.0.0-to-2.2.1-.xxx . On the contrary, a windows installer for RSPython is for R-1.4.0, python 2.2.0. 2. RSPython uses mainly its RS.call function. This is troublesome and is the main reason why I use rpy. For example, in RSPython: RS.call('rnorn', 10) in rpy: r.rnorm(10) RSPython does provide similar usage now (maybe after I became a rpy user) but RSPython: from RS import R R.rnorm(10) # works R.dev.off() # does not work Rpy solves this problem by (and the mechanism is clearly described in the rpy manual): from rpy import * r.rnorm(10) r.dev_off() rpy also provides r('''arbitrary R piece of code''') which is immensely useful to run big trunk of R code. rpy also claims that the performance of rpy is better but I have no comparison data here. > there is nothing wrong with having a bias, > but the goals listed imply that it might be "better" in those areas... > which may or may not be true. I agree. > (To answer one question you might have, I was trying to invoke Python > code from inside R and was doing it the opposite direction from you). Exactly. We are doing different things so while you have to use RSPython, I have a choice between RSPython and rpy. In my case, all the real computations are done in C/C++, wrapped by Python. I could have wrapped my C/C++ code in R but R is not good at wrapping C++ class hierarchy because of the different OOP mechanisms. When I need the statistical analysis and plotting capacity of R, I use rpy. As a matter of fact, since Python is a powerful programming language than can handle string, text file etc better than R, I usually prepare my data in python and pass them to R using rpy. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R plot display problem under windows when using python rpy module.
On 1/9/06, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > How is Rpy calling R? Presumably R is running single-threaded, and the > problem is likely to be that Rpy is using blocking I/O on the R process > and hence blocking the GUI callbacks that drive the window. > > The not-so-simple answer is not to do it that way. It might be well > sufficient to turn windows() buffering off -- see its help page. Using >>> from rpy import * >>> r.options(windowsBuffered=False) >>> r.plot(0) solves the problem. Thank you very much! I will suggest that rpy developers address this problem. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R plot display problem under windows when using python rpy module.
Dear list, Rpy is a python module that provides python interface to R. The following simple commands >>> from rpy import * >>> r.plot(0) is supposed to create a window that displays the result of plot(0). However, we observe that 1. Under *nix, rpy+R+python work as expected 2. Under windows, python + pythonWin32 (a python GUI provided by the pywin32 module), work as expected 3. Under windows, if we run the commands from command line or IDLE (a simple python IDE), a window will be created, but the figure will not be drawn. Then, if we run r.plot(2), the result of plot(0) will be drawn. plot(2) will be displayed when the next drawing command is executed. Since R works well in most cases, I do not think this is a R problem. However, can anyone tell me what *might* block the figures from being displayed? In other word, what might PythonWin have provided to enable correct rendering of the figures? If I have to trace to the sources, what portion of the R code should I have a look at? (It is good that python/R/rpy are all open source). Many thanks in advance. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.Date() , feature or bug?
This problem was brought up by Xu Neng <[EMAIL PROTECTED]> in the rpy-list. Remember the old times when computer guys thought the year 2000 was too far away to be worried? In my case, the dates like "-06-6" and "-09-09" are used as special missing codings for dates. If R cannot handle dates later than the year 5000, it is fine. Please just let the world knows about it. What is really annoying is the fact that the default date "1970-01-01" is silently used instead of raising exception. If you are not very careful, you may not even notice this trick. This is the bug I'm referring to. Hopefully, the R experts can handle this bug seriously. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.Date() , feature or bug?
> Well, bug, if you really want to call it a bug that you cannot represent > the year . ;-) So I guess we need a warning message and a line in help(as.Date)? Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] as.Date() , feature or bug?
Under linux and windows, > as.Date("-06-06") [1] "-06-06" > as.Date("-07-07") [1] "1970-01-01" > Feature? Bug? help(as.Date) does not mention this case. Bo. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] efficiency of sample() with prob.
> You chose to report just one extremely favourable example, < ignore> > I do think you are being `careless with the truth'. I chose to report whatever I got and whatever I felt the result was. It was not a scientific report and it was up to you (the R-team) to validate my result and make further investigations. When I was asked for a more thorough research in this field (and thus take the responsibility to my results), I said no since I did not have enough expertise and time. After all, I could have chosen not to report anything at all. If this mailing list only accepts serious contributions from field professionals, it is time for me to quit. > Hmm. A sample every 100ns and generating samples several times faster > than tabulating them is something to be impressed by. > Not in our tests. Did you try my 5 out of 10,000 non-zero probablilties > distribution, for example? No. I did not try. Intuitively, Walker's method may be slow because of the memory allocation and table setup stuff. It should be used with large sample and small prob vector. Bisection method should be quicker when the prob vector is large (so linear search is slow) and the sample size is relatively small. I thought bisection would be uniformly quicker than the linear search method since there is no additional cost other than a few more variables. If you have done enough experiments and know the fastest method in each case, it should be straightforward to implement a hybrid method where the best method is chosen based on the lengths of sample and prob vector. I might have been reporting partial results, but the 80 times speed-up in some cases was not faked. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] efficiency of sample() with prob.
> It seems the distribution used for your tests is maximally favourable to > your proposal (not uncommon in papers, but not very honest). I did not have time to do a thorough test. I was simply reporting what I had done so any implied dishonesty is unfair to me. > Changing how this is done will break the reproducibility of past programs, > and we don't really want to introduce yet more options. So it seems only > worth doing when there are substantial speed gains. I agree that Walker's method may be slow at some cases but there is nothing to brag about R's current method either. The bottom line is: a bisection method will be uniformly better than the linear search method that R uses right now. This does not qualify as 'substantial speed gain' though. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How difficult is it to wrap a large C++ library with R?
> There was a couple of posts about this recently: > > https://stat.ethz.ch/pipermail/r-devel/2005-April/subject.html > I am too new to this list to see these discussions. They are quite interesting! Certainly I was not aware of the big differences between C++/Python OOP and S3/S4 OOP. This explains why I skipped the OOP part of the red S-Plus manual when I was learning S-Plus: I was not able to get the ideas although I was an experienced C++ programmer. Regardless of the question of 'which OOP is better', I certainly agree with Nathan's opinion that R has lost many users/developers who only know C++/Python OOP. I think I would better continue to use Python to wrap my C++ code (until Ali find a good way to automatically wrap C++ code, or SWIG adds R support). It is simply much easier and more efficient. Destructing my C++ classes into C pieces and reconstruct them in S4 way would be tedious, and the performance penalty will most possibly be disastrous for this simulation environment. Cheers, Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How difficult is it to wrap a large C++ library with R?
> SWIG is a very nice idea. ... SWIG is a wonderful tool that makes wrapping C++ code super-easy. I can write pure C++ code and even test them by writing a main() function. Then, using a simple interface file, all my C++ classes becomes Python (shadow) classes like magic. It will take much more time to develop simuPOP if I had to modify interface files manually for every change to the C++ classes. > I hope to have a prototype of a general mechanism for generating > bindings to other code for R sometime over the summer. > It has been hatching for some time and it is primarily a matter of > getting the time to implement the existing ideas. Since you already know how to wrap C++ classes to R classes, maybe it is a good idea to ask SWIG developers how to use SWIG to bridge them. After all, SWIG has a mature set of tools to parse C++ code and generate interfaces to more than a dozen scripting languages. Personally, if I have to write interfaces manually for my 40+ classes and a lot more functions, I would rather devote time in adding R support for SWIG. Thanks. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How difficult is it to wrap a large C++ library with R?
> > * Wrap C++ class hierarchy. Virtual functions need to be supported. > > (SWIG can generate Python shadow classes that behave almost exactly > > like the underlying C++ classes) > > This is hard to do in R, because the R object model is quite different > from that of C++ (whereas Python's is more similar). I guess I will have to learn R OOP in detail first. It would be terrible if I will have to re-write every C++ class in R (and without inheritance)! > > * Direct access to C++ data structure. For example, an object may keep > > a C array internally. I will need a method (maybe wrap this data into > > a R object) to read/write this array directly. > > That's not too hard provided you use C++ code to do the actual access. > That is, you write an R function that calls C++ code to do the work. > It's a lot harder if you want to keep it all in R, because it doesn't > understand C++ type definitions, alignment conventions, etc. So this has to be done through functions. In Python, I can create an object to wrap C array that can be accessed just like regular list. I guess it would be difficult to create a new data type in R. Anyway, where can I find the most complete information about implementation of R/OOP at C/C++ level? (Or should I write classes in R and call C++ functions for every member function?) I have not read http://cran.r-project.org/doc/manuals/R-exts.pdf in detail but my impression is that almost everything is done at a function (not object) level using .C(). Thanks. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How difficult is it to wrap a large C++ library with R?
Dear list, I have developed a forward-time population genetics simulation environment simuPOP, which is a set of C++ (template) classes/functions wrapped by SWIG as Python libraries. R is used extensively as plotting and statistical analysis engine through RPy package. I use Python to wrap simuPOP since most the following can be easily done using SWIG or Python C API. However, since Python is less used by bioinformaticists and geneticists, I am asked again and again why I do not wrap simuPOP with R (R is also OOP etc...). Although I am a long time R user, I am not familiar with package writing in R. From what I read from R website, it is easy to wrap individual C/C++ functions but not C++ classes. Can anyone take the time to review the following and tell me if they can be done (easily or need effort) using R? If most of the answers are yes, it may be a good idea to switch to R. * Wrap C++ class hierarchy. Virtual functions need to be supported. (SWIG can generate Python shadow classes that behave almost exactly like the underlying C++ classes) * Be able to do this: evolve(ops=c(obj1, obj2, obj3)) Internally, evolve will call virtual function fun() of obj1, obj2, .etc. obj1, obj2, ... are objects derived from the same base class. * Direct access to C++ data structure. For example, an object may keep a C array internally. I will need a method (maybe wrap this data into a R object) to read/write this array directly. * create and read/write a R list at C++ level. (Need API for R/list read/write) * pass a user defined R function to C++ function. Call it and obtain result. * evaluate an R expression (passed as string) from within C++ function and obtain result. * pass C++ exceptions to R * be able to use C++ features like template, STL. * organize manual by objects, not functions. (I notice that manuals of R libraries are simple function references.) Many thanks in advance. -- Bo Peng Department of Statistics Rice University http://bp6.stat.rice.edu:8080/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] efficiency of sample() with prob.
On 6/24/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > `Research' involves looking at all the competitor methods, devising a > near-optimal strategy and selecting amongst methods according to that > strategy. It is not a quick fix we are looking for but something that > will be good for the long term. > I am sorry but I am afraid that I do not have enough time and background knowledge to do a thorough research in this area. I have tried bisection search method and the alias method, the latter has greatly improved the performance of my bioinformatics application. Since this method is the only one mentioned in Knuth's book, I have no idea about other alternatives. Attached is a slightly improved version of the alias method. It may be helpful to people having similar problems. Thanks. -- Bo Peng Department of Statistics Rice University. http://bp6.stat.rice.edu:8080/ /* replace probSampleReplace in src/main/random.c */ static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans) { /* allocate memory for a, p, H and L is the second half of a */ double * q = Calloc(n, double); int * a = Calloc(2*n, int); int * LL = a+n, * HH = a+2*n-1; /* start of H and L vector */ int * L = LL, * H = HH; /* end of H and L vector */ int i, j, k; double rU; /* U[0,1)*n */ /* set up alias table */ /* initialize q with n*p0,...n*p_n-1 */ for(i=0; i= 1.) *H-- = i; else *L++ = i; } while( L != LL && H != HH ) { j = *(L-1); k = *(H+1); a[j] = k; q[k] += q[j] - 1; L--; /* remove j from L */ if( q[k] < 1. ) { *L++ = k; /* add k to L */ ++H; /* remove k */ } } /* generate sample */ for (i = 0; i < nans; ++i) { rU = unif_rand() * n; k = (int)(rU); rU -= k; /* rU becomes rU-[rU] */ if( rU < q[k] ) ans[i] = k+1; else ans[i] = a[k]+1; } Free(a); Free(q); } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] efficiency of sample() with prob.
> We suggest that you take up your own suggestion, research this area and > offer the R project the results of your research for consideration as your > contribution. I implemented Walker's alias method and re-compiled R. Here is what I did: 1. replace function ProcSampleReplace in R-2.1.0/src/main/random.c with the following one static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans) { /* allocate memory for a, p and HL */ double * q = Calloc(n, double); int * a = Calloc(n, int); int * HL = Calloc(n, int); int * H = HL; int * L = HL+n-1; int i, j, k; double rU; /* U[0,1)*n */ /* set up alias table */ /* initialize q with n*p0,...n*p_n-1 */ for(i=0; i= 1.) *H++ = i; else *L-- = i; } while( H != HL && L != HL+n-1) { j = *(L+1); k = *(H-1); a[j] = k; q[k] += q[j] - 1; L++; /* remove j from L */ if( q[k] < 1. ) { *L-- = k; /* add k to L */ --H; /* remove k */ } } /* generate sample */ for (i = 0; i < nans; ++i) { rU = unif_rand() * n; k = (int)(rU); rU -= k; /* rU becomes rU-[rU] */ if( rU < q[k] ) ans[i] = k+1; else ans[i] = a[k]+1; } Free(HL); Free(a); Free(q); } 2. make and make install 3. test the new sample function by code like > b=sample(seq(1,100), prob=seq(1,100), replace=TRUE, size=100) > table(b)/100*sum(seq(1,100)) 4. run the following code in current R 2.1.0 and updated R. for(prob in seq(1,4)){ for(sample in seq(1,4)){ x = seq(1:(10^prob)) # short to long x p = abs(rnorm(length(x))) # prob vector times = 10^(6-prob) # run shorter cases more times Rprof(paste("sample_", prob, "_", sample, ".prof", sep='')) for(t in seq(1,times)){ sample(x, prob=p, size=10^sample, replace=TRUE ) } Rprof(NULL) } } Basically, I tried to test the performance of sample(replace=TRUE, prob=..) with different length of x and size. 5. process the profiles and here is the result. p: length of prob and x size: size of sample cell: execution time of old/updated sample() size\p10 10^210^3 10^4 10 2.4/1.6 0.32/0.22 0.20/0.08 0.24/0.06 10^2 3.1/2.6 0.48/0.28 0.28/0.06 0.30/0.06 10^3 11.8/11.11.84/1.14 0.94/0.18 0.96/0.08 10^4 96.8/96.615.34/9.68 7.54/1.06 7.48/0.16 run: 1100010010 times We can see that the alias method is quicker than the linear search method in all cases. The performance difference is greatest (>50 times) when the original algorithm need to search in a long prob vector. I have not thoroughly tested the new function. I will do so if you (the developers) think that this has the potential to be incorporated into R. Thanks. Bo Peng Department of Statistics Rice University __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] efficiency of sample() with prob.
On 6/21/05, Vadim Ogranovich <[EMAIL PROTECTED]> wrote: > In his "Introduction to Probability Models" Sheldon Ross describes (sec > 11.4.1, 8th edition) the alias method for such weighted sampling. > It is based on some decomposition of the original distribution (the > weights) into a mixture of two-point distributions. This sounds like Walker's alias method for weighted sampling. I looked through Knoth's 'the art of computer programming' and find this algorithm. I implemented this one but it is just as efficient as the bisection lookup method in my case. The reason is that the setup of this algorithm is complicated so it is suited for getting large sample from short weighted sequences. Anyway, I do suggest R developers try this algorithm for sample with replacement. A sample code can be found at http://statistik.wu-wien.ac.at/arvag/monograph/arvag-src/algo03_03.c . BTW, does anyone know a quicker algorithm to set up the internal table of the alias method? Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] efficiency of sample() with prob.
Dear list, A while ago, Vadim asked opinions on improving efficiency of sample() with prob, e.g. sample with replacement with weight. ( https://stat.ethz.ch/pipermail/r-devel/2004-September/030844.html ) He did not post what he ended up with this problem though. I am having exactly the same problem. I need to sample with replacement from a population of size 10 million with fitness values for each individual. sample() is too slow for this purpose. I implement a bisection search algorithm. It is about 30% faster than the linear search one but still not good enough to me. (I can post the function if needed). Does anybody have some good ideas? The only other idea I have is using a faster (but worse) random number generator just for this application. Thanks. Bo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel