[Rd] R installation glitches.

2006-07-06 Thread Bo Peng
Dear list,

I installed R 2.3.1 on a solaris machine. The first time, I used the
wrong option --enable-shlib,  the installation went smoothly (glitch
1), but of course did not generate libR.so. So, I ran ./configure
again with --enable-R-shlib, make and make install went well but still
there was no libR.so generated (glitch 2). Finally, I removed the
build directory and started fresh, and everything went well.

I should have used the correct options, or should have used make
distclean afterwards, but the fact that configure/make did not
complain may be considered as bugs.

Regards,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Provide both shlib and standard versions of R?

2006-01-16 Thread Bo Peng
> Why guess?  There are accurate statements in the R-admin manual,

I read the manual 2 years ago, and the info I got was still correct.

> and the RH RPM change was discussed on this list in 2006:
> https://stat.ethz.ch/pipermail/r-devel/2006-January/036118.html

I simply do not know RPMs have been built with this option on, and
there is no definite place/word I can be assured about this.

> No, for the reasons given in the R-admin manual.

Even the reason I gave in my first email was not up to date (most
binary distributions are compiled with this option), my suggest is
still valid: why not provide two binaries? Most users can use standard
R without sacrificing performance  and embedding applications use
libR.so and, if needed, a separate binary R_embed.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Provide both shlib and standard versions of R?

2006-01-15 Thread Bo Peng
> then either build your own with correct options or talk to your
> distribution's packaging team.

It seems that my knowledge about this option is outdated.  When I
first encountered this problem two years ago, the R/rpm distribution
came with no libR.so. I was told that --enable-R-shlib would lead to
10% - 20% performance loss, and I had to re-compile R if I need to
embed it.

So I guess performance is no longer an issue and shared libraries are
provided as default on all platforms now? I certainly welcome this
change and I apologize for my unfounded accusation to R.

BTW, shouldn't --enable-R-shlib be yes by default during ./configure?.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Provide both shlib and standard versions of R?

2006-01-15 Thread Bo Peng
> That is not true, almost all binaries come with R as shared library -
> it is in fact the default on Mac OS X and Windows. Most Linux
> distributions provide a shared library binary as well.

This would be good news. But at least, under linux,

./configure --help
  --enable-R-shlibbuild the shared/dynamic library 'libR' [no]

This option is not enabled by default.

Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Provide both shlib and standard versions of R?

2006-01-15 Thread Bo Peng
Dear list,

To operate R from python via a Python package rpy, R has to be
compiled with --enable-R-shlib.  This is troublesome since none of the
binary distributions (except for windows?) is built with this option
so rpy users have to build R from source. This can be quite a
challenge, especially on platforms like macOSX.

Is it possible to provide both standard and shlib version of R by
default? This may not be too hard for R (link twice? I am not quite
sure.) but will benefit all applications that embed R.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R plot display problem under windows when using python rpy module.

2006-01-09 Thread Bo Peng
> I was wondering
> what is the difference between rpy and RSPython.

The main difference is that rpy does one-way communication, is simpler
than RSPython, than is easier to use.

> So I have a question for you - why rpy instead of RSPython? Have you
> tried both, and can you give a comparison of pros and cons?

I am not an expert on RSPython. I tried RSPython and rpy and chose rpy
for the following reasons:

1. rpy is in active maintenance. As you can see from rpy webpage, rpy
supports all versions of R till 2.2.1, python 2.4 and provides binary
installers like rpy-0.4.6-R-2.0.0-to-2.2.1-.xxx . On the contrary,
a windows installer for RSPython is for R-1.4.0, python 2.2.0.

2. RSPython uses mainly its RS.call function. This is troublesome and
is the main reason why I use rpy. For example,

in RSPython:
  RS.call('rnorn', 10)
in rpy:
  r.rnorm(10)

RSPython does provide similar usage now (maybe after I became a rpy user) but
RSPython:
  from RS import R
  R.rnorm(10)   # works
  R.dev.off() # does not work

Rpy solves this problem by (and the mechanism is clearly described in
the rpy manual):
  from rpy import *
  r.rnorm(10)
  r.dev_off()

rpy also provides
  r('''arbitrary R piece of code''')
which is immensely useful to run big trunk of R code.

rpy also claims that the performance of rpy is better but I have no
comparison data here.

>  there is nothing wrong with having a bias,
> but the goals listed imply that it might be "better" in those areas...
> which may or may not be true.

I agree.

> (To answer one question you might have, I was trying to invoke Python
> code from inside R and was doing it the opposite direction from you).

Exactly. We are doing different things so while you have to use
RSPython, I have a choice between RSPython and rpy. In my case, all
the real computations are done in C/C++, wrapped by Python. I could
have wrapped my C/C++ code in R but R is not good at wrapping C++
class hierarchy because of the different OOP mechanisms. When I need
the statistical analysis and plotting capacity of R, I use rpy.

As a matter of fact, since Python is a powerful programming language
than can handle string, text file etc better than R, I usually prepare
my data in python  and pass them to R using rpy.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R plot display problem under windows when using python rpy module.

2006-01-09 Thread Bo Peng
On 1/9/06, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> How is Rpy calling R?  Presumably R is running single-threaded, and the
> problem is likely to be that Rpy is using blocking I/O on the R process
> and hence blocking the GUI callbacks that drive the window.
>
> The not-so-simple answer is not to do it that way.  It might be well
> sufficient to turn windows() buffering off -- see its help page.

Using
>>> from rpy import *
>>> r.options(windowsBuffered=False)
>>> r.plot(0)
solves the problem.

Thank you very much! I will suggest that rpy developers address this problem.

Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R plot display problem under windows when using python rpy module.

2006-01-09 Thread Bo Peng
Dear list,

Rpy is a python module that provides python interface to R. The
following simple commands

>>> from rpy import *
>>> r.plot(0)

is supposed to create a window that displays the result of plot(0).

However, we observe that
1. Under *nix, rpy+R+python  work as expected
2. Under windows, python + pythonWin32 (a python GUI provided by the
pywin32 module), work as expected
3. Under windows, if we run the commands from command line or IDLE (a
simple python IDE), a window will be created, but the figure will not
be drawn. Then, if we run r.plot(2), the result of plot(0) will be
drawn. plot(2) will be displayed when the next drawing command is
executed.

Since R works well in most cases, I do not think this is a R problem.
However, can anyone tell me what *might* block the figures from being
displayed? In other word, what might PythonWin have provided to enable
correct rendering of the figures? If I have to trace to the sources,
what portion of the R code should I have a look at? (It is good that
python/R/rpy are all open source).

Many thanks in advance.
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.Date() , feature or bug?

2005-09-15 Thread Bo Peng
This problem was brought up by Xu Neng <[EMAIL PROTECTED]> in the rpy-list. 


Remember the old times when computer guys thought the year 2000 was
too far away to be worried? In my case, the dates like "-06-6" and
"-09-09" are used as special missing codings for dates.

If R cannot handle dates later than the year 5000, it is fine. Please
just let the world knows about it. What is really annoying is the fact
that the default date "1970-01-01" is silently used instead of raising
exception. If you are not very careful, you may not even notice this
trick. This is the bug I'm referring to.

Hopefully, the R experts can handle this bug seriously.


Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.Date() , feature or bug?

2005-09-14 Thread Bo Peng
> Well, bug, if you really want to call it a bug that you cannot represent
> the year . ;-)

So I guess we need a warning message and a line in help(as.Date)?

Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] as.Date() , feature or bug?

2005-09-13 Thread Bo Peng
Under linux and windows,

> as.Date("-06-06")
[1] "-06-06"
> as.Date("-07-07")
[1] "1970-01-01"
> 

Feature? Bug? help(as.Date) does not mention this case.

Bo.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-08-30 Thread Bo Peng
> You chose to report just one extremely favourable example, < ignore>
> I do think you are being `careless with the truth'.

I chose to report whatever I got and whatever I felt the result was. It was not
a scientific report and it was up to you (the R-team) to validate my result and
make further investigations. When I was asked for a more thorough research
in this field (and thus take the responsibility to my results), I said
no since I did
not have enough expertise and time. 

After all, I could have chosen not to report anything at all. If this
mailing list
only accepts serious contributions from field professionals, it is time for me 
to quit.

> Hmm.  A sample every 100ns and generating samples several times faster
> than tabulating them is something to be impressed by. 
> Not in our tests.  Did you try my 5 out of 10,000 non-zero probablilties
> distribution, for example?

No. I did not try. Intuitively, Walker's method may be slow because  of
the memory allocation and table setup stuff. It should be used with 
large sample and small prob vector. Bisection method should be
quicker when the prob vector is large (so linear search is slow) and the 
sample size is relatively small. I thought bisection would be uniformly quicker 
than the linear search method since there is no additional cost other than 
a few more variables.

If you have done enough experiments and know the fastest method in 
each case, it should be straightforward to implement a hybrid method 
where the best method is chosen based on the lengths of sample and 
prob vector. I might have been reporting partial results, but the 80 times 
speed-up in some cases was not faked.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-08-29 Thread Bo Peng
> It seems the distribution used for your tests is maximally favourable to 
> your proposal (not uncommon in papers, but not very honest).  

I did not have time to do a thorough test. I was simply reporting what
I had done so any implied dishonesty is unfair to me.

> Changing how this is done will break the reproducibility of past programs,
> and we don't really want to introduce yet more options.  So it seems only
> worth doing when there are substantial speed gains. 

I agree that Walker's method may be slow at some cases but there is
nothing to brag about R's current method either. The bottom line is: a
bisection method will be uniformly better than the linear search
method that R uses right now. This does not qualify as 'substantial
speed gain' though.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How difficult is it to wrap a large C++ library with R?

2005-07-04 Thread Bo Peng
> There was a couple of posts about this recently:
> 
> https://stat.ethz.ch/pipermail/r-devel/2005-April/subject.html
> 

I am too new to this list to see these discussions. They are quite interesting!

Certainly I was not aware of the big differences between C++/Python
OOP and S3/S4 OOP. This explains why I skipped the OOP part of the red
S-Plus manual when I was learning S-Plus: I was not able to get the
ideas although I was an experienced C++ programmer. Regardless of the
question of 'which OOP is better', I certainly agree with Nathan's
opinion that R has lost many users/developers who only know C++/Python
OOP.

I think I would better continue to use Python to wrap my C++ code
(until Ali find a good way to automatically wrap C++ code, or SWIG
adds R support). It is simply much easier and more efficient.
Destructing my C++ classes into C pieces and reconstruct them in S4
way would be tedious, and the performance penalty will most possibly
be disastrous for this simulation environment.

Cheers,
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How difficult is it to wrap a large C++ library with R?

2005-07-04 Thread Bo Peng
> SWIG is a very nice idea. ...

SWIG is a wonderful tool that makes wrapping C++ code super-easy. I
can write pure C++ code and even test them by writing a main()
function. Then, using a simple interface file, all my  C++ classes
becomes Python (shadow) classes like magic. It will take much more
time to develop simuPOP if I had to modify interface files manually
for every change to the  C++ classes.

> I hope to have a prototype of a general mechanism for generating
> bindings  to other code for R sometime over the summer.
> It has been hatching for some time and it is primarily a matter of
> getting the time to implement the existing ideas.

Since you already know how to wrap C++ classes to R classes, maybe it
is a good idea to ask SWIG developers how to use SWIG to bridge them.
After all, SWIG has a mature set of tools to parse C++ code and
generate interfaces to more than a dozen scripting languages.

Personally, if I have to write interfaces manually for my 40+ classes
and a lot more functions, I would rather devote time in adding R
support for SWIG.

Thanks.
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How difficult is it to wrap a large C++ library with R?

2005-07-04 Thread Bo Peng
> > * Wrap C++ class hierarchy. Virtual functions need to be supported.
> > (SWIG can generate Python shadow classes that behave almost exactly
> > like the underlying C++ classes)
> 
> This is hard to do in R, because the R object model is quite different
> from that of C++ (whereas Python's is more similar).

I guess I will have to learn R OOP in detail first. It would be
terrible if I will have to re-write every C++ class in R (and without
inheritance)!

> > * Direct access to C++ data structure. For example, an object may keep
> > a C array internally. I will need a method (maybe wrap this data into
> > a R object) to read/write this array directly.
> 
> That's not too hard provided you use C++ code to do the actual access.
> That is, you write an R function that calls C++ code to do the work.
> It's a lot harder if you want to keep it all in R, because it doesn't
> understand C++ type definitions, alignment conventions, etc.

So this has to be done through functions. In Python, I can create an
object to wrap C array that can be accessed just like regular list. I
guess it would be difficult to create a new data type in R.

Anyway, where can I find the most complete information about
implementation of R/OOP at C/C++ level? (Or should I write classes in
R and call C++ functions for every member function?) I have not read
http://cran.r-project.org/doc/manuals/R-exts.pdf in detail but my
impression is that almost everything is done at a function (not
object) level using .C().

Thanks.
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How difficult is it to wrap a large C++ library with R?

2005-07-04 Thread Bo Peng
Dear list,

I have developed a forward-time population genetics simulation
environment simuPOP, which is a set of C++ (template)
classes/functions wrapped by SWIG as Python libraries. R is used
extensively as plotting and statistical analysis engine through RPy
package.

I use Python to wrap simuPOP since most the following can be easily
done using SWIG or Python C API. However, since Python is less used by
bioinformaticists and geneticists, I am asked again and again why I do
not wrap simuPOP with R (R is also OOP etc...). Although I am a long
time R user, I am not familiar with package writing in R. From what I
read from R website, it is easy to wrap individual C/C++ functions but
not C++ classes. Can anyone take the time to review the following and
tell me if they can be done (easily or need effort) using R? If most
of the answers are yes, it may be a good idea to switch to R.

* Wrap C++ class hierarchy. Virtual functions need to be supported.
(SWIG can generate Python shadow classes that behave almost exactly
like the underlying C++ classes)

* Be able to do this:
evolve(ops=c(obj1, obj2, obj3))
  Internally, evolve will call virtual function fun() of obj1, obj2, .etc.
  obj1, obj2, ... are objects derived from the same base class. 

* Direct access to C++ data structure. For example, an object may keep
a C array internally. I will need a method (maybe wrap this data into
a R object) to read/write this array directly.

* create and read/write a R list at C++ level. (Need API for R/list read/write)

* pass a user defined R function to C++ function. Call it and obtain result.

* evaluate an R expression (passed as string) from within C++ function
and obtain result.

* pass C++ exceptions to R

* be able to use C++ features like template, STL.

* organize manual by objects, not functions. (I notice that manuals of
R libraries are simple function references.)


Many thanks in advance.

--
Bo Peng
Department of Statistics
Rice University
http://bp6.stat.rice.edu:8080/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-06-24 Thread Bo Peng
On 6/24/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> `Research' involves looking at all the competitor methods, devising a
> near-optimal strategy and selecting amongst methods according to that
> strategy.  It is not a quick fix we are looking for but something that
> will be good for the long term.
> 

I am sorry but I am afraid that I do not have enough time and
background knowledge
to do a thorough research in this area. I have tried bisection search
method and the alias
method, the latter has greatly improved the performance of my bioinformatics
application. Since this method is the only one mentioned in Knuth's
book, I have no
idea about other alternatives. 

Attached is a slightly improved version of the alias method. It may be
helpful to people
having similar problems.

Thanks.

--
Bo Peng
Department of Statistics
Rice University.
http://bp6.stat.rice.edu:8080/

/* replace probSampleReplace in src/main/random.c */
static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans)
{
/* allocate memory for a, p, H and L is the second half of a  */
double * q = Calloc(n, double);
int * a = Calloc(2*n, int);
int * LL = a+n, * HH = a+2*n-1;  /* start of H and L vector */
int * L = LL, * H = HH;  /* end of H and L vector */
int i, j, k;
double rU;  /* U[0,1)*n */

/* set up alias table */
/* initialize q with n*p0,...n*p_n-1 */
for(i=0; i= 1.)
*H-- = i;
else
*L++ = i;
}

while( L != LL && H != HH ) {
j = *(L-1);
k = *(H+1);
a[j] = k;
q[k] += q[j] - 1;
L--;  /* remove j from L */
if( q[k] < 1. ) {
*L++ = k; /* add k to L */
++H;  /* remove k */
}
}

/* generate sample */
for (i = 0; i < nans; ++i) {
rU = unif_rand() * n;

k = (int)(rU);
rU -= k;  /* rU becomes rU-[rU] */

if( rU < q[k] )
ans[i] = k+1;
else
ans[i] = a[k]+1;
}
Free(a);
Free(q); 
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-06-23 Thread Bo Peng
> We suggest that you take up your own suggestion, research this area and
> offer the R project the results of your research for consideration as your
> contribution.

I implemented Walker's alias method and re-compiled R. Here is what
I did:

1. replace function ProcSampleReplace in R-2.1.0/src/main/random.c
  with the following one 
  
static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans)
{
/* allocate memory for a, p and HL */
double * q = Calloc(n, double);
int * a = Calloc(n, int);
int * HL = Calloc(n, int); 
int * H = HL;   
int * L = HL+n-1;  
int i, j, k;
double rU;  /* U[0,1)*n */

/* set up alias table */
/* initialize q with n*p0,...n*p_n-1 */
for(i=0; i= 1.)
*H++ = i;
else
*L-- = i;
}

while( H != HL && L != HL+n-1) {
j = *(L+1);
k = *(H-1);
a[j] = k;
q[k] += q[j] - 1;
L++;  /* remove j from L */
if( q[k] < 1. ) {
*L-- = k; /* add k to L */
--H;  /* remove k */
}
}

/* generate sample */
for (i = 0; i < nans; ++i) {
rU = unif_rand() * n;

k = (int)(rU);
rU -= k;  /* rU becomes rU-[rU] */

if( rU < q[k] )
ans[i] = k+1;
else
ans[i] = a[k]+1;
}
Free(HL);
Free(a);
Free(q); 
}

2. make and make install

3. test the new sample function by code like

> b=sample(seq(1,100), prob=seq(1,100), replace=TRUE, size=100)
> table(b)/100*sum(seq(1,100))

4. run the following code in current R 2.1.0 and updated R.

for(prob in seq(1,4)){
  for(sample in seq(1,4)){
x = seq(1:(10^prob))   # short to long x
p = abs(rnorm(length(x)))  # prob vector
times = 10^(6-prob)   # run shorter cases more times
Rprof(paste("sample_", prob, "_", sample, ".prof", sep=''))
for(t in seq(1,times)){
  sample(x, prob=p, size=10^sample, replace=TRUE )
}
Rprof(NULL)
  }
}

Basically, I tried to test the performance of sample(replace=TRUE, prob=..)
with different length of x and size.

5. process the profiles and here is the result.
p: length of prob and x
size: size of sample
cell: execution time of old/updated sample()

  size\p10  10^210^3   10^4
  10   2.4/1.6  0.32/0.22   0.20/0.08  0.24/0.06  
  10^2 3.1/2.6  0.48/0.28   0.28/0.06  0.30/0.06
  10^3 11.8/11.11.84/1.14   0.94/0.18  0.96/0.08
  10^4 96.8/96.615.34/9.68  7.54/1.06  7.48/0.16
  run: 1100010010 times

We can see that the alias method is quicker than the linear search
method in all cases. The performance difference is greatest (>50 times)
when the original algorithm need to search in a long prob vector.

I have not thoroughly tested the new function. I will do so if you
(the developers) think that this has the potential to be incorporated
into R.

Thanks.

Bo Peng
Department of Statistics
Rice University

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-06-22 Thread Bo Peng
On 6/21/05, Vadim Ogranovich <[EMAIL PROTECTED]> wrote:
> In his "Introduction to Probability Models" Sheldon Ross describes (sec
> 11.4.1, 8th edition) the alias method for such weighted sampling.
> It is based on some decomposition of the original distribution (the
> weights) into a mixture of two-point distributions. 

This sounds like Walker's alias method for weighted sampling. I looked
through Knoth's 'the art of computer programming' and find this
algorithm. I implemented this one but it is just as efficient as the
bisection lookup method in my case. The reason is that the setup of
this algorithm is complicated so it is suited for getting large sample
from short weighted sequences. Anyway, I do suggest R developers try
this algorithm for sample with replacement. A sample code can be found
at http://statistik.wu-wien.ac.at/arvag/monograph/arvag-src/algo03_03.c
.

BTW, does anyone know a quicker algorithm to set up the internal table
of the alias method?

Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] efficiency of sample() with prob.

2005-06-21 Thread Bo Peng
Dear list,

A while ago, Vadim asked opinions on improving efficiency of sample()
with prob, e.g. sample with replacement with weight.  (
https://stat.ethz.ch/pipermail/r-devel/2004-September/030844.html ) He
did not post what he ended up with this problem though.

I am having exactly the same problem. I need to sample  with
replacement from a population of size 10 million with fitness values
for each individual. sample() is too slow for this purpose.

I implement a bisection search algorithm. It is about 30% faster than
the linear search one but still not good enough to me. (I can post the
function if needed). Does anybody have some good ideas? The only other
idea I have is using a faster (but worse) random number generator just
for this application.

Thanks.
Bo

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel