Re: [Rd] Running two R instances at the same time

2009-09-08 Thread Peter Juhasz
On Mon, Sep 7, 2009 at 4:47 PM, Andrew Piskorskia...@piskorski.com wrote:
 On Sat, Sep 05, 2009 at 08:31:18PM +0200, Peter Juhasz wrote:

 I don't understand how is this possible. Maybe there is an issue of
 thread-safety with the R backend, meaning that the two R *interpreter*
 instances are talking to the same backend that's capable of processing
 only one thing at a time?

 No.  Particularly since there is no R backend involved at all, not
 if you're starting up the standard R from Ubuntu 9.04 (rather than
 Rserve or something else unusual).  People run multiple R processes
 concurrently on the same (multi-core) machine all the time, works
 fine.

 Please see http://www.perlmonks.org/?node_id=792460 for an extended
 discussion of the problem, and especially
 http://www.perlmonks.org/?node_id=793506 for excerpts of output and
 actual code.

 The most likely explanation seems to be that you have a bug in your
 Perl code.  Have you tried using your Perl framework to fork something
 OTHER than R?  Have you tried manually starting up two R processes and
 running your R code that way?  And, what is the actual R code you're
 running?  You don't seem to have shown it anywhere.

 --
 Andrew Piskorski a...@piskorski.com
 http://www.piskorski.com/


Actually, I have ran some tests that clarified the issue somewhat.

- It is always possible that my Perl code is buggy but that doesn't
seem to play a role in this case.
- I tried to use my Perl system to start two non-R processes - it
worked as expected, they ran concurrently without ill effects.

But please see http://www.perlmonks.org/index.pl?node_id=793907 ,
where I posted my R code in its simplest form along with an example
run which exhibits the symptoms I originally wrote about.

From that test I conclude that it is not my Perl code nor R itself
that is wrong here, but the specific package I use. That package -
NADA, from which I use the 'cenros' command - seems to be the culprit.

Please forgive my complete lack of experience when it comes to R, this
made me assume things that didn't make sense.

Péter Juhász

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Running two R instances at the same time

2009-09-07 Thread Andrew Piskorski
On Sat, Sep 05, 2009 at 08:31:18PM +0200, Peter Juhasz wrote:

 I don't understand how is this possible. Maybe there is an issue of
 thread-safety with the R backend, meaning that the two R *interpreter*
 instances are talking to the same backend that's capable of processing
 only one thing at a time?

No.  Particularly since there is no R backend involved at all, not
if you're starting up the standard R from Ubuntu 9.04 (rather than
Rserve or something else unusual).  People run multiple R processes
concurrently on the same (multi-core) machine all the time, works
fine.

 Please see http://www.perlmonks.org/?node_id=792460 for an extended
 discussion of the problem, and especially
 http://www.perlmonks.org/?node_id=793506 for excerpts of output and
 actual code.

The most likely explanation seems to be that you have a bug in your
Perl code.  Have you tried using your Perl framework to fork something
OTHER than R?  Have you tried manually starting up two R processes and
running your R code that way?  And, what is the actual R code you're
running?  You don't seem to have shown it anywhere.

-- 
Andrew Piskorski a...@piskorski.com
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Running two R instances at the same time

2009-09-07 Thread Andrew Piskorski
On Mon, Sep 07, 2009 at 05:36:38PM +0200, Peter Juhasz wrote:

 But please see http://www.perlmonks.org/index.pl?node_id=793907 ,
 where I posted my R code in its simplest form along with an example
 run which exhibits the symptoms I originally wrote about.

Ah, your two-process serialization is probably happening during that
cenros() call then.  (You may want to run with/without that call to
confirm.)  cenros() is in the NADA package and seems to use survreg()
from the survival package.  The survival package looks bigger than
NADA and includes a bunch of C code, so perhaps one of its C
implementations is using some sort of mutex-locked system call.  If
so, it'd be interesting to know where the serialization is happening.

  http://cran.r-project.org/web/packages/NADA/index.html
  http://cran.r-project.org/web/packages/survival/index.html

You may want to run your code under strace (perhaps with -cfF) and/or
ltrace, to get a list of C-level functions that are actually being
called.  That might give you an idea of where the blocking is
occurring, and could also help the NADA or survival package
maintainers when you ask them about this.

(But, hm, haven't any of your international collaborators run across
this serialization problem before?)

-- 
Andrew Piskorski a...@piskorski.com
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Running two R instances at the same time

2009-09-05 Thread Simon Urbanek


On Sep 5, 2009, at 2:31 PM, Peter Juhasz wrote:


Reposting from R-help:

Dear R experts,

please excuse me for writing to the mailing list without subscribing.
I have a somewhat urgent problem that relates to R.

I have to process large amounts of data with R - I'm in an
international collaboration and the data processing protocol is fixed,
that is a specific set of R commands has to be used.

I wrote a perl program that manages creation of data subsets from my
database and feeds these subsets to an R process via pipes.

This worked all right, however, I wanted to speed things up by
exploiting the fact that I have a dual-core machine. So I rewrote my
perl driver program to use two threads, each starting its own R
instance, getting data off a queue and feeding it to its R process.

This also worked, except that I noticed something very peculiar: the
processing time was almost exactly the same for both cases. I did some
tests to look at this, and it seems that R needs twice the time to do
the exact same thing if there are two instances of it running.

I don't understand how is this possible. Maybe there is an issue of
thread-safety with the R backend, meaning that the two R *interpreter*
instances are talking to the same backend that's capable of processing
only one thing at a time?



No, are least not in R itself. Clearly there are many explanations  
(you are accessing the data in some way that is not parallelizable, R  
is already using both cores, perl does something funny that you are  
not anticipating ...), but I see too little evidence. The perl code to  
too much of a mess to really tell - why don't you just start two of  
your jobs manually in the background and clock them? For starters use  
simply

time .. 

In perl I wouldn't use threads, it should be as simple as

#!/usr/bin/perl

sub run {
$children++;
if (fork() == 0) {
print job $children started\n;
system $_[0];
print job $children done\n;
exit 0;
}
}

run sleep 1;
run sleep 2;
#etc.

while ($children) { wait; $children--; }
print Jobs done.\n;

Fino:sandbox$ ./tt
job 1 started
job 2 started
job 1 done
job 2 done
Jobs done.

(replace sleep by your R invocation ... use your imagination to  
improve it since it's admittedly very crude but helps to track it  
down ...)


Cheers,
Simon



Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and
the R version used was the default one from the Ubuntu repository.

Please see http://www.perlmonks.org/?node_id=792460 for an extended
discussion of the problem, and especially
http://www.perlmonks.org/?node_id=793506 for excerpts of output and
actual code.

I have received several suggestions about R packages that would enable
parallel processing in some way or other, and I'm thankful for those.

However, at this point I'm interested in having two completely
unrelated R processes that run simultaneously, not in parallel
processing from within R.
I have to admit that I'm an absolute beginner when it comes to R and
this project will be finished before I could learn everything I'd need
for a pure R solution. I'm familiar with perl, however, so I'd like to
stick to that.

Thanks for your answers in advance and please excuse me if this causes
too much noise:

Péter Juhász
physicist

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel