Re: [Rd] Running two R instances at the same time
On Mon, Sep 7, 2009 at 4:47 PM, Andrew Piskorskia...@piskorski.com wrote: On Sat, Sep 05, 2009 at 08:31:18PM +0200, Peter Juhasz wrote: I don't understand how is this possible. Maybe there is an issue of thread-safety with the R backend, meaning that the two R *interpreter* instances are talking to the same backend that's capable of processing only one thing at a time? No. Particularly since there is no R backend involved at all, not if you're starting up the standard R from Ubuntu 9.04 (rather than Rserve or something else unusual). People run multiple R processes concurrently on the same (multi-core) machine all the time, works fine. Please see http://www.perlmonks.org/?node_id=792460 for an extended discussion of the problem, and especially http://www.perlmonks.org/?node_id=793506 for excerpts of output and actual code. The most likely explanation seems to be that you have a bug in your Perl code. Have you tried using your Perl framework to fork something OTHER than R? Have you tried manually starting up two R processes and running your R code that way? And, what is the actual R code you're running? You don't seem to have shown it anywhere. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ Actually, I have ran some tests that clarified the issue somewhat. - It is always possible that my Perl code is buggy but that doesn't seem to play a role in this case. - I tried to use my Perl system to start two non-R processes - it worked as expected, they ran concurrently without ill effects. But please see http://www.perlmonks.org/index.pl?node_id=793907 , where I posted my R code in its simplest form along with an example run which exhibits the symptoms I originally wrote about. From that test I conclude that it is not my Perl code nor R itself that is wrong here, but the specific package I use. That package - NADA, from which I use the 'cenros' command - seems to be the culprit. Please forgive my complete lack of experience when it comes to R, this made me assume things that didn't make sense. Péter Juhász __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Running two R instances at the same time
On Sat, Sep 05, 2009 at 08:31:18PM +0200, Peter Juhasz wrote: I don't understand how is this possible. Maybe there is an issue of thread-safety with the R backend, meaning that the two R *interpreter* instances are talking to the same backend that's capable of processing only one thing at a time? No. Particularly since there is no R backend involved at all, not if you're starting up the standard R from Ubuntu 9.04 (rather than Rserve or something else unusual). People run multiple R processes concurrently on the same (multi-core) machine all the time, works fine. Please see http://www.perlmonks.org/?node_id=792460 for an extended discussion of the problem, and especially http://www.perlmonks.org/?node_id=793506 for excerpts of output and actual code. The most likely explanation seems to be that you have a bug in your Perl code. Have you tried using your Perl framework to fork something OTHER than R? Have you tried manually starting up two R processes and running your R code that way? And, what is the actual R code you're running? You don't seem to have shown it anywhere. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Running two R instances at the same time
On Mon, Sep 07, 2009 at 05:36:38PM +0200, Peter Juhasz wrote: But please see http://www.perlmonks.org/index.pl?node_id=793907 , where I posted my R code in its simplest form along with an example run which exhibits the symptoms I originally wrote about. Ah, your two-process serialization is probably happening during that cenros() call then. (You may want to run with/without that call to confirm.) cenros() is in the NADA package and seems to use survreg() from the survival package. The survival package looks bigger than NADA and includes a bunch of C code, so perhaps one of its C implementations is using some sort of mutex-locked system call. If so, it'd be interesting to know where the serialization is happening. http://cran.r-project.org/web/packages/NADA/index.html http://cran.r-project.org/web/packages/survival/index.html You may want to run your code under strace (perhaps with -cfF) and/or ltrace, to get a list of C-level functions that are actually being called. That might give you an idea of where the blocking is occurring, and could also help the NADA or survival package maintainers when you ask them about this. (But, hm, haven't any of your international collaborators run across this serialization problem before?) -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Running two R instances at the same time
On Sep 5, 2009, at 2:31 PM, Peter Juhasz wrote: Reposting from R-help: Dear R experts, please excuse me for writing to the mailing list without subscribing. I have a somewhat urgent problem that relates to R. I have to process large amounts of data with R - I'm in an international collaboration and the data processing protocol is fixed, that is a specific set of R commands has to be used. I wrote a perl program that manages creation of data subsets from my database and feeds these subsets to an R process via pipes. This worked all right, however, I wanted to speed things up by exploiting the fact that I have a dual-core machine. So I rewrote my perl driver program to use two threads, each starting its own R instance, getting data off a queue and feeding it to its R process. This also worked, except that I noticed something very peculiar: the processing time was almost exactly the same for both cases. I did some tests to look at this, and it seems that R needs twice the time to do the exact same thing if there are two instances of it running. I don't understand how is this possible. Maybe there is an issue of thread-safety with the R backend, meaning that the two R *interpreter* instances are talking to the same backend that's capable of processing only one thing at a time? No, are least not in R itself. Clearly there are many explanations (you are accessing the data in some way that is not parallelizable, R is already using both cores, perl does something funny that you are not anticipating ...), but I see too little evidence. The perl code to too much of a mess to really tell - why don't you just start two of your jobs manually in the background and clock them? For starters use simply time .. In perl I wouldn't use threads, it should be as simple as #!/usr/bin/perl sub run { $children++; if (fork() == 0) { print job $children started\n; system $_[0]; print job $children done\n; exit 0; } } run sleep 1; run sleep 2; #etc. while ($children) { wait; $children--; } print Jobs done.\n; Fino:sandbox$ ./tt job 1 started job 2 started job 1 done job 2 done Jobs done. (replace sleep by your R invocation ... use your imagination to improve it since it's admittedly very crude but helps to track it down ...) Cheers, Simon Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and the R version used was the default one from the Ubuntu repository. Please see http://www.perlmonks.org/?node_id=792460 for an extended discussion of the problem, and especially http://www.perlmonks.org/?node_id=793506 for excerpts of output and actual code. I have received several suggestions about R packages that would enable parallel processing in some way or other, and I'm thankful for those. However, at this point I'm interested in having two completely unrelated R processes that run simultaneously, not in parallel processing from within R. I have to admit that I'm an absolute beginner when it comes to R and this project will be finished before I could learn everything I'd need for a pure R solution. I'm familiar with perl, however, so I'd like to stick to that. Thanks for your answers in advance and please excuse me if this causes too much noise: Péter Juhász physicist __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel