Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-05 Thread David Bauer

What would you say typically limits taskPR's approach, not finding
enough instruction-level parallelism at the R script level, or the
communications overhead (probably latency) of trying to make use of
it?


Depends on the specific function.  The communication cost is 
significant, especially serialization and deserialization.  (Since I 
finally found the right way to force a flush of the TCP data, the actual 
network cost isn't a problem for moderate sized data.)  For reasons of 
simplicity of implementation and ease of correctness, a lot of the R 
environment is serialized and sent over with *each* operation.


In terms of the instruction-level parallelism available, code that is a 
performance bottle-neck is usually re-written in C or Fortran and called 
in large blocks.  So now the program is trying to find parallelism in 
the large blocks, which it usually can't.


I didn't have a lot of suitable code to try, and so the best example 
program was one that did a complex calculation followed by an accumulate 
operation in a loop.  Parallel-R/taskPR dynamically unrolled the loop 
(just like Tomosulo's algorithm does on a processor) and got a 
reasonable speedup (about half of linear).  Unfortunately, I don't even 
have that code example any more.




If latency, then perhaps taskPR would work better in a multi-threaded
R interpreter, rather than across a TCP/IP network fabric.


Yes, most especially if serialization and deserialization could be 
avoided.  However, I don't believe R is thread-safe?  (Using shared 
memory, but between multiple R processes, was on the TODO list when the 
project ended.)


I was fortunate to have access to a very large NUMA machine at the time 
that I was originally working on this project, so the network itself 
wasn't a limiting factor.  (The network stack turned out to be a 
problem, though.)



David Bauer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread Andrew Piskorski
I see about 7 different R packages for multi-process parallel
programming.  Which do you think is the best, most complete, and most
robust to pick for general purpose Erlang-style message-passing
programming in R, and why?

First here's my use case, and then my analysis so far.  I often have
code whose basic organization looks something like this:

1. Fetch step: For each date, gather up or pre-process a bunch of
   data.  Return a big list of data, one item on the list for each date.
2. Compute step: For each date on the big list of data, do a bunch of
   computations.

Of course, when the number of dates is large, it's pretty annoying to
wait for all the fetches to complete before starting the compute step.
(Especially when the compute step then hits a bug on the very first
date.)  So in practice, I end up breaking things apart to fetch and
then compute one date at a time, etc.

However, instead of completely serializing everything the way I do
now, it would be nice to have 2 concurrent threads of control
(processes, threads, coroutines, or whatever) which talk to each
other.  Then the compute thread can just periodically say to the fetch
thread, Give me the next date's worth of data, please.  And usually
the fetch thread will already have that data fetched and ready to go.

Also, sometimes my compute step is slow, and has a lots of readily
parallelizable work, so it would be even better if I can optionally
run things across multiple physical machines in a cluster.

How to do it?  R is single-threaded and not thread safe, so threads
are out.  Coroutines are also probably out.  The obvious approach is
to use multiple R processes which talk to each other via some message
passing library.

Fortunately, R has a plethora of such packages.  My question is, which
is the best choice for this sort of use?  From reading their API docs,
here are my brief thoughts on each so far:

- papply:  Not suitable, no bi-directional communication.  Slave
  process return values when the papply() call completes, that's it.

- biopara:  Not suitable, simple one-way master/slave communication
  only, just like papply.

- snow:  Not directly suitable, the supported communication is intended
  to be very simple.  But since it runs on top of Rmpi, perhaps its
  utility code would be useful in conjunction with Rmpi?

- taskPR:  Sounds equivalent to snow.  Also uses MPI underneath.

- Rmpi:  Probably.  Should definitely work for my needs, only question
  is if it's the best choice.  Is it stable, complete, robust, etc.?

- rpvm:  Maybe.  Should be equivalent to Rmpi, but MPI is much more
  popular on clusters than PVM these days.

- NetWorkSpaces:  Maybe.  This looks like a rather mature and
  well-supported multi-language TupleSpace implementation, so it could
  certainly be made to work.

  Passing all my large R data objects back and forth solely as strings
  seems very unappealing, but the docs hint that it includes direct
  (or at least transparent) support for binary R objects.  I need to
  start up and run an explicit NetWorkSpaces Python/Twisted server.

  Also, TupleSpace programming sounds somewhat more limiting than
  Erlang-style message passing (although I definitely do not know that
  for sure!).  On the other hand, the TupleSpace APIs sound a lot
  simpler than MPI.

Since I've never done MPI programming before, I'm also curious about
some of the practical semantics of Rmpi.  E.g., is it possible to send
a message to a busy R process that says, Stop what you're doing right
now! and have it obeyed immediately?  Probably not, as I think that
would require either multiple threads or an active event loop
somewhere in either R or the MPI stack.

Finally, here are links and some notes on each of the above 7 packages
(converted from HTML with 'lynx -dump'):

* [1]Rmpi ([2]CRAN, [3]tutorial), [4]rpvm ([5]CRAN). 
* [6]SNOW ([7]CRAN) - Simple Network of Workstations for R, high 
  level interface for parallel R on clusters, uses sockets, MPI, or 
  PVM underneath. Reportedly intended for embarassingly parallel 
  not closely coupled problems. 

* [8]papply ([9]CRAN) 
* The [10]Parallel-R project provides both [11]RScaLAPACK ([12]CRAN) 
  and [13]taskPR ([14]old), using MPI. 
* [15]biopara - One-way master/slave communication, much like papply 
  or taskPR. Uses R sockets, no MPI or PVM underneath. 

* [16]NetWorkSpaces for R ([17]article, [18]FAQ) from [19]SCAI is a 
  [20]dual licenced (GPL and commercial) Linda/tuplespace 
  implementation. Also, some aspects sound similar to the [21]data 
  flow variables in [22]Van Roy's [23]CTM and [24]Mozart/Oz. 
 
References 
   1. http://www.stats.uwo.ca/faculty/yu/Rmpi/ 
   2. http://cran.us.r-project.org/src/contrib/Descriptions/Rmpi.html 
   3. http://ace.acadiau.ca/math/ACMMaC/Rmpi/ 
   4. http://www.analytics.washington.edu/statcomp/projects/rhpc/rpvm/ 
   5. http://cran.us.r-project.org/src/contrib/Descriptions/rpvm.html 
   6. 

Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread David Bauer
 - taskPR:  Sounds equivalent to snow.  Also uses MPI underneath.

Actually, it is very different from snow.  taskPR was an attempt to get 'free' 
parallelism out of already existing programs by using simple data dependencies 
to figure out which individual statements in a program can be run in parallel.  
The name comes from the description of the program as exploiting task-level 
parallelism.  Compare this to snow which uses data-level parallelism 
(performing the same operation on many pieces of data at once).  Additionally, 
MPI is optional, and only used for the initial setup of processes.
(If anybody actually uses or has successfully used this package, I would love 
to hear about it, btw.  While the package *does* work, there are probably few 
cases where it is worth it.)


David Bauer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread Andrew Piskorski
On Thu, Sep 04, 2008 at 04:06:31PM -0400, David Bauer wrote:

 taskPR was an attempt to get 'free' parallelism out of already
 existing programs by using simple data dependencies to figure out
 which individual statements in a program can be run in parallel.
 The name comes from the description of the program as exploiting
 task-level parallelism.

Ah, and thus your reference to Tomasulo's algorithm, interesting.
Thanks for straightening me out there.

  http://users.ece.gatech.edu/~gte810u/Parallel-R/

 (If anybody actually uses or has successfully used this package, I
 would love to hear about it, btw.  While the package *does* work,
 there are probably few cases where it is worth it.)

What would you say typically limits taskPR's approach, not finding
enough instruction-level parallelism at the R script level, or the
communications overhead (probably latency) of trying to make use of
it?

If latency, then perhaps taskPR would work better in a multi-threaded
R interpreter, rather than across a TCP/IP network fabric.  To roughly
test that empirically (assuming you are in fact using MPI for the
communications), I suppose you could start up your several R processes
on a single fat SMP node, and use an MPI that sends messages through
fast shared memory.  That's probably still slower than
thread-to-thread communications, but it should be much lower latency
than TCP/IP.  Maybe you already tried something like that?

-- 
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel