On 4/11/07, AJ Rossini <[EMAIL PROTECTED]> wrote:
> On Tuesday 10 April 2007 23:17, Ramon Diaz-Uriarte wrote:
>
> > Of course, you are right there. I think that might still be the case.
> > At the time we made our decision, and decided to go for MPI, MPI 2 was
> > already out, and MPI seemed "more like the current/future standard"
> > than PVM.
>
> That's always been the case.  In fact MPI is a standard, where as PVM always
> was an implementation defining a so-called standard.
>


Ooops, you are right. But in addition to whether or not a standard, it
seemed (and still seems) that "MPI is the current/future stuff"
whereas PVM seemed more like a useful but aging approach. (I am aging
too, so maybe that ain't that good an argument :-).


> > So using papply with Rmpi requires sharper programmers than using
> > snow? Hey, it is good to know I am that smarter. I'll wear that as a
> > badge :-).
>
> You are!   I've never been patient enough to use plain Rmpi or rpvm except a
> few times, but for me, the advantage of snow is that you get all the

Oh, but except for a few very simple things such as broadcasting data
or functions to all the slaves, or cleaning up, I never use Rmpi
directly. I always use papply, which is, really, a piece of cake.

I am just scratching the surface of this parallelism stuff, and I am
sticking to the simple "embarrasingly parallelizable" problems
(cross-validation, bootstrap, identical analysis on many samples,
etc). So going any deeper into MPI (individual sends, receives, etc)
was more trouble than it seemed worth. papply or, alternatively,
clusterApplyLB, are all I've (almost ever) needed/used.


> backends, not just MPI.  In fact, I've heard mention that some folks are
> sticking together a NWS backend as well.
>

> > Anyway, papply (with Rmpi) is not, in my experience, any harder than
> > snow (with either rpvm or Rmpi). In fact, I find papply a lot simpler
> > than snow (clusterApply and clusterApplyLB). For one thing, debugging
> > is very simple, since papply becomes lapply if no lam universe is
> > booted.
>
> In fact it might be easier, since we never put together decent aggregation
> routines.
>
> (smarter doesn't mean works harder, just more intelligently :-).
>

I'll take that as a compliment :-).


> > I see, though, that I might want to check PVM just for the sake of the
> > fault tolerance in snowFT.
>
> Fault tolerance is one of those very ill-defined words.   Specifically:
>
> #1 - mapping pRNG streams to work units, not just CPUs or dispatch order (both
> of which can differ), for reproducibility
>
> #2 - handling "failure to complete" on worker nodes gracefully.
>
> However, you'd need checkpointing or probably a miracle to handle failure on
> the master...
>

Aha, I hadn't thought of #1, beings as I am much more concerned about
#2. (For #1, and to check results, I tend to run things under
controlled conditions, where if a worker shuts down, I'll bring it
back to life, and start again ---not elegant, but this happens rarely
enough that I don't worry too much).

Right now, I am dealing with #2 via additional external scripts that
check that LAM universes are up, examine log files for signs of
failures, modify lamb host definition files if needed, restart LAM
universes,  etc, and with checkpointing in the R code.  But I think it
is an ugly kludge (and a pain). I envy the Erlang guys.

As for failure in the master ... I'll take that as an act of god, so
no point in trying to defeat it via miracles :-). Actually, the
scripts above could be distributed (the checkpointing is done from the
master), so this is doable via a meta script that runs distributed.
I've just added that to the "to-do" list.


Best,


R.
>
> best,
> -tony
>
> [EMAIL PROTECTED]
> Muttenz, Switzerland.
> "Commit early,commit often, and commit in a repository from which we can
> easily
> roll-back your mistakes" (AJR, 4Jan05).
>
>


-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to