On 8 Nov 2012, at 13:52, Skylar Thompson <[email protected]> wrote:

> I guess if your development time is sufficiently shorter than the
> equivalent compiled code, it could make sense.

This is true, and a lot of what these guys are writing is pipeline glue joining 
other bits of software together, for which scripting languages are perfect.  
But there is an element of the "to the man with a hammer everything looks like 
a nail" thing going on, and people are writing analysis algorithms in these 
languages too.  That's fine for prototyping, but once you run it in production 
and it's going to use thousands of CPU-years, it might be nice if occasionally 
the prototypes were replaced with something that could run in hundreds of CPU 
years instead.  In those cases, investing a few extra weeks in implementing in 
a "harder" language is cost-effective.

> In Genome Sciences here
> at University of Washington, the grad students are taught Python and R,
> and there's a number of people who love the Python MPI bindings. We also
> have some C MPI users, but it's not as popular as Python.
> 
> I supposed what you can say is, for the right application, Python MPI
> certainly is faster than serial Python.

Maybe, maybe not.  If the problem is embarrassingly parallel, which many 
genomics problems are, often not.  We never adopted MPI-BLAST at Sanger, taking 
an old example, because the throughput was always far greater running multiple 
independent serial BLAST jobs, at least in a mixed environment where the BLAST 
searches weren't terribly predictable.

Plus of course, writing that MPI version of the code is much harder to get 
right than the serial version, so it goes against the original argument for 
keeping the development time short.

I realise I'm playing devil's advocate here, to a great extent.  But most 
genomics that I've dealt with so far is really about high throughput, not about 
short turnaround time of a single analysis job.  Of course there are some 
exceptions, and I'm making far too many sweeping generalisations here.

Tim

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to