-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/08/2012 06:10 AM, Tim Cutts wrote: > > On 8 Nov 2012, at 13:52, Skylar Thompson > <[email protected]> wrote: > >> I guess if your development time is sufficiently shorter than >> the equivalent compiled code, it could make sense. > > This is true, and a lot of what these guys are writing is pipeline > glue joining other bits of software together, for which scripting > languages are perfect. But there is an element of the "to the man > with a hammer everything looks like a nail" thing going on, and > people are writing analysis algorithms in these languages too. > That's fine for prototyping, but once you run it in production and > it's going to use thousands of CPU-years, it might be nice if > occasionally the prototypes were replaced with something that could > run in hundreds of CPU years instead. In those cases, investing a > few extra weeks in implementing in a "harder" language is > cost-effective. > >> In Genome Sciences here at University of Washington, the grad >> students are taught Python and R, and there's a number of people >> who love the Python MPI bindings. We also have some C MPI users, >> but it's not as popular as Python. >> >> I supposed what you can say is, for the right application, Python >> MPI certainly is faster than serial Python. > > Maybe, maybe not. If the problem is embarrassingly parallel, which > many genomics problems are, often not. We never adopted MPI-BLAST > at Sanger, taking an old example, because the throughput was always > far greater running multiple independent serial BLAST jobs, at > least in a mixed environment where the BLAST searches weren't > terribly predictable. > > Plus of course, writing that MPI version of the code is much harder > to get right than the serial version, so it goes against the > original argument for keeping the development time short. > > I realise I'm playing devil's advocate here, to a great extent. > But most genomics that I've dealt with so far is really about high > throughput, not about short turnaround time of a single analysis > job. Of course there are some exceptions, and I'm making far too > many sweeping generalisations here. > > Tim >
This is definitely true. Many of the MPI jobs here are not what many Beowulfers think of as traditional parallel jobs - they aren't tightly coupled, instead there's one master rank that farms data-parallel jobs out to the child ranks, and then does some post-processing when everything is finished. It could easily be written as a gang of serial jobs and get the same speedup (or lack of speedup - a perennial challenge is explaining how slow disks really are). Skylar -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlCcjpIACgkQsc4yyULgN4Z0xwCgr6zrkXUAmUDrJjuwbB2y2F44 VPEAn2QzzhaLGCOFObLx9r6QHprmCekE =m1w0 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
