Hi Ole, On 22 August 2012 07:08, Ole Tange <[email protected]> wrote: > So please write a few lines about the tasks you use it for - > especially if you have reason to believe you are one of the few doing > that kind of thing. If you want to be anonymous you can write me > directly, but otherwise use the mailing list.
Good luck with the talk! I use parallel to parallelise the external loop of most Bioinformatics software, especially HMMER3. Many pieces of software have no parallelisation, so if I give a big long list of input they go through serially. I work with quite large datasets, 1,765 genomes each having 1-10 thousand protein sequences. With 5x 24 core desktops I can really cutback how long something takes. We even have an internal script that bridges parallel with the EC2 compute cloud, so if I need to do something extra big I just go wider and hand the list of EC2 machine names to parallel. More day to day, I frequently use parallel to transform large files (hundreds of gigabytes per file) of data between text based file formats, so parallel perl/sed. I use the --pipe feature a lot to split files too, so something like the FASTA format is splitable with parallel and I can pipe the data straight in to another program. I think you would do well to perhaps publish a short paper somewhere in the Bioinformatic field about the speed ups you can get using parallel with older non-parallel software. Best, Matt. --- http://www.mattoates.co.uk http://bccs.bris.ac.uk
