On Sat, Aug 20, 2011 at 12:54 AM, Nathan Watson-Haigh
<[email protected]> wrote:
>
> What I'm actually doing is using the ABySS genome assembler. Part of the 
> pipeline is:
>
> KAligher | ParseAligns | sort | DistanceEst
>
> KAligner takes sequences from one file (queries) and finds alignments agianst 
> sequences in another file (targets), outputting these in Sequence 
> Alignment/Map (SAM) format. ParseAligns takes the SAM format and filters out 
> some alignments. It is the ParseAligns step which is slowest and I'm looking 
> at how best to split up the work to make use of more cores. A job for early 
> next week!

The most obvious way seems to be:

  cat queries | parallel --pipe --files 'KAligher | ParseAligns |
sort' | parallel -Xj1 sort -m {}\;rm {} | DistanceEst

Would that work?

/Ole

Reply via email to