Assuming your shell is bash.... With this exported function
function slice {
# PURPOSE: After an optional -h lines of header (which are echoed
# unless supressed with <-sh>), echo every <-n>th line (default:
# every 1 line) starting with the <-m>th (counting from 1, starting
# with the first line after the header, default: starting with the
# <n-1>th line.)
# AUTHOR: [email protected]
# EXAMPLE: slice -h=1 -sh -n=5 foo.tab >
foo_every_fifth_line_after_the_one_line_header.tab
# set -e ;
perl -snwe 'BEGIN{our $n||=1; our $m=($n) unless defined($m); $m-=1; our
$h||=0; die "required: m < n" unless $m < $n; our $sh} print $_ if (($. > $h )
? (($. -1 - $h) % $n == $m) : ! $sh)' -- $@
}
export -f slice
...you can create a parallel jobs where each job greps a slice of in.bam
You would pass parallels {#} as the value for -m and the same value you pass as
-j to parallel as the value for -n
You'll probably need to use parallels -q and have each job call bash.
The following is untested.
parallel -j 10 -q 'bash -c "samtools view in.bam | slice -n=10 -m={#} | bash -c
fgrep -w -f read.ids"' > alignments.txt
The output will have the slices interwoven.
From: [email protected]
[mailto:[email protected]] On Behalf Of Nathan S.
Watson-Haigh
Sent: Friday, August 09, 2013 12:54 AM
To: [email protected]
Subject: Parallelising grep
I have a SAM/BAM file and I'd like to grep for alignments of certain reads IDs.
I have the read ID strings in another file. I'm currently doing this with:
$ samtools view in.bam | fgrep -w -f read.ids > alignments.txt
Is it possible to parallelise the grep by having each grep process a different
subset of read iDs from the read.ids file? Or is there an alternative way to
parallelise this which I have overlooked?
Cheers,
Nathan
--
Nathan S. Watson-Haigh, PhD
Research Fellow in Bioinformatics
[Description: Description: Description: logo1a4Signature]
Australian Centre for Plant Functional Genomics (ACPFG)
School of Agriculture, Food and Wine
University of Adelaide Waite Campus
Plant Genomics Centre
Hartley Grove, Urrbrae
SA 5064
Phone: +61 8 8313 2046
Mobile: +61 438 711 615
Skype: nathanhaigh<skype:nathanhaigh?call>
Email:
[email protected]<mailto:[email protected]>
Web: http://www.acpfg.com.au/bioinformatics
LinkedIn http://www.linkedin.com/profile/view?id=114191748
Github: https://github.com/nathanhaigh/
https://gist.github.com/nathanhaigh/
Twitter: @watsonhaigh<https://twitter.com/watsonhaigh>
@BIG_SA1<https://twitter.com/BIG_SA1>
RID:
B-9833-2008<http://www.researcherid.com/rid/B-9833-2008>
ResearchGate:
Nathan_Watson-Haigh<https://www.researchgate.net/profile/Nathan_Watson-Haigh/>
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
<<inline: image001.gif>>
