Re: feature suggestion: --preserve-blocking-factor

Ole Tange Sat, 18 Feb 2017 13:37:13 -0800

On Sat, Feb 18, 2017 at 5:39 AM, Cook, Malcolm <[email protected]> wrote:


> I don't think my needs were clear.

Your needs were clear and I am really surprised that you did not
understand the solution I proposed.

> I know you are bioinformatics savvy and are familiar with bedtools, so let me 
> cast my example in terms of bedtools.
>
> I have a huge sorted bedfile, my.bed, that I want to pipe into bedtools merge 
> (http://bedtools.readthedocs.io/en/latest/content/tools/merge.html)
>
> As required, it is sorted already.
>
> I could
>
>         cat my.bed | parallel -j10 --pipe --block 50M bedtools merge
>
> but the blocks that my.bed get broken by parallel into might not keep 
> together the chromosomes, but this is required for the merge to be correct.
>
> So I am looking for a means to instruct parallel that some ranges of records 
> must stay together within a block.

Yup. You want each chromosome to be treated as a record. So what you
do is to insert a record separator before each chromosome and tell GNU
Parallel to use that as record separator. Column 0 is the chromosome,
so when that changes we insert '\0' which will never be in a normal
bedfile. Then we ask GNU Parallel to split records on \0 and remove
the \0 before passing it to bedtools.

  cat my.bed | perl -ape '$F[0] ne $old and print "\0"; $old = $F[0]' |
    parallel --recend '\0' --rrs --pipe --block 50M -j10 bedtools merge

The only thing I have changed from my previous email is:

example -> my.bed
$F[1] -> $F[0]
--block 200 -> --block 50M
wc -> bedtools merge

and added -j10.

I have the feeling you are now saying *DOH*.


/Ole

Re: feature suggestion: --preserve-blocking-factor

Reply via email to