On Thu, May 3, 2012 at 11:24 AM, Matt Oates (Home) <[email protected]> wrote:
> On 2 May 2012 15:23, Ole Tange <[email protected]> wrote:
>> On Tue, May 1, 2012 at 9:42 AM, Matt Oates (Home) <[email protected]> 
>> wrote:
>>> On 30 April 2012 21:51, Ole Tange <[email protected]> wrote:
>>>> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <[email protected]> 
>>>> wrote:
>> :
>>>>> I then want to run something of the form:
>>>>>
>>>>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>>>>> -" < file.tab | output-processing-program > results.tab
:
>> 21501699        MSAFFPVISSLNPAVPSVAAP
>> 21501700        MIGGILSCGITHTGITPLDVV
>> 21501701        MVIAIAKYFGWPLDQLDVVTA
>> 21501702        MKWHPDKNKNNLVEAQYRFQE
>>
>> If I understand you correctly, you want:
>>
>>  echo 21501699
>>  printf "21501699\tMSAFFPVISSLNPAVPSVAAP" | myprogram /dev/stdin
:
> >  cat foo.tab | parallel -C '\t' 'echo {1}; printf "{1}\t{2}\n" |
> > myprogram /dev/stdin'
:
> Yes this is what I want to achieve in theory but the protein sequences
> are too long to be used as command line arguments, they overflow the
> UNIX c/l buffer.

Use awk to extract the short args on a line of its own while keeping
the full line:

cat file.tab | awk '{ print $1;print; }' | parallel --pipe -N2 'read
A; echo $A; myprogram'

This works for sequences up to at least 50 MB.


/Ole

Reply via email to