Re: GNU Parallel seems to drop

Dirk Eddelbuettel Tue, 25 Sep 2012 04:23:23 -0700

Hi Ole,

Ole Tange <ole <at> tange.dk> writes:
> You have not given enough information to generate a big inputfile, so
> I cannot reproduce your test.


I know. I created a quick R script for dummy data I can post if there is 
interest. 
 
> Based on your explanation it sounds as if the awk script opens files A, B and 
C.
> 
> If 2 awk scripts both open A, B and C then the last one wins and all
> data written by the first one is lost.

Plonk. I think that may indeed be the case. I had not tought that through.
I have to find a tool that does this in append mode.
 
> One way to solve that is to instead have the first invocation open A1,
> B1 and C1 while the second writes to A2, B2 and C2. You can use {#} or
> $PARALLEL_SEQ for that by writing to A{#} or A$PARALLEL_SEQ.

Hm. Then I have ~ N x cores files, and need to aggregate those. Won't win that 
over straight awk use, I fear.  Shucks.
 
> Maybe you were under the impression that GNU Parallel would cache all
> output (even data written to files) and write that atomically in the
> end, but that is only the case for STDERR and STDOUT.

Cache would be costly. I was hoping I could just append in pieces, as I find 
the 
respective lines.

Back to the drawing board.

Thanks,  Dirk

Re: GNU Parallel seems to drop

Reply via email to