Hi Ole,
Ole Tange <ole <at> tange.dk> writes:
> You have not given enough information to generate a big inputfile, so
> I cannot reproduce your test.
I know. I created a quick R script for dummy data I can post if there is
interest.
> Based on your explanation it sounds as if the awk script opens files A, B and
C.
>
> If 2 awk scripts both open A, B and C then the last one wins and all
> data written by the first one is lost.
Plonk. I think that may indeed be the case. I had not tought that through.
I have to find a tool that does this in append mode.
> One way to solve that is to instead have the first invocation open A1,
> B1 and C1 while the second writes to A2, B2 and C2. You can use {#} or
> $PARALLEL_SEQ for that by writing to A{#} or A$PARALLEL_SEQ.
Hm. Then I have ~ N x cores files, and need to aggregate those. Won't win that
over straight awk use, I fear. Shucks.
> Maybe you were under the impression that GNU Parallel would cache all
> output (even data written to files) and write that atomically in the
> end, but that is only the case for STDERR and STDOUT.
Cache would be costly. I was hoping I could just append in pieces, as I find
the
respective lines.
Back to the drawing board.
Thanks, Dirk