Re: RFC: Safely using xargs -P$NUM children's output? Need a new tool?

Egmont Koblinger Thu, 02 May 2019 11:26:11 -0700

Hi Denys,

(Disclaimer: I'm just a random guy commenting here, not affiliated
with the coreutils team.)


The simple solution recommended by xargs's man page at the -P option
is to use separate output file for each parallel "file".

One concern with your suggested coalesce approach is that this tool
itself also needs to write its output in an atomic step. If coalesce
itself performs a partial write then you're back to square one.

I guess (and that's what you also say) if the output goes to a local
file system then complete writes are guaranteed. Might depend on the
file system type or other factors though, a kernel guru could clarify
it. I'm not so sure about virtual, remote, fuse etc. file systems. If
the output goes to a pipe for further processing then I'm pretty sure
all you'd see are short writes of some small pipe buffer size. This
would make this coalesce tool tricky to use properly, and easy to make
mistakes during refactoring (like: oh, I need to pipe it through more
one step before storing in the file... now it's broken, why?). So
using coalesce would require special attention. In order to prevent
"wrong" usage, it could take the output filename as a parameter and
not implement writing to its stdout. It could also reject
"/dev/stdout" or verify that the output file is seekable. That being
said, due to these constraints, I'm wondering if this is a wise
approach to take.

Another solution could be a new option to xargs to run a merger
process. This merger would read from the 199 different file
descriptors (the output of each "file" connected to one of them) and
output to its stdout whenever it gets an EOF or a specified separator
character in one of them. Could do the same with stderrs as well.
Would be incomptible with SIGUSR1/2 to dynamically change the number
of xargs's parallel processes. It would be interesting to see if
findutils's developers fancy this idea.

I guess I'd just go with a separate temporary output file for each
parallel "sh -c file", that sounds by far the simplest to me.


cheers,
egmont

Re: RFC: Safely using xargs -P$NUM children's output? Need a new tool?

Reply via email to