Pádraig Brady wrote: > Pádraig Brady wrote: >> >> You wouldn't want multiple threads/processes fighting over >> the disk head so you would do something like: >> >> find /disk1 | xargs md5sum & find /disk2 | xargs md5sum >> >> Note if we're piping/redirecting the output of the above >> then we must be careful to line buffer the output from md5sum >> so that it's not interspersed. Hmm I wonder should >> we linebuffer the output from *sum by default. > > In the attached patch, I've changed the default buffering > to line buffered to address the above issue. For standard > size files there is a 2% performance drop.
Good catch. It sounds like this fixes a real (albeit obscure) bug, so this might deserve a NEWS item, though I admit it is borderline. Thanks! > p.s. I'll look at bypassing stdio on input to see > if I can get at least the 2% back IMHO, even if it did, it would not be worth it. >>From 0db7057c6256d9cd25e988b3fe23e97a0e30f717 Mon Sep 17 00:00:00 2001 > From: =?utf-8?q?P=C3=A1draig=20Brady?= <[email protected]> > Date: Tue, 20 Oct 2009 19:19:58 +0100 > Subject: [PATCH] md5sum, sha*sum, sum: line buffer the outputted checksums s/outputted/printed/ s/line buffer/line-buffer/ > * src/md5sum.c (main): Set stdout to line buffered mode > to ensure parallel running instances don't intersperse > their output. This adds 5% to the run time in the worst case > of many zero length files, or 2% with standard file sizes. > * src/sum.c (main): Likewise.
