On 08/01/16 19:04, Assaf Gordon wrote: > Hello Pádraig and all, > > On 01/08/2016 11:56 AM, Pádraig Brady wrote: > [...] >> Possible additions to this class: >> >> nl (N/A as primarily text rather than record oriented) >> numfmt (ditto) >> expand (ditto) >> unexpand (ditto) >> > > Attached similarly structured patch adding -z to numfmt (it does not include > a NEWS entry, yet).
Cool. I was wondering a bit about numfmt, and thinking more this could be useful for: du -0 ... | numfmt -z > an open question: > With -z, do embedded newlines count as whitespace/field delimiters ? > (not sure if this applies to other programs). > > For example: > > $ printf "A B\tC\nD 1000\x00" > > Should the newline count as whitespace/field delimiter (since numfmt defaults > to whitespace delimiters) ? > If so, the "1000" should be the fifth field. > If not, the "1000" should be in the fourth field (and "C\nD" cound as one > field). > > Currently, because the numfmt code uses "isblank()", newlines DO NOT count as > whitespace: > > $ printf "A B\tC\nD 1000\x00" | ./src/numfmt -z --to=si --field=4 | od -a > 0000000 A sp B sp C nl D sp 1 . 0 K nul > 0000015 A very good point. This is not an issue for the utils in my current patch set I think, but is for field processing utils like numfmt, sort, join, uniq (cut delimits fields with a char rather than a class). I.E. should these utils use isspace() rather than isblank() when -z is specified? More conservatively they probably should use isblank(c) || c=='\n'. > Also, > Two minor questions: > > 1. If null-terminated test fail due to incorrect output, the log will contain: > numfmt.pl: test z4: stdout mismatch, comparing z4.2 (expected) and z4.O > (actual) > Binary files z4.2 and z4.O differ > > This will make it hard for users to send us bug reports. > Perhaps it's worth thinking about how to display a diff even for > null-terminated lines (not sure how best to approach this). Maybe we should have something like bcompare that diffs the base64 of two files? > 2. In the patch for "wc", the long-form of the parameter (for getopt_long) is > "zero" instead of "zero-terminated" - is that intentional ? Yes, to match other uses in that "class" of programs, like basename, etc. Anyway -z may be moot for wc as discussed elsewhere in the thread. thanks for the careful review! Padraig.
