On 07/10/10 19:25, Pádraig Brady wrote: > On 07/10/10 18:43, Assaf Gordon wrote: >> Pádraig Brady wrote, On 10/07/2010 06:22 AM: >>> On 07/10/10 01:03, Pádraig Brady wrote: >>>> On 06/10/10 21:41, Assaf Gordon wrote: >>>>> >>>>> The "--auto-format" feature simply builds the "-o" format line >>>>> automatically, based on the number of columns from both input files. >>>> >>>> Thanks for persisting with this and presenting a concise example. >>>> I agree that this is useful and can't think of a simple workaround. >>>> Perhaps the interface would be better as: >>>> >>>> -o {all (default), padded, FORMAT} >>>> >>>> where padded is the functionality you're suggesting? >>> >>> Thinking more about it, we mightn't need any new options at all. >>> Currently -e is redundant if -o is not specified. >>> So how about changing that so that if -e is specified >>> we operate as above by auto inserting empty fields? >>> Also I wouldn't base on the number of fields in the first line, >>> instead auto padding to the biggest number of fields >>> on the current lines under consideration. >> >> My concern is the principle of "least surprise" - if there are existing >> scripts/programs that specify "-e" without "-o" (doesn't make sense, but >> still possible) - this change will alter their behavior. >> >> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might be >> a bit confusing. > > Well seeing as -e without -o currently does nothing, > I don't think we need to worry too much about changing that behavior. > Also to me, specifying -e EMPTY implicitly means I want > fields missing from one of the files replaced with EMPTY. > > Note POSIX is more explicit, and describes our current operation: > > -e EMPTY > Replace empty output fields in the list selected by -o with EMPTY > > So changing that would be an extension to POSIX. > But I still think it makes sense. > I'll prepare a patch soon, to do as I describe above, > unless there are objections.
The attached changes `join` (from what's done on other platforms) so that... `join -e` will automatically pad missing fields from one file so that the same number of fields are output from each file. Previously -e was only used for missing fields specified with -o or -j. With this change join now does: $ cat file1 a 1 2 b 1 d 1 2 $ cat file2 a 3 4 b 3 4 c 3 4 $ join -a1 -a2 -1 1 -2 1 -e. file1 file2 a 1 2 3 4 b 1 . 3 4 c . . 3 4 d 1 2 . . $ join -a1 -a2 -1 1 -2 4 -e. file1 file2 . . . . a 3 4 . . . . b 3 4 . . . . c 3 4 a 1 2 . . b 1 . d 1 2 . . $ join -a1 -a2 -1 4 -2 1 -e. file1 file2 . a 1 2 . . . . b 1 . . . d 1 2 . . . a . . 3 4 b . . 3 4 c . . 3 4 $ join -a1 -a2 -1 4 -2 4 -e. file1 file2 . a 1 2 a 3 4 . a 1 2 b 3 4 . a 1 2 c 3 4 . b 1 . a 3 4 . b 1 . b 3 4 . b 1 . c 3 4 . d 1 2 a 3 4 . d 1 2 b 3 4 . d 1 2 c 3 4 While -e without -o was previously a noop, and so could safely be extended IMHO, this will also change the behavior when with -e and -j are specified. Previously if -j > 1 was specified, and that field was missing, then -e would be used in its place, rather than the empty string. This still does that, but also does the padding. Without the -j issue I'd be 80:20 for just extending -e to auto pad, but given -j I'm 50:50. The alternative it to select this with say '-o padded', but that's less discoverable, and complicates the interface somewhat. cheers, Pádraig.
>From 3ea251d770fd79c37ee5195480574ccd6bfa156c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com> Date: Wed, 5 Jan 2011 11:52:54 +0000 Subject: [PATCH] join: auto pad missing fields when -e specified * src/join.c (prfields): A new function refactored from prjoin(), to output all but the join field. This function also applies the extra padding when appropriate. (prjoin): Don't swap line1 and line2 when line1 is blank so that the padding is applied to the right place. * tests/misc/join: Remove existing uses of -e without -o, and add 2 new cases to test the new auto padding behavior. * NEWS: Mention the change in behavior. Suggestion from Assaf Gordon --- NEWS | 6 ++++ src/join.c | 81 ++++++++++++++++++++++++++++++++++++------------------- tests/misc/join | 16 ++++++++--- 3 files changed, 71 insertions(+), 32 deletions(-) diff --git a/NEWS b/NEWS index 2a71ca6..f02e8a8 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,12 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Changes in behavior + + join -e will automatically pad missing fields from one file + so that the same number of fields are output from each file. + Previously -e was only used for missing fields specified with -o or -j. + * Noteworthy changes in release 8.9 (2011-01-04) [stable] diff --git a/src/join.c b/src/join.c index afda5a1..1df24a9 100644 --- a/src/join.c +++ b/src/join.c @@ -527,6 +527,45 @@ prfield (size_t n, struct line const *line) fputs (empty_filler, stdout); } +/* Output all the fields in line, other than the join field. */ + +static void +prfields (struct line const *line, size_t join_field, + struct line const *oline, size_t ojoin_field) +{ + size_t i; + char output_separator = tab < 0 ? ' ' : tab; + + for (i = 0; i < join_field && i < line->nfields; ++i) + { + putchar (output_separator); + prfield (i, line); + } + for (i = join_field + 1; i < line->nfields; ++i) + { + putchar (output_separator); + prfield (i, line); + } + if (empty_filler && oline->nfields > line->nfields) + { + bool extra_filler = false; + + /* Increase padding by 1 to account for the implicit + join field outputted for a blank line. */ + if (line != &uni_blank) + extra_filler = true; + /* Or if we didn't print a join field, increase padding. */ + else if (oline->nfields <= ojoin_field || !oline->fields[ojoin_field].len) + extra_filler = true; + + for (i = !extra_filler; i < oline->nfields - line->nfields; ++i) + { + putchar (output_separator); + fputs (empty_filler, stdout); + } + } +} + /* Print the join of LINE1 and LINE2. */ static void @@ -534,6 +573,8 @@ prjoin (struct line const *line1, struct line const *line2) { const struct outlist *outlist; char output_separator = tab < 0 ? ' ' : tab; + size_t field; + struct line const *line; outlist = outlist_head.next; if (outlist) @@ -543,9 +584,6 @@ prjoin (struct line const *line1, struct line const *line2) o = outlist; while (1) { - size_t field; - struct line const *line; - if (o->file == 0) { if (line1 == &uni_blank) @@ -574,37 +612,24 @@ prjoin (struct line const *line1, struct line const *line2) } else { - size_t i; - if (line1 == &uni_blank) { - struct line const *t; - t = line1; - line1 = line2; - line2 = t; + line = line2; + field = join_field_2; } - prfield (join_field_1, line1); - for (i = 0; i < join_field_1 && i < line1->nfields; ++i) - { - putchar (output_separator); - prfield (i, line1); - } - for (i = join_field_1 + 1; i < line1->nfields; ++i) + else { - putchar (output_separator); - prfield (i, line1); + line = line1; + field = join_field_1; } - for (i = 0; i < join_field_2 && i < line2->nfields; ++i) - { - putchar (output_separator); - prfield (i, line2); - } - for (i = join_field_2 + 1; i < line2->nfields; ++i) - { - putchar (output_separator); - prfield (i, line2); - } + /* Output the join field. */ + prfield (field, line); + + /* Output other fields. */ + prfields (line1, join_field_1, line2, join_field_2); + prfields (line2, join_field_2, line1, join_field_1); + putchar ('\n'); } } diff --git a/tests/misc/join b/tests/misc/join index 3696a03..7ca628b 100755 --- a/tests/misc/join +++ b/tests/misc/join @@ -43,7 +43,7 @@ my @tv = ( ['1e', '-a2', ["a 1\nb\n", "b\n"], "b\n", 0], ['1f', '-a2', ["b\n", "a\nb\n"], "a\nb\n", 0], -['2a', '-a1 -e .', ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0], +['2a', '-a1', ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0], ['2b', '-a1 -e . -o 2.1,2.2,2.3', ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb . .\nc . .\n", 0], ['2c', '-a1 -e . -o 2.1,2.2,2.3', ["a\nb\nc\nd\n", "a x y\nb\nc\n"], @@ -104,13 +104,13 @@ my @tv = ( ["apr 15\naug 20\ndec 18\n", "apr 06\naug 14\ndate\nfeb 15\n"], "06 apr\n14 aug\n- -\n15 -\n", 0], -['6a', '-e -', +['6a', '', ["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"], "a 1 21\nb 2 22\n", 0], -['6b', '-a1 -e -', +['6b', '-a1', ["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"], "a 1 21\nb 2 22\nd 4\n", 0], -['6c', '-a1 -e -', +['6c', '-a1', ["a 21\nb 22\nc 23\nf 26\n", "a 1\nb 2\nd 4\n"], "a 21 1\nb 22 2\nc 23\nf 26\n", 0], @@ -127,6 +127,14 @@ my @tv = ( # From David Dyck ['9a', '', [" a 1\n b 2\n", " a Y\n b Z\n"], "a 1 Y\nb 2 Z\n", 0], +# Test -e without -o (auto pad) +['10a', '-a1 -a2 -e .', + ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"], + "a 1 2 3 4\nb 1 . 3 4\nc . . 3 4\nd 1 2 . .\n", 0], +['10b', '-a1 -a2 -j3 -e .', + ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"], + "2 a 1 . .\n. b 1 . .\n2 d 1 . .\n4 . . a 3\n4 . . b 3\n4 . . c 3\n"], + # From Tim Smithers: fixed in 1.22l ['trailing-sp', '-t: -1 1 -2 1', ["a:x \n", "a:y \n"], "a:x :y \n", 0], -- 1.7.3.4