On 07/10/10 19:25, Pádraig Brady wrote:
> On 07/10/10 18:43, Assaf Gordon wrote:
>> Pádraig Brady wrote, On 10/07/2010 06:22 AM:
>>> On 07/10/10 01:03, Pádraig Brady wrote:
>>>> On 06/10/10 21:41, Assaf Gordon wrote:
>>>>>
>>>>> The "--auto-format" feature simply builds the "-o" format line 
>>>>> automatically, based on the number of columns from both input files.
>>>>
>>>> Thanks for persisting with this and presenting a concise example.
>>>> I agree that this is useful and can't think of a simple workaround.
>>>> Perhaps the interface would be better as:
>>>>
>>>> -o {all (default), padded, FORMAT}
>>>>
>>>> where padded is the functionality you're suggesting?
>>>
>>> Thinking more about it, we mightn't need any new options at all.
>>> Currently -e is redundant if -o is not specified.
>>> So how about changing that so that if -e is specified
>>> we operate as above by auto inserting empty fields?
>>> Also I wouldn't base on the number of fields in the first line,
>>> instead auto padding to the biggest number of fields
>>> on the current lines under consideration.
>>
>> My concern is the principle of "least surprise" - if there are existing 
>> scripts/programs that specify "-e" without "-o" (doesn't make sense, but 
>> still possible) - this change will alter their behavior.
>>
>> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might be 
>> a bit confusing.
> 
> Well seeing as -e without -o currently does nothing,
> I don't think we need to worry too much about changing that behavior.
> Also to me, specifying -e EMPTY implicitly means I want
> fields missing from one of the files replaced with EMPTY.
> 
> Note POSIX is more explicit, and describes our current operation:
> 
> -e EMPTY
>   Replace empty output fields in the list selected by -o with EMPTY
> 
> So changing that would be an extension to POSIX.
> But I still think it makes sense.
> I'll prepare a patch soon, to do as I describe above,
> unless there are objections.

The attached changes `join` (from what's done on other platforms) so that...

`join -e` will automatically pad missing fields from one file
so that the same number of fields are output from each file.
Previously -e was only used for missing fields specified with -o or -j.

With this change join now does:

$ cat file1
a 1 2
b 1
d 1 2

$ cat file2
a 3 4
b 3 4
c 3 4

$ join -a1 -a2 -1 1 -2 1 -e. file1 file2
a 1 2 3 4
b 1 . 3 4
c . . 3 4
d 1 2 . .

$ join -a1 -a2 -1 1 -2 4 -e. file1 file2
. . . . a 3 4
. . . . b 3 4
. . . . c 3 4
a 1 2 . .
b 1 .
d 1 2 . .

$ join -a1 -a2 -1 4 -2 1 -e. file1 file2
. a 1 2 . . .
. b 1 . .
. d 1 2 . . .
a . . 3 4
b . . 3 4
c . . 3 4

$ join -a1 -a2 -1 4 -2 4 -e. file1 file2
. a 1 2 a 3 4
. a 1 2 b 3 4
. a 1 2 c 3 4
. b 1 . a 3 4
. b 1 . b 3 4
. b 1 . c 3 4
. d 1 2 a 3 4
. d 1 2 b 3 4
. d 1 2 c 3 4

While -e without -o was previously a noop, and so could safely be extended IMHO,
this will also change the behavior when with -e and -j are specified.
Previously if -j > 1 was specified, and that field was missing,
then -e would be used in its place, rather than the empty string.
This still does that, but also does the padding.
Without the -j issue I'd be 80:20 for just extending -e to auto pad,
but given -j I'm 50:50.  The alternative it to select this with
say '-o padded', but that's less discoverable, and complicates
the interface somewhat.

cheers,
Pádraig.
>From 3ea251d770fd79c37ee5195480574ccd6bfa156c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 5 Jan 2011 11:52:54 +0000
Subject: [PATCH] join: auto pad missing fields when -e specified

* src/join.c (prfields): A new function refactored from prjoin(),
to output all but the join field. This function also applies
the extra padding when appropriate.
(prjoin): Don't swap line1 and line2 when line1 is blank
so that the padding is applied to the right place.
* tests/misc/join: Remove existing uses of -e without -o,
and add 2 new cases to test the new auto padding behavior.
* NEWS: Mention the change in behavior.
Suggestion from Assaf Gordon
---
 NEWS            |    6 ++++
 src/join.c      |   81 ++++++++++++++++++++++++++++++++++++-------------------
 tests/misc/join |   16 ++++++++---
 3 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/NEWS b/NEWS
index 2a71ca6..f02e8a8 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU coreutils NEWS                                    -*- outline -*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Changes in behavior
+
+  join -e will automatically pad missing fields from one file
+  so that the same number of fields are output from each file.
+  Previously -e was only used for missing fields specified with -o or -j.
+
 
 * Noteworthy changes in release 8.9 (2011-01-04) [stable]
 
diff --git a/src/join.c b/src/join.c
index afda5a1..1df24a9 100644
--- a/src/join.c
+++ b/src/join.c
@@ -527,6 +527,45 @@ prfield (size_t n, struct line const *line)
     fputs (empty_filler, stdout);
 }
 
+/* Output all the fields in line, other than the join field.  */
+
+static void
+prfields (struct line const *line, size_t join_field,
+          struct line const *oline, size_t ojoin_field)
+{
+  size_t i;
+  char output_separator = tab < 0 ? ' ' : tab;
+
+  for (i = 0; i < join_field && i < line->nfields; ++i)
+    {
+      putchar (output_separator);
+      prfield (i, line);
+    }
+  for (i = join_field + 1; i < line->nfields; ++i)
+    {
+      putchar (output_separator);
+      prfield (i, line);
+    }
+  if (empty_filler && oline->nfields > line->nfields)
+    {
+      bool extra_filler = false;
+
+      /* Increase padding by 1 to account for the implicit
+         join field outputted for a blank line.  */
+      if (line != &uni_blank)
+        extra_filler = true;
+      /* Or if we didn't print a join field, increase padding.  */
+      else if (oline->nfields <= ojoin_field || !oline->fields[ojoin_field].len)
+        extra_filler = true;
+
+      for (i = !extra_filler; i < oline->nfields - line->nfields; ++i)
+        {
+          putchar (output_separator);
+          fputs (empty_filler, stdout);
+        }
+    }
+}
+
 /* Print the join of LINE1 and LINE2.  */
 
 static void
@@ -534,6 +573,8 @@ prjoin (struct line const *line1, struct line const *line2)
 {
   const struct outlist *outlist;
   char output_separator = tab < 0 ? ' ' : tab;
+  size_t field;
+  struct line const *line;
 
   outlist = outlist_head.next;
   if (outlist)
@@ -543,9 +584,6 @@ prjoin (struct line const *line1, struct line const *line2)
       o = outlist;
       while (1)
         {
-          size_t field;
-          struct line const *line;
-
           if (o->file == 0)
             {
               if (line1 == &uni_blank)
@@ -574,37 +612,24 @@ prjoin (struct line const *line1, struct line const *line2)
     }
   else
     {
-      size_t i;
-
       if (line1 == &uni_blank)
         {
-          struct line const *t;
-          t = line1;
-          line1 = line2;
-          line2 = t;
+          line = line2;
+          field = join_field_2;
         }
-      prfield (join_field_1, line1);
-      for (i = 0; i < join_field_1 && i < line1->nfields; ++i)
-        {
-          putchar (output_separator);
-          prfield (i, line1);
-        }
-      for (i = join_field_1 + 1; i < line1->nfields; ++i)
+      else
         {
-          putchar (output_separator);
-          prfield (i, line1);
+          line = line1;
+          field = join_field_1;
         }
 
-      for (i = 0; i < join_field_2 && i < line2->nfields; ++i)
-        {
-          putchar (output_separator);
-          prfield (i, line2);
-        }
-      for (i = join_field_2 + 1; i < line2->nfields; ++i)
-        {
-          putchar (output_separator);
-          prfield (i, line2);
-        }
+      /* Output the join field.  */
+      prfield (field, line);
+
+      /* Output other fields.  */
+      prfields (line1, join_field_1, line2, join_field_2);
+      prfields (line2, join_field_2, line1, join_field_1);
+
       putchar ('\n');
     }
 }
diff --git a/tests/misc/join b/tests/misc/join
index 3696a03..7ca628b 100755
--- a/tests/misc/join
+++ b/tests/misc/join
@@ -43,7 +43,7 @@ my @tv = (
 ['1e', '-a2',      ["a 1\nb\n", "b\n"],   "b\n",         0],
 ['1f', '-a2',      ["b\n", "a\nb\n"], "a\nb\n",          0],
 
-['2a', '-a1 -e .',  ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0],
+['2a', '-a1',  ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0],
 ['2b', '-a1 -e . -o 2.1,2.2,2.3',  ["a\nb\nc\n", "a x y\nb\nc\n"],
  "a x y\nb . .\nc . .\n", 0],
 ['2c', '-a1 -e . -o 2.1,2.2,2.3',  ["a\nb\nc\nd\n", "a x y\nb\nc\n"],
@@ -104,13 +104,13 @@ my @tv = (
  ["apr 15\naug 20\ndec 18\n", "apr 06\naug 14\ndate\nfeb 15\n"],
   "06 apr\n14 aug\n- -\n15 -\n", 0],
 
-['6a', '-e -',
+['6a', '',
  ["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"],
  "a 1 21\nb 2 22\n", 0],
-['6b', '-a1 -e -',
+['6b', '-a1',
  ["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"],
  "a 1 21\nb 2 22\nd 4\n", 0],
-['6c', '-a1 -e -',
+['6c', '-a1',
  ["a 21\nb 22\nc 23\nf 26\n", "a 1\nb 2\nd 4\n"],
  "a 21 1\nb 22 2\nc 23\nf 26\n", 0],
 
@@ -127,6 +127,14 @@ my @tv = (
 # From David Dyck
 ['9a', '', [" a 1\n b 2\n", " a Y\n b Z\n"], "a 1 Y\nb 2 Z\n", 0],
 
+# Test -e without -o (auto pad)
+['10a', '-a1 -a2 -e .',
+ ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"],
+ "a 1 2 3 4\nb 1 . 3 4\nc . . 3 4\nd 1 2 . .\n", 0],
+['10b', '-a1 -a2 -j3 -e .',
+ ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"],
+ "2 a 1 . .\n. b 1 . .\n2 d 1 . .\n4 . . a 3\n4 . . b 3\n4 . . c 3\n"],
+
 # From Tim Smithers: fixed in 1.22l
 ['trailing-sp', '-t: -1 1 -2 1', ["a:x \n", "a:y \n"], "a:x :y \n", 0],
 
-- 
1.7.3.4

Reply via email to