I'm thinking of pushing both the doc change (that join -t '\0' operates on whole line usually) and the new functionality: (that join -t '' always operates on the whole line) as mentioned below...
On 29/10/09 12:15, Pádraig Brady wrote:
It's quite common to want `join` to operate on the whole line. For e.g. see: https://bugzilla.redhat.com/show_bug.cgi?id=531355 In addition `sort` by default operates on the whole line. So I think there should be an easy way for join to do the same. The logical way for me is to specify an empty seperator with -t '' as is done in the patch below. Would this be useful? If not I'll at least document the -t '\0' option which achieves the same thing iff there are no NUL characters in the line. Note '\0' support was added in f9118c1c cheers, Pádraig. diff --git a/doc/coreutils.texi b/doc/coreutils.texi index df7e963..57f6f11 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -5458,6 +5458,8 @@ locales and options if the output of @command{sort} is fed to sort a file on its default join field, but if you select a non-default locale, join field, separator, or comparison options, then you should do so consistently between @command{join} and @command{sort}. +If @samp{join -t ''} is specified then the whole line is considered which +matches the default operation of sort. If the input has no unpairable lines, a @acronym{GNU} extension is available; the sort order can be any order that considers two fields @@ -5559,7 +5561,10 @@ option---are subject to the specified @var{field-list}. Use character @var{char} as the input and output field separator. Treat as significant each occurrence of @var{char} in the input file. Use @samp{sort -t @var{char}}, without the @option{-b} option of -...@samp{sort}, to produce this ordering. +...@samp{sort}, to produce this ordering. If @samp{join -t ''} is specified, +the whole line is considered, matching the default operation of sort. +If @samp{-t '\0'} is specified then the @acronym{ASCII} @sc{nul} +character is used to delimit the fields. @item -v @var{file-number} Print a line for each unpairable line in file @var{file-number} diff --git a/src/join.c b/src/join.c index d734a91..8c9b9d3 100644 --- a/src/join.c +++ b/src/join.c @@ -204,7 +204,8 @@ the remaining fields from FILE1, the remaining fields from FILE2, all\n\ separated by CHAR.\n\ \n\ Important: FILE1 and FILE2 must be sorted on the join fields.\n\ -E.g., use `sort -k 1b,1' if `join' has no options.\n\ +E.g., use ` sort -k 1b,1 ' if `join' has no options,\n\ +or use ` join -t '' ' if `sort' has no options.\n\ Note, comparisons honor the rules specified by `LC_COLLATE'.\n\ If the input is not sorted and some lines cannot be joined, a\n\ warning message will be given.\n\ @@ -1024,8 +1025,8 @@ main (int argc, char **argv) { unsigned char newtab = optarg[0]; if (! newtab) - error (EXIT_FAILURE, 0, _("empty tab")); - if (optarg[1]) + newtab = '\n'; /* '' => process the whole line. */ + else if (optarg[1]) { if (STREQ (optarg, "\\0")) newtab = '\0';