Bug#1068891: coreutils: is join -t '' just comm -12?

2024-04-12 Thread Pádraig Brady

On 12/04/2024 23:59, наб wrote:

Package: coreutils
Version: 9.1-1
Version: 9.4-3
Severity: normal

Dear Maintainer,

POSIX.1-202x/D3:
−t char
Use character char as a separator, for both input and 
output. Every appearance of
char in a line shall be significant. When this option 
is specified, the collating
sequence shall be the same as sort without the −b 
option.
so obviously allowing -t '' is an extension.

Manual:
-t CHAR
   use CHAR as input and output field separator

Important: FILE1 and FILE2 must be sorted on the  join  fields.  E.g.,
use  "sort  -k  1b,1"  if 'join' has no options, or use "join -t ''" if
'sort' has no options.  Note, comparisons honor the rules specified by
'LC_COLLATE'.   If  the  input  is  not sorted and some lines cannot be
joined, a warning message will be given.

So given
$ cat f1
row1f1  1
urow1   f1  2
$ cat f2
row1f2  1
urow2   f2  2
which are stable against both sort and sort -k 1b,1
$ join f?
row1 f1 1 f2 1
$ join f? -t '  '
row1f1  1   f2  1
is all as expected.

But
$ join f? -t ''
returns empty. What would empty -t mean, anyway?
The empty string can either be found at every position
(clearly not the case here, otherwise this'd be joined on r and u)
or at no positions, so
$ cat g1
row1
urow1
$ cat g2
row1
urow2
$ join g? -t ''
row1
which is, well
$ comm g? -12
row1

Somehow I don't feel like this is a good recommendation?


Well sort with no options operates on the whole line.
So the corresponding join -t '' operates on the whole line.

cheers,
Pádraig



Bug#1068891: coreutils: is join -t '' just comm -12?

2024-04-12 Thread наб
Package: coreutils
Version: 9.1-1
Version: 9.4-3
Severity: normal

Dear Maintainer,

POSIX.1-202x/D3:
−t char
Use character char as a separator, for both input and 
output. Every appearance of
char in a line shall be significant. When this option 
is specified, the collating
sequence shall be the same as sort without the −b 
option.
so obviously allowing -t '' is an extension.

Manual:
-t CHAR
   use CHAR as input and output field separator

Important: FILE1 and FILE2 must be sorted on the  join  fields.  E.g.,
use  "sort  -k  1b,1"  if 'join' has no options, or use "join -t ''" if
'sort' has no options.  Note, comparisons honor the rules specified by
'LC_COLLATE'.   If  the  input  is  not sorted and some lines cannot be
joined, a warning message will be given.

So given
$ cat f1
row1f1  1
urow1   f1  2
$ cat f2
row1f2  1
urow2   f2  2
which are stable against both sort and sort -k 1b,1
$ join f?
row1 f1 1 f2 1
$ join f? -t '  '
row1f1  1   f2  1
is all as expected.

But
$ join f? -t ''
returns empty. What would empty -t mean, anyway?
The empty string can either be found at every position
(clearly not the case here, otherwise this'd be joined on r and u)
or at no positions, so
$ cat g1
row1
urow1
$ cat g2
row1
urow2
$ join g? -t ''
row1
which is, well
$ comm g? -12
row1

Somehow I don't feel like this is a good recommendation?

Best,
наб

-- System Information:
Debian Release: 12.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-12-amd64 (SMP w/24 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, 
TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages coreutils depends on:
ii  libacl1  2.3.1-3
ii  libattr1 1:2.5.1-4
ii  libc62.36-9+deb12u4
ii  libgmp10 2:6.2.1+dfsg1-1.1
ii  libselinux1  3.4-1+b6

coreutils recommends no packages.

coreutils suggests no packages.

-- no debconf information


signature.asc
Description: PGP signature