bug#70532: sort: Mention counting fields from the end

2024-04-26 Thread Dan Jacobson
> "PB" == Pádraig Brady  writes:
PB> All good suggestions. I'll at least add an example along the lines of:

PB>   awk '{print $NF, $0}' | sort -k1,1 | cut -f2- -d' '

OK, also say what it's doing. Not everybody knows awk.

Also join(1) needs a tip added. Users might want to join on
e.g., the second to last field on a variable-number-of-fields file.

bug#70601: join(1) info page -o: too hazy at first

2024-04-26 Thread Dan Jacobson
The join main page is really clear,

   -o FORMAT
  obey FORMAT while constructing output line

A base hit on the first pitch. The info page on the other hand,

‘-o auto’
 If the keyword ‘auto’ is specified, infer the output format from
 the first line in each file.  This is the same as the default
 output format but also ensures the same number of fields are output
 for each line.  Missing fields are replaced with the ‘-e’ option
 and extra fields are discarded.

even after reading down to here, hasn't tipped the user off that it is
talking about the output format! It's its turn up to bat but it's still
drinking coffee in the bullpen.

In fact FIELD-LIST vs FORMAT make the user double check that they are
actually looking at the same commands' documentation.

bug#70600: trailing whitespace spotted in join info pages

2024-04-26 Thread Dan Jacobson
The join info pages have tons of trailing whitespace.
sed s/$/$/ reveals:

‘sort -u file1 file2’Union of unsorted files$
‘sort file1 file2 | uniq -d’ Intersection of unsorted files$
‘sort file1 file1 file2 | uniq -u’   Difference of unsorted files$
‘sort file1 file2 | uniq -u’ Symmetric Difference of unsorted$

Or in emacs
  (setq-default show-trailing-whitespace t)
  (info "(coreutils) Set operations"))
will drive home the point.
Same with some other pages in that manual.

Why bad? It says that whatever typesetting program you are using, is
like a broken Xerox machine that is spitting out extra blank pages,
wasting paper. Even if we're rich so who cares.

bug#70599: join vs. numeric order

2024-04-26 Thread Dan Jacobson
(info "(coreutils) Sorting files for join") needs to talk about numeric

$ seq 111|join --check-order - /dev/null
join: -:10: is not sorted: 10

So the info manual needs to mention 'Even though your files might be in
perfect "sort --numeric-sort" order, you need to make them into plain
"sort" order first. Sorry. At least you'll get the same number of joins.'

Or, add a new join -n option. The join man page could now say:
'-n: use numeric comparisons. Note sort order also needs to be "sort -n" order.'

And / or mention how the user might tinker with LC_NUMERIC and / or
LC_COLLATE to somehow achieve numeric sorting...

bug#70586: cp walks dir differently than rm and is hitting "File name too long" where this could be avoided

2024-04-26 Thread Arkadiusz Miśkiewicz via GNU coreutils Bug Reports


rm -r while deleting a directory that's longer than PATH_MAX walks it in 
a way to avoid hitting max limit

$ (for i in `seq 1 2000`; do mkdir 
1234567890123456789012345678901234567890; cd 
1234567890123456789012345678901234567890; done)

$ rm -r 1234567890123456789012345678901234567890

but cp doesn't do that:

$ (for i in `seq 1 2000`; do mkdir 
1234567890123456789012345678901234567890; cd 
1234567890123456789012345678901234567890; done)

$ cp -a 1234567890123456789012345678901234567890 2

cp: cannot stat 
 File name too long

I wonder (+ report as a enhancement request) why cp isn't made to do the 
same smart thing and avoid hitting ENAMETOOLONG?

$ cp --version
cp (GNU coreutils) 9.5

Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )