bug#24015: [PATCH] sort: make -h work with -k and blank used as thousands separator

Pádraig Brady Sun, 17 Jul 2016 12:53:19 -0700

On 17/07/16 17:02, Kamil Dudka wrote:
> * src/sort.c (find_unit_order): Allow to skip only one occurrence
> of thousands_sep to avoid finding the unit in the next column in case
> thousands_sep matches as blank and is used as column delimiter.
> * tests/misc/sort-h-thousands-sep.sh: Add regression test for this bug.
> * tests/local.mk: Reference the test.
> Reported at https://bugzilla.redhat.com/1355780
> ---
>  src/sort.c                         | 12 ++++++----
>  tests/local.mk                     |  1 +
>  tests/misc/sort-h-thousands-sep.sh | 45 
> ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+), 4 deletions(-)
>  create mode 100755 tests/misc/sort-h-thousands-sep.sh
> 
> diff --git a/src/sort.c b/src/sort.c
> index f717604..a2cadda 100644
> --- a/src/sort.c
> +++ b/src/sort.c
> @@ -1904,12 +1904,16 @@ find_unit_order (char const *number)
>       to be lacking in units.
>       FIXME: add support for multibyte thousands_sep and decimal_point.  */
>  
> -  do
> +  while (ISDIGIT (ch = *p++))
>      {
> -      while (ISDIGIT (ch = *p++))
> -        nonzero |= ch - '0';
> +      nonzero |= ch - '0';
> +
> +      /* Allow to skip only one occurrence of thousands_sep to avoid finding
> +         the unit in the next column in case thousands_sep matches as blank
> +         and is used as column delimiter.  */
> +      if (*p == thousands_sep)
> +        ++p;
>      }
> -  while (ch == thousands_sep);


This is an improvement.
Though I now also see an existing inconsistency where we treat trailing blanks 
in this case.
I.E. this inconsistency with:

$ printf '%s\n' '1 M' '2 K' | LANG=en_US git/coreutils/src/sort -h
1 M
2 K

$ printf '%s\n' '1 M' '2 K' | LANG=sv_SE git/coreutils/src/sort -h
2 K
1 M

We should probably not allow/consider a blank after the last digit
as part of the number here. I.E. the first output is correct,
treating the input as 2 separate fields.

> diff --git a/tests/misc/sort-h-thousands-sep.sh 
> b/tests/misc/sort-h-thousands-sep.sh
> new file mode 100755
> index 0000000..a1e02de
> --- /dev/null
> +++ b/tests/misc/sort-h-thousands-sep.sh
> @@ -0,0 +1,45 @@
> +#!/bin/sh
> +# exercise 'sort -h' in locales where thousands separator is blank
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
> +print_ver_ sort
> +
> +tee in > exp1 << _EOF_
> +1       1k      4 003   1M
> +2k      2M      4 002   2
> +3M      3       4 001   3k
> +_EOF_
> +
> +cat > exp2 << _EOF_
> +3M      3       4 001   3k
> +1       1k      4 003   1M
> +2k      2M      4 002   2
> +_EOF_
> +
> +cat > exp3 << _EOF_
> +3M      3       4 001   3k
> +2k      2M      4 002   2
> +1       1k      4 003   1M
> +_EOF_
> +

A testing for the case I highlighted would be good.

> +for i in 1 2 3; do
> +  LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1
> +  compare "exp${i}" "out${i}" || fail=1
> +done

We'd have to skip_ the test if sv_SE wasn't available.
Maybe something like:

  test "$(LC_ALL=sv_SE locale thousands_sep)" = ' ' ||
    skip_ 'The swedish locale with blank thousands separator is unavailable'

This deserves an entry in NEWS also.

thanks!
Pádraig

bug#24015: [PATCH] sort: make -h work with -k and blank used as thousands separator

Reply via email to