On 17/07/16 17:02, Kamil Dudka wrote: > * src/sort.c (find_unit_order): Allow to skip only one occurrence > of thousands_sep to avoid finding the unit in the next column in case > thousands_sep matches as blank and is used as column delimiter. > * tests/misc/sort-h-thousands-sep.sh: Add regression test for this bug. > * tests/local.mk: Reference the test. > Reported at https://bugzilla.redhat.com/1355780 > --- > src/sort.c | 12 ++++++---- > tests/local.mk | 1 + > tests/misc/sort-h-thousands-sep.sh | 45 > ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 54 insertions(+), 4 deletions(-) > create mode 100755 tests/misc/sort-h-thousands-sep.sh > > diff --git a/src/sort.c b/src/sort.c > index f717604..a2cadda 100644 > --- a/src/sort.c > +++ b/src/sort.c > @@ -1904,12 +1904,16 @@ find_unit_order (char const *number) > to be lacking in units. > FIXME: add support for multibyte thousands_sep and decimal_point. */ > > - do > + while (ISDIGIT (ch = *p++)) > { > - while (ISDIGIT (ch = *p++)) > - nonzero |= ch - '0'; > + nonzero |= ch - '0'; > + > + /* Allow to skip only one occurrence of thousands_sep to avoid finding > + the unit in the next column in case thousands_sep matches as blank > + and is used as column delimiter. */ > + if (*p == thousands_sep) > + ++p; > } > - while (ch == thousands_sep);
This is an improvement. Though I now also see an existing inconsistency where we treat trailing blanks in this case. I.E. this inconsistency with: $ printf '%s\n' '1 M' '2 K' | LANG=en_US git/coreutils/src/sort -h 1 M 2 K $ printf '%s\n' '1 M' '2 K' | LANG=sv_SE git/coreutils/src/sort -h 2 K 1 M We should probably not allow/consider a blank after the last digit as part of the number here. I.E. the first output is correct, treating the input as 2 separate fields. > diff --git a/tests/misc/sort-h-thousands-sep.sh > b/tests/misc/sort-h-thousands-sep.sh > new file mode 100755 > index 0000000..a1e02de > --- /dev/null > +++ b/tests/misc/sort-h-thousands-sep.sh > @@ -0,0 +1,45 @@ > +#!/bin/sh > +# exercise 'sort -h' in locales where thousands separator is blank > + > +# Copyright (C) 2016 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > + > +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src > +print_ver_ sort > + > +tee in > exp1 << _EOF_ > +1 1k 4 003 1M > +2k 2M 4 002 2 > +3M 3 4 001 3k > +_EOF_ > + > +cat > exp2 << _EOF_ > +3M 3 4 001 3k > +1 1k 4 003 1M > +2k 2M 4 002 2 > +_EOF_ > + > +cat > exp3 << _EOF_ > +3M 3 4 001 3k > +2k 2M 4 002 2 > +1 1k 4 003 1M > +_EOF_ > + A testing for the case I highlighted would be good. > +for i in 1 2 3; do > + LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1 > + compare "exp${i}" "out${i}" || fail=1 > +done We'd have to skip_ the test if sv_SE wasn't available. Maybe something like: test "$(LC_ALL=sv_SE locale thousands_sep)" = ' ' || skip_ 'The swedish locale with blank thousands separator is unavailable' This deserves an entry in NEWS also. thanks! Pádraig
