Re: sort b option in pos2 has strange effect
Jim Meyering wrote: For each fix, I usually try to determine when the bug was introduced and mention that in NEWS. Both of these date back to the very beginning, since sort from textutils-1.13 (yes, I actually built it ;-) exhibits the same incorrect behavior, and the code in that function barely changed between my initial import in 1992 and that 1.13. How many more +16-year-old bugs are lurking? Interesting :) I also noticed that freeBSD/Mac OS X use coreutils sort so they have the same issue. Also the i18n patch in fedora 8 at least seems to be varying one of the problems somewhat: upstream buggy coreutils: $ printf a y\na z\n | sort -k1,1b #buggy a z a y $ printf a y\na z\n | sort -k1b,1 #ok a y a z fedora 8: $ printf a y\na z\n | sort -k1,1b #ok a y a z $ printf a y\na z\n | sort -k1b,1 #buggy a z a y So I'll add the attached test I think to check for that. cheers, Pádraig From b857242886538be7c1ae387c85b86e9f96fecb80 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Fri, 27 Feb 2009 08:40:42 + Subject: [PATCH] tests: sort: Check skipping blanks in multibyte locales * tests/misc/sort: On Fedora 8 at least, sort -k1b,1 mishandles blanks in multibyte locales, so add appropriate test. --- tests/misc/sort |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/tests/misc/sort b/tests/misc/sort index 3af2388..b6ee905 100755 --- a/tests/misc/sort +++ b/tests/misc/sort @@ -24,6 +24,10 @@ my $prog = 'sort'; # Turn off localization of executable's output. @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3; +my $locale = $ENV{LOCALE_FR_UTF8}; +! defined $locale || $locale eq 'none' + and $locale = 'C'; + # Since each test is run with a file name and with redirected stdin, # the name in the diagnostic is either the file name or -. # Normalize each diagnostic to use '-'. @@ -216,6 +220,11 @@ my @Tests = # next field are not included in the sort. I.E. order should not change here. [18f, '-k1,1b', {IN=a y\na z\n}, {OUT=a y\na z\n}], +# When ignoring leading blanks for start position, ensure blanks from +# next field are not included in the sort. I.E. order should not change here. +# This was noticed as an issue on fedora 8 (only in multibyte locales). +[18g, '-k1b,1', {IN=a y\na z\n}, {OUT=a y\na z\n}, {ENV = LC_ALL=$locale}], + # This looks odd, but works properly -- 2nd keyspec is never # used because all lines are different. [19a, '+0 +1nr', {IN=b 2\nb 1\nb 3\n}, {OUT=b 1\nb 2\nb 3\n}], -- 1.5.3.6 ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Pádraig Brady wrote: I also noticed that freeBSD/Mac OS X use coreutils sort so they have the same issue. Also the i18n patch in fedora 8 at least seems to be varying one of the problems somewhat: upstream buggy coreutils: $ printf a y\na z\n | sort -k1,1b #buggy a z a y $ printf a y\na z\n | sort -k1b,1 #ok a y a z fedora 8: $ printf a y\na z\n | sort -k1,1b #ok a y a z $ printf a y\na z\n | sort -k1b,1 #buggy a z a y So I'll add the attached test I think to check for that. Good catch. Thanks. ... +# When ignoring leading blanks for start position, ensure blanks from +# next field are not included in the sort. I.E. order should not change here. +# This was noticed as an issue on fedora 8 (only in multibyte locales). +[18g, '-k1b,1', {IN=a y\na z\n}, {OUT=a y\na z\n}, {ENV = LC_ALL=$locale}], Please split that longer-than-80 line onto two, e.g.: [18g, '-k1b,1', {IN=a y\na z\n}, {OUT=a y\na z\n}, {ENV = LC_ALL=$locale}], ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
I've tweaked the patch a bit to simplify some code and expect to push it soon. Thanks to my friendly LUG I confirmed that solaris 9 and 10 behave as expected for these commands: printf a a b\nz a a\n | sort -k2,3.0 printf a y\na z\n | sort -k1,1b cheers, Pádraig. From 4a1f5d98265cf74297d9e523aa99fca80cc51e3c Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 24 Feb 2009 08:37:18 + Subject: [PATCH] sort: Fix two bugs with determining the end of field * src/sort.c: When no specific number of chars to skip is specified for the end field, always skip the whole field. Also never include leading spaces from next field. * tests/misc/sort: Add 2 new tests for these cases. * NEWS: Mention this bug fix. * THANKS: Add bug reporter. Reported by Davide Canova --- NEWS|6 ++ THANKS |1 + src/sort.c | 38 +- tests/misc/sort |6 ++ 4 files changed, 26 insertions(+), 25 deletions(-) diff --git a/NEWS b/NEWS index 82ded9d..05d22cb 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,12 @@ GNU coreutils NEWS-*- outline -*- * Noteworthy changes in release ?.? (-??-??) [?] +** Bug fixes + + sort now handles specified key ends correctly. + Previously -k1,1b would have caused leading space from field 2 to be + included in the sort while -k2,3.0 would have not included field 3. + * Noteworthy changes in release 7.1 (2009-02-21) [stable] diff --git a/THANKS b/THANKS index 5c25321..4b35a37 100644 --- a/THANKS +++ b/THANKS @@ -137,6 +137,7 @@ David Godfrey d...@delta.demon.co.uk David Luyer david_lu...@pacific.net.au David Madoredavid.mad...@ens.fr David Malonedwmal...@cnri.dit.ie +Davide Canova kc.can...@gmail.com Dawson Engler eng...@stanford.edu Dean Gaudet dean-savan...@arctic.org Deepak Goel de...@gnufans.org diff --git a/src/sort.c b/src/sort.c index f438563..27726a5 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1366,7 +1366,6 @@ begfield (const struct line *line, const struct keyfield *key) char *ptr = line-text, *lim = ptr + line-length - 1; size_t sword = key-sword; size_t schar = key-schar; - size_t remaining_bytes; /* The leading field separator itself is included in a field when -t is absent. */ @@ -1392,12 +1391,7 @@ begfield (const struct line *line, const struct keyfield *key) while (ptr lim blanks[to_uchar (*ptr)]) ++ptr; - /* Advance PTR by SCHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (schar remaining_bytes) -ptr += schar; - else -ptr = lim; + ptr = MIN (lim, ptr + schar); return ptr; } @@ -1410,7 +1404,9 @@ limfield (const struct line *line, const struct keyfield *key) { char *ptr = line-text, *lim = ptr + line-length - 1; size_t eword = key-eword, echar = key-echar; - size_t remaining_bytes; + + if (echar == 0) +eword++; /* Skip all of end field. */ /* Move PTR past EWORD fields or to one past the last byte on LINE, whichever comes first. If there are more than EWORD fields, leave @@ -1487,19 +1483,14 @@ limfield (const struct line *line, const struct keyfield *key) } #endif - /* If we're ignoring leading blanks when computing the End - of the field, don't start counting bytes until after skipping - past any leading blanks. */ - if (key-skipeblanks) -while (ptr lim blanks[to_uchar (*ptr)]) - ++ptr; + if (echar != 0) /* We need to skip over a portion of the end field. */ +{ + if (key-skipeblanks) /* blanks not counted in echar. */ + while (ptr lim blanks[to_uchar (*ptr)]) + ++ptr; - /* Advance PTR by ECHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (echar remaining_bytes) -ptr += echar; - else -ptr = lim; + ptr = MIN (lim, ptr + echar); +} return ptr; } @@ -3152,12 +3143,9 @@ main (int argc, char **argv) badfieldspec (optarg, N_(field number is zero)); } if (*s == '.') - s = parse_field_count (s + 1, key-echar, - N_(invalid number after `.')); - else { - /* `-k 2,3' is equivalent to `+1 -3'. */ - key-eword++; + s = parse_field_count (s + 1, key-echar, + N_(invalid number after `.')); } s = set_ordering (s, key, bl_end); } diff --git a/tests/misc/sort b/tests/misc/sort index 3e8eda6..3af2388 100755 --- a/tests/misc/sort +++ b/tests/misc/sort @@ -110,6 +110,8 @@ my @Tests = [07b, '-k 2,3', {IN=a a b\nz a a\n}, {OUT=z a a\na a b\n}], [07c, '-k 2,3', {IN=y k b\nz k a\n}, {OUT=z k a\ny k b\n}], [07d, '+1 -3', {IN=y k b\nz k a\n}, {OUT=z k a\ny k b\n}], +# ensure a character position of 0 includes whole field +[07e, '-k 2,3.0', {IN=a a b\nz a a\n},
Re: sort b option in pos2 has strange effect
Pádraig Brady wrote: I've tweaked the patch a bit to simplify some code and expect to push it soon. Thanks to my friendly LUG I confirmed that solaris 9 and 10 behave as expected for these commands: printf a a b\nz a a\n | sort -k2,3.0 printf a y\na z\n | sort -k1,1b cheers, Pádraig. From 4a1f5d98265cf74297d9e523aa99fca80cc51e3c Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 24 Feb 2009 08:37:18 + Subject: [PATCH] sort: Fix two bugs with determining the end of field * src/sort.c: When no specific number of chars to skip is specified for the end field, always skip the whole field. Also never include leading spaces from next field. * tests/misc/sort: Add 2 new tests for these cases. * NEWS: Mention this bug fix. * THANKS: Add bug reporter. Reported by Davide Canova ... Thanks! Looks fine. Only one question: - /* If we're ignoring leading blanks when computing the End - of the field, don't start counting bytes until after skipping - past any leading blanks. */ - if (key-skipeblanks) -while (ptr lim blanks[to_uchar (*ptr)]) - ++ptr; + if (echar != 0) /* We need to skip over a portion of the end field. */ +{ + if (key-skipeblanks) /* blanks not counted in echar. */ Was something wrong with the comment you're removing, above? + while (ptr lim blanks[to_uchar (*ptr)]) + ++ptr; - /* Advance PTR by ECHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (echar remaining_bytes) -ptr += echar; - else -ptr = lim; + ptr = MIN (lim, ptr + echar); +} ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Jim Meyering wrote: Pádraig Brady wrote: I've tweaked the patch a bit to simplify some code and expect to push it soon. Thanks to my friendly LUG I confirmed that solaris 9 and 10 behave as expected for these commands: printf a a b\nz a a\n | sort -k2,3.0 printf a y\na z\n | sort -k1,1b cheers, Pádraig. From 4a1f5d98265cf74297d9e523aa99fca80cc51e3c Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 24 Feb 2009 08:37:18 + Subject: [PATCH] sort: Fix two bugs with determining the end of field * src/sort.c: When no specific number of chars to skip is specified for the end field, always skip the whole field. Also never include leading spaces from next field. * tests/misc/sort: Add 2 new tests for these cases. * NEWS: Mention this bug fix. * THANKS: Add bug reporter. Reported by Davide Canova ... Thanks! Looks fine. Only one question: - /* If we're ignoring leading blanks when computing the End - of the field, don't start counting bytes until after skipping - past any leading blanks. */ - if (key-skipeblanks) -while (ptr lim blanks[to_uchar (*ptr)]) - ++ptr; + if (echar != 0) /* We need to skip over a portion of the end field. */ +{ + if (key-skipeblanks) /* blanks not counted in echar. */ Was something wrong with the comment you're removing, above? I thought it was too verbose. It's replaced with: /* blanks not counted in echar. */ which should be obvious in along with the code? +while (ptr lim blanks[to_uchar (*ptr)]) + ++ptr; - /* Advance PTR by ECHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (echar remaining_bytes) -ptr += echar; - else -ptr = lim; + ptr = MIN (lim, ptr + echar); +} Same here. I removed the comment as the code is (now) obvious I think. cheers, Pádraig. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Pádraig Brady wrote: ... - /* If we're ignoring leading blanks when computing the End - of the field, don't start counting bytes until after skipping - past any leading blanks. */ - if (key-skipeblanks) -while (ptr lim blanks[to_uchar (*ptr)]) - ++ptr; + if (echar != 0) /* We need to skip over a portion of the end field. */ +{ + if (key-skipeblanks) /* blanks not counted in echar. */ Was something wrong with the comment you're removing, above? I thought it was too verbose. It's replaced with: /* blanks not counted in echar. */ which should be obvious in along with the code? In that case, please stick with the longer comment. Not only is it a complete sentence (which we prefer), but I find it more readable/descriptive. + while (ptr lim blanks[to_uchar (*ptr)]) + ++ptr; - /* Advance PTR by ECHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (echar remaining_bytes) -ptr += echar; - else -ptr = lim; + ptr = MIN (lim, ptr + echar); +} Same here. I removed the comment as the code is (now) obvious I think. Removing that one is ok, I suppose. But in general, please try to add rather than remove comments. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
For each fix, I usually try to determine when the bug was introduced and mention that in NEWS. Both of these date back to the very beginning, since sort from textutils-1.13 (yes, I actually built it ;-) exhibits the same incorrect behavior, and the code in that function barely changed between my initial import in 1992 and that 1.13. How many more +16-year-old bugs are lurking? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Davide Canova wrote: It seems to be doing as you describe, plus if a b option is used in POS2, it also eats the leading blanks in the field after (POS2 field if .0 is specified, POS2 field + 1 if .0 is implied): $ sort -k2b,3.0b a a b z a a ^D z a a a a b The location of a field-end is not affected by whether initial blanks are skipped, therefore a b option in POS2 should have some effect only if a non-zero '.c' character position is provided. I don't know what's going on exactly thought as as I don't know what's expected. It certainly seems buggy. I tried to omit the .0 AND the b option in POS2 in all our examples and I think what I get is the expected behavior. Specifying them shouldn't change anything. I think the attached patch should fix this issue up. cheers, Pádraig. From 3ca0151f2761ae0cbef14d3b4d36c183337ed6f7 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Tue, 24 Feb 2009 08:37:18 + Subject: [PATCH] sort: Fix a couple of bugs with determining end of fields Issue reported by Davide Canova kc.can...@gmail.com * src/sort.c: When no specific number of chars is specified to skip in the end field, always skip the whole field. Also never include leading spaces from next field. * tests/misc/sort: Add 2 new tests for these cases. --- src/sort.c | 36 +++- tests/misc/sort |5 + 2 files changed, 24 insertions(+), 17 deletions(-) diff --git a/src/sort.c b/src/sort.c index f438563..a5416f9 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1412,6 +1412,9 @@ limfield (const struct line *line, const struct keyfield *key) size_t eword = key-eword, echar = key-echar; size_t remaining_bytes; + if (echar == 0) +eword++; /* Skip all of end field. */ + /* Move PTR past EWORD fields or to one past the last byte on LINE, whichever comes first. If there are more than EWORD fields, leave PTR pointing at the beginning of the field having zero-based index, @@ -1487,19 +1490,21 @@ limfield (const struct line *line, const struct keyfield *key) } #endif - /* If we're ignoring leading blanks when computing the End - of the field, don't start counting bytes until after skipping - past any leading blanks. */ - if (key-skipeblanks) -while (ptr lim blanks[to_uchar (*ptr)]) - ++ptr; + if (echar != 0) /* We need to skip over a portion of the end field. */ +{ + if (key-skipeblanks) /* blanks not counted in echar. */ +{ + while (ptr lim blanks[to_uchar (*ptr)]) +++ptr; +} - /* Advance PTR by ECHAR (if possible), but no further than LIM. */ - remaining_bytes = lim - ptr; - if (echar remaining_bytes) -ptr += echar; - else -ptr = lim; + /* Advance PTR by ECHAR (if possible), but no further than LIM. */ + remaining_bytes = lim - ptr; + if (echar remaining_bytes) +ptr += echar; + else +ptr = lim; +} return ptr; } @@ -3152,12 +3157,9 @@ main (int argc, char **argv) badfieldspec (optarg, N_(field number is zero)); } if (*s == '.') - s = parse_field_count (s + 1, key-echar, - N_(invalid number after `.')); - else { - /* `-k 2,3' is equivalent to `+1 -3'. */ - key-eword++; + s = parse_field_count (s + 1, key-echar, + N_(invalid number after `.')); } s = set_ordering (s, key, bl_end); } diff --git a/tests/misc/sort b/tests/misc/sort index 3e8eda6..3e34f30 100755 --- a/tests/misc/sort +++ b/tests/misc/sort @@ -110,6 +110,7 @@ my @Tests = [07b, '-k 2,3', {IN=a a b\nz a a\n}, {OUT=z a a\na a b\n}], [07c, '-k 2,3', {IN=y k b\nz k a\n}, {OUT=z k a\ny k b\n}], [07d, '+1 -3', {IN=y k b\nz k a\n}, {OUT=z k a\ny k b\n}], +[07e, '-k 2,3.0', {IN=a a b\nz a a\n}, {OUT=z a a\na a b\n}], # # report an error for `.' without following char spec [08a, '-k 2.,3', {EXIT=2}, @@ -210,6 +211,10 @@ my @Tests = # key start and key end. [18e, '-nb -k1.1,1.2', {IN= 901\n100\n}, {OUT=100\n 901\n}], +# When ignoring leading blanks for end position, ensure blanks from +# next field are not included in the sort. I.E. order should not change here. +[18f, '-k1,1b', {IN=a y\na z\n}, {OUT=a y\na z\n}], + # This looks odd, but works properly -- 2nd keyspec is never # used because all lines are different. [19a, '+0 +1nr', {IN=b 2\nb 1\nb 3\n}, {OUT=b 1\nb 2\nb 3\n}], -- 1.5.3.6 ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Pádraig Brady wrote: I've looked at this a little more, and wow it's confusing. I think the documentation is inconsistent at least wrt -kPOS1,POS2.C For POS2.0 it says that it refers to the end of POS2 field. In fact what the code seems to be doing is referring to the end of field, POS2 - 1. This is demonstrated like: $ cat sort.test2 a a b z a a $ ./sort-dbg -k2b,3b sort.test2 #eword=3 here z a a a a b $ ./sort-dbg -k2b,3.0b sort.test2 #eword=2 here a a b z a a It seems to be doing as you describe, plus if a b option is used in POS2, it also eats the leading blanks in the field after (POS2 field if .0 is specified, POS2 field + 1 if .0 is implied): $ sort -k2b,3.0b a a b z a a ^D z a a a a b The location of a field-end is not affected by whether initial blanks are skipped, therefore a b option in POS2 should have some effect only if a non-zero '.c' character position is provided. I don't know what's going on exactly thought as as I don't know what's expected. It certainly seems buggy. I tried to omit the .0 AND the b option in POS2 in all our examples and I think what I get is the expected behavior. Specifying them shouldn't change anything. thanks! DC ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Pádraig Brady p...@draigbrady.com writes: Can someone at least confirm that we (should) interpret the a x input as having 2 fields, 'a' and ' x' ? A field comprises a maximal sequence of non-separating characters and, in the absence of option −t, any preceding field separator. Andreas. -- Andreas Schwab, SuSE Labs, sch...@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: sort b option in pos2 has strange effect
Davide Canova wrote: $ sort -k 1b,1b #1 a x a y a z ^D a z a y a x $ sort -k 1b,1.0b #2 a x a y a z ^D a x a y a z I'm confused by the first command output too. I'm not sure what the code is trying to do. It looks like if you do specify and end field, but don't specify the .# part of it (as in the first example above), it bumps the end field up one? The line of code doing that is: http://url.ie/141l That in conjunction with skipping ending blanks is causing the issue I think. cheers, Pádraig. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils