I will check when later this weekend. I have to investigate. Sent from my iPhone
> On Jul 24, 2014, at 4:09 PM, "Alexander Pyhalov via illumos-discuss" > <[email protected]> wrote: > > Hello. > During gnu grep update I've found out that one test fails, specifically > gawk 'BEGIN { printf "\xe2\x80\x80\n" }' doesn't match for grep '\s' > > GNU grep testsuite checks that following UTF-8 symbols are spaces: > > utf8_space_characters=$(sed 's/.*://;s/ */\\x/g' <<\EOF > U+0009 Horizontal Tab: 09 > U+000B Vertical Tab: 0b > U+000C Form feed: 0c > U+000D Carriage return: 0d > U+0020 SPACE: 20 > U+1680 OGHAM SPACE MARK: e1 9a 80 > U+2000 EN QUAD: e2 80 80 > U+2001 EM QUAD: e2 80 81 > U+2002 EN SPACE: e2 80 82 > U+2003 EM SPACE: e2 80 83 > U+2004 THREE-PER-EM SPACE: e2 80 84 > U+2005 FOUR-PER-EM SPACE: e2 80 85 > U+2006 SIX-PER-EM SPACE: e2 80 86 > U+2008 PUNCTUATION SPACE: e2 80 88 > U+2009 THIN SPACE: e2 80 89 > U+200A HAIR SPACE: e2 80 8a > U+205F MEDIUM MATHEMATICAL SPACE: e2 81 9f > U+3000 IDEOGRAPHIC SPACE: e3 80 80 > EOF > ) > > Checks for > e1 9a 80 > e2 80 80 - e2 80 8a > e2 81 9f, e3 80 80 > fail. > > I've verified whith the following C99 program > #include <wchar.h> > #include <wctype.h> > #include <locale.h> > #include <stdio.h> > void try_with(wchar_t c, const char* loc) > { > setlocale(LC_ALL, loc); > printf("in locale %s iswspace returned %d\n",loc,iswspace(c)); > } > int main() > { > // wchar_t EM_SPACE = L'\u2003'; // Unicode character 'EM SPACE' > wchar_t EM_SPACE = L'\u205f'; > try_with(EM_SPACE, "C"); > try_with(EM_SPACE, "en_US.UTF-8"); > } > > that iswspace considers \u2003 (as I understand it corresponds to e2 80 83) > and \u205f ( e2 81 9f) non-spaces. > I've run the same test program on FreeBSD. It considers both characters > spaces in en_US.UTF-8 locale. > Is it a bug or do I miss something? > > -- > System Administrator of Southern Federal University Computer Center > > > ------------------------------------------- > illumos-discuss > Archives: https://www.listbox.com/member/archive/182180/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182180/22003744-9012f59c > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
