I will check when later this weekend.  I have to investigate.  

Sent from my iPhone

> On Jul 24, 2014, at 4:09 PM, "Alexander Pyhalov via illumos-discuss" 
> <[email protected]> wrote:
> 
> Hello.
> During gnu grep update I've found out that one test fails, specifically
> gawk 'BEGIN { printf "\xe2\x80\x80\n" }'  doesn't match for grep '\s'
> 
> GNU grep testsuite checks that following UTF-8 symbols are spaces:
> 
> utf8_space_characters=$(sed 's/.*://;s/  */\\x/g' <<\EOF
> U+0009 Horizontal Tab:            09
> U+000B Vertical Tab:              0b
> U+000C Form feed:                 0c
> U+000D Carriage return:           0d
> U+0020 SPACE:                     20
> U+1680 OGHAM SPACE MARK:          e1 9a 80
> U+2000 EN QUAD:                   e2 80 80
> U+2001 EM QUAD:                   e2 80 81
> U+2002 EN SPACE:                  e2 80 82
> U+2003 EM SPACE:                  e2 80 83
> U+2004 THREE-PER-EM SPACE:        e2 80 84
> U+2005 FOUR-PER-EM SPACE:         e2 80 85
> U+2006 SIX-PER-EM SPACE:          e2 80 86
> U+2008 PUNCTUATION SPACE:         e2 80 88
> U+2009 THIN SPACE:                e2 80 89
> U+200A HAIR SPACE:                e2 80 8a
> U+205F MEDIUM MATHEMATICAL SPACE: e2 81 9f
> U+3000 IDEOGRAPHIC SPACE:         e3 80 80
> EOF
> )
> 
> Checks for
> e1 9a 80
> e2 80 80 - e2 80 8a
> e2 81 9f, e3 80 80
> fail.
> 
> I've verified whith the following C99 program
> #include <wchar.h>
> #include <wctype.h>
> #include <locale.h>
> #include <stdio.h>
> void try_with(wchar_t c, const char* loc)
> {
>    setlocale(LC_ALL, loc);
>    printf("in locale %s iswspace returned  %d\n",loc,iswspace(c));
> }
> int main()
> {
> //    wchar_t EM_SPACE = L'\u2003'; // Unicode character 'EM SPACE'
>    wchar_t EM_SPACE = L'\u205f';
>    try_with(EM_SPACE, "C");
>    try_with(EM_SPACE, "en_US.UTF-8");
> }
> 
> that iswspace considers \u2003 (as I understand it corresponds to e2 80 83) 
> and \u205f ( e2 81 9f) non-spaces.
> I've run the same test program on FreeBSD. It considers both characters 
> spaces in en_US.UTF-8 locale.
> Is it a bug or do I miss something?
> 
> -- 
> System Administrator of Southern Federal University Computer Center
> 
> 
> -------------------------------------------
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/22003744-9012f59c
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to