Pádraig Brady wrote:
> \u3000 is ideographic space, i.e. a space generally used in east asian text
> so that alignment is maintained. Since it's a space, and not non breaking
> space
> it should be treated as a blank character IMHO.
It should be treated like a space character. Implementations essentially
agree what this means. See gnulib/tests/test-c32isspace.c.
The "blank" character category has, unfortunately, so much variation among
implementations that it is not really useful. See
gnulib/tests/test-c32isblank.c:
case '3':
/* Locale encoding is UTF-8. */
{
#if defined __GLIBC__
/* U+00A0 NO-BREAK SPACE */
is = for_character ("\302\240", 2);
ASSERT (is == 0);
#endif
/* U+00B7 MIDDLE DOT */
is = for_character ("\302\267", 2);
ASSERT (is == 0);
#if defined __GLIBC__
/* U+202F NARROW NO-BREAK SPACE */
is = for_character ("\342\200\257", 3);
ASSERT (is == 0);
#endif
/* U+3002 IDEOGRAPHIC FULL STOP */
is = for_character ("\343\200\202", 3);
ASSERT (is == 0);
/* U+1D13D MUSICAL SYMBOL QUARTER REST */
is = for_character ("\360\235\204\275", 4);
ASSERT (is == 0);
/* U+E0020 TAG SPACE */
is = for_character ("\363\240\200\240", 4);
ASSERT (is == 0);
}
I could not find any non-ASCII character for which iswblank is true
across platforms.
Bruno