-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi,
Nicholas Clark wrote: | On Wed, Oct 05, 2005 at 05:20:34PM -0400, [EMAIL PROTECTED] wrote: | |>Should a non-breaking space character be treated as whitespace in |>perl source code? It doesn't appear to be: | | | As far as I know code points outside the range 0-127 are invalid, except as | quotes for q, qq, etc, by default. Under use utf8; Unicode word characters | can also be used in identifiers. The classification of characters (locale category LC_CTYPE) is locale dependent and therefore unfortunately system dependent. In most (if not all) locales defined in current GNU libc versions, the no-break space is ~ classified as punctuation, graphical, and printable. See http://sourceware.org/cgi-bin/cvsweb.cgi/libc/localedata/locales/i18n?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=glibc, and search for "<U00A0>". | I doubt that this will change in perl 5, because the parser is written in C, | and so it would be very hard work to replace it with something that was fully | Unicode aware. I don't see that this has something to do with C but with the locale definitions used in the system libc. But in fact, the whole purpose of the no-break space is to provide a blank character that is _not_ interpreted as a space. Ciao, Guido - -- Imperia AG, Development Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDXlWUOo0HNPWNDz0RAqo/AKCPbQzVnSEC2FNY3bQWafaVpqcbRwCgwfmv jG5jX81CcdJ1KFL9HzhS81w= =1RuS -----END PGP SIGNATURE-----