On Fri, 2005-04-15 at 17:44, Juerd wrote: > Is there a <?ws>-like thingy that is always \s+?
Not sure what that means exactly. > Do \s and <?ws> match non-breaking whitespace, U+00A0? As I understood, Perl 6 was going to use the Unicode standard(s) to determine the whitespacishness of each codepoint. Going to Google, I find: http://www.fileformat.info/info/unicode/category/Zs/list.htm which lists all of the "separator, space" characters. > How about: > > U+0008 backspace Character.isWhitespace() No > U+00A0 no break space (Repeated for overview) Character.isWhitespace() No > U+1361 ethiopic wordspace Character.isWhitespace() No > U+2000 en quad Character.isWhitespace() Yes > U+2001 em quad Character.isWhitespace() Yes > U+2002 en space Character.isWhitespace() Yes > U+2003 em space Character.isWhitespace() Yes > U+2004 three per em space Character.isWhitespace() Yes > U+2005 four per em space Character.isWhitespace() Yes > U+2006 six per em space Character.isWhitespace() Yes > U+2007 figure space Character.isWhitespace() No > U+2008 punctuation space Character.isWhitespace() Yes > U+2009 thin space Character.isWhitespace() Yes > U+200A hair space Character.isWhitespace() Yes > U+200B zero width space Character.isWhitespace() Yes > U+202F narrow no break space Character.isWhitespace() No > U+205F medium mathematic space Character.isWhitespace() Yes > U+2060 word joiner (What is that, anyway?) Character.isWhitespace() No Comments WJ a zero width non-breaking space (only) intended for disambiguation of functions for byte order mark > U+3000 ideographic space Character.isWhitespace() Yes > U+FEFF zero width non-breaking space Character.isWhitespace() No > \s is said (in S05) to match any unicode whitespace, but letting it > match NBSP and then using \s for splitting things is wrong, I think. Thankfully, NBSP (U+00A0) is not Unicode whitespace. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback