Is there a <?ws>-like thingy that is always \s+?

Do \s and <?ws> match non-breaking whitespace, U+00A0?

How about:

    U+0008  backspace
    U+00A0  no break space (Repeated for overview)
    U+1361  ethiopic wordspace
    U+2000  en quad
    U+2001  em quad
    U+2002  en space
    U+2003  em space
    U+2004  three per em space
    U+2005  four per em space
    U+2006  six per em space
    U+2007  figure space
    U+2008  punctuation space
    U+2009  thin space 
    U+200A  hair space
    U+200B  zero width space
    U+202F  narrow no break space
    U+205F  medium mathematic space
    U+2060  word joiner (What is that, anyway?)
    U+3000  ideographic space
    U+FEFF  zero width non-breaking space
    
\s is said (in S05) to match any unicode whitespace, but letting it
match NBSP and then using \s for splitting things is wrong, I think.

Are the contents of <> split using <?ws>? (Is <<$foo>>, where $foo is
"foo\xA0bar", one or two elements?)


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html

Reply via email to