Larry Wall <[EMAIL PROTECTED]> wrote: > On Fri, Apr 15, 2005 at 11:44:03PM +0200, Juerd wrote: >: Is there a <?ws>-like thingy that is always \s+?
> Not currently, since \s+ is there. <?ws> used to be that, but > currently is defined as the magical whitespace matcher used by :words. >: Do \s and <?ws> match non-breaking whitespace, U+00A0? > Yes. > Yes, any Unicode whitespace, but you seem to have a different list than > I do. Well there are three different whitespace lists. The Parrot program [1] below shows all, including space and blank. $ ./parrot ws.imc char uws ws jws sp bl U+0008 0 0 0 0 0 U+0009 1 1 0 1 1 U+000a 1 1 0 1 0 U+000b 1 1 0 1 0 U+000c 1 1 0 1 0 U+000d 1 1 0 1 0 U+0020 1 1 1 1 1 U+0085 1 1 0 1 0 U+00a0 1 0 1 1 1 U+1680 1 1 1 1 1 U+180e 1 1 1 1 1 U+2000 1 1 1 1 1 U+2001 1 1 1 1 1 U+2002 1 1 1 1 1 U+2003 1 1 1 1 1 U+2004 1 1 1 1 1 U+2005 1 1 1 1 1 U+2006 1 1 1 1 1 U+2007 1 0 1 1 1 U+2008 1 1 1 1 1 U+2009 1 1 1 1 1 U+200a 1 1 1 1 1 U+2028 1 1 1 1 0 U+2029 1 1 1 1 0 U+202f 1 0 1 1 1 U+205f 1 1 1 1 1 U+2060 0 0 0 0 0 U+3000 1 1 1 1 1 U+feff 0 0 0 0 0 > So I make it: which seems to match Parrot_char_is_JavaSpaceChar leo [1] Needs some additions, which I'll ci in a minute, and the ICU lib installed. There isn't an interface for these functions yet, so they are looked up via dlsym(3) inside parrot itself. $ cat ws.imc .sub main @MAIN .local pmc chars, uws, ws, jws, sp, bl, nul, fmt .local int i, n, is, c chars = new ResizableIntegerArray push chars, 0x8 push chars, 0x9 push chars, 0xa push chars, 0xb push chars, 0xc push chars, 0xd push chars, 0x20 push chars, 0x85 push chars, 0xA0 push chars, 0x1680 push chars, 0x180e i = 0x2000 pl: push chars, i inc i if i <= 0x200a goto pl push chars, 0x2028 push chars, 0x2029 push chars, 0x202f push chars, 0x205f push chars, 0x2060 push chars, 0x3000 push chars, 0xfeff null nul uws = dlfunc nul, "Parrot_char_is_UWhiteSpace", "IJI" ws = dlfunc nul, "Parrot_char_is_Whitespace", "IJI" jws = dlfunc nul, "Parrot_char_is_JavaSpaceChar", "IJI" sp = dlfunc nul, "Parrot_char_is_space", "IJI" bl = dlfunc nul, "Parrot_char_is_blank", "IJI" n = elements chars i = 0 print "char uws ws jws sp bl\n" loop: fmt = new ResizableIntegerArray c = chars[i] push fmt, c is = uws(c) push fmt, is is = ws(c) push fmt, is is = jws(c) push fmt, is is = sp(c) push fmt, is is = bl(c) push fmt, is $S0 = sprintf "U+%04x\t%d\t%d\t%d\t%d\t%d\n", fmt print $S0 inc i if i < n goto loop .end