Larry Wall <[EMAIL PROTECTED]> wrote:
> On Fri, Apr 15, 2005 at 11:44:03PM +0200, Juerd wrote:
>: Is there a <?ws>-like thingy that is always \s+?

> Not currently, since \s+ is there.  <?ws> used to be that, but
> currently is defined as the magical whitespace matcher used by :words.

>: Do \s and <?ws> match non-breaking whitespace, U+00A0?

> Yes.

> Yes, any Unicode whitespace, but you seem to have a different list than
> I do.

Well there are three different whitespace lists. The Parrot program [1]
below shows all, including space and blank.

$ ./parrot ws.imc
char    uws     ws      jws     sp      bl
U+0008  0       0       0       0       0
U+0009  1       1       0       1       1
U+000a  1       1       0       1       0
U+000b  1       1       0       1       0
U+000c  1       1       0       1       0
U+000d  1       1       0       1       0
U+0020  1       1       1       1       1
U+0085  1       1       0       1       0
U+00a0  1       0       1       1       1
U+1680  1       1       1       1       1
U+180e  1       1       1       1       1
U+2000  1       1       1       1       1
U+2001  1       1       1       1       1
U+2002  1       1       1       1       1
U+2003  1       1       1       1       1
U+2004  1       1       1       1       1
U+2005  1       1       1       1       1
U+2006  1       1       1       1       1
U+2007  1       0       1       1       1
U+2008  1       1       1       1       1
U+2009  1       1       1       1       1
U+200a  1       1       1       1       1
U+2028  1       1       1       1       0
U+2029  1       1       1       1       0
U+202f  1       0       1       1       1
U+205f  1       1       1       1       1
U+2060  0       0       0       0       0
U+3000  1       1       1       1       1
U+feff  0       0       0       0       0

> So I make it:

which seems to match Parrot_char_is_JavaSpaceChar

leo

[1]

Needs some additions, which I'll ci in a minute, and the ICU lib installed.
There isn't an interface for these functions yet, so they are looked up
via dlsym(3) inside parrot itself.

$ cat ws.imc
.sub main @MAIN
    .local pmc chars, uws, ws, jws, sp, bl, nul, fmt
    .local int i, n, is, c
    chars = new ResizableIntegerArray
    push chars, 0x8
    push chars, 0x9
    push chars, 0xa
    push chars, 0xb
    push chars, 0xc
    push chars, 0xd
    push chars, 0x20
    push chars, 0x85
    push chars, 0xA0
    push chars, 0x1680
    push chars, 0x180e
    i = 0x2000
pl:
    push chars, i
    inc i
    if i <= 0x200a goto pl
    push chars, 0x2028
    push chars, 0x2029
    push chars, 0x202f
    push chars, 0x205f
    push chars, 0x2060
    push chars, 0x3000
    push chars, 0xfeff

    null nul
    uws = dlfunc nul, "Parrot_char_is_UWhiteSpace", "IJI"
    ws  = dlfunc nul, "Parrot_char_is_Whitespace", "IJI"
    jws = dlfunc nul, "Parrot_char_is_JavaSpaceChar", "IJI"
    sp  = dlfunc nul, "Parrot_char_is_space", "IJI"
    bl  = dlfunc nul, "Parrot_char_is_blank", "IJI"

    n = elements chars
    i = 0
    print "char uws     ws      jws     sp      bl\n"
loop:
    fmt = new ResizableIntegerArray
    c = chars[i]
    push fmt, c
    is = uws(c)
    push fmt, is
    is = ws(c)
    push fmt, is
    is = jws(c)
    push fmt, is
    is = sp(c)
    push fmt, is
    is = bl(c)
    push fmt, is
    $S0 = sprintf "U+%04x\t%d\t%d\t%d\t%d\t%d\n", fmt
    print $S0
    inc i
    if i < n goto loop
.end

Reply via email to