I came up with these tests which I though should work: ok("π" ~~ /<[π]>/, "π as a character class"); ok("π" ~~ /<[\x03c0]>/, "π as a character class (hex)"); ok("π" ~~ /<[\x0391 .. \x03c9]>/, "π in a character class range"); ok("π" ~~ /\w/, "π as a word character");
Of those, only the first one actually did work. The others all fail. Am I misunderstanding how these constructs should work? PS: The reason I'm running into this is that my URI matcher for RFC3987 needs to match many large blocks of characters (essentially all non-ascii, valid Unicode codepoints) per the spec at: http://www.ietf.org/rfc/rfc3987.txt Specifically: ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD Which I tried to translate as: token ucschar { <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] + [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] + [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] + [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] + [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] + [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] + [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] + [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]> } But this refuses to match my test IRI's one-character path: http://www.example.com/π -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs