The code points giving you trouble are 0xFDD0..0xFDEF: http://stackoverflow.com/questions/5188679/whats-the-purpose-of-the-noncharacters-ufdd0-to-ufdef
You can split this into two ranges to avoid the problematic points (and could use this to combine the distinct ranges you have above.) $ perl6 -e 'say ?("\c[0xFDCF]" ~~ /<[\c[0xE000]..\c[0xFDCF]\c[0xFDF0]..\c[0xFFFD]]>/)' True Note that if you have invalid UTF-8 input, though, you'll still get the invalid character error, so you'll need to deal with that before trying to use the rule. $ perl6 -e 'say ?("\c[0xFDD0]" ~~ /<[\c[0xE000]..\c[0xFDCF]\c[0xFDF0]..\c[0xFFFD]]>/)' ===SORRY!=== Invalid character for UTF-8 encoding Hope this helps. On Mon, Feb 18, 2013 at 11:29 PM, David Warring <david.warr...@gmail.com>wrote: > Hi Guys, > A quick question. > > I'm trying to interpret unicode code-point ranges from the CSS 3 spec - > http://www.w3.org/TR/css3-syntax/#CHARSETS > > The rule in question is > > nonascii :== #x80-#xD7FF #xE000-#xFFFD #x10000-#x10FFFF > > Where (I think) these are unicode code-point ranges. > > The latest rakudo build is fine with: > > > % perl6 -e perl6 -e '/<[\c[0x80]..\c[0xD7FF]]>/' > > > ...but doesn't like the second (or third) range: > > > % perl6 -e '/<[\c[0xE000]..\c[0xFFFD]]>/' > ===SORRY!=== > Invalid character for UTF-8 encoding > > > ...the individual code points are ok: > > > % perl6 -e '/<[\c[0xE000]]>/' > % perl6 -e '/<[\c[0xFFFD]]>/' > > > I'm think I'm getting the above error because not all unicode code-points > are defined for the range xE000 to xFFFD - see > http://www.utf8-chartable.de/unicode-utf8-table.pl . > > I'm just having a problem implementing a concise regex/grammar rule for the > above. Looking for advice. > > Cheers, > David Warring > -- Will "Coke" Coleda