Has anybody specifically looked at how Perl6 regexes might map to
the various requirements of UTS#18, Unicode Regular Expressions?
http://unicode.org/reports/tr18/
I ask because to my inexperienced eye, quite a few perl6isms are
*much* better at this than in perl5 obtain, and so I wondered
whether this was by conscious intent and design. Is/Was it?
I'm also curious whether there are active plans to address the
tr18 requirements in perl6 regexes. It would be a wonderful
feather in perl6's cap to be able to legitimately claim Level 2
or even Level 3 compliance, since besides perl5, only ICU right
now manages even Level 1, with everybody else *very* far behind.
TR18 specifies three levels of support (Basic, Extended, and Tailored),
with each having specific, reasonably well-defined requirements:
=Level 1: Basic Unicode Support
RL1.1 Hex Notation
RL1.2 Properties
RL1.2a Compatibility Properties
RL1.3 Subtraction and Intersection
RL1.4 Simple Word Boundaries
RL1.5 Simple Loose Matches
RL1.6 Line Boundaries
RL1.7 Supplementary Code Points
=Level 2: Extended Unicode Support
RL2.1 Canonical Equivalents
RL2.2 Default Grapheme Clusters
RL2.3 Default Word Boundaries
RL2.4 Default Loose Matches
RL2.5 Name Properties
RL2.6 Wildcard Properties
=Level 3: Tailored Unicode Support
RL3.1 Tailored Punctuation
RL3.2 Tailored Grapheme Clusters
RL3.3 Tailored Word Boundaries
RL3.4 Tailored Loose Matches
RL3.5 Tailored Ranges
RL3.6 Context Matching
RL3.7 Incremental Matches
( RL3.8 Unicode Set Sharing )
RL3.9 Possible Match Sets
RL3.10 Folded Matching
RL3.11 Submatchers
thanks,
--tom