------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1279 Summary: Support Unicode Extended Grapheme Clusters with \X Product: PCRE Version: N/A Platform: All OS/Version: All Status: NEW Severity: bug Priority: medium Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] Please support matching the full standard Unicode definition of Extended Grapheme Clusters with \X. Currently, \X doesn't support the Unicode definition of a Extended Grapheme Cluster, although it is pretty close. The documentation makes it clear that this is known: "Note that recent versions of Perl have changed \X to match what Unicode calls an "extended grapheme cluster", which has a more complicated definition." Unfortunately, this makes PCRE incompatible with both Perl and ICU Regulator expressions in several important situations. Since matching Extended Grapheme Clusters is one of the most common things to do with Unicode regulator expressions, this situation also reflects somewhat poorly upon PCRE which otherwise has excellent Unicode support! So again, please support matching the full standard Unicode definition of Extended Grapheme Clusters with \X. Perhaps this support could be enabled/disabled via an option for some measure of backwards compatibility. However, I'd suggest that anyone using \X in patterns currently probably is doing so in an attempt to get real Extended Grapheme Clusters, but is only currently getting an approximation. It's less likely (but obviously possible) that someone thought, "I need (?>\PM\pM*) ... oh! \X in PCRE matches that!". Thanks for PCRE and considering this upgrade! =) -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
