> > preprocessing. Another example, if I want to search for /resume/e, > > (equivalent matching), the regex engine can normalize the case, fully > > decompose input string, strip off any combining character, and do 8-bit > > Hmmm. The above sounds complicated not quite what I had in mind > for equivalence matching: I would have just said "both the pattern > and the target need to normalized, as defined by Unicode". Then > the comparison and searching reduce to the trivial cases of byte > equivalence and searching (of which B-M is the most popular example).
You are right in some sense. But "normalized, as defined by Unicode" may not be simple. I look at unicode regex tr18. It does not specify equivalence of "resume" vs "re`sume`", but user may want or may not want this kind of normalization. Hong