> > preprocessing. Another example, if I want to search for /resume/e,
> > (equivalent matching), the regex engine can normalize the case, fully 
> > decompose input string, strip off any combining character, and do 8-bit
> 
> Hmmm.  The above sounds complicated not quite what I had in mind
> for equivalence matching: I would have just said "both the pattern
> and the target need to normalized, as defined by Unicode".  Then 
> the comparison and searching reduce to the trivial cases of byte
> equivalence and searching (of which B-M is the most popular example).

You are right in some sense. But "normalized, as defined by Unicode"
may not be simple. I look at unicode regex tr18. It does not specify
equivalence of "resume" vs "re`sume`", but user may want or may not
want this kind of normalization.

Hong

Reply via email to