Hong Zhang <[EMAIL PROTECTED]> writes: >> > preprocessing. Another example, if I want to search for /resume/e, >> > (equivalent matching), the regex engine can normalize the case, fully >> > decompose input string, strip off any combining character, and do 8-bit >> >> Hmmm. The above sounds complicated not quite what I had in mind >> for equivalence matching: I would have just said "both the pattern >> and the target need to normalized, as defined by Unicode". Then >> the comparison and searching reduce to the trivial cases of byte >> equivalence and searching (of which B-M is the most popular example). > > You are right in some sense. But "normalized, as defined by Unicode" > may not be simple. I look at unicode regex tr18. It does not specify > equivalence of "resume" vs "re`sume`", but user may want or may not > want this kind of normalization.
But e` and e are different letters man. And re`sume` and resume are different words come to that. If the user wants something that'll match 'em both then the pattern should surely be: /r[ee`]sum[ee`]/ Of course, it might be nice to have something that lets us do /r\any_accented(e)sum\any_accented(e)/ (or some such, notation is terrible I know), but my point is that such searches should be explicit. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?