Hong Zhang <[EMAIL PROTECTED]> writes:

>> > preprocessing. Another example, if I want to search for /resume/e,
>> > (equivalent matching), the regex engine can normalize the case, fully 
>> > decompose input string, strip off any combining character, and do 8-bit
>> 
>> Hmmm.  The above sounds complicated not quite what I had in mind
>> for equivalence matching: I would have just said "both the pattern
>> and the target need to normalized, as defined by Unicode".  Then 
>> the comparison and searching reduce to the trivial cases of byte
>> equivalence and searching (of which B-M is the most popular example).
>
> You are right in some sense. But "normalized, as defined by Unicode"
> may not be simple. I look at unicode regex tr18. It does not specify
> equivalence of "resume" vs "re`sume`", but user may want or may not
> want this kind of normalization.

But e` and e are different letters man. And re`sume` and resume are
different words come to that. If the user wants something that'll
match 'em both then the pattern should surely be:

   /r[ee`]sum[ee`]/

Of course, it might be nice to have something that lets us do

   /r\any_accented(e)sum\any_accented(e)/

(or some such, notation is terrible I know), but my point is that such
searches should be explicit.

-- 
Piers

   "It is a truth universally acknowledged that a language in
    possession of a rich syntax must be in need of a rewrite."
         -- Jane Austen?

Reply via email to