One feature that seems to be missing in the re module (or any tools that I know 
for searching text) is "diacretical incensitive search". I would like to have a 
match for something like this:

re.match("franc", "français")

in about the same whay we can have a case incensitive search:

re.match("(?i)fran", "Français").

Another related and more general problem (in the sense that it could easily be 
used to solve the first problem) would be to translate a string removing any 
diacritical mark:

nodiac("Français") -> "Francais"

The algorithm to write such a function is trivial but there are a lot of mark 
we can put on a letter. It would be necessary to have the list of "a"'s with 
something on it. i.e. "à,á,ã", etc. and this for every letter. Trying to make 
such a list by hand would inevitably lead to some symbols forgotten (and would 
be tedious). 

Olive


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to