On 29. mars 2013 13:38, Kornel Benko wrote:
I seem unable to find strings using non ascii chars (e.g. latin2)

(Please try to use UTF-8 encoding to read this mail)

The regex search string may be "pou.i.", so I was expecting to find

e.g. "použiť". I have to use '..' to find this single chars. ("pou..i..")


I believe this happens because the "ž" is encoded as two bytes when using UTF-8. And I guess the regexp matching software in use works on "bytes", not "characters". So, you are forced to use two periods to match the two bytes in "ž". And more, if you want to match Chinese characters.

The solution would be regexp matching software that is unicode-aware.
A link to such software:
http://abies.nmsu.edu/pkgsrc/boost/libs/regex/doc/icu_strings.html

Helge Hafting

Reply via email to