Re: Regular expression for non-ascii chars, advanced search

Helge Hafting Sun, 31 Mar 2013 04:39:58 -0700

On 29. mars 2013 13:38, Kornel Benko wrote:

I seem unable to find strings using non ascii chars (e.g. latin2)


(Please try to use UTF-8 encoding to read this mail)

The regex search string may be "pou.i.", so I was expecting to find

e.g. "použiť". I have to use '..' to find this single chars. ("pou..i..")

I believe this happens because the "ž" is encoded as two bytes whenusing UTF-8. And I guess the regexp matching software in use works on"bytes", not "characters". So, you are forced to use two periods tomatch the two bytes in "ž". And more, if you want to match Chinesecharacters.


The solution would be regexp matching software that is unicode-aware.
A link to such software:
http://abies.nmsu.edu/pkgsrc/boost/libs/regex/doc/icu_strings.html

Helge Hafting

Re: Regular expression for non-ascii chars, advanced search

Reply via email to