On 2019-10-12 20:48, Serhiy Storchaka wrote:
12.10.19 21:08, Eko palypse пише:
So how can I make it work with utf8 encoded text?

You cannot. First, \w in re.LOCALE works only when the text is encoded
with the locale encoding (cp1252 in your case). Second, re.LOCALE
supports only 8-bit charsets. So even if you set the utf-8 locale, it
would not help.

Regular expressions with re.LOCALE are slow. It may be more efficient to
decode text and use Unicode regular expression.

+1

It's best to treat re.LOCALE as being for old legacy encodings that use/used 8 bits per character. Wherever possible, decode to Unicode and work with that instead.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to