Re: Identifying unicode punctuation characters with Python regex

Martin v. Löwis Fri, 14 Nov 2008 02:31:11 -0800

> I'm trying to build a regex in python to identify punctuation
> characters in all the languages. Some regex implementations support an
> extended syntax \p{P} that does just that. As far as I know, python re
> doesn't. Any idea of a possible alternative?


You should use character classes. You can generate them automatically
from the unicodedata module: check whether unicodedata.category(c)
starts with "P".

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

Reply via email to