Re: From JoyceUlysses.txt -- words occurring exactly once

Dieter Maurer via Python-list Tue, 04 Jun 2024 09:41:33 -0700

Edward Teach wrote at 2024-6-3 10:47 +0100:
> ...
>The Gutenburg Project publishes "plain text".  That's another problem,
>because "plain text" means UTF-8....and that means unicode...and that
>means running some sort of unicode-to-ascii conversion in order to get
>something like "words".  A couple of hours....a couple of hundred lines
>of C....problem solved!


Unicode supports the notion "owrd" even better "ASCII".
For example, the `\w` (word charavter) regular expression wild card,
works for Unicode like for ASCII (of course with enhanced letter,
digits, punctuation, etc.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: From JoyceUlysses.txt -- words occurring exactly once

Reply via email to