Edward Teach wrote at 2024-6-3 10:47 +0100:
> ...
>The Gutenburg Project publishes "plain text".  That's another problem,
>because "plain text" means UTF-8....and that means unicode...and that
>means running some sort of unicode-to-ascii conversion in order to get
>something like "words".  A couple of hours....a couple of hundred lines
>of C....problem solved!

Unicode supports the notion "owrd" even better "ASCII".
For example, the `\w` (word charavter) regular expression wild card,
works for Unicode like for ASCII (of course with enhanced letter,
digits, punctuation, etc.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to