Edward Teach wrote at 2024-6-3 10:47 +0100: > ... >The Gutenburg Project publishes "plain text". That's another problem, >because "plain text" means UTF-8....and that means unicode...and that >means running some sort of unicode-to-ascii conversion in order to get >something like "words". A couple of hours....a couple of hundred lines >of C....problem solved!
Unicode supports the notion "owrd" even better "ASCII". For example, the `\w` (word charavter) regular expression wild card, works for Unicode like for ASCII (of course with enhanced letter, digits, punctuation, etc.) -- https://mail.python.org/mailman/listinfo/python-list