On 2013-02-23 17:51, Devin Jeanpierre wrote:
On Sat, Feb 23, 2013 at 12:41 PM, MRAB <pyt...@mrabarnett.plus.com>
wrote:
Getting full case folding to work can be tricky. There's always
going to be a limit to what's worth doing.

There are also areas where it's not clear what the result should
be. You've already mentioned matching 's' against 'ß' (fails) and
matching 'ss' against 'ß' (succeeds), but how about matching
'(s)(s)' against 'ß' (fails)?

For the record, Perl also says that 'ss' matches 'ß', but 's+' does
not.

I would find it helpful to know the exact rules. The regex module
docs say that it works, but don't say what it means to "work".

The basic rule is that a series of characters in the regex must match a
series of characters in the text, with no partial matches in either.

For example, 'ss' can match 'ß', but 's' can't match 'ß' because that
would be matching part of 'ß'.

In a regex like 's+', you're asking it to match one or more repetitions
of 's', but that would mean that 's' would have to match part of 'ß' in
the first iteration and the remainder of 'ß' in the second iteration.

Although it's theoretically possible to do that, the code is already
difficult enough. The cost outweighs the potential benefit.

If you'd like to have a go at implementing it, the code _is_ open
source. :-)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to