On Tue, Mar 29, 2011 at 22:40, Lennart Regebro <rege...@gmail.com> wrote: > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". >
For that matter, what happens with combining characters? '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL LETTER O WITH DIAERESIS}' I guess the filesystem shouldn't treat these as the same (even though they are), but what if some webservice does? I suspect you should normalize both strings before comparing them in any blacklist, and what happens with surrogates when you normalize? //Lennart _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com