Tony Nelson wrote: > Umm, 0 (NUL) is a valid output character in most of the 8-bit character > sets. It could be handled by having a separate "exceptions" string of the > unicode code points that actually map to the exception char.
Yes. But only U+0000 should normally map to 0. It could be special-cased altogether. > As you are concerned about pathological cases for hashing (that would make > the hash chains long), it is worth noting that in such cases this data > structure could take 64K bytes. Of course, no such case occurs in standard > encodings, and 64K is affordable as long is it is rare. I'm not concerned with long hash chains, I dislike having collisions in the first place (even if they are only for two code points). Having to deal with collisions makes the code more complicated, and less predictable. It's primarily a matter of taste: avoid hashtables if you can :-) Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com