Dan Sugalski <[EMAIL PROTECTED]> wrote:

> You're going to run into problems no matter what you do, and as
> transcoding could happen with each comparison arguably you need to make a
> local copy of the string for each comparison, as otherwise you run the
> risk of significant data loss as a sring gets transcoded back and forth
> across a lossy boundary.

Here is again, what I already had proposed:
 * as long as there are only ascii keys: noop
 * on first non ascii key, convert all hash to utf8 - doesn't change
   hash values
 * then if key is non-ascii and non-utf8 transcode it in
   find_bucket() before string_compare

The hash (assuming ascii is used mainly) starts out with a compare
function pointing to a strcmp()-alike compare function. Each key that
enters the hash either for insert or for find is checked for its
encoding/type. When the first non-ascii key is inserted, hash keys are
converted to utf8 and the compare function pointer is changed to do utf8
compare. Non-ascii search keys are always transcoded to utf8 first - and
only once.

> Regardless, I think at least a single string copy with comparison against
> that copy within the hash functions is the only way to get correct
> results.

Yes. That's the point - a single string copy. Now each compare could do
a transcode i.e. generate a new string.

>                                       Dan

leo

Reply via email to