Dan Sugalski <[EMAIL PROTECTED]> wrote: > You're going to run into problems no matter what you do, and as > transcoding could happen with each comparison arguably you need to make a > local copy of the string for each comparison, as otherwise you run the > risk of significant data loss as a sring gets transcoded back and forth > across a lossy boundary.
Here is again, what I already had proposed: * as long as there are only ascii keys: noop * on first non ascii key, convert all hash to utf8 - doesn't change hash values * then if key is non-ascii and non-utf8 transcode it in find_bucket() before string_compare The hash (assuming ascii is used mainly) starts out with a compare function pointing to a strcmp()-alike compare function. Each key that enters the hash either for insert or for find is checked for its encoding/type. When the first non-ascii key is inserted, hash keys are converted to utf8 and the compare function pointer is changed to do utf8 compare. Non-ascii search keys are always transcoded to utf8 first - and only once. > Regardless, I think at least a single string copy with comparison against > that copy within the hash functions is the only way to get correct > results. Yes. That's the point - a single string copy. Now each compare could do a transcode i.e. generate a new string. > Dan leo