On 6 Aug 2013, at 14:30, Stefan Bidi <[email protected]> wrote: > I copied the hash algorithm straight out of -base, so they should match. I > remember a few months ago Richard was playing around with hash functions and > this might be causing some issues, now.
It wouldn't on a normal setup ... the experimental hash code is used only if you explicitly build it. > I just looked it up, the changes were made on rev 36344. > > There is another issue... -base allows UTF-8 strings, which will not be > hashed to the same UTF-16 value. They are hashed to the same value as other strings, in base hashing is computed on unicode codepoint. > In my opinion, allowing UTF-8 string literals is not a good idea and base > should revert back to Latin1 as the default C string encoding. gnustep-base still uses latin1 as the default C string encoding. The change with string literals is one from ascii to utf-8 > I'm actually debating adding a UTF-16 string literals configure option for > corebase. I believe using UTF-16 internally is the only sane solution to > non-ASCII encodings. > > I've tried experimenting with other hash functions that are not > one-at-a-time, but unfortunately have not found anything that will work on > both ASCII and Unicode strings consistently. It would be really nice to be > able to work with 32- or 64-bit integers directly instead of 8- or 16-bit > characters. If could use UTF-16 across the board, this wouldn't be a problem. base uses the 16bit codepoints to compute string hashes ... which is of course fine for ascii and utf-16 since ascii is a true subset of unicode and each ascii character therefore has exactly the same value as the corresponding utf-15 character. _______________________________________________ Gnustep-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gnustep-dev
