Re: Hash computation and TFB

Richard Frith-Macdonald Tue, 06 Aug 2013 06:42:35 -0700

On 6 Aug 2013, at 14:30, Stefan Bidi <[email protected]> wrote:

> I copied the hash algorithm straight out of -base, so they should match.  I 
> remember a few months ago Richard was playing around with hash functions and 
> this might be causing some issues, now.


It wouldn't on a normal setup ... the experimental hash code is used only if 
you explicitly build it.

> I just looked it up, the changes were made on rev 36344.
> 
> There is another issue... -base allows UTF-8 strings, which will not be 
> hashed to the same UTF-16 value.

They are hashed to the same value as other strings, in base hashing is computed 
on unicode codepoint.

>  In my opinion, allowing UTF-8 string literals is not a good idea and base 
> should revert back to Latin1 as the default C string encoding.

gnustep-base still uses latin1 as the default C string encoding.  The change 
with string literals is one from ascii to utf-8

>  I'm actually debating adding a UTF-16 string literals configure option for 
> corebase.  I believe using UTF-16 internally is the only sane solution to 
> non-ASCII encodings.
> 
> I've tried experimenting with other hash functions that are not 
> one-at-a-time, but unfortunately have not found anything that will work on 
> both ASCII and Unicode strings consistently.  It would be really nice to be 
> able to work with 32- or 64-bit integers directly instead of 8- or 16-bit 
> characters.  If could use UTF-16 across the board, this wouldn't be a problem.

base uses the 16bit codepoints to compute string hashes ... which is of course 
fine for ascii and utf-16 since ascii is a true subset of unicode and each 
ascii character therefore has exactly the same value as the corresponding 
utf-15 character.



_______________________________________________
Gnustep-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnustep-dev

Re: Hash computation and TFB

Reply via email to