Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()
I don't see the moment to try it with the plastic server... :-P - Original Message - From: Miguel de Icaza [EMAIL PROTECTED] To: Andreas Nahr [EMAIL PROTECTED] Cc: 'Robert Jordan' [EMAIL PROTECTED]; mono-devel-list@lists.ximian.com Sent: Monday, December 03, 2007 3:45 PM Subject: Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode() Hello, First of all, I love the idea. Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean to store if the hash is already computed (as e.g. 0 is a valid hash, too). You could assume that any string over N would contain the precomputed hashcode immediately after the string in a sizeof(IntPtr) aligned 32-bit location. What the N should still be measured. Miguel ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()
Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean to store if the hash is already computed (as e.g. 0 is a valid hash, too). You could assume that any string over N would contain the precomputed hashcode immediately after the string in a sizeof(IntPtr) aligned 32-bit location. I don't think precomputing (a.k. non-lazy init) may be a good thing. I for one have quite some applications that handle enormous amount of strings. However none of these Strings ever needs to compute the hashcode. I'd rather pay 1 (or 4) additional bytes and not precompute it. However for other projects things will be different... 5) GetHashCode should never be called for a string that is not yet fully built (like in StringBuilder), so there is no worry aout the string changing after the hash code field has been set Well string at least contains: internal unsafe void InternalSetChar (int idx, char val) internal unsafe void InternalSetLength (int newLength) Which may be usefull for more advanced optimizations in case they aren't already used. Greetings Andreas ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()
Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean to store if the hash is already computed (as e.g. 0 is a valid hash, too). And then you would need one additional check for this boolean per call. And don't forget that strings within the corelib ARE mutable to some extent. Greetings Andreas _ Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Alan McGovern Gesendet: Samstag, 1. Dezember 2007 17:22 An: Robert Jordan Cc: mono-devel-list@lists.ximian.com Betreff: [U-SPAM] Re: [Mono-dev] String.GetHashCode() Also, just looking at the string source a bit more closely, it has a GetCaseInsensitiveHashcode method too, so i'd assume that would need to be cached too which would mean 8 bytes would be needed. This wouldn't scale well. Fair enough. Twas just an idea. Alan. On Dec 1, 2007 4:09 PM, Robert Jordan [EMAIL PROTECTED] wrote: Tinco Andringa wrote: (Woops, only replied to kamil) If Jerome is right and the overhead is only 4 bytes, then overhead shouldn't be a problem at all. The worst case size of a string would be 1 character, of 2 bytes + something to end it with, like an int containing its length, 2 bytes, or a terminating character, probably 2 bytes too. Making it at least 4 bytes. A worst case scenario of Look at a heavy string consumer: [g]mcs. The average string length it has to process is probably only 4-5 chars long. That's roundabout 12 bytes. Adding 4 bytes for the hash code is a huge overhead that only pays out if GetHashCode is called frequently, but this is definitely not a common scenario for most of the strings. Robert ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()
On 02/12/2007, Andreas Nahr [EMAIL PROTECTED] wrote: Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean to store if the hash is already computed (as e.g. 0 is a valid hash, too). And then you would need one additional check for this boolean per call. Technically it would be safe (and no worse than current behaviour) to recalculate the hash every time in the rare case that it's exactly zero. And don't forget that strings within the corelib ARE mutable to some extent. That sounds somewhat more important. Wouldn't it be better, in the cases where precomputing the hash would have a large benefit, just to create a new class like PrecomputedHashString that stores a string along with its hash? Then the application itself could optimize for the cases where this matters by using the different class as the key to its hash tables. Have fun, Avery ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()
2007/12/2, Avery Pennarun [EMAIL PROTECTED]: On 02/12/2007, Andreas Nahr [EMAIL PROTECTED] wrote: Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean to store if the hash is already computed (as e.g. 0 is a valid hash, too). And then you would need one additional check for this boolean per call. Technically it would be safe (and no worse than current behaviour) to recalculate the hash every time in the rare case that it's exactly zero. And don't forget that strings within the corelib ARE mutable to some extent. That sounds somewhat more important. Wouldn't it be better, in the cases where precomputing the hash would have a large benefit, just to create a new class like PrecomputedHashString that stores a string along with its hash? Then the application itself could optimize for the cases where this matters by using the different class as the key to its hash tables. If only they didn't make string sealed ;) Have fun, Avery ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list -- Kamil Skalski http://nazgul.omega.pl ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list