Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()

2007-12-03 Thread pablosantosluac
I don't see the moment to try it with the plastic server... :-P
- Original Message - 
From: Miguel de Icaza [EMAIL PROTECTED]
To: Andreas Nahr [EMAIL PROTECTED]
Cc: 'Robert Jordan' [EMAIL PROTECTED]; mono-devel-list@lists.ximian.com
Sent: Monday, December 03, 2007 3:45 PM
Subject: Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()


 Hello,
 
First of all, I love the idea.
 
 Don't forget that 4 bytes per Hashcode isn't enough. You also need a
 boolean to store if the hash is already computed (as e.g. 0 is a valid
 hash, too).
 
You could assume that any string over N would contain the
 precomputed hashcode immediately after the string in a sizeof(IntPtr)
 aligned 32-bit location.
 
What the N should still be measured.
 
 Miguel
 
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()

2007-12-03 Thread Andreas Nahr
 Don't forget that 4 bytes per Hashcode isn't enough. You also need a 
 boolean to store if the hash is already computed (as e.g. 0 is a valid 
 hash, too).

You could assume that any string over N would contain the precomputed
hashcode immediately after the string in a sizeof(IntPtr) aligned 32-bit
location.


I don't think precomputing (a.k. non-lazy init) may be a good thing. I for
one have quite some applications that handle enormous amount of strings.
However none of these Strings ever needs to compute the hashcode. I'd rather
pay 1 (or 4) additional bytes and not precompute it.
However for other projects things will be different...

 5) GetHashCode should never be called for a string that is not yet fully
built (like in StringBuilder), so  
 there is no worry aout the string changing after the hash code field has
been set

Well string at least contains:
internal unsafe void InternalSetChar (int idx, char val)
internal unsafe void InternalSetLength (int newLength)
Which may be usefull for more advanced optimizations in case they aren't
already used.

Greetings
Andreas

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()

2007-12-02 Thread Andreas Nahr
Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean
to store if the hash is already computed (as e.g. 0 is a valid hash, too).
And then you would need one additional check for this boolean per call.
And don't forget that strings within the corelib ARE mutable to some extent.
 
Greetings
Andreas

  _  

Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Im Auftrag von Alan
McGovern
Gesendet: Samstag, 1. Dezember 2007 17:22
An: Robert Jordan
Cc: mono-devel-list@lists.ximian.com
Betreff: [U-SPAM] Re: [Mono-dev] String.GetHashCode()


Also, just looking at the string source a bit more closely, it has a
GetCaseInsensitiveHashcode method too, so i'd assume that would need to be
cached too which would mean 8 bytes would be needed. This wouldn't scale
well. 

Fair enough. Twas just an idea.

Alan.


On Dec 1, 2007 4:09 PM, Robert Jordan [EMAIL PROTECTED] wrote:


Tinco Andringa wrote:
 (Woops, only replied to kamil)

 If Jerome is right and the overhead is only 4 bytes, then overhead
 shouldn't be a problem at all. The worst case size of a string would 
 be 1 character, of 2 bytes + something to end it with, like an int
 containing its length, 2 bytes, or a terminating character, probably
 2 bytes too. Making it at least 4 bytes.  A worst case scenario of 


Look at a heavy string consumer: [g]mcs. The average string
length it has to process is probably only 4-5 chars long.
That's roundabout 12 bytes. Adding 4 bytes for the hash code
is a huge overhead that only pays out if GetHashCode is 
called frequently, but this is definitely not a common scenario
for most of the strings.

Robert


___ 
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
http://lists.ximian.com/mailman/listinfo/mono-devel-list 



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()

2007-12-02 Thread Avery Pennarun
On 02/12/2007, Andreas Nahr [EMAIL PROTECTED] wrote:
 Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean
 to store if the hash is already computed (as e.g. 0 is a valid hash, too).
 And then you would need one additional check for this boolean per call.

Technically it would be safe (and no worse than current behaviour) to
recalculate the hash every time in the rare case that it's exactly
zero.

 And don't forget that strings within the corelib ARE mutable to some extent.

That sounds somewhat more important.

Wouldn't it be better, in the cases where precomputing the hash would
have a large benefit, just to create a new class like
PrecomputedHashString that stores a string along with its hash?  Then
the application itself could optimize for the cases where this matters
by using the different class as the key to its hash tables.

Have fun,

Avery
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [U-SPAM] Re: String.GetHashCode()

2007-12-02 Thread Kamil Skalski
2007/12/2, Avery Pennarun [EMAIL PROTECTED]:
 On 02/12/2007, Andreas Nahr [EMAIL PROTECTED] wrote:
  Don't forget that 4 bytes per Hashcode isn't enough. You also need a boolean
  to store if the hash is already computed (as e.g. 0 is a valid hash, too).
  And then you would need one additional check for this boolean per call.

 Technically it would be safe (and no worse than current behaviour) to
 recalculate the hash every time in the rare case that it's exactly
 zero.

  And don't forget that strings within the corelib ARE mutable to some extent.

 That sounds somewhat more important.

 Wouldn't it be better, in the cases where precomputing the hash would
 have a large benefit, just to create a new class like
 PrecomputedHashString that stores a string along with its hash?  Then
 the application itself could optimize for the cases where this matters
 by using the different class as the key to its hash tables.

If only they didn't make string sealed ;)


 Have fun,

 Avery
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list



-- 
Kamil Skalski
http://nazgul.omega.pl
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list