Xiao-Feng Li wrote: > On 7/4/07, Tim Ellison <[EMAIL PROTECTED]> wrote: >> Xiao-Feng Li wrote: >> >> <big snip> >> I just wanted to pick up on this... >> >> > (Note if we use a formula for hashing, it will be computed every time >> > it's accessed, we need consider the overhead of the computation. >> >> Not each time it is accessed, just when it is first stored, right? > > Tim, This is one of the design decisions made for DRLVM hashcode > implementation. Based on a common assumption that hashcode is not > frequently accessed (only a small ratio of objects are accessed for > hashcode, and only small ratio of applications use hashcode), we > usually don't want a constant storage for hashcode. Instead, we only > allocate the extra storage when the hashcode is accessed AND the > object is moved; Otherwise, we directly return the object address for > hashcode. Since most objects die before they are moved in a collection > (this is another common assumption we make for objects life spans), > most hashcodes will be accessed before they are stored.
Ah, ok - thanks. Hopefully we have not tipped the balance too far if we require DRLVM to perform some computation on the address before returning it. > So in this design of hashcode, we have some assumptions and make perf > optimizations based on them. It's unclear if the design is the best in > general, since common workloads don't have extensive hashcode > accesses, hence cannot really exhibit the design differences. (Btw, > Dacapo has some apps using hashcode heavily. ) Agreed, we will probably only notice these things as we scale up to large numbers of entries in a hashed collection. There are other algorithmic things we can do to improve HashMap too if we are prepared to bet on 'typical usage'. For example, new entries are always added to the head of the linked list for a given bucket. If we add them in their natural order (so the link list is ordered by hashcode value), then in general you would only have to search half the length of the link list to determine that the key does not exist. It would be a win with get's for absent keys, but a loss if there are more put's than get's. Regards, Tim
