Thanks for a thoughtful reply, Bob! Let's get the code to the trunk. Tim, Mark, Nathan, should we acquire a gold ticket to modify java.lang.Thread?
Thanks, Aleksey. On Fri, Jun 6, 2008 at 10:36 PM, Bob Lee <[EMAIL PROTECTED]> wrote: > On Fri, Jun 6, 2008 at 10:09 AM, Aleksey Shipilev < > [EMAIL PROTECTED]> wrote: > >> I'm not saying that your implementation is bad. On contrary, your >> implementation is much faster than original Harmony one. What I'm >> saying is, ThreadLocalBench is still 3x slower on Harmony/DRLVM than >> on Sun 1.6.0_05. That might be connected with poor codegeneration on >> Harmony JIT, some disabled optimizations there, etc. I don't know what >> the real cause is. That's the question to answer. >> > > It sounds like you're comparing DRLVM's JIT compiler to Sun's. I'm not at > all surprised by the outcome--Sun has quite a head start. > > >> Leaving Harmony/DRLVM aside, it's wonderful that your implementation >> is going neck-to-neck with Sun's on another VM. So, why don't to >> experiment and beat them? :) >> > > Sun's impl was finely tuned by Doug Lea, BTW. I do beat the RI in 3 tests, > they beat me in 2. When I say one impl beats another, we're really talking > about a few ns in either direction; there's really no point in comparing. I > think I've prioritized the correct use cases. I'm a little slower in tests > which instantiate and throw away millions of thread local variables. I don't > think that's the norm. > > Besides raw throughput, I also beat the RI in a few more areas: more > agressive clean up, smaller memory footprint, and fewer allocations and > garbage collections. > > I do plan to contribute it to the OpenJDK. > > >> From this point-of-view, I raised the >> question on rationale of having open-addressed + linear-probed hashmap >> instead of, say, chaining. Taking away caching opportunities, Knuth >> had analyzed the average lookup count for different probing schemes, >> and linear probing performed the baddest. > > > That doesn't apply here. Knuth used English words for keys. We generate well > dispersed hash codes. In practice, this implementation almost never actually > probes. When it does probe, it probes the same area of memory. > > >> Actually, we had recently >> moved IdentityHashMap from the exactly the same addressing to chaining >> [1], and results are pretty good. There's additional +80% on >> ThreadLocalBench and I expect we can get them on your implementation >> too! > > > In general, you should favor open addressing over chaining: less > indirection, fewer allocations, smaller memory footprint, better memory > locality (if you use linear probing). My guess is that IdentityHashMap just > had a flawed impl of open addressing--it's very easy to get something wrong. > > I could have used quadratic probing, double hashing, or cuckoo hashing, but > all of those are aimed at avoiding clustering. I generate well dispersed > hash codes in the first place, so they would have just added more complexity > and reduced memory locality with no benefit. > > Obviously, if you're using externally provided hash codes, you have to use > chaining, but that's not the case here. > > Bob >
