Re: RFR JDK-8011200 (was 7143928) : (coll) Optimize for Empty ArrayList and HashMap

Martin Buchholz Tue, 09 Apr 2013 19:57:31 -0700

Mike, thanks.

I don't see anything wrong in this version, although the ongoing
complexification and special-case-ification (with attendant risk of bugs)
of ArrayList and HashMap, the two most didactically important classes,
continue to bother me.  This change continues to feel marginal.  Anyways,
this is OK with me if it's OK with Alan.


The original code that shifts by one every time has a certain elegance, and
because it's rare to need more than one doubling, is also hard to beat
performance-wise.

 546             while (newCapacity < targetCapacity)
 547                 newCapacity <<= 1;

---

should there be an "else" added in the code below?

 563         if (table == EMPTY_TABLE) { 564
inflateTable(Math.max((int) (numKeysToBeAdded * loadFactor),
threshold)); 565         } 566
 567         /*
 568          * Expand the map if the map if the number of mappings to be added
 569          * is greater than or equal to threshold.  This is
conservative; the
 570          * obvious condition is (m.size() + size) >= threshold, but this
 571          * condition could result in a map with twice the
appropriate capacity,
 572          * if the keys to be added overlap with the keys already
in this map.
 573          * By using the conservative calculation, we subject ourself
 574          * to at most one extra resize.
 575          */
 576         if (numKeysToBeAdded > threshold) {

---

There are now so many "if (isEmpty())" checks that I wonder again whether
it would be better to use a null for the empty table, since null checks are
closer to free.

---

Style:
"""
Inflates the table.
"""

 285      * Inflate the table

---

288 // Find a power of 2 >= initialCapacity

There's no "initialCapacity" variable here.



On Tue, Apr 9, 2013 at 6:17 PM, Mike Duigou <[email protected]> wrote:

> Hello all;
>
> Another update to the webrev incorporating feedback from Alan and Martin.
>
> http://cr.openjdk.java.net/~mduigou/JDK-8011200/3/webrev/
>
> - The key change in this revision is that the capacity of both cloned()
> and deserialized instances is determined by the number of entries NOT the
> capacity of the source HashMap. As Alan says, we don't know how the clone
> or deserialized instance is going to be used. Without maintaining the
> original user specified initial capacity we're possibly better off not
> allocating excessive bucket table.
>
> Mike
>
>
>
> On Apr 6 2013, at 15:02 , Mike Duigou wrote:
>
> > Hello all;
> >
> > Another, and hopefully the final, update to the webrev for this issue.
> The revised webrev is here:
> >
> > http://cr.openjdk.java.net/~mduigou/JDK-8011200/1/webrev/
> >
> > The important changes in this revision:
> >
> > - I've removed the serialData change in HashMap. The implementation now
> reads the capacity and gracefully handles non-power of 2 values.
> >
> > - I'm not entirely convinced that having serialization emulate clone()
> for capacity handling is the best answer. I might also want to change
> clone() to size it's result based upon the number of mappings in the source
> rather its the capacity. Anybody have strong feelings about this to suggest
> one behaviour is obviously better?
> >
> > Any other final thoughts?
> >
> > Mike
> >
> > On Apr 2 2013, at 15:50 , Mike Duigou wrote:
> >
> >> Hello again;
> >>
> >> I have updated the patch with the received review feedback. The revised
> webrev is here:
> >>
> >> http://cr.openjdk.java.net/~mduigou/JDK-8011200/1/webrev/
> >>
> >> The important changes in this revision:
> >>
> >> - The behaviour of the readObject/writeObject serialization for both
> classes now more closely mirrors the behaviour of clone(). For ArrayList
> this means that the deserialized list has a capacity the same as the size.
> ie. as if trimToSize() was called. For HashMap, the opposite is true, the
> capacity is the same as was in effect when the object was serialized.
> (HashMap also tries to protect itself from nonsensical/harmful input). The
> implementation changes to serialization preserve forward and backward
> compatibility--all serialized objects are compatible with all
> implementations. I will file a spec change request for the addition of ", a
> power of 2" to the @serialData tag for this existing but previously
> unstated requirement.
> >>
> >> - Use of Arrays.fill has been reverted. I did change one fill case so
> that the loop can be optimized. (size field was being updated with each
> iteration). I very slightly expanded the docs.
> >>
> >> This is starting to look like a nice set of changes.
> >>
> >> Mike
> >>
> >> On Apr 1 2013, at 21:44 , Mike Duigou wrote:
> >>
> >>> Hello all;
> >>>
> >>> Last night while pushing another changeset I accidentally pushed a
> changeset for JKD-7143928. Since the review and testing was not complete on
> this issue I have backed out that changeset and created a new bug number to
> continue development. The new webrev to complete the review is:
> >>>
> >>> http://cr.openjdk.java.net/~mduigou/JDK-8011200/0/webrev/
> >>>
> >>> It is currently unchanged from the last posted changeset for 7143928.
> >>>
> >>> Mike
> >>>
> >>> On Apr 1 2013, at 19:00 , Mike Duigou wrote:
> >>>
> >>>> Hello all;
> >>>>
> >>>> I have posted an updated version of the empty ArrayList and HashMap
> patch.
> >>>>
> >>>> http://cr.openjdk.java.net/~mduigou/JDK-7143928/1/webrev/
> >>>>
> >>>> This revised implementation introduces *no new fields* to either
> class. For ArrayList the lazy allocation of the backing array occurs only
> if the list is created at default size. According to our performance
> analysis team, approximately 85% of ArrayList instances are created at
> default size so this optimization will be valid for an overwhelming
> majority of cases.
> >>>>
> >>>> For HashMap, creative use is made of the threshold field to track the
> requested initial size until the bucket array is needed. On the read side
> the empty map case is tested with isEmpty(). On the write size a comparison
> of (table == EMPTY_TABLE) is used to detect the need to inflate the bucket
> array. In readObject there's a little more work to try to choose an
> efficient initial capacity.
> >>>>
> >>>> Mike
> >>>>
> >>>> On Mar 26 2013, at 17:25 , Mike Duigou wrote:
> >>>>
> >>>>> Hello all;
> >>>>>
> >>>>> This is a review for optimization work that came out of internal
> analysis of Oracle's Java applications. It's based upon analysis that shows
> that in large applications as much as 10% of maps and lists are initialized
> but never receive any entries. A smaller number spend a large proportion of
> their lifetime empty. We've found similar results across other workloads as
> well. This patch is not a substitute for pre-sizing your collections and
> maps--doing so will *always* have better results.
> >>>>>
> >>>>> This patch extends HashMap and ArrayList to provide special handling
> for newly created instances that avoids creating the backing array until
> needed. There is a very small additional cost for detecting when to inflate
> the map or list that is measurable in interpreted tests but disappears in
> JITed code.
> >>>>>
> >>>>> http://cr.openjdk.java.net/~mduigou/JDK-7143928/0/webrev/
> >>>>>
> >>>>> We expect that should this code prove successful in Java 8 it will
> be backported to Java 7 updates.
> >>>>>
> >>>>> The unit test may appear to be somewhat unrelated. It was created
> after resolving a bug in an early version of this patch to detect the issue
> encountered (LinkedHashMap.init() was not being called in readObject() when
> the map was empty).
> >>>>>
> >>>>> Mike
> >>>>
> >>>
> >>
> >
>
>

Re: RFR JDK-8011200 (was 7143928) : (coll) Optimize for Empty ArrayList and HashMap

Reply via email to