> `TreeMap` sub-maps use the default `IteratorSpliterator` implementation for 
> `TreeMap$EntrySetView` which is slow for some operations, because 
> `EntrySetView.size()` iterates all elements. This is most trivially shown by 
> something like `largeTreeMap.tailMap(0L, false).entrySet().limit(1).count()` 
> taking a long time. This showed up in my application, where it was trivial to 
> mitigate by switching to a for loop, but I think the fix is easy enough.
> 
> `keySet()` does not have the same problem, as it provides a custom 
> `Spliterator` implementation which is not `Spliterator.SIZED`, and returns 
> `Long.MAX_VALUE` for `estimateSize()` (which is the recommended approach when 
> the size is expensive to compute). I'm *assuming* this optimization was 
> simply missed for the EntryIterator in the original implementation, but I 
> don't know for sure.
> 
> This patch fixes the issue by using `Spliterators.spliteratorUnknownSize` for 
> `EntrySetView`, marking the size as unknown avoids accidentally iterating the 
> whole thing for simple operations.
> 
> 
> Basic performance test: `map.tailMap(0L, 
> false).entrySet().stream().limit(1).count()` for a `TreeMap` with 
> `10_000_000` entries.
> 
> Before (keySet is fast using `SubMapKeyIterator`, entrySet is slow):
> 
> class java.util.TreeMap$KeySet
>     .stream().limit(1).count() took 0.046ms
>     spliterator = SubMapKeyIterator, estimateSize() = 9223372036854775807
> class java.util.TreeMap$AscendingSubMap$AscendingEntrySetView
>     .stream().limit(1).count() took 218ms
>     spliterator = IteratorSpliterator, estimateSize() = 9999999 
> 
> 
> After (entrySet is now fast):
> 
> class java.util.TreeMap$KeySet
>       .stream().limit(1).count() took 0.017ms
>       spliterator = SubMapKeyIterator, estimateSize() = 9223372036854775807
> class java.util.TreeMap$AscendingSubMap$AscendingEntrySetView
>       .stream().limit(1).count() took 0.115419ms
>       spliterator = IteratorSpliterator, estimateSize() = 9223372036854775807
> 
> 
> Behaviour is tested by the test cases added in 
> https://bugs.openjdk.org/browse/JDK-8376698.
> 
> The Set default stream() implementation uses spliterator(), and since our 
> spliterator() implementation is no longer late-binding (it eagerly creates 
> the iterator), we override stream() to use a late-binding wrapper.

Oli Gillespie has updated the pull request incrementally with one additional 
commit since the last revision:

  Make stream() late-binding

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28608/files
  - new: https://git.openjdk.org/jdk/pull/28608/files/144216ba..4ebc2526

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28608&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28608&range=03-04

  Stats: 8 lines in 2 files changed: 7 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28608.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28608/head:pull/28608

PR: https://git.openjdk.org/jdk/pull/28608

Reply via email to