On Wed, 5 Jun 2024 05:40:12 GMT, Tobias Hartmann <thartm...@openjdk.org> wrote:

>> MaxXing has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Revert last commit, and push the `LoadNode` back to the worklist to wait 
>> for the dead code to be removed.
>
> Impressive results! I haven't looked at the change yet but here are a few 
> questions / requests:
> - Could you add a screenshot of the IR of the case you are describing?
> - Wouldn't it help to add the LoadNode back to the IGVN worklist and wait for 
> the dead path to be removed?
> - Could you add an [IR 
> framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md)
>  test that verifies that the optimization works as expected?
> 
> Thanks,
> Tobias

@TobiHartmann  Hi Tobias, thanks for your reply!

> Could you add a screenshot of the IR of the case you are describing?

Sure. Here's a simple example of iterating over the keys of `ConcurrentHashMap`:


public long sumMapKeys() {
  long sum = 0;
  Enumeration it = map.keys();
  while (it.hasMoreElements()) {
    sum += (Long) it.nextElement();
  }
  return sum;
}


And here's what `-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations` says:


JavaObject(6) NoEscape(NoEscape) [ 183F 189F 205F 196F 191F 179F 186F 202F 208F 
233F 637F 1069F [ 104 109 ]]     92  Allocate  === 76 6 69 8 1 (90 89 24 1 1 1 
22 1 1 43 77 87 ) [[ 93 94 95 102 103 104 ]]  rawptr:NotNull ( int:>=0, 
java/lang/Object:NotNull *, bool, top, bool ) ConcurrentHashMap::keys @ bci:16 
(line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105) !jvms: 
ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 
(line 105)
LocalVar(60) [ 92P [ 109 183b 189b 205b ]]    104  Proj  === 92  [[ 105 109 183 
189 205 ]] #5 !jvms: ConcurrentHashMap::keys @ bci:16 (line 2152) 
MyBenchmark::sumMapKeys @ bci:6 (line 105)
LocalVar(103) [ 104 92P [ 196b 191b 179b 186b 202b 208b 233b 637b 1069b ]]    
109  CheckCastPP  === 106 104  [[ 1801 1771 1754 1584 1503 1503 1490 1490 1466 
1466 1451 1451 208 1393 1381 1381 179 179 186 186 208 196 191 191 196 202 202 
228 297 233 233 1363 1363 1393 1321 1321 1311 637 637 648 648 998 988 1311 1301 
477 477 487 487 497 497 569 1301 672 685 767 672 978 920 539 539 1069 1069 557 
557 1128 569 631 1061 685 631 ]]  
#java/util/concurrent/ConcurrentHashMap$KeyIterator 
(java/util/Iterator,java/util/Enumeration):NotNull:exact *  
Oop:java/util/concurrent/ConcurrentHashMap$KeyIterator 
(java/util/Iterator,java/util/Enumeration):NotNull:exact * !jvms: 
ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 
(line 105)

NotScalar (Field load)  109  CheckCastPP  === 106 104  [[ 1801 1771 1754 1584 
1503 1503 1490 1490 1128 569 1451 1451 208 1393 1381 1381 672 685 1069 1069 208 
196 191 191 196 978 685 631 297 767 672 569 1301 1393 1321 1321 1311 557 557 
631 1061 998 988 1311 1301 477 477 487 487 497 497 ]]  
#java/util/concurrent/ConcurrentHashMap$KeyIterator 
(java/util/Iterator,java/util/Enumeration):NotNull:exact *,iid=92  
Oop:java/util/concurrent/ConcurrentHashMap$KeyIterator 
(java/util/Iterator,java/util/Enumeration):NotNull:exact *,iid=92 !jvms: 
ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 
(line 105)
  >>>>  2186  LoadI  === _ 1973 191  [[ 2185 ]]  
@java/util/concurrent/ConcurrentHashMap$Traverser+12 *, name=index, idx=9; #int 
!orig=[2178],[2171],[1373] !jvms: ConcurrentHashMap$Traverser::advance @ bci:51 
(line 3369) ConcurrentHashMap$KeyIterator::next @ bci:28 (line 3468) 
ConcurrentHashMap$KeyIterator::nextElement @ bci:1 (line 3472) 
MyBenchmark::sumMapKeys @ bci:21 (line 107)


It shows that scalar replacement is aborted due to field load 2186:

<img width="909" alt="ir-diff" 
src="https://github.com/openjdk/jdk/assets/5129820/2e35769a-6e0f-4c8a-96ec-09923a5eb2c0";>

As we can see its memory is a Phi, and it should be split by 
`LoadNode::split_through_phi` if its address `109 CheckCastPP` dominates the 
control flow of the Phi node `1330 Region`.

The control node of `CheckCastPP` is `106 Proj`:

<img width="360" alt="cast" 
src="https://github.com/openjdk/jdk/assets/5129820/ebf6f7f8-fcd1-4d83-8734-3f1864dcc946";>

And it does dominate 1330, although not that obvious:

<img width="960" alt="cfg" 
src="https://github.com/openjdk/jdk/assets/5129820/e53b25de-c966-4f89-b30e-54ccc35f645b";>

But `Node::dominates` think it doesn't because of the dead control flow.

> Wouldn't it help to add the LoadNode back to the IGVN worklist and wait for 
> the dead path to be removed?

I tried to revert the change of main algorithm of `Node::dominates`, and just 
add some code to add the LoadNode back to the worklist if we met dead path. It 
works, still makes the iteration ~30% faster:


Benchmark                                  (nkeys)  Mode  Cnt        Score      
 Error  Units
Maps.testConcurrentHashMapIterators          10000  avgt   15   312720.415 ±  
3255.500  ns/op


Thanks for pointing this out. I updated this PR, and the latest commit is 
passing test tier1-4.

> Could you add an [IR 
> framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md)
>  test that verifies that the optimization works as expected?

I'm still trying to reproduce this case without using `ConcurrentHashMap`, but 
haven't found a way yet. Can I add an IR test that depends on 
`ConcurrentHashMap`? (It might need to be updated if the implementation of 
`ConcurrentHashMap` changes, I guess.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19496#issuecomment-2151588662

Reply via email to