gortiz commented on PR #8766:
URL: https://github.com/apache/pinot/pull/8766#issuecomment-1137183022

   I've added two new commits, creating what I called V2 and V3. I ran the 
benchmark in both phases. Taking into consideration what @richardstartin said 
about IdentityHashMap, I ran these tests twice: One using the IdentityHashMap 
and another one using a normal HashMap.
   
   The first commit, V2, tries to apply what @Jackie-Jiang suggested about not 
only cache the hashes but also the converted values and instead of doing that 
lazily, just eagerly do that while the expression is visited during the prepare 
phase. This implies that the key used in the cache is a `Predicate`, not the 
`literals`. I also had to break the safe compatibility, because the value of 
the EQ operators is a single `CachedValue` but the value of the IN operator is 
a list of them. I could create two maps instead, but I didn't like that 
solution either. Which one would you prefer? Do you have another solution in 
mind?
   
   The second commit, V3, has a bigger impact (it cuts the cost by a half!). 
The idea here is to add a cache of DataSource in ImmutableSegment. It is clear 
in the flamegraphs created from the benchmark that a lot of time was spent 
trying to calculate this value. It makes a lot of sense, as there are a lot of 
segments and we need to create a new instance for each one. In fact the 
instance creation is quite expensive not because it has to generate memory, but 
due to the fact that the columns used as keys are different strings that has 
the same content. This force HashMap to calculate compare them by equals, which 
is relative expensive when the operation itself is in the order of microseconds.
   
   In the benchmark each segment is read tons of times, so having the 
DataSource cached on all executions but the first is a huge advantage. In a 
Pinot Server that condition should also be true and the improvement should 
applicable to other methods that heavily use `getDataSource()`. As I said in 
another comment, I'm not sure about the correctness of this change. It seems 
correct but I would like the confirmation from someone else that knows the 
codebase.  
   
   The results of the benchmark are the following:
   
   Before any change:
   ```
   Benchmark                           (_numRows)  (_numSegments)  Mode  Cnt   
Score   Error  Units
   BenchmarkServerSegmentPruner.query          10              10  avgt    5   
1.426 ± 0.003  us/op
   BenchmarkServerSegmentPruner.query          10             100  avgt    5  
14.719 ± 1.342  us/op
   BenchmarkServerSegmentPruner.query          10            1000  avgt    5  
174.879 ± 5.115  us/op
   ```
   V1 (yesterday):
   ```
   Benchmark                           (_numRows)  (_numSegments)  Mode  Cnt   
Score   Error  Units
   BenchmarkServerSegmentPruner.query          10              10  avgt    5   
0.972 ± 0.003  us/op
   BenchmarkServerSegmentPruner.query          10             100  avgt    5  
10.149 ± 0.046  us/op
   BenchmarkServerSegmentPruner.query          10            1000  avgt    5  
135.063 ± 14.670  us/op
   ```
   V2
   ```
   V2 with HashMap
   Benchmark                                (_numRows)  (_numSegments)  Mode  
Cnt    Score   Error  Units
   BenchmarkColumnValueSegmentPruner.query          10              10  avgt    
5    1.325 ± 0.029  us/op
   BenchmarkColumnValueSegmentPruner.query          10             100  avgt    
5   12.851 ± 0.171  us/op
   BenchmarkColumnValueSegmentPruner.query          10            1000  avgt    
5  138.314 ± 2.238  us/op
   
   V2 with IdentityHashMap
   Benchmark                                (_numRows)  (_numSegments)  Mode  
Cnt    Score   Error  Units
   BenchmarkColumnValueSegmentPruner.query          10              10  avgt    
5    1.110 ± 0.001  us/op
   BenchmarkColumnValueSegmentPruner.query          10             100  avgt    
5   10.659 ± 0.139  us/op
   BenchmarkColumnValueSegmentPruner.query          10            1000  avgt    
5  120.608 ± 0.159  us/op
   ```
   
   V3
   ```
   V3 with HashMap
   Benchmark                                (_numRows)  (_numSegments)  Mode  
Cnt   Score   Error  Units
   BenchmarkColumnValueSegmentPruner.query          10              10  avgt    
5   0.971 ± 0.003  us/op
   BenchmarkColumnValueSegmentPruner.query          10             100  avgt    
5   4.843 ± 0.045  us/op
   BenchmarkColumnValueSegmentPruner.query          10            1000  avgt    
5  72.658 ± 0.735  us/op
   
   V3 with IdentityHashMap
   Benchmark                                (_numRows)  (_numSegments)  Mode  
Cnt   Score   Error  Units
   BenchmarkColumnValueSegmentPruner.query          10              10  avgt    
5   0.556 ± 0.001  us/op
   BenchmarkColumnValueSegmentPruner.query          10             100  avgt    
5   4.964 ± 0.268  us/op
   BenchmarkColumnValueSegmentPruner.query          10            1000  avgt    
5  58.304 ± 0.382  us/op
   ```
   
   I cannot print them all in a single JMH execution because each one requires 
to do some modifications in the code. I will try to edit this message to create 
a table to make the difference clear, but the TL;DR: is that by merging these 
changes we should get an almost 57% improvement when pruning 1000 segments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to