[GitHub] [pinot] gortiz commented on pull request #8766: Optimize ColumnValueSegmentPruner by caching value hashes

GitBox Wed, 25 May 2022 00:09:33 -0700


gortiz commented on PR #8766:
URL: https://github.com/apache/pinot/pull/8766#issuecomment-1136857476


   > I feel a better way to optimize this segment pruner would be to first 
pre-process the predicates (convert the value, compute the hash etc.), then use 
the pre-processed values to prune each segment. This can avoid the overhead of 
processing the predicate for each segment. Since the `FilterContext` won't be 
changed, we should be able to use the identity map to store the mapping from 
`Predicate` to the pre-computed values
   
   That sounds like a good idea that would be interesting for future 
improvements. Although I'm not sure about the priority of these changes. JFR 
metrics taken with the benchmarks show that most of the time is spent in 
`DataSource dataSource = dataSourceCache.computeIfAbsent(column, 
segment::getDataSource);`.
   
   In case we still want to focus on improve the performance of this class, I 
think we should focus on how to make getDataSource faster. Specifically, 
`ImmutableSegmentImpl` should be able to cache these values. I don't know the 
codebase that well, but it seems that these datasources are going to be 
immutable, are relative expensive to be created, and are used several times 
during the lifecycle of a `ImmutableSegmentImpl`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] gortiz commented on pull request #8766: Optimize ColumnValueSegmentPruner by caching value hashes

Reply via email to