Jackie-Jiang commented on a change in pull request #6776:
URL: https://github.com/apache/incubator-pinot/pull/6776#discussion_r612697831
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -238,6 +237,66 @@ private boolean pruneRangePredicate(IndexSegment segment,
RangePredicate rangePr
return false;
}
+ /**
+ * For IN predicate, segment will not be pruned if the size of values is
greater than threshold
+ * Prune the segment based on:
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if segment can be pruned </li>
+ * <li> false if size of values > threshold or any of the value is greater
than min value or smaller than max value of segment</li>
+ * </ul>
+ */
+ private boolean pruneInPredicate(IndexSegment segment, InPredicate
inPredicate, Map<String, DataSource> dataSourceCache) {
+ String column = inPredicate.getLhs().getIdentifier();
+ DataSource dataSource = dataSourceCache.computeIfAbsent(column,
segment::getDataSource);
+ // NOTE: Column must exist after DataSchemaSegmentPruner
+ assert dataSource != null;
+ DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
+ List<String> values = inPredicate.getValues();
+ //check max threshold value
+ if (values.size() > _inPredicateThreshold) {
+ return false;
+ }
+
+ for (String value : values) {
+ Comparable inValue = convertValue(value,
dataSourceMetadata.getDataType());
+ if (!checkMinMaxRange(dataSourceMetadata, inValue)) {
+ return false;
+ }
+ }
+ return true;
+ }
+
+ /**
+ * Check if the comparable value is within min/max range
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if the value is smaller than min value or value is greater
than max value</li>
+ * <li> false if the value is greater than min value or value is smaller
than max value</li>
+ * </ul>
+ */
+ private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata,
Comparable value) {
Review comment:
Let's keep it this way then. This is cleaner, and should have travail
performance impact.
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -59,8 +62,11 @@
@SuppressWarnings({"rawtypes", "unchecked"})
public class ColumnValueSegmentPruner implements SegmentPruner {
+ private int _inPredicateThreshold;
+
@Override
public void init(PinotConfiguration config) {
+ _inPredicateThreshold =
config.getProperty(Server.CONFIG_OF_VALUE_PRUNER_IN_PREDICATE_THRESHOLD,
Server.DEFAULT_VALUE_PRUNER_IN_PREDICATE_THRESHOLD);
Review comment:
This is incorrect because when the config is passed from the
`SegmentPrunerProvider`, the prefix is already removed, and here you should use
key `inpredicate.threshold` to access the config.
The reason why the test works is because you directly pass the top-level
config into this pruner class, which is not the case in the production code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]