Jackie-Jiang commented on a change in pull request #6776:
URL: https://github.com/apache/incubator-pinot/pull/6776#discussion_r612012764
##########
File path:
pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -237,6 +237,8 @@
"pinot.server.instance.realtime.alloc.offheap.direct";
public static final String PREFIX_OF_CONFIG_OF_PINOT_FS_FACTORY =
"pinot.server.storage.factory";
public static final String PREFIX_OF_CONFIG_OF_PINOT_CRYPTER =
"pinot.server.crypter";
+ public static final int DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD = 10;
+ public static final String CONFIG_THRESHOLD_FOR_IN_PREDICATE=
"pinot.server.query.executor.pruner.columnvaluesegmentpruner.in.threshold";
Review comment:
```suggestion
public static final String CONFIG_OF_VALUE_PRUNER_IN_PREDICATE_THRESHOLD
=
"pinot.server.query.executor.pruner.columnvaluesegmentpruner.inpredicate.threshold";
public static final int DEFAULT_VALUE_PRUNER_IN_PREDICATE_THRESHOLD = 10;
```
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -59,8 +62,11 @@
@SuppressWarnings({"rawtypes", "unchecked"})
public class ColumnValueSegmentPruner implements SegmentPruner {
+ private int _inPredicateThreshold;
+
@Override
public void init(PinotConfiguration config) {
+ _inPredicateThreshold =
config.getProperty(Server.CONFIG_THRESHOLD_FOR_IN_PREDICATE,
Server.DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD);
Review comment:
Should we exclude the prefix
`pinot.server.query.executor.pruner.columnvaluesegmentpruner` from the config
key?
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -238,6 +237,66 @@ private boolean pruneRangePredicate(IndexSegment segment,
RangePredicate rangePr
return false;
}
+ /**
+ * For IN predicate, segment will not be pruned if the size of values is
greater than threshold
+ * Prune the segment based on:
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if segment can be pruned </li>
+ * <li> false if size of values > threshold or any of the value is greater
than min value or smaller than max value of segment</li>
+ * </ul>
+ */
+ private boolean pruneInPredicate(IndexSegment segment, InPredicate
inPredicate, Map<String, DataSource> dataSourceCache) {
+ String column = inPredicate.getLhs().getIdentifier();
+ DataSource dataSource = dataSourceCache.computeIfAbsent(column,
segment::getDataSource);
+ // NOTE: Column must exist after DataSchemaSegmentPruner
+ assert dataSource != null;
+ DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
+ List<String> values = inPredicate.getValues();
+ //check max threshold value
+ if (values.size() > _inPredicateThreshold) {
+ return false;
+ }
+
+ for (String value : values) {
+ Comparable inValue = convertValue(value,
dataSourceMetadata.getDataType());
+ if (!checkMinMaxRange(dataSourceMetadata, inValue)) {
+ return false;
+ }
+ }
+ return true;
+ }
+
+ /**
+ * Check if the comparable value is within min/max range
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if the value is smaller than min value or value is greater
than max value</li>
+ * <li> false if the value is greater than min value or value is smaller
than max value</li>
+ * </ul>
+ */
+ private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata,
Comparable value) {
Review comment:
I would suggest returning `false` if the value is not within the range,
which is more intuitive.
(Optional) Also, we can slightly improve the performance by first perform
the null check on the min/max value, then loop over the values to compare,
instead of doing null checks for each value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]