Jackie-Jiang commented on code in PR #14163:
URL: https://github.com/apache/pinot/pull/14163#discussion_r1792544661
##########
pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java:
##########
@@ -346,6 +365,156 @@ private static Expression
rewriteRangeExpression(Expression range, FilterKind ki
return range;
}
+ /**
+ * Rewrite expressions of the form "column BETWEEN lower AND upper" to
ensure that lower and upper bounds are the same
+ * datatype as the column (or can be cast to the same datatype in the
server).
+ */
+ private static Expression rewriteBetweenExpression(Expression between,
DataType dataType) {
+ List<Expression> operands = between.getFunctionCall().getOperands();
+ Expression lower = operands.get(1);
+ Expression upper = operands.get(2);
+
+ if (lower.isSetLiteral()) {
+ switch (lower.getLiteral().getSetField()) {
+ case LONG_VALUE: {
+ long actual = lower.getLiteral().getLongValue();
+ // Other data types can be converted on the server side.
Review Comment:
This part I still don't follow. Do you mean we should not rewrite `BETWEEN`
the same way as other range filters, or is it too complicated? As long as we
computed lower and upper bound, we should be able to assemble it back to a
`BETWEEN`.
It is also fine to do it separately
##########
pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java:
##########
@@ -75,33 +85,41 @@ Expression optimizeChild(Expression filterExpression,
@Nullable Schema schema) {
Function function = filterExpression.getFunctionCall();
FilterKind kind = FilterKind.valueOf(function.getOperator());
switch (kind) {
- case IS_NULL:
- case IS_NOT_NULL:
- // No need to try to optimize IS_NULL and IS_NOT_NULL operations on
numerical columns.
+ case BETWEEN: {
+ // Verify that value is a numeric column before rewriting.
+ List<Expression> operands = function.getOperands();
+ Expression value = operands.get(0);
Review Comment:
(minor) Suggest naming it `lhs`. `value` is a little bit misleading here as
it is usually a column or a function.
Currently for other filter kinds we check whether `rhs` is numeric but not
here. That check itself seems unnecessary, and we can consider consolidating
the logic by only checking if `lhs` is numeric
##########
pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/QueryGenerator.java:
##########
@@ -1005,6 +1005,16 @@ public QueryFragment generatePredicate(String
columnName, boolean useMultistageE
List<String> columnValues = _columnToValueList.get(columnName);
String leftValue = pickRandom(columnValues);
String rightValue = pickRandom(columnValues);
+
+ if (_singleValueNumericalColumnNames.contains(columnName)) {
Review Comment:
Did you add this in order for the test to pass? We probably want to test
scenarios when lower is larger than higher (always false scenario)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]