jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204325743
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##########
 @@ -124,8 +124,7 @@ private static LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
    */
   private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
     return new ParquetIsPredicate<Boolean>(expr, (exprStat, evaluator) ->
-        //if min value is not false or if there are all nulls  -> canDrop
-        isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+      isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin() ? 
RowsMatch.NONE : checkNull(exprStat)
 
 Review comment:
   OK I found the reason why the tests pass : 
   1. we need several parquet files ,else the process is squeezed in 
AbstractParquetGroupScan.applyFilter.
   2. We need that some parquet files are dropped again in 
AbstractParquetGroupScan.applyFilter
   if (qualifiedRGs.size() == rowGroupInfos.size() ) { return null } ...
   3. If one at least of the row groups is SOME, then the filter is applied to 
all, in ParquetPushDownFilter.doOnMatch L 179
   These 3 conditions together make that the tests pass.
   One way to check that they fail, is to put ft0.parquet, ft0.parquet and 
tt1.parquet in the same folder and run a IS TRUE predicate. the result then 
reads F, T, F, T (wrong) instead of F, F (expected) !
   
   I have then written the IS TRUE, IS FALSE, IS NOT TRUE and IS NOT FALSE 
predicates based on the cases:
   a. ST:[min: true, max: true, num_nulls: ?]
   b. ST:[min: false, max: false, num_nulls: ?]
   c. ST:[min: false, max: true, num_nulls: ?]
   d. and num_nulls = RC ( row count)
   And check all cases.
   
   I also introduced 4 helper functions for code readability: minIsTrue, 
minIsFalse, maxIsTrue and maxIsFalse

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to