Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters ...................................................................... Patch Set 28: Code-Review+1 (2 comments) Just a nit and a comment. Should be able to +2 after that. http://gerrit.cloudera.org:8080/#/c/17075/27/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17075/27/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@260 PS27, Line 260: } > Looks like mixture of files of different format (like Parquet and ORC at the > same time) is not allowed. This may not be accurate. In the HdfsScanNode's computeScanRangeLocation(), we examine the file format for a partition and just record it but just set a flag if they are not all parquet format. It does not assert or return error. Here's the snippet in HdfsScanNode.java: fileFormats_.add(partition.getFileFormat()); if (!isParquetBased(partition.getFileFormat())) { allParquet = false; } However, for statistics, as I mentioned before (and you seem in agreement) that having at least one parquet partition is sufficient to trigger the min-max filter checks since it does not affect correctness of results. http://gerrit.cloudera.org:8080/#/c/17075/28/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17075/28/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@387 PS28, Line 387: /* Can you remove this method. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 31 Mar 2021 21:39:27 +0000 Gerrit-HasComments: Yes