Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup@3115 PS5, Line 3115: KW_WHERE opt_plan_hints:pred_hints expr:e I guess this is a bit limiting in the it applies only to the whole where clause. Should it be part of the expr production below so it can be attached to any expression? I don't think this affects the functionality of this patch, since we're only checking the top-level statement anyway, but it seems like itwould me more elegant to have the expr hint be associated with the expr in the parser? If there are complications with that, maybe a comment here explaining the limitation would be sufficient. http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java File fe/src/main/java/org/apache/impala/analysis/Predicate.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java@30 PS5, Line 30: isAlwaysTrue_ maybe hasAlwaysTrueHint_ just to make it crystal-clear that it's not actually a guarantee? http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@869 PS5, Line 869: if (fsHasBlocks && fd.getNumFileBlocks() == 0) continue; nit: use braces for multi-line if http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870 PS5, Line 870: fd.getFileLength() > Yes. Totally agree. We probably can live with the 0-row data files through We already had to deal with a similar issue here https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html ; we should document similarly Generally it doesn't make any sense to write files with 0 rows and it should be rare. Our experience is that some misbehaving tools can generate 0 row files (we've seen Spark do it with issues like https://issues.apache.org/jira/browse/SPARK-10216). You're right that the files are non-empty because they have the footer with the schema. I don't think there's an upper bound on the size either though, cause they could have an arbitrarily complex scheme. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 20 Nov 2020 01:29:42 +0000 Gerrit-HasComments: Yes