Qifan Chen has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16098 )
Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans ...................................................................... WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans This work addresses the current limitation in computing the total row count for a Hive table. The row count can be incorrectly computed as 0, even though there exists data in some partitions of the Hive table. CDPD-12560 documents a form of corruption in partition stats in Hive tables that contributes to this limitation in Impala: the row count of a partition is set to 0 even though the partition size is a positive value. The corruption can only happen when hive.stats.autogather=true during both table creation and table loading. In the fix, as long as no partition in a Hive table exhibits any stats corruptions including the type described above, the total row count for the table is computed from the row counts in all partitions. Otherwise, Impala estimates the total row count from the total size of the partitions and the row width if feasible. Testing: 1. Ran unit tests with queries documented in the case against Hive tables with the following configrations: a. No stats corruption in any partitions; b. Stats corruption in some partitions; c. Stats corruption in all partitions. Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576 --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 1 file changed, 11 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/2 -- To view, visit http://gerrit.cloudera.org:8080/16098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576 Gerrit-Change-Number: 16098 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>