[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16098/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16098/3//COMMIT_MSG@13
PS3, Line 13: CDPD-12560 documents a form of corruption in partition stats in 
Hive
BTW, I see this is marked WIP, so not ready for review..but just want to 
mention CDPD is Cloudera's jira system..we should avoid referencing it in 
context of the Apache JIRA.



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 19 Jun 2020 19:40:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6381/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 19 Jun 2020 16:48:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6380/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 19 Jun 2020 16:47:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6379/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 19 Jun 2020 16:38:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..

WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

This work addresses the current limitation in computing the total row
count for a Hive table. The row count can be incorrectly computed as 0,
even though there exists data in partitions of the Hive table.

CDPD-12560 documents a form of corruption in partition stats in Hive
tables that contributes to this limitation in Impala: the row count of
a partition is set to 0 even though the partition size is a positive
value. The corruption can only happen when hive.stats.autogather=true
during both table creation and loading.

In the fix, as long as no partition in a Hive table exhibits any stats
corruptions including the kind described above, the total row count for
the table is computed from the row counts in all partitions. Otherwise,
Impala estimates the total row count from the total size of the partitions
and the row width, if feasible.

Testing:
1. Ran unit tests with queries documented in the case against Hive tables
   with the following configrations:
   a. No stats corruption in any partitions;
   b. Stats corruption in some partitions;
   c. Stats corruption in all partitions.

Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
1 file changed, 11 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/3
--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..

WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

This work addresses the current limitation in computing the total row
count for a Hive table. The row count can be incorrectly computed as 0,
even though there exists data in some partitions of the Hive table.

CDPD-12560 documents a form of corruption in partition stats in Hive
tables that contributes to this limitation in Impala: the row count of
a partition is set to 0 even though the partition size is a positive
value. The corruption can only happen when hive.stats.autogather=true
during both table creation and table loading.

In the fix, as long as no partition in a Hive table exhibits any stats
corruptions including the type described above, the total row count for
the table is computed from the row counts in all partitions. Otherwise,
Impala estimates the total row count from the total size of the partitions
and the row width if feasible.

Testing:
1. Ran unit tests with queries documented in the case against Hive tables
   with the following configrations:
   a. No stats corruption in any partitions;
   b. Stats corruption in some partitions;
   c. Stats corruption in all partitions.

Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
1 file changed, 11 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/2
--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16098/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16098/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1180
PS1, Line 1180:   // If all partitions have good stats, return the total 
row count, contributed
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16098/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1182
PS1, Line 1182:   if (!hasCorruptTableStats_ && numPartitionsWithNumRows_ > 
0) return partitionNumRows_;
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 19 Jun 2020 16:10:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-06-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16098


Change subject: WIP IMPALA-9744: Treat corrupt table stats as missing to avoid 
bad plans
..

WIP IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

This work addresses the current limitation in computing the total row
count for a Hive table. The row count can be incorrectly computed as 0,
even though there exists data in some partitions of the Hive table.

CDPD-12560 documents a form of corruption in partition stats in Hive
tables that contributes to this limitation in Impala: the row count of
a partition is set to 0 even though the partition size is a positive
value. The corruption can only happen when hive.stats.autogather=true
during both table creation and table loading.

In the fix, as long as no partition in a Hive table exhibits any stats
corruptions including the type described above, the total row count for
the table is computed from the row counts in all partitions. Otherwise,
Impala estimates the total row count from the total size of the partitions
and the row width if feasible.

Testing:
1. Ran unit tests with queries documented in the case against Hive tables
   with the following configrations:
   a. No stats corruption in any partitions;
   b. Stats corruption in some partitions;
   c. Stats corruption in all partitions.

Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
1 file changed, 10 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16098/1
--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar