Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16098/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16098/10//COMMIT_MSG@27
PS10, Line 27: configrations
nit: typo


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a1164
PS10, Line 1164:
why remove this?


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1179
PS10, Line 1179: // If all partitions have good stats, return the total row 
count, contributed
               :     // by all of them, as the row count for the table.
is this the intended behavior? so if a user has a table with hundreds of 
partitions, adding a single partition with corrupt stats basically causes the 
stats estimation to fall back to the table-size based estimation? wouldn't a 
more reasonable behavior be if you see a partition with corrupt stats you 
fallback to the file-size based estimation just for that partition?


http://gerrit.cloudera.org:8080/#/c/16098/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1220
PS10, Line 1220:       if ( numRows > 0 )
               :         numRows = Math.min(numRows, estNumRows);
               :       else
               :         numRows = estNumRows;
for any if statement thats is longer than a single line, you should add curly 
braces. so it should be:

 if (numRows > 0) {
        numRows = Math.min(numRows, estNumRows);
 } else {
        numRows = estNumRows;
 }

you don't need the extra space in the if condition either


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py
File tests/metadata/test_compute_stats.py:

http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@180
PS10, Line 180: impala_9744
I know other methods do this, but I'm generally not a fan of including the JIRA 
number in a test method name. I think we can come up with a more descriptive 
name for this method.


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@194
PS10, Line 194: #
nit remove


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@199
PS10, Line 199:       CREATE TABLE {0}.{1} (
I think you could just use 'create table like parquet' rather than listing out 
all the columns. You can create the table in Impala and still modify it using 
the load data local inpath statement in Hive.


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@215
PS10, Line 215:       set hive.stats.autogather=true;
do you need this?


http://gerrit.cloudera.org:8080/#/c/16098/10/tests/metadata/test_compute_stats.py@236
PS10, Line 236:     #
I think its probably best to create separate test methods for the partitioned 
and unpartitioned cases



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jun 2020 03:43:27 +0000
Gerrit-HasComments: Yes

Reply via email to