Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@59
PS2, Line 59:  *   table-level column statistics. Existing partition-objects 
and their row count not
nit: are not


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@338
PS2, Line 338: expectAllPartitions_ = false;
I don't think you need that. I think it's already initialized to false.


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@422
PS2, Line 422: expectAllPartitions_ = !(table_ instanceof HdfsTable) ||
             :           !BackendConfig.INSTANCE.enableStatsExtrapolation();
I think there is a conflict between this line and the comment about 
expectAllPartitions_ (L124).


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@435
PS2, Line 435: // Tablesample clause to be used for all child queries.
             :     String tableSampleSql = analyzeTableSampleClause(analyzer);
nit: move it closer to L452?


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440
PS2, Line 440: if (!updateTableStatsOnly()) {
             :       for (Column partCol: hdfsTable.getClusteringColumns()) {
             :         
groupByCols.add(ToSqlUtils.getIdentSql(partCol.getName()));
             :       }
             :     }
merge with L450?


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@503
PS2, Line 503: Sets 'sampleFileBytes_' according
             :    * to the sample.
I think it's important to stress that  this function computes the sample. It is 
kind of implied by this line, but let's make it explicit to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@524
PS2, Line 524: Set total file bytes being scanned based on the sample.
Maybe "Compute a sample of files to be scanned and set 'sampleFileBytes_'", or 
something like that.


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
File fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java:

http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java@67
PS2, Line 67: Long
nit: do you need an object here?


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@691
PS2, Line 691: Reference<Long> numUpdatedPartitions, Reference<Long> 
numUpdatedColumns
Add a comment about these two.


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@698
PS2, Line 698: if (LOG.isInfoEnabled()) {
Does this mean that it won't print anything for debug and/or trace? Is there 
any reason why we don't want that?


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@804
PS2, Line 804: Hive
I think we should start calling these HMS tables/columns. Besides, soon HMS 
will be a separate thing from HIVE :)


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@870
PS2, Line 870: Preconditions.checkState(val >= 0);
             :     Preconditions.checkState(sampleFileBytes >= 0);
             :     Preconditions.checkState(totalFileBytes >= 0);
nit: merge into one statement?


http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@875
PS2, Line 875: return Math.round(val * mult);
Alternatively, you can use LongMath.checkedMultiply(), catch the arithmetic 
exception and return Long.MAX_VALUE. I know it looks more than what you 
currently have, but I feel it's more clear what will happen in all cases 
compared to using round().



--
To view, visit http://gerrit.cloudera.org:8080/8136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Nov 2017 20:00:24 +0000
Gerrit-HasComments: Yes

Reply via email to