Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8136 )
Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. ...................................................................... Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@59 PS2, Line 59: * table-level column statistics. Existing partition-objects and their row count not nit: are not http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@338 PS2, Line 338: expectAllPartitions_ = false; I don't think you need that. I think it's already initialized to false. http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@422 PS2, Line 422: expectAllPartitions_ = !(table_ instanceof HdfsTable) || : !BackendConfig.INSTANCE.enableStatsExtrapolation(); I think there is a conflict between this line and the comment about expectAllPartitions_ (L124). http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@435 PS2, Line 435: // Tablesample clause to be used for all child queries. : String tableSampleSql = analyzeTableSampleClause(analyzer); nit: move it closer to L452? http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440 PS2, Line 440: if (!updateTableStatsOnly()) { : for (Column partCol: hdfsTable.getClusteringColumns()) { : groupByCols.add(ToSqlUtils.getIdentSql(partCol.getName())); : } : } merge with L450? http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@503 PS2, Line 503: Sets 'sampleFileBytes_' according : * to the sample. I think it's important to stress that this function computes the sample. It is kind of implied by this line, but let's make it explicit to avoid confusion. http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@524 PS2, Line 524: Set total file bytes being scanned based on the sample. Maybe "Compute a sample of files to be scanned and set 'sampleFileBytes_'", or something like that. http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java File fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java: http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java@67 PS2, Line 67: Long nit: do you need an object here? http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@691 PS2, Line 691: Reference<Long> numUpdatedPartitions, Reference<Long> numUpdatedColumns Add a comment about these two. http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@698 PS2, Line 698: if (LOG.isInfoEnabled()) { Does this mean that it won't print anything for debug and/or trace? Is there any reason why we don't want that? http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@804 PS2, Line 804: Hive I think we should start calling these HMS tables/columns. Besides, soon HMS will be a separate thing from HIVE :) http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@870 PS2, Line 870: Preconditions.checkState(val >= 0); : Preconditions.checkState(sampleFileBytes >= 0); : Preconditions.checkState(totalFileBytes >= 0); nit: merge into one statement? http://gerrit.cloudera.org:8080/#/c/8136/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@875 PS2, Line 875: return Math.round(val * mult); Alternatively, you can use LongMath.checkedMultiply(), catch the arithmetic exception and return Long.MAX_VALUE. I know it looks more than what you currently have, but I feel it's more clear what will happen in all cases compared to using round(). -- To view, visit http://gerrit.cloudera.org:8080/8136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Gerrit-Change-Number: 8136 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 27 Nov 2017 20:00:24 +0000 Gerrit-HasComments: Yes