Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 )
Change subject: IMPALA-5152: Introduce metadata loading phase ...................................................................... Patch Set 1: (19 comments) First round of comments. Overall approach looks reasonable to me. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@73 PS1, Line 73: if (!fromClauseOnly) tblRefs.add(new TableRef(tableName_.toPath(), null)); Maybe replace the if check with a Preconditions.checkState(!fromClauseOnly); since it doesn't make sense to call this function with true in this context. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@66 PS1, Line 66: // Set in analyzeAndAuthorize(). : private ImpaladCatalog catalog_; If I understand it correctly, that shouldn't be needed, given that analysis can proceed with the LoadedTables state. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@518 PS1, Line 518: analysisResult_.isAlterTableStmt() || : analysisResult_.isAlterViewStmt()); Curious, where did this come from? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2361 PS1, Line 2361: or the db I don't think the existence of db is checked in this function anymore. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java@68 PS1, Line 68: public @Override http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@64 PS1, Line 64: public abstract void collectTableRefs(List<TableRef> tblRefs, boolean fromClauseOnly); If most of the statement classes that extend StatementBase don't return anything you can just provide a default implementation here and avoid adding a function in all these cases. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@823 PS1, Line 823: * ignored and not returned or added to 'loadedTables'. Comment where defaultDb is used. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@825 PS1, Line 825: private Set<TableName> getMissingTables nit: can you put the following two functions below requestTblLoadAndWait so that the code reads top-down? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@866 PS1, Line 866: org.apache.impala.analysis.Path.getCandidateTables nit: you can use import static if you want to avoid the classpath inside the function. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@872 PS1, Line 872: LoadedTables The name sounds a bit generic. Maybe LoadedTablesCache or something to indicate the temporal nature of it (i.e. it is specific to a query not a global thing). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@873 PS1, Line 873: public final ImpaladCatalog catalog; I see that this is used for initializing the Analyzer. But given that the analysis proceeds based on the loaded tables (my understanding from the commit msg), does the Analyzer still need it? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@885 PS1, Line 885: If non-NULL You mean the timeline, no? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@889 PS1, Line 889: defaultDb nit: I would call it sessionDb http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@903 PS1, Line 903: public public? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@916 PS1, Line 916: requestedTbls It seems that this is only used for the log message in L965. Can't you use a counter instead? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@919 PS1, Line 919: final long debugLoggingFrequency = 10; : final long retryLoadFrequency = 20; nit: move them at the beginning of this function? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@924 PS1, Line 924: if (issueLoadRequest) { Since we know there are missing tbls (condition in while), why do we need this check? The way I read this is "there are missing tbls but I will not ask them to be loaded" and I am not sure I understand the reason. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@943 PS1, Line 943: Tables remaining: %s If views are used, that number may go up and down and I am wondering if this may cause some confusion to anyone looking at the logs. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1091 PS1, Line 1091: AnalysisContext analysisCtx = new AnalysisContext(queryCtx, authzConfig_, timeline); : AnalysisResult analysisResult = : analysisCtx.analyzeAndAuthorize(queryCtx.client_request.stmt, this); As is today, we have the following call sequence: Frontend::createExecRequest()->AnalysisContext::analyzeAndAuthorize()->Frontend::requestTableLoadAndWait(). I think it may make more sense, to first collect the required tbls state (i.e. load any missing ones) and pass that to the AnalysisContext to perform the analysis. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Jan 2018 00:26:57 +0000 Gerrit-HasComments: Yes