Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8958 )

Change subject: IMPALA-5152: Introduce metadata loading phase
......................................................................


Patch Set 3:

(13 comments)

I was curious how this works, so took a look through. It certainly makes a ton 
of sense to me to explicitly load the metadata as one phase of the query.

http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG@62
PS3, Line 62: Testing:
            : - A core/hdfs run succeeded
            : - No new tests were added because the existing functional tests
            :   provide good coverage of various metadata loading scenarios.
            : - The issue reported in IMPALA-5152 is basically impossible now.
            :   Adding FE unit tests for that bug specifically would require
            :   ugly changes to the new code to enable such testing.
Your view of these tests is obviously much deeper than mine. We expect that for 
most queries, the number of round trips (on a clean "cache") is exactly one, 
but for views involving views, it's one plus the number of views (or fewer). 
Does it make sense to explicitly count round trips (perhaps expose it as a 
metric of "number of rounds of prioritized loading" in the profile) and concoct 
one of these deliberately?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java@284
PS3, Line 284:   public static List<TableName> getCandidateTables(List<String> 
path, String sessionDb) {
This seems easy to unit test. Would it make sense to do so?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@61
PS3, Line 61:    * Subclasses should override this method as necessary.
Is it possible that someone will forget to do so in the future in an unpleasant 
way? (You could make this abstract and force implementors to provide the empty 
function definition explicitly.)


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@231
PS3, Line 231:     stmt.collectTableRefs(tblRefs, true);
I don't know what the usual style here is, but you might want to consider:

"true /* only from clause */" or even creating an enum or creating a method 
called collectTableRefsFromClauseOnly().

Basically, boolean arguments are really hard to remember.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@964
PS1, Line 964:       if (tbl == null) continue;
Perhaps %d requested and %d loaded tables? I think the fraction (e.g., "10/8") 
would be meaningless without the source.


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@827
PS3, Line 827:   public static final class StmtTableCache {
javadoc?

Is this actually a cache?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@866
PS3, Line 866:     final long debugLoggingFrequency = 10;
nit: rename to include units? (e.g., debugLoggingFrequencyMillis).

In a previous life, I've made these difficult-but-configurable via Java system 
properties:

 
Integer.getInteger("org.apache.impala.service.Frontend.DEBUG_LOGGING_FREQUENCY",
 5000)

Up to you.


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@894
PS3, Line 894:    // Loop until all the missing tables are loaded in the 
Impalad's catalog cache.
             :     // In every iteration of this loop we wait for one catalog 
update to arrive.
             :     while (!missingTbls.isEmpty()) {
Is there a limit to the number of iterations we're willing to do? Similarly, 
can we check if the query has been cancelled?

I worry that a bug here will cause an infinite loop, whereas in practice, if 
catalogUpdateCount > 10, something is wrong?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@906
PS3, Line 906:         LOG.warn("Catalog restart detected while waiting for 
table metadata.");
For debugging, would it be useful to log the old and new versions of the 
catalog? Those don't seem to be obviously exposed right now (but could maybe be 
exposed via a toString() implementation.)


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@928
PS3, Line 928:       // 3) Periodically retry to avoid a hang due to 
anomalies/bugs, e.g.,
Do you feel that this case requires more explicit logging?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@959
PS3, Line 959:         tbl = catalog.getTable(tblName.getDb(), 
tblName.getTbl());
There's a "throw on error" argument to getTable(). Should that be used instead 
of the try/catch?


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@960
PS3, Line 960:       } catch (CatalogException e) {
             :         Preconditions.checkState(e instanceof 
DatabaseNotFoundException);
is this just

catch (DatabaseNotFoundException ignored) {
  // ...
}

i.e., if you get anything else, it'll rethrow via the Precondition.

Might be nice to have a comment about why, though.


http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@964
PS3, Line 964:       if (tbl == null) continue;
Is there a chance that not recording tbl==null causes extra round trips?

For example, let's say we resolved a view, and that view refers to table "x". 
getCandidateTables() might generate "y.x" which isn't loaded because it doesn't 
exist, and perhaps we already knew that, because "y.x" was generated in a 
previous pass as well.

I've not thought this all the way through, but seems like it's possible.



--
To view, visit http://gerrit.cloudera.org:8080/8958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
Gerrit-Change-Number: 8958
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Feb 2018 20:20:05 +0000
Gerrit-HasComments: Yes

Reply via email to