Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11942 )
Change subject: IMPALA-7801: Remove toSql() from ParseNode interface. ...................................................................... Patch Set 2: (6 comments) Thanks much for the improvement. Some general comments sprinkled among the files. http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@395 PS2, Line 395: String sql = toSql(DEFAULT); Let's think about the meaning of the options for expressions. Expr nodes won't have the state to provide un-rewritten SQL: a node is what it is, and does not know if it should be considered rewritten or not. The rewritten state is something that the outer non-expr node must provide. Expr nodes can decide whether to include or exclude implicit casts. This, then, raises and interesting issue. The ToSqlOptions enum is an enum: we get to pick one value. But, some of the options overlap: I may want to include implicit casts as one choice, and may want source or rewritten SQL on the other hand. I wonder, can we make this simpler somehow? http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1300 PS2, Line 1300: name, (printExpr) ? " '" + toSql(DEFAULT) + "'" : "", type_.toString())); This comment applies in a zillion other places. Our general rule for error messages should be that we show the user's original SQL without rewrites or implicit casts. We have hundreds of such usages and we have to check that new messages use the correct form. Can we standardize these somehow? One thought is to rename the option from DEFAULT to something like ORIGINAL or SOURCE. But, this means we still have to sprinkle this option across many error messages. This is, really, exposing implementation in too many places. As it turns out, ParseNode is an interface, with many direct decedents. Does it make sense to introduce a new base class, AbstractParseNode, that has a userSql() (or sourceSql()) method that we use in error messages? Recognizing that expressions are different than statements, maybe: ParseNode |- Expr (root of all expressions) |- AbstractStmt (root of all statements) The statement base would provide the source/rewritten options. Exprs only provide the implicit cast or not options. http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/ParseNode.java File fe/src/main/java/org/apache/impala/analysis/ParseNode.java: http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/ParseNode.java@33 PS2, Line 33: String toSql(ToSqlOptions options); A common practice in Java is to include the enum definition in the interface file if the enum is used only by the interface (and its implementations). http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@217 PS2, Line 217: "OFFSET requires an ORDER BY clause: " + limitElement_.toSql(DEFAULT).trim()); I wonder if we can solve a mess here with exceptions also. Rather than building message (very inconsistently) in each message, maybe have a builder: throw new AnalysisException .builder("OFFSET requires an ORDER BY clause") .expr(limitElement_) .build(); The builder would have methods for expressions (which will call the proper toSql version), for statements, for additional context, and so on. This would avoid the issue of exposing the magic toSql(DEFAULT) incantation in zillions of places. http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java@44 PS2, Line 44: import static org.apache.impala.analysis.ToSqlOptions.DEFAULT; Two suggestions. 1. Rename this as ToSqlOption (or ToSqlFormat) since we can pass only one option. 2. Import the enum, not the static value. Reference it with the enum name: ToSqlFormat.DEFAULT (or, based on earlier suggestions, ToSqlFormat.SOURCE). http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java@214 PS2, Line 214: output.append(info_.getSortExprs().get(i).toSql(DEFAULT) + " "); This is an example of the point made elsewhere. It SEEMS we are asking the expression to give the DEFAULT (source) SQL. But, an expression can't make that decision. if we want the source SQL, we'd have to have cached it. I'm working on a project to do just that in a more standardized form. (Though, the result is still pretty messy in order to minimize massive changes...) -- To view, visit http://gerrit.cloudera.org:8080/11942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17025901838e9ffd753894a8087170123f9d8b33 Gerrit-Change-Number: 11942 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Fri, 16 Nov 2018 21:56:53 +0000 Gerrit-HasComments: Yes