[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has abandoned this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Abandoned Abandoning the review until I can work on it again. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; > I think we can still make progress here, I'm not sure I completely follow y If I remember correctly, I encountered a problem with the com.cloudera.impala.analysis.TableName class. It stores the table name as a String, but the table name itself can be qualified in case of complex types, thus it can not be quoted by simply putting it between backticks. It also seems that com.cloudera.impala.analysis.InsertStmt and com.cloudera.impala.catalog.View constructors take a List parameter for columns and store it in the object. It seems to me that in order to properly quote these, we would need to refactor the code to use a data structure capable of representing hierarchical names unambiguously to pass the table and column references around and to store them. I thought that the ongoing effort you mentioned is about addressing this. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; > Those are two separate issues. In your example, we are guarding against a l I added splitting because when I just put the whole string between backticks then Impala ended up escaping qualified identifiers of complex types incorrectly. I noticed that these references were already in a string form by the time getIdentSql gets called, so there is no way to properly quote them without refactoring to pass them in a structured format to the affected functions. The "Invalid column/field name" error message mislead me to believe that dots are not allowed in any identifier, which lead me to come up with my approach which would work if identifiers really couldn't contain dots. Since this turned out to be a false assumption, it seems that I have to discard this change as this task can only be implemented properly after IMPALA-2287 and probably some other refactorings. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Alex Behm has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 69: public static String getIdentSql(String ident) { > I agree that it's slighly less pleasing to the eye but I think correctness/ For SHOW CREATE TABLE and CREATE VIEW it's fine to quote all identifiers, but I don't think we should needlessly pollute error messages, e.g., in AnalysisExceptions. To the best of my knowledge, Hive does not quote identifiers in error reports, and neither should we. Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; > Hmmm. Interesting. On the other hand: Those are two separate issues. In your example, we are guarding against a limitation in the Hive Metastore that disallows certain column names. Generally, the Hive Metastore is pretty restrictive. However, within an Impala query you can use more or less arbitrary identifiers, for better of for worse. There's an argument to be made that we should not allow that, but that's the state of the world today. Ambiguities are a reality that we cannot escape, even if we disallowed dots in quoted identifiers. If you are curious, you can look at AnalyzeStmts#TestSlotRefPathAmbiguity() and the other *Ambiguity tests to get a flavor. I still don't see how the specific problem we are discussing is not solvable. This function should accept a single identifier and quote it. It's up to the caller (who has more knowledge about the identifier) to pass in the right value. For the struct case there is an existing JIRA to change the behavior: IMPALA-2287 My basic question here is: Why did you add the splitting? -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 69: public static String getIdentSql(String ident) { > So we are in agreement then? I agree that it's slighly less pleasing to the eye but I think correctness/unambiguity is more important than aesthetics. Also Hive does the same, so I think people are used to it. Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; > select `a.b.c`.`x.y.z` from (select 1 `x.y.z`) `a.b.c` Hmmm. Interesting. On the other hand: > create view test as select `a.b.c`.`x.y.z` from (select 1 `x.y.z`) `a.b.c`; ERROR: AnalysisException: Invalid column/field name: x.y.z This is a serious problem, because it means that identifier names are ambigous. I checked that in case of a struct, functions get "column_name.field_name" as the column name. I haven't checked your example yet, but my guess is that in this case the column name parameter will become "x.y.z". Still, the former should be quoted as `column_name`.`field_name` and latter as `x.y.z`. It seems that it's too late to fix this problem by the time getIdentSql gets called. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 69: public static String getIdentSql(String ident) { > The main issue with this approach is that all invocations of toSql() will n Initially I was also worried that identifiers become "ugly" and considered making this behavior optional. I was thinking of a query option that the user can control using the SET statement. Then I checked how Hive works and I found that it already quotes every identifier (I checked the table and view definitions if I remember correctly), so I came to the conclusion that quoting everything should be okay. Of course I expected some suggestions to how it should work, that's why I didn't fix the styling issues in the test. Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; > I don't think splitting here is the right fix. An identifier itself could c I checked that identifiers can't contain dots, or at least users can't create such identifiers. If you try specifying `a.b` as an identifier, Impala will complain that dots are not allowed in identifier names. Do you know of a code piece that gets around this mechanism and creates identifiers with dots in some other way? Qualified table or column references are exactly the reason why I only made this version public and the other one private, as only this is safe to use to quote table or column references. In case of complex types, dots can become a part of either the table reference or the column reference. In case of a struct, a column reference might look like column_name.field_name, which should be quoted as `column_name`.`field_name`. In case of a map or array, table references might look like table_name.map_or_array_name, which should be quoted `table_name`.`map_or_array_name`. This is my understanding based on the short amount of time I spent on Impala, so please correct me if I'm wrong. Line 88: if (ident.equals("*")) { > This doesn't seem right or necessary. First, "*" is not an identifier (it i Without this check a SELECT * became SELECT `*` which is invalid. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Alex Behm has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java File fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java: Line 112: sb.append(" " + type_.toString()); toSql() Line 114: sb.append(" " + typeDef_.toString()); toSql() http://gerrit.cloudera.org:8080/#/c/4527/2/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: Line 69: public static String getIdentSql(String ident) { The main issue with this approach is that all invocations of toSql() will not have quoted identifiers. Unfortunately, we also use toSql() for error reporting. For example, take a look at AnalyticExpr.java or any other Expr. If analyze() fails we typically we print the offending expr with toSql(). Error messages will now become unnecessarily weird. I think the callers need to decide the quoting policy. We should have two versions of toSql(): 1. Only adds quotes if necessary for Impala (no need to consider Hive as before) 2. Always quote identifiers Imo, the default toSql() should have the first behavior, and we may add another version that always quotes identifiers. For example, we could have a setup like this: private String toSql(boolean quoteAllIdents); private String toSql() { return toSql(false); } You can imagine other variants. Happy to discuss. Line 76: return Joiner.on(".").join(getStandaloneIdentSqlList((Splitter.on('.').split(ident; I don't think splitting here is the right fix. An identifier itself could contain a dot, even if unlikely. Also, qualified table or column references consist of multiple identifiers and not a single identifier, so the premise of this fix is kind of wrong (based on your comment). Line 88: if (ident.equals("*")) { This doesn't seem right or necessary. First, "*" is not an identifier (it is a terminal), and second those clauses that could contain "*" should handle it specially in toSql(). For my understanding, what use case is this change addressing? -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has uploaded a new patch set (#2). Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. IMPALA-784: Use `-s in SHOW CREATE TABLE output Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 --- M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java M fe/src/main/java/com/cloudera/impala/analysis/DeleteStmt.java M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java M fe/src/main/java/com/cloudera/impala/analysis/TableName.java M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java M fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java 8 files changed, 339 insertions(+), 338 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4527/2 -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Tim Armstrong has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 1: (2 comments) This seems reasonable to me. I think we should get Alex Behm to confirm, since he was the one who originally wrote some of this code. How did you determine all the places that needed to be changed? We should also fix this for "SHOW CREATE FUNCTION" and any other similar statements that output SQL for consistency. The CREATE TABLE expected output is probably coming from .test files. E.g. testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test http://gerrit.cloudera.org:8080/#/c/4527/1/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: PS1, Line 72: reassamble reassemble Line 73: // This is safe, because periods are not allowed inside identifier names for Thanks for the explanation. -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. Patch Set 1: The functional tests still fail with false positives like - CREATE TABLE functional_kudu.dimtbl (id BIGINT, name STRING, zip INT) + CREATE TABLE `functional_kudu`.`dimtbl` (`id` BIGINT, `name` STRING, `zip` INT) I haven't fixed these yet, because I still have to figure out where the expected strings come from, but first I wanted to make sure that we agree on the approach to be taken. The same applies to the overly long lines in the unit test. That test is fixed functionally but not stylistically, as I didn't want to spend too much time on it until I get feedback about whether this is the right approach. Thanks, Zoltan -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan IvanfiGerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-784: Use `-s in SHOW CREATE TABLE output
Zoltan Ivanfi has uploaded a new change for review. http://gerrit.cloudera.org:8080/4527 Change subject: IMPALA-784: Use `-s in SHOW CREATE TABLE output .. IMPALA-784: Use `-s in SHOW CREATE TABLE output Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 --- M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java M fe/src/main/java/com/cloudera/impala/analysis/DeleteStmt.java M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java M fe/src/main/java/com/cloudera/impala/analysis/TableName.java M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java M fe/src/main/java/com/cloudera/impala/analysis/UpdateStmt.java M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java 8 files changed, 339 insertions(+), 338 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4527/1 -- To view, visit http://gerrit.cloudera.org:8080/4527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8e66439daa35d2e3725e7277c16f3cc5fdb6f128 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi