Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12221 )
Change subject: [PROTOTYPE] IMPALA-5872: Test case builder for query planner ...................................................................... Patch Set 1: (8 comments) I'll include a stand-alone planner only Java test in PS2. > Is there a way to include the query itself in the test case? That eliminates > the question about "did I get the correct data for the query or query for the > data?" It is already there. Included an example output in comments on sql-parser.cup http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@282 PS1, Line 282: KW_ENCODING, KW_END, KW_ESCAPED, KW_EXISTS, KW_EXPLAIN, KW_EXPORT, KW_EXTENDED, > Will introducing a keyword using a common term cause existing queries to fa I redid the syntax to "copy testcase [to|from]" per Greg's suggestion. But yes, I see your point. Anytime we add new keywords we are at a risk of creating new incompatibility issues. I think we should document it and suggest the users to quote them going forward. http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@773 PS1, Line 773: KW_EXPORT KW_TESTCASE KW_INTO KW_OUTFILE STRING_LITERAL:path query_stmt:query > Insert a keyword before the query to allow future revisions? Eliminate OUTF Redid the grammar to copy testcase...(eliminates the outfile keyword). Also what's the advantage in doing it the json way? Its pretty simple to add but I don't see the usecase. http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@782 PS1, Line 782: RESULT = new LoadTestCaseStmt(new HdfsUri(path)); > What does LOAD mean? Is this a load-and-plan? Suppose I want to do an EXPLA It just loads the metadata. The query_stmt is already collected. Sample output from a load looks like this. ``` [localhost:21000] default> copy testcase from 'hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553'; Query: copy testcase from 'hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553' ......... +----------------------------------------------------------------------------------------------------------------+ | summary | +----------------------------------------------------------------------------------------------------------------+ | Testcase generated using Impala version 3.2.0-SNAPSHOT. 1 db(s), 2 table(s) and 2 view(s) imported for query: | | | | SELECT v1.a FROM foo.v1 INNER JOIN foo.v2 ON v1.a = v2.a | +----------------------------------------------------------------------------------------------------------------+``` http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java File fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java: http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@86 PS1, Line 86: table.getLock().lock(); > Is the lock needed? None of our other table-related code in the analyzer or That is for Table.toThrift(). So far we had callers for it only in the Catalog code, hence you don't see this pattern in the analysis code. We had some issues in the past with multiple threads trying to serialize the table structure without the lock and that caused some weird data corruption issues. So one of the fixes added an assert which checks that the current thread holds a lock. It's annoying but I don't think we can avoid it. (This is another side-effect of Catalog and analysis sharing Catalog object implementations) http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@94 PS1, Line 94: if (!result.getDbs().contains(thriftDb)) { > Does this work? contains() uses the 'equals()' method, which, presumably, i Done http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@99 PS1, Line 99: Set<FeView> allReferencedViews = new HashSet<>(); > This will also be a value set, which means we will compare inline views fie Guava to the rescue, it has a newIdentityHashSet() which is a wrapper around IdentityMap. http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@101 PS1, Line 101: for (FeView feView: allReferencedViews) { > Set iteration has no defined order. Run the same operation twice and the ex sorted by fullname for deterministic behavior. http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@121 PS1, Line 121: outputPath_.getPath(), testOutputFilePrefix + UUID.randomUUID().toString()); > This appends a UUID. The user gave a name. How will the user know the actua Actually, user just gives an "input directory" where Impala writes the tescase and outputs the file name as query response. I thought this would be a better user experience since finding a non-existing file name and then running the query is like two steps instead of the current approach, which is just one step. A sample output with the patch looks like this. Lemme know if you disagree. ``` [localhost:21000] foo> copy testcase to 'hdfs:///tmp' select v1.a from v1 join v2 on v1.a=v2.a; Query: copy testcase to 'hdfs:///tmp' select v1.a from v1 join v2 on v1.a=v2.a +--------------------------------------------------------------------------------------+ | Test case data output path | +--------------------------------------------------------------------------------------+ | hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553 | +--------------------------------------------------------------------------------------+ Fetched 1 row(s) in 3.79s ``` -- To view, visit http://gerrit.cloudera.org:8080/12221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52 Gerrit-Change-Number: 12221 Gerrit-PatchSet: 1 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Comment-Date: Tue, 22 Jan 2019 23:09:48 +0000 Gerrit-HasComments: Yes