Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12221 )

Change subject: [PROTOTYPE] IMPALA-5872: Test case builder for query planner
......................................................................


Patch Set 1:

(8 comments)

I'll include a stand-alone planner only Java test in PS2.

> Is there a way to include the query itself in the test case? That eliminates 
> the question about "did I get the correct data for the query or query for the 
> data?"

It is already there. Included an example output in comments on sql-parser.cup

http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@282
PS1, Line 282:   KW_ENCODING, KW_END, KW_ESCAPED, KW_EXISTS, KW_EXPLAIN, 
KW_EXPORT, KW_EXTENDED,
> Will introducing a keyword using a common term cause existing queries to fa
I redid the syntax to "copy testcase [to|from]" per Greg's suggestion. But yes, 
I see your point. Anytime we add new keywords we are at a risk of creating new 
incompatibility issues. I think we should document it and suggest the users to 
quote them going forward.


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@773
PS1, Line 773:   KW_EXPORT KW_TESTCASE KW_INTO KW_OUTFILE STRING_LITERAL:path 
query_stmt:query
> Insert a keyword before the query to allow future revisions? Eliminate OUTF
Redid the grammar to copy testcase...(eliminates the outfile keyword). Also 
what's the advantage in doing it the json way? Its pretty simple to add but I 
don't see the usecase.


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/cup/sql-parser.cup@782
PS1, Line 782:     RESULT = new LoadTestCaseStmt(new HdfsUri(path));
> What does LOAD mean? Is this a load-and-plan? Suppose I want to do an EXPLA
It just loads the metadata. The query_stmt is already collected. Sample output 
from a load looks like this.

```
[localhost:21000] default> copy testcase from 
'hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553';
Query: copy testcase from 
'hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553'
.........
+----------------------------------------------------------------------------------------------------------------+
| summary                                                                       
                                 |
+----------------------------------------------------------------------------------------------------------------+
| Testcase generated using Impala version 3.2.0-SNAPSHOT. 1 db(s), 2 table(s) 
and 2 view(s) imported for query:  |
|                                                                               
                                 |
| SELECT v1.a FROM foo.v1 INNER JOIN foo.v2 ON v1.a = v2.a                      
                                 |
+----------------------------------------------------------------------------------------------------------------+```


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java
File fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java:

http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@86
PS1, Line 86:       table.getLock().lock();
> Is the lock needed? None of our other table-related code in the analyzer or
That is for Table.toThrift(). So far we had callers for it only in the Catalog 
code, hence you don't see this pattern in the analysis code. We had some issues 
in the past with multiple threads trying to serialize the table structure 
without the lock and that caused some weird data corruption issues. So one of 
the fixes added an assert which checks that the current thread holds a lock. 
It's annoying but I don't think we can avoid it. (This is another side-effect 
of Catalog and analysis sharing Catalog object implementations)


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@94
PS1, Line 94:       if (!result.getDbs().contains(thriftDb)) {
> Does this work? contains() uses the 'equals()' method, which, presumably, i
Done


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@99
PS1, Line 99:     Set<FeView> allReferencedViews = new HashSet<>();
> This will also be a value set, which means we will compare inline views fie
Guava to the rescue, it has a newIdentityHashSet() which is a wrapper around 
IdentityMap.


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@101
PS1, Line 101:     for (FeView feView: allReferencedViews) {
> Set iteration has no defined order. Run the same operation twice and the ex
sorted by fullname for deterministic behavior.


http://gerrit.cloudera.org:8080/#/c/12221/1/fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java@121
PS1, Line 121:         outputPath_.getPath(), testOutputFilePrefix + 
UUID.randomUUID().toString());
> This appends a UUID. The user gave a name. How will the user know the actua
Actually, user just gives an "input directory" where Impala writes the tescase 
and outputs the file name as query response. I thought this would be a better 
user experience since finding a non-existing file name and then running the 
query is like two steps instead of the current approach, which is just one 
step. A sample output with the patch looks like this. Lemme know if you 
disagree.

```
[localhost:21000] foo> copy testcase to 'hdfs:///tmp' select v1.a from v1 join 
v2 on v1.a=v2.a;
Query: copy testcase to 'hdfs:///tmp' select v1.a from v1 join v2 on v1.a=v2.a
+--------------------------------------------------------------------------------------+
| Test case data output path                                                    
       |
+--------------------------------------------------------------------------------------+
| 
hdfs://localhost:20500/tmp/impala-testcase-data-354469c8-1371-4b4e-8978-943018088553
 |
+--------------------------------------------------------------------------------------+
Fetched 1 row(s) in 3.79s
```



--
To view, visit http://gerrit.cloudera.org:8080/12221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
Gerrit-Change-Number: 12221
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Jan 2019 23:09:48 +0000
Gerrit-HasComments: Yes

Reply via email to