Taras Bobrovytsky has posted comments on this change.

Change subject: Add nested testdata flattener
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5787/1//COMMIT_MSG
Commit Message:

Line 22: 
> How was this tested?
Done


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/README
File testdata/TableFlattener/README:

Line 3: 
> It would be useful to explain also how nested data types (array, struct, ma
Added a brief description.


Line 11: 
> Mention that testdata/bin/generate-load-nested.sh uses this.
Done


PS1, Line 12: There are various options to specify the type of input file but 
the output is always
            : parquet/snappy.
> You could mention that the tool tries to use file extension to infer the in
Done


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/pom.xml
File testdata/TableFlattener/pom.xml:

Line 1: <?xml version="1.0" encoding="UTF-8"?>
> Would Maven permit this whole tree be moved to testdata/src/main/java/org/a
Possibly, but I think that should be done after this is checked in.


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java
File 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java:

PS1, Line 93: "value"
> Why not item for array?
This should not be modified because the query generator relies on this.


PS1, Line 96: "idx"
> Why not pos?
As mentioned above, the query generator relies on these the way there are now. 
A future patch can modify both this and the query generator.


PS1, Line 103: "_values";
> This will lead to "foo__values" (with 2 underscores). Is that intended?
Yes, this is intended, the query generator relies on this. This could be 
modified as together with the query generator in the future.


PS1, Line 120: "_"
> Shouldn't this use getNameSeparator() ?
Done


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java
File 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java:

PS1, Line 51: dataset
> It would read a bit clearer to call this dstDataSet.
Done


Line 60:     Preconditions.checkState(srcSchema.getType() == Type.RECORD);
> It would be nice to have a comment on this method describing what it does a
Let's do this as part of a separate patch. I would like to just get this code 
in because it works.


PS1, Line 65: field.schema()
> Why not fieldSchema?
Done


Line 85:     // Ensure that the parent schema has an id field so the child can 
reference the
> Same with this method: add brief description of method and parameter meanin
Let's do this in a separate patch.


PS1, Line 93:     LinkedList<Field> fields = new LinkedList<>();
> Is it correct to be instantiating fields this way, or should you be using t
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to