Taras Bobrovytsky has posted comments on this change. Change subject: Add nested testdata flattener ......................................................................
Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/5787/1//COMMIT_MSG Commit Message: Line 22: > How was this tested? Done http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/README File testdata/TableFlattener/README: Line 3: > It would be useful to explain also how nested data types (array, struct, ma Added a brief description. Line 11: > Mention that testdata/bin/generate-load-nested.sh uses this. Done PS1, Line 12: There are various options to specify the type of input file but the output is always : parquet/snappy. > You could mention that the tool tries to use file extension to infer the in Done http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/pom.xml File testdata/TableFlattener/pom.xml: Line 1: <?xml version="1.0" encoding="UTF-8"?> > Would Maven permit this whole tree be moved to testdata/src/main/java/org/a Possibly, but I think that should be done after this is checked in. http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java: PS1, Line 93: "value" > Why not item for array? This should not be modified because the query generator relies on this. PS1, Line 96: "idx" > Why not pos? As mentioned above, the query generator relies on these the way there are now. A future patch can modify both this and the query generator. PS1, Line 103: "_values"; > This will lead to "foo__values" (with 2 underscores). Is that intended? Yes, this is intended, the query generator relies on this. This could be modified as together with the query generator in the future. PS1, Line 120: "_" > Shouldn't this use getNameSeparator() ? Done http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java: PS1, Line 51: dataset > It would read a bit clearer to call this dstDataSet. Done Line 60: Preconditions.checkState(srcSchema.getType() == Type.RECORD); > It would be nice to have a comment on this method describing what it does a Let's do this as part of a separate patch. I would like to just get this code in because it works. PS1, Line 65: field.schema() > Why not fieldSchema? Done Line 85: // Ensure that the parent schema has an id field so the child can reference the > Same with this method: add brief description of method and parameter meanin Let's do this in a separate patch. PS1, Line 93: LinkedList<Field> fields = new LinkedList<>(); > Is it correct to be instantiating fields this way, or should you be using t Done -- To view, visit http://gerrit.cloudera.org:8080/5787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes