[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.
Lars Volker has posted comments on this change. Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG Commit Message: Line 9: Having the repetition level set to REPEATED on the root schema > Well, IMPALA-4826 contains a link to the Parquet Jira (that explains why th My reasoning was that it makes the commit message more self contained. When reading it, I wondered why we decided to ignore the field in the root schema. The explanation is that the parquet format says that those fields should not be set. http://gerrit.cloudera.org:8080/#/c/7870/2//COMMIT_MSG Commit Message: PS2, Line 7: wrong scan result Here it says wrong result, but below it says that the queries failed. Can you make both consistent? -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.
Gabor Kaszab has posted comments on this change. Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG Commit Message: Line 7: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. > nit: Colon after JIRA Done Line 9: Having the repetition level set to REPEATED on the root schema > Yeah it would be good to reference the PARQUET JIRA. Well, IMPALA-4826 contains a link to the Parquet Jira (that explains why this should be ignored). I don't see the point of adding a reference to the commit msg as well. http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README File testdata/data/README: Line 111: Generated by hacking Impala's Parquet writer. > If it's just a one or two line change we could include the diff inline here Done: Added the repro steps here. -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.
Gabor Kaszab has uploaded a new patch set (#2). Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. .. IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet. Having the repetition level set to REPEATED on the root schema resulted a scan to fail with error when Impala tried to parse that table. As a solution, the 'REPEATED' repetition level is ignored when the root schema is processed. Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 --- M be/src/exec/parquet-metadata-utils.cc M testdata/data/README A testdata/data/repeated_root_schema.parquet M tests/query_test/test_scanners.py 4 files changed, 27 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/7870/2 -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG Commit Message: Line 9: Having the repetition level set to REPEATED on the root schema > Can you explain why it should be ignored, i.e. include a reference to the p Yeah it would be good to reference the PARQUET JIRA. http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README File testdata/data/README: Line 111: Generated by hacking Impala's Parquet writer. > I'm not very happy with us collecting more and more specially crafted files If it's just a one or two line change we could include the diff inline here. -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.
Lars Volker has posted comments on this change. Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet. .. Patch Set 1: (3 comments) Looks good to me, only minor comments http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG Commit Message: Line 7: IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet. nit: Colon after JIRA Line 9: Having the repetition level set to REPEATED on the root schema Can you explain why it should be ignored, i.e. include a reference to the parquet format? http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README File testdata/data/README: Line 111: Generated by hacking Impala's Parquet writer. I'm not very happy with us collecting more and more specially crafted files without a way to repro them. Can you push the hack to somewhere, e.g. your public Github, and mention the link here. That way we have a chance of preserving the information. What do others think? -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.
Gabor Kaszab has uploaded a new change for review. http://gerrit.cloudera.org:8080/7870 Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet. .. IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet. Having the repetition level set to REPEATED on the root schema resulted a scan to fail with error when Impala tried to parse that table. As a solution, the 'REPEATED' repetition level is ignored when the root schema is processed. Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 --- M be/src/exec/parquet-metadata-utils.cc M testdata/data/README A testdata/data/repeated_root_schema.parquet M tests/query_test/test_scanners.py 4 files changed, 22 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/7870/1 -- To view, visit http://gerrit.cloudera.org:8080/7870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab