[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.

2017-08-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in 
Parquet.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG
Commit Message:

Line 9: Having the repetition level set to REPEATED on the root schema
> Well, IMPALA-4826 contains a link to the Parquet Jira (that explains why th
My reasoning was that it makes the commit message more self contained. When 
reading it, I wondered why we decided to ignore the field in the root schema. 
The explanation is that the parquet format says that those fields should not be 
set.


http://gerrit.cloudera.org:8080/#/c/7870/2//COMMIT_MSG
Commit Message:

PS2, Line 7: wrong scan result
Here it says wrong result, but below it says that the queries failed. Can you 
make both consistent?


-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.

2017-08-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in 
Parquet.
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.

2017-08-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change.

Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in 
Parquet.
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.
> nit: Colon after JIRA
Done


Line 9: Having the repetition level set to REPEATED on the root schema
> Yeah it would be good to reference the PARQUET JIRA.
Well, IMPALA-4826 contains a link to the Parquet Jira (that explains why this 
should be ignored). I don't see the point of adding a reference to the commit 
msg as well.


http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README
File testdata/data/README:

Line 111: Generated by hacking Impala's Parquet writer.
> If it's just a one or two line change we could include the diff inline here
Done: Added the repro steps here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.

2017-08-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded a new patch set (#2).

Change subject: IMPALA-4826: Fix wrong scan result on repeated root schema in 
Parquet.
..

IMPALA-4826: Fix wrong scan result on repeated root schema in Parquet.

Having the repetition level set to REPEATED on the root schema
resulted a scan to fail with error when Impala tried to parse that table.

As a solution, the 'REPEATED' repetition level is ignored when the
root schema is processed.

Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
---
M be/src/exec/parquet-metadata-utils.cc
M testdata/data/README
A testdata/data/repeated_root_schema.parquet
M tests/query_test/test_scanners.py
4 files changed, 27 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/7870/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.

2017-08-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in 
Parquet.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG
Commit Message:

Line 9: Having the repetition level set to REPEATED on the root schema
> Can you explain why it should be ignored, i.e. include a reference to the p
Yeah it would be good to reference the PARQUET JIRA.


http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README
File testdata/data/README:

Line 111: Generated by hacking Impala's Parquet writer.
> I'm not very happy with us collecting more and more specially crafted files
If it's just a one or two line change we could include the diff inline here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.

2017-08-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in 
Parquet.
..


Patch Set 1:

(3 comments)

Looks good to me, only minor comments

http://gerrit.cloudera.org:8080/#/c/7870/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.
nit: Colon after JIRA


Line 9: Having the repetition level set to REPEATED on the root schema
Can you explain why it should be ignored, i.e. include a reference to the 
parquet format?


http://gerrit.cloudera.org:8080/#/c/7870/1/testdata/data/README
File testdata/data/README:

Line 111: Generated by hacking Impala's Parquet writer.
I'm not very happy with us collecting more and more specially crafted files 
without a way to repro them. Can you push the hack to somewhere, e.g. your 
public Github, and mention the link here. That way we have a chance of 
preserving the information.

What do others think?


-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.

2017-08-29 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7870

Change subject: IMPALA-4826 Fix wrong scan result on repeated root schema in 
Parquet.
..

IMPALA-4826 Fix wrong scan result on repeated root schema in Parquet.

Having the repetition level set to REPEATED on the root schema
resulted a scan to fail with error when Impala tried to parse that table.

As a solution, the 'REPEATED' repetition level is ignored when the
root schema is processed.

Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
---
M be/src/exec/parquet-metadata-utils.cc
M testdata/data/README
A testdata/data/repeated_root_schema.parquet
M tests/query_test/test_scanners.py
4 files changed, 22 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/7870/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ea84589e1d122ad9d43adde46893ec0ecc5f9c4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab