[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range as we are scanning Parquet.

Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Reviewed-on: http://gerrit.cloudera.org:8080/5343
Reviewed-by: Taras Bobrovytsky 
Tested-by: Internal Jenkins
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 141 insertions(+), 36 deletions(-)

Approvals:
  Taras Bobrovytsky: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 2: Code-Review+2

Carrying the +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5343/1/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 143:   /// the time falls into 00:00:00..23:59:59.9 range.
> Stale comment - we don't validate time.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Hello Internal Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5343

to look at the new patch set (#2).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range as we are scanning Parquet.

Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 141 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/5343/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 1: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/586/

-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5343/1/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 143:   /// the time falls into 00:00:00..23:59:59.9 range.
Stale comment - we don't validate time.


-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 1:

The previous review https://gerrit.cloudera.org/#/c/4968/ has been abandoned 
because I deleted a draft, which put the reivew in a weird state. (Gerrit bug)

-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5343

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range as we are scanning Parquet.

Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 142 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/5343/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4968/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 657: 
> Unnecessary blank line.
Done


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 151: const static int64_t NUMBER_OF_NANOSECONDS_IN_24H =
> boost provides a direct way to construct the min and max date with 
Done


PS7, Line 160: g buffer. The size of the buffer should be a
> It looks like boost date_time doesn't support leap seconds: https://groups.
After discussing offline, we agreed that we will not implement validation of 
number of seconds since midnight in this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-12-02 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#10).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 142 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4968/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 657: 
Unnecessary blank line.


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 151: const static int64_t NUMBER_OF_NANOSECONDS_IN_24H =
boost provides a direct way to construct the min and max date with 


  boost::gregorian::date(boost::date_time::min_date_time)
  boost::gregorian::date(boost::date_time::max_date_time)


PS7, Line 160: g buffer. The size of the buffer should be a
> Yes, that's true, some days can have an extra second: https://en.wikipedia.
It looks like boost date_time doesn't support leap seconds: 
https://groups.google.com/forum/#!topic/boost-list/i_0h_amkk-c

"Leap second support never got added to the library -- most applications can 
ignore them.  In principle, of course, you can use the library to help you do 
the calculations.  Basically they are like leap years except that they are 
irregular -- so you have to use a collection to perform the adjustment instead 
of an algorithm."

I think we should allow for one leap second per day just so we don't regress 
anything where the data is potentially valid. It would be good to add some 
testing of this (maybe as a follow) to make sure we don't crash, even if we do 
return bogus results.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-29 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#9).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 150 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-29 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#9).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 150 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-29 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#9).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 152 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-29 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#9).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 152 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-29 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4968/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: ErrorMsg* msg = NULL;
> see comment below. this pointer is never freed.  what's the objection to pu
Ok, I agree, it's better to put the logic inside ValidateSlot().


Line 484:   // This point is reached if slot validation succeeds, but slot 
conversion fails.
> but don't we need to validate the converted value? i.e. the timestamp might
Convert slot should never return an invalid value. I tried it manually and it 
works correctly.


Line 603: *msg = new ErrorMsg(TErrorCode::PARQUET_TIMESTAMP_OUT_OF_RANGE,
> this memory is leaked.
This is not a problem any more.


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS7, Line 160: g buffer. The size of the buffer should be a
> What I'm asking is: don't some days legitimately have more than NUMBER_OF_N
Yes, that's true, some days can have an extra second: 
https://en.wikipedia.org/wiki/Leap_second So far there hasn't been a case where 
a day has 2 leap seconds, but it's not impossible in the future.

If we do this, the way seconds are printed to screen would be incorrect, Impala 
would print "24:00:00" instead of "23:59:60".

Also, we would need to be able to cast string as timestamp that looks like 
"23:59:60" if we want to support leap seconds.

I'm not really sure what's the right solution here. Should we just increase the 
constant by 1?

By the way, even Oracle doesn't handle leap seconds properly (the number of 
seconds can't be greater that 59): 
http://stackoverflow.com/questions/31136211/how-to-handle-leap-seconds-in-oracle


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-28 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4968/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: ErrorMsg* msg = NULL;
see comment below. this pointer is never freed.  what's the objection to 
putting the logic inside ValidateSlot() so that the memory allocation can stay 
automatic?


Line 484:   // This point is reached if slot validation succeeds, but slot 
conversion fails.
but don't we need to validate the converted value? i.e. the timestamp might be 
within bounds before conversion, but not after. do we handle that?


Line 603: *msg = new ErrorMsg(TErrorCode::PARQUET_TIMESTAMP_OUT_OF_RANGE,
this memory is leaked.


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS7, Line 160: g buffer. The size of the buffer should be a
> The result of a select statement would look like, for example:
What I'm asking is: don't some days legitimately have more than 
NUMBER_OF_NANOSECNODS_IN_24H when leap seconds are in play?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-23 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: ErrorMsg msg;
> this has a non-trivial destructor, so we should probably only construct it 
I added some comments to make the code more clear.
It doesn't seem elegant to pass tuple into ValidateSlot, so I fixed the problem 
with the non-trivial destructor differently.


PS7, Line 474: val_ptr
> you'll also have to be careful that we pass the right pointer here, dependi
I think I'm passing it correctly in the latest patch.


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 147:   /// Verifies that the timestamp date falls into a valid range 
(years 1400..).
> update this comment to be clear that we also validate time.
Done


PS7, Line 156: l
> doesn't this have to be "ll" (long long)?
Yes, that's true (it's 64 bits).


PS7, Line 160: time_.ticks() < NUMBER_OF_NANOSECONDS_IN_24H
> I'm slightly worried about how this interacts with leap seconds.  Are you s
The result of a select statement would look like, for example:
1995-05-05 54:35:31


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-23 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#8).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
13 files changed, 154 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: ErrorMsg msg;
this has a non-trivial destructor, so we should probably only construct it 
after checking NeedsValidation().  Also, why are NeedsValidation() and 
NeedsConversion() mutually exclusive?  Actually, doesn't this break 
--convert_legacy_hive_parquet_utc_timestamps since we'll never take the 
conversion path for timestamp now? (Do we not have tests for that?)  That is, I 
think both NeedsConversion() and NeedsValidation() can be true, and the code 
needs to be adjusted for that.

Also consider tucking lines 475 to 480 inside ValidateSlot() and that way the 
ErrorMsg doesn't have to be passed back and forth, and it'll naturally keep the 
construction after the NeedsValidation() check.


PS7, Line 474: val_ptr
you'll also have to be careful that we pass the right pointer here, depending 
on how you address the comment above.


http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 147:   /// Verifies that the timestamp date falls into a valid range 
(years 1400..).
update this comment to be clear that we also validate time.


PS7, Line 156: l
doesn't this have to be "ll" (long long)?


PS7, Line 160: time_.ticks() < NUMBER_OF_NANOSECONDS_IN_24H
I'm slightly worried about how this interacts with leap seconds.  Are you sure 
this is always illegal?  If it doesn't crash, what behavior did you see (i.e. 
how do you know this is the correct bounds)?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-18 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4968

to look at the new patch set (#7).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400.. year
range and time falls into 00:00:00..23:59:59.9 range as we
are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
12 files changed, 149 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 12: did't
> typo
Done


PS6, Line 14: date
> year*
Done


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 526: parent_->parse_status_ = Status(*msg);
> the claim is we are pulling this out to avoid status construction code, but
Good point, Done.


PS6, Line 598:  
> nit: extraneous space
Done


Line 599:   if (!src->IsValidDate()) {
> UNLIKELY
Done, (also changed the order of the if statement).


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS6, Line 512: 
> was this an incorrect comment?
I saw a comment in the implementation, so I didn't think it was true any more:
"// AssembleCollection() advances child readers, so we don't need to call 
NextLevels()"

I looked at the implementation, but it turns out AssembleCollection() actually 
calls NextLevels(). I'll put the comment back.


PS6, Line 449: void* slot
> shouldn't this be 'tuple'.  but the fact that this compiled probably means 
Yeah, I don't think this is used anywhere, so I'll delete it.


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 156: return date_.day_number() >= MIN_DAY_NUMBER && date_.day_number() 
<= MAX_DAY_NUMBER;
> are we sure these are the only invalidate TimestampValue's that may have be
Yes, it turns out there are (but they don't cause a crash). I added validation 
of time_ as well.


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

PS6, Line 16: out-of-range-timestamp.parq
> not sure what you mean -- aren't you adding a parquet file to test this? co
I'm adding a file to testdata/data, this readme is in testdata/bad_parquet_data


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test:

PS6, Line 7: Parquet file '$NAM
> why is the message repeated?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 526: parent_->parse_status_ = Status(*msg);
the claim is we are pulling this out to avoid status construction code, but 
didn't we already construct a status if we call this function (in 
LogOrReturnError())?


PS6, Line 598:  
nit: extraneous space


Line 599:   if (!src->IsValidDate()) {
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS6, Line 512: 
was this an incorrect comment?


PS6, Line 449: void* slot
shouldn't this be 'tuple'.  but the fact that this compiled probably means that 
this method isn't actually ever called or defined (i.e. the one in 
BaseScalarColumnReader), and so this declaration should just be deleted.


http://gerrit.cloudera.org:8080/#/c/4968/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 156: return date_.day_number() >= MIN_DAY_NUMBER && date_.day_number() 
<= MAX_DAY_NUMBER;
are we sure these are the only invalidate TimestampValue's that may have been 
copied in?  Are there any invalid time_ values?


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

PS6, Line 16: out-of-range-timestamp.parq
> The parquet file is no longer in this folder, so I will remove this comment
not sure what you mean -- aren't you adding a parquet file to test this? could 
you just make whatever change you are proposing.


http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test:

PS6, Line 7: Parquet file '$NAM
why is the message repeated?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 14: date
year*


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-17 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 12: did't
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

PS6, Line 16: out-of-range-timestamp.parq
The parquet file is no longer in this folder, so I will remove this comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 445:   /// Writes the next value into the next slot in the *tuple using 
pool if necessary.
> ... into the destination slot in tuple ...
Done


Line 480: } else if (UNLIKELY(NeedsConversion() &&
> Is it possible that an otherwise invalid value becomes valid after conversi
No, because the input to conversion must be a valid date.


http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 512:   /// the next slot in *tuple.
> ... into the appropriate destination slot in 'tuple'.
Done


http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test:

Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp;
> setting up the table is already done in the .py file, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we did't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the timestamp falls into 1400.. date
range as we are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/data/out_of_range_timestamp.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M tests/query_test/test_scanners.py
9 files changed, 133 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 5:

(4 comments)

Almost ready for +1

http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 445:   /// Writes the next value into the next slot in the *tuple using 
pool if necessary.
... into the destination slot in tuple ...

(it's not the "next" slot, it is always the same slot for a particular column 
reader)


Line 480: } else if (UNLIKELY(NeedsConversion() &&
Is it possible that an otherwise invalid value becomes valid after conversion?


http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 512:   /// the next slot in *tuple.
... into the appropriate destination slot in 'tuple'.

(it's not the "next" slot, it is always the same slot for a particular column 
reader)


http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test:

Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp;
setting up the table is already done in the .py file, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we did't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the timestamp falls into 1400.. date
range as we are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/data/out_of_range_timestamp.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M tests/query_test/test_scanners.py
9 files changed, 136 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 77: 
> I'll remove the empty lines I accidentally added here and in other files.
Done


Line 233: // TODO: consider adding validation for other types, such as 
DECIMAL.
> Let's file a JIRA instead. You don't need to identify whether there is a bu
Done


Line 388:   ReadSlot(tuple, pool);
> single line if it fits
Done


Line 475: if (NeedsValidation() && 
!ValidateSlot(reinterpret_cast(slot))) {
> The API seems a little clunky because we call a generic ValidateSlot() but 
Added ErrorMsg* to ValidateSlot.


Line 503: DCHECK(!needs_conversion_);
> DCHECK(false)?
This DCHECK should be removed completely.


Line 513:   /// Sets the slot to NULL if the slot value is invalid, e.g., due 
to being
> Update comment, doesn't set anything
Done


Line 515:   bool ValidateSlot(T* src) {
> const function
Done


Line 520:   /// Pull out slow-path Status construction code from 
ReadRepetitionLevel()/
> remove the last part "from ReadRepetitionLevel..." I don't think that's acc
Done


Line 1084:   void* slot = tuple->GetSlot(tuple_offset_);
> just inline into L1087
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 511:   /// Recursively reads from children_ to assemble a single 
CollectionValue into
> Update comment to reflect API change
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 22: 
> remove blank lines
Done


Line 150:   inline bool IsValidDate() const {
> nice
:)


http://gerrit.cloudera.org:8080/#/c/4968/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 312:"The valid date range is 1400..."),
> mention the full range with year-month-day to make it clearer
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 382:   hadoop fs -put -f 
${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq \
> Let's avoid doing this and instead create a test table on-the-fly in the te
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 248:   def test_zero_rows(self, vector, unique_database):
> it would be nice to have a new test test_timestamp_out_of_range that looks 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 4:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 233: // TODO: consider adding validation for other types, such as 
DECIMAL.
Let's file a JIRA instead. You don't need to identify whether there is a bug or 
not, the JIRA can say that we should investigate the DECIMAL case (and link to 
this JIRA).


Line 388:   ReadSlot(tuple, pool);
single line if it fits


Line 475: if (NeedsValidation() && 
!ValidateSlot(reinterpret_cast(slot))) {
The API seems a little clunky because we call a generic ValidateSlot() but then 
set a type-specific error message. How would you generalize this for DECIMAL?

Imo it'll make the code easier to follow without baking in the TIMESTAMP 
assumption here.

Maybe ValidateSlot() could take an ErrorMsg* as output param which would be set 
when returning false. Or perhaps you have another idea.


Line 503: DCHECK(!needs_conversion_);
DCHECK(false)?


Line 513:   /// Sets the slot to NULL if the slot value is invalid, e.g., due 
to being
Update comment, doesn't set anything


Line 515:   bool ValidateSlot(T* src) {
const function


Line 520:   /// Pull out slow-path Status construction code from 
ReadRepetitionLevel()/
remove the last part "from ReadRepetitionLevel..." I don't think that's 
accurate anymore


Line 1084:   void* slot = tuple->GetSlot(tuple_offset_);
just inline into L1087


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 511:   /// Recursively reads from children_ to assemble a single 
CollectionValue into
Update comment to reflect API change


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 22: 
remove blank lines


Line 150:   inline bool IsValidDate() const {
nice


http://gerrit.cloudera.org:8080/#/c/4968/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 312:"The valid date range is 1400..."),
mention the full range with year-month-day to make it clearer


http://gerrit.cloudera.org:8080/#/c/4968/4/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 382:   hadoop fs -put -f 
${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq \
Let's avoid doing this and instead create a test table on-the-fly in the test. 
There's no need to store this in the snapshot and having the loading separate 
from the actual test can cause confusion.


http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 248:   def test_zero_rows(self, vector, unique_database):
it would be nice to have a new test test_timestamp_out_of_range that looks like 
this one

the test_corrput_files doesn't really follow best practices imo, and let's not 
fix that in this patch, so adding a new test seems easiest/cleanest

also it's arguable whether these bad timestamp files are really corrupt, the 
values are certainly legal INT96, but the interpretation as a timestamp is the 
problem


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-10 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 44: #from tests.util.get_parquet_metadata import get_parquet_metadata
I'll remove this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-10 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 77: 
I'll remove the empty lines I accidentally added here and in other files.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-10 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we did't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the timestamp falls into 1400.. date
range as we are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/out-of-range-timestamp.parq
M testdata/bin/create-load-data.sh
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
M tests/query_test/test_scanners.py
10 files changed, 120 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky