[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Reviewed-on: http://gerrit.cloudera.org:8080/7954 Reviewed-by: Lars VolkerTested-by: Impala Public Jenkins --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M testdata/workloads/functional-query/queries/QueryTest/exprs.test 8 files changed, 146 insertions(+), 10 deletions(-) Approvals: Lars Volker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Sat, 21 Oct 2017 03:13:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 20 Oct 2017 23:13:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1360/ -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 20 Oct 2017 23:13:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/17/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/17/be/src/exprs/expr-test.cc@5005 PS17, Line 5005: // Timestamp conversions of "dateless" times should return null (and not crash, This part is from different commit that was applied because of the rebase. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 20 Oct 2017 12:40:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Alex Behm, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#16). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M testdata/workloads/functional-query/queries/QueryTest/exprs.test 8 files changed, 146 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/16 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 16 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: > Csaba, are those failing tests specifically require a timestamp > that issues such a warning? If not, I suggest we just change the > timestamps of those tests in a way that preserves test coverage and > avoids the warnings problem. Yes, those tests work with "non timestamp" strings by design. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 18 Oct 2017 14:28:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: Csaba, are those failing tests specifically require a timestamp that issues such a warning? If not, I suggest we just change the timestamps of those tests in a way that preserves test coverage and avoids the warnings problem. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 13 Oct 2017 18:16:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@139 PS9, Line 139: } > Literal::Literal(ColumnType type, double v) also uses TimestampValue::FromS I have found the cause of this issue, see https://issues.apache.org/jira/browse/IMPALA-5664?focusedCommentId=16203482=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16203482 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 13 Oct 2017 13:05:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: I have checked the output of jenkins. There are 3 problematic front end tests: AnalyzeDDLTest.TestCreateManagedKuduTable error, (null pointer exception), side effect of adding warning to invalid string->timestamp casts PlannerTest.testKuduSelectivity: failure, not 100% sure that it is related to the patch PlannerTest.testPartitionPruning: failure, side effect of adding warning to invalid string->timestamp casts As a short term solution, the warning can be removed from the string->timestamp conversion, as it is not related to the original issue. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 12 Oct 2017 15:59:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1323/ -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 10 Oct 2017 21:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 10 Oct 2017 17:48:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1323/ -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 10 Oct 2017 17:48:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc@277 PS11, Line 277: } > I mean the warning messages that you're adding here have no verification. I have added tests to exprs.test. There are files called "out-of-range-timestamp-abort-on-error.test" and "out-of-range-timestamp-continue-on-error.test", but they seem to be Parquet specific. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc@670 PS11, Line 670: * > Yes, we should follow the style guide so we have a consistent style, which Done http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc@116 PS11, Line 116: ( > still not fixed: Done -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 10 Oct 2017 13:55:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#14). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 183 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/14 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#13). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 8 files changed, 81 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/13 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc@277 PS11, Line 277: } > They are exercised by different parts of be/exprs/expr-test.cc .I have sear I mean the warning messages that you're adding here have no verification. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc@670 PS11, Line 670: * > I can change this, but I prefer it this way actually. What is the rule, sho Yes, we should follow the style guide so we have a consistent style, which overall makes the code more readable. I'm sure everyone has something that they dislike about the style, but we need consistency. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h@43 PS11, Line 43: > This comment was just wrong, we state <= -12-31 or <1-01-01 everywh Okay, thanks for confirming. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc@116 PS11, Line 116: ( > Done still not fixed: if (HasDate()... -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 06 Oct 2017 17:27:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc@277 PS11, Line 277: } > all of these new if-stmts need end-to-end tests to cover. they aren't exerc They are exercised by different parts of be/exprs/expr-test.cc .I have searched for "TestIsNull.*as timestamp", and the existing tests cover every code line, but not every type instantiation for CAST_TO_SUBSECOND_TIMESTAMP / CAST_TO_TIMESTAMP, only BIGINT/DECIMAL/DOUBLE. Actually most integer types can not represent values that are outside the valid range, because it needs >32 bit. This design (BE unit tests call FE to parse expressions) seems strange to me, but there is probably a valid reason why it was not done in FE or E2E. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/timestamp-functions-ir.cc@139 PS11, Line 139: } > also needs test Done http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc@670 PS11, Line 670: * > nit: missing space here and below I can change this, but I prefer it this way actually. What is the rule, should we follow the style guide even if the contributor consciously ignored it? It doesn`t matter in this case, but it would matter with short variable names, e.g. I find a*a + 2*a*b much more readable than a * a + 2 * a * b. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h@43 PS11, Line 43: > is that what the documentation says? and did this code change affect the up This comment was just wrong, we state <= -12-31 or <1-01-01 everywhere else. This limitation comes from boost::gregorian, and this patch has no effect on it. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc@116 PS11, Line 116: ( > style Done -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 06 Oct 2017 13:00:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#12). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 8 files changed, 81 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/12 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/cast-functions-ir.cc@277 PS11, Line 277: } all of these new if-stmts need end-to-end tests to cover. they aren't exercised by the unit test since that exercises the layer below. http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/exprs/timestamp-functions-ir.cc@139 PS11, Line 139: } also needs test http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-test.cc@670 PS11, Line 670: * nit: missing space here and below http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.h@43 PS11, Line 43: is that what the documentation says? and did this code change affect the upper bound or was this comment just wrong? http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/7954/11/be/src/runtime/timestamp-value.cc@116 PS11, Line 116: ( style -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 05 Oct 2017 17:34:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#11). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 7 files changed, 66 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/11 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 10: The fe/ files should not be in the patch, I just needed them to fix a compile error. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 04 Oct 2017 15:15:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@136 PS9, Line 136: if(!tv.HasDateAndTime()){ > here an elsewhere - please follow the style used by the rest of impala. Done http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@137 PS9, Line 137: UnixMicrosToUtcTimestamp > is that meaningful to the user? I have changed it to its query name. http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@139 PS9, Line 139: } > are the ones you added the only places where your new validation might prod Literal::Literal(ColumnType type, double v) also uses TimestampValue::FromSubsecondUnixTime, but there is no function context there, and I do not know how to call that constructor. If I put a double to a place where a timestamp is expected, I get an analyses exception. I have also noticed someting strange: select timestamp_cmp(timstamp_col, cast(cast(-17987443200.1 as double) as timestamp)) from table; The table has 3 rows and 4 "UDF WARNING: Could not convert -17987443200.1 to timestamp" are printed. It would make more sense to me to have 1 call to cast (if the casts with constant arguments would be optimized away) or 3 (if no optimization takes place, and the cast is called for every row). Does Impala try to optimize functions calls with constant parameters? http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@751 PS9, Line 751: datetime.date().year(); > it still seems like this is doing the same validation. should we remove thi It is the same check, but I would prefer to leave it as it is for now - it leads to a nice warning message, and the try-catch block is needed anyway because of AddInterval. It could be done in another commit + jira like clean up/speed up timestamp functions. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 04 Oct 2017 15:09:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#10). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java 9 files changed, 76 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/10 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@136 PS9, Line 136: if(!tv.HasDateAndTime()){ here an elsewhere - please follow the style used by the rest of impala. http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@137 PS9, Line 137: UnixMicrosToUtcTimestamp is that meaningful to the user? http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@139 PS9, Line 139: } are the ones you added the only places where your new validation might produce a !HasDateAndTime() TimestampValue? http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@751 PS9, Line 751: datetime.date().year(); it still seems like this is doing the same validation. should we remove this an instead do the result_value.HasDateAndTime() check here? Or is this really a different case? -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 02 Oct 2017 21:51:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h@86 PS2, Line 86: } > But what we need is to log to the warning log, like the code in timestamp-f I have added some logging to warning. The string -> timestamp cast is not related to this issue, but I think it is very useful to warn the user about the invalid formatting. I have left the logging in TimestampValue::Validate, maybe it will be useful for postmortem analysis one day. http://gerrit.cloudera.org:8080/#/c/7954/8/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/7954/8/be/src/runtime/timestamp-value.cc@116 PS8, Line 116: /// The time zone of the resulting ptime is local time. This is called by > nit: for style consistency combine lines 115 & 116. Done -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 02 Oct 2017 18:00:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#9). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Also added logging to "cast to timestamp" functions. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 7 files changed, 66 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/9 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h@173 PS4, Line 173: > i think it would be best to avoid running the constructor given these thing Done http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h@287 PS4, Line 287: > nit: extraneous space Done http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h@86 PS2, Line 86: } > Logging in the caller is fine but I don't see that added to your diff. I have added some logging, but not on the caller side. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 29 Sep 2017 16:40:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 5: Ping? -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 27 Sep 2017 17:20:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/exprs/timestamp-functions-ir.cc@332 PS2, Line 332: DCHECK(ts_value.IsValidDate()); > why does this DCHECK no longer make sense? are we saying it's trivially tr Yes, my idea was that range check must be done at construction, and it is enough to check HasDate() in other timestamp related functions. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 22 Sep 2017 13:35:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/exprs/timestamp-functions-ir.cc@332 PS2, Line 332: StringVal result(context, 10); why does this DCHECK no longer make sense? are we saying it's trivially true now after check HasDate() and with the checks you've added in this commit? http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h@173 PS4, Line 173: boost::gregorian::date i think it would be best to avoid running the constructor given these things can throw exceptions, so let's use 'const &' here. http://gerrit.cloudera.org:8080/#/c/7954/4/be/src/runtime/timestamp-value.h@287 PS4, Line 287: nit: extraneous space http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h@86 PS2, Line 86: /// Constructors that parse from a date/time string. See TimestampParser for details > I think it would be better to do the logging on the caller side, because de Logging in the caller is fine but I don't see that added to your diff. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 21 Sep 2017 20:50:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7954/5/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/5/be/src/runtime/timestamp-value.h@173 PS5, Line 173: This became static to enable other functions to check whether their date is valid or not without actually creating a TimestampValue or calling year() to generate an exception. I think it would be better if there was a single function that does this check, so if 1400 will be changed for some reason ( IMPALA-2009 ), there will be no need to change many parts of the code. http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h@86 PS2, Line 86: } > One purpose of FunctionContext is to be the interface back into impala's ru I think it would be better to do the logging on the caller side, because depending on the caller, the format for the log can be different, e.g. unix timestamp vs -MM-DD string. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 21 Sep 2017 17:29:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#4). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h 5 files changed, 47 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/4 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/7954 ) Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-test.cc@637 PS2, Line 637: Sub-second F > The problem is only with the functions that take sub-second resolution, rig Done http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-test.cc@685 PS2, Line 685: > nit: typo Done -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 21 Sep 2017 15:43:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7954 to look at the new patch set (#3). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h 6 files changed, 50 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/3 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-Change-Number: 7954 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 80: !date_.is_special() > I looked at ScalarColumnReader::ValidateSlot(), and n My comment was just that HasDate() == !date_.is_special(), and using HasDate() would make it easier to grep all the places we check for this condition. I'm not sure I follow your comment about public functions. Since the scanner does a cast, it explicitly checks IsValidDate(). It is kinda gross, though. PS2, Line 80: !date_.is_special() > I think that ScalarColumnReader ::ValidateSlot() is ac Thanks for finding that. Let's discuss on the jira. Line 86: } > About giving a warning to the user: is it ok to log in low level classes li One purpose of FunctionContext is to be the interface back into impala's runtime for UDFs (and expressions in general). It'd be best to leak it outside of exprs, udf, and codegen. So, I think it'd make sense to have a static wrapper around the constructor where we do this validation that lives in the exprs code, and that can have access to FunctionContext to do the logging, and construct the TimestampValue only for well formed values. (Then this constructor could do a dcheck for that). -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 80: !date_.is_special() > I looked at ScalarColumnReader::ValidateSlot(), and n I think that ScalarColumnReader ::ValidateSlot() is actually a bit too strict, which lead to inconsistency between text and parquet files. I have created a ticket for this: https://issues.apache.org/jira/browse/IMPALA-5942 We should decide the correct behavior in these cases, before continuing with this ticket. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 80: !date_.is_special() > how about HasDate()? But also, should IsValidDate() be checking HasDate()? I looked at ScalarColumnReader::ValidateSlot(), and now I see that TimestampValue is created via reinterpret_cast on buffer pointers - this means that its constructor is skipped, so it is not possible to force validity in the public functions. >That code looks bogus if date_.is_special(), Special values lie outside the valid time range, so IsValidDate can only return true if is_special() is false. TimestampValue can have not_a_date_time in date_ or time_, it is only invalid, if both are not_a_date_time. I am not sure, what will/should happen in these cases. Line 86: } > why is that that the year() calls in timestamp-functions-ir.cc aren't suffi About giving a warning to the user: is it ok to log in low level classes like TimestampValue? AddSub calls FunctionContext::AddWarning, while TimestampValue`s functions do not receive FunctionContext arguments. If TimestampValue functions cannot log, then I will have to look for their callers and add checks + warnings there. Note that I am not familiar with Impala`s logging system yet. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Dan Hecht has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: PS2, Line 637: FromUnixTime The problem is only with the functions that take sub-second resolution, right? I think that's important to clarify -- otherwise it's not clear what we mean by rounding. PS2, Line 685: n nit: typo http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 80: !date_.is_special() how about HasDate()? But also, should IsValidDate() be checking HasDate()? That code looks bogus if date_.is_special(), so the call from ScalarColumnReader::ValidateSlot() seems dangerous (if the parquet file contained a corrupted value that corresponded to not_a_date_time). PS2, Line 81: ptime > There was no exception thrown at the time of creation, but much later, when We don't want to use exceptions in impala. We limit their use to when interacting with other libraries. So not throwing an exception here is the right thing. Line 86: } why is that that the year() calls in timestamp-functions-ir.cc aren't sufficient, e.g.: // Validate that the ptime is not "special" (ie not_a_date_time) and has a valid year. // If validation fails, an exception is thrown. datetime.date().year(); Centralizing where we do this validation seems like a good idea, but unlike the other code that attempts to do this kind of thing, e.g. TimestampFunctions::AddSub(), we don't get a warning. The user should probably get a warning in this case. 2) If we're going to do the validation in a central place, we should try to clean up the other code that does the year() calls (assuming it's trying to validate the same thing). -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Lars Volker has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: Code-Review+1 Thanks for fixing this. LGTM, let's see what others say. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 81: ptime > Have you considered catching the error when the invalid ptime object is cre There was no exception thrown at the time of creation, but much later, when the TimestampValue::ToString() function was called. It would be possible to throw an exception here, and add try+catch to every function where this constructor is called, but it would be a bit more work. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Lars Volker has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 81: ptime Have you considered catching the error when the invalid ptime object is created? What is the benefit of doing it here? -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/7954/1//COMMIT_MSG Commit Message: PS1, Line 7: Impala > nit: Capitalize Impala (it's a name) Done PS1, Line 9: FromSubsecondUnixTime > We usually refer to functions as Function(). Done PS1, Line 11: 1400-01-01 00:00:00 > You could write dates in the more standard format (1400-01-01 00:00:00). Done PS1, Line 13: The maximum : case, -12-31 59:59:59 is a bit different, because as I understand, : with nanosecond precision posix times, the maximum value is actually : 1-01-01. 00:00:00 minus 1 nanosec. > Can you add tests with sub-second deltas around the upper bound, too? Done http://gerrit.cloudera.org:8080/#/c/7954/1/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: Line 637: // FromUnixTime functions incorrectly accepted the last second of 1399 as valid, > Please wrap lines at 90 characters. Please follow the surrounding code for Done PS1, Line 637: he last second of 1399 as valid, > Please outline in the comment briefly why the last second needs special att Done PS1, Line 637: ns i > Extra word? Done Line 640: MIN_DATE_AS_UNIX_TIME - 0.1).HasDate()); > While you are here, can you also add tests for the exact beginning of the l Done http://gerrit.cloudera.org:8080/#/c/7954/1/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS1, Line 80: s_special() && UNLI > Can you add a comment explaining why this check is necessary? Done Line 80: if(!date_.is_special() && UNLIKELY(!IsValidDate())){ > Please use spaces instead of tabs. Done http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 84: time_ = boost::posix_time::time_duration(boost::posix_time::not_a_date_time); The first patch was incorrect - it did not set time_ to not_a_date_time, but 00:00:00 instead. This caused the last second of year 1399 to be inconsistent with other dates before year 1400. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Csaba Ringhofer has uploaded a new patch set (#2). Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. IMPALA-5664: Unix time to timestamp conversions may crash Impala TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros() are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400-01-01 00:00:00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, -12-31 59:59:59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1-01-01. 00:00:00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. To fix this, the posix times are checked in the constructor of TimestampValue, and if it is outside the valid interval,both time_ and date_ are set to not_a_date_time. Test: select cast(-17987443200-0.1 as timestamp); This query no longer crashes, but returns NULL, similarly to other < 1400 timestamps. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h 2 files changed, 34 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/2 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash impala (boost exception)
Lars Volker has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash impala (boost exception) .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/7954/1//COMMIT_MSG Commit Message: PS1, Line 7: impala nit: Capitalize Impala (it's a name) PS1, Line 9: FromSubsecondUnixTime We usually refer to functions as Function(). PS1, Line 11: 1400.01.01 00.00.00 You could write dates in the more standard format (1400-01-01 00:00:00). PS1, Line 13: The maximum : case, .12.31 59.59.59 is a bit different, because as I understand, : with nanosecond precision posix times, the maximum value is actually : 1.01.01. 00.00.00 minus 1 nanosec. Can you add tests with sub-second deltas around the upper bound, too? PS1, Line 18: TimestampValue::FromUnixTimeNanos can create problematic TimestampValues : both <1400 and 1<=. Doens't this change fix this? Line 22: HasDate/HasTime is true, then it really is a valid timestamp. Can you include in the commit message how you fixed it and how you test it? Generally speaking, it's usually good to structure commit messages for bug fixes as "Problem, Solution, Tests". http://gerrit.cloudera.org:8080/#/c/7954/1/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: Line 637: // last second of 1399 need can be special, see https://issues.apache.org/jira/browse/IMPALA-5664 Please wrap lines at 90 characters. Please follow the surrounding code for indents and capitalization of comments. PS1, Line 637: need Extra word? PS1, Line 637: 1399 Please state of what. PS1, Line 637: https://issues.apache.org/jira/browse/IMPALA-5664 Please outline in the comment briefly why the last second needs special attention. Line 640: EXPECT_FALSE(TimestampValue::FromUnixTimeNanos(MIN_DATE_AS_UNIX_TIME, -NANOS_PER_MICRO*100).HasDate()); While you are here, can you also add tests for the exact beginning of the last second? We also should have tests that add a subsecond interval for symmetry. Then please also add similar tests for the maximum date. http://gerrit.cloudera.org:8080/#/c/7954/1/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS1, Line 80: !date_.is_special() Can you add a comment explaining why this check is necessary? Line 80: if(UNLIKELY(!date_.is_special() && !IsValidDate())) *this = TimestampValue(); Please use spaces instead of tabs. -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash impala (boost exception)
Csaba Ringhofer has uploaded a new change for review. http://gerrit.cloudera.org:8080/7954 Change subject: IMPALA-5664: Unix time to timestamp conversions may crash impala (boost exception) .. IMPALA-5664: Unix time to timestamp conversions may crash impala (boost exception) TimestampValue::FromSubsecondUnixTime and UtcFromUnixTimeMicros are incorrect only in case of the last second of 1399, because these sub-second values are rounded first towards 1400.01.01 00.00.00, which is accepted as a valid date, and the sub-second part is subtracted afterwards, leading to a date outside the valid interval. The maximum case, .12.31 59.59.59 is a bit different, because as I understand, with nanosecond precision posix times, the maximum value is actually 1.01.01. 00.00.00 minus 1 nanosec. TimestampValue::FromUnixTimeNanos can create problematic TimestampValues both <1400 and 1<=. These timestamps can cause problems, because most code assumes that if HasDate/HasTime is true, then it really is a valid timestamp. Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff --- M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h 2 files changed, 10 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/1 -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer