[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
fifteencai has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 3: For further information on this improvement, please see @IMPALA-9922 -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Mon, 14 Dec 2020 07:17:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7848/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Mon, 14 Dec 2020 02:07:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7847/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 2 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Mon, 14 Dec 2020 01:51:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
fifteencai has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 3: > Patch Set 2: > > (5 comments) I have fixed these problem in PatchSet 3 -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Mon, 14 Dec 2020 01:44:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
fifteencai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. IMPALA-9922: A better approach to deal with date's sub-second fractions In this patch, we relax the constraint on date format of to_timestamp(). Previously, the sub-second fraction width should never exceed the data, otherwise, the conversion will fail. This constraint is stricter than many query engines like Hive, Presto, ClickHouse etc. Hence we've encountered inconsistent conversions on dataset with malformed values. To relax this constraint, we made these modifications: 1) Introduced a new parameter `desired_length` alongside with `tok_len`, the former parameter refers to the fraction width we want. 2) When fraction part is a single `S`, the `desired_length` is set to actual width of data, making the output consistent with older logic. 3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out of the loop, avoiding unnecessary code execution. 4) Since fraction is always positive, change it's type from `int32` to `uint32` Example: > Before: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") NULL > After: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") 2020-01-01 18:00:00.012000 Testing: 1. Added 3 test cases in `timestamp-test.cc`. 2. Passed all backend tests Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 --- M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/timestamp-test.cc 5 files changed, 60 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/3 -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7846/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 14 Dec 2020 01:35:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/datetime-simple-date-format-parser.cc@475 PS2, Line 475: if (!ParseFractionToken(tok_val, tok_len, fraction_desired_len, dt_result)) return false; line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@629 PS2, Line 629: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:13", false, true, true, 2020, 10, 1, 10, 54, 0, 1300)) line too long (123 > 90) http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@631 PS2, Line 631: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13400)) line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@633 PS2, Line 633: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:134",false, true, true, 2020, 10, 1, 10, 54, 0, 13400)) line too long (124 > 90) http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@635 PS2, Line 635: (TimestampTC("-MM-dd HH:mm:ss:S", "2020-10-01 10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13450)) line too long (123 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 2 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 14 Dec 2020 01:30:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
fifteencai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. IMPALA-9922: A better approach to deal with date's sub-second fractions In this patch, we relax the constraint on date format of to_timestamp(). Previously, the sub-second fraction width should never exceed the data, otherwise, the conversion will fail. This constraint is stricter than many query engines like Hive, Presto, ClickHouse etc. Hence we've encountered inconsistent conversions on dataset with malformed values. To relax this constraint, we made these modifications: 1) Introduced a new parameter `desired_length` alongside with `tok_len`, the former parameter refers to the fraction width we want. 2) When fraction part is a single `S`, the `desired_length` is set to actual width of data, making the output consistent with older logic. 3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out of the loop, avoiding unnecessary code execution. Example: > Before: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") NULL > After: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") 2020-01-01 18:00:00.012000 Testing: 1. Added 3 test cases in `timestamp-test.cc`. 2. Passed all backend tests Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 --- M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/timestamp-test.cc 5 files changed, 36 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/2 -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 2 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/datetime-simple-date-format-parser.cc@475 PS1, Line 475: if (!ParseFractionToken(tok_val, tok_len, fraction_desired_len, dt_result)) return false; line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@629 PS1, Line 629: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:13", false, true, true, 2020, 10, 1, 10, 54, 0, 1300)) line too long (123 > 90) http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@631 PS1, Line 631: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13400)) line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@633 PS1, Line 633: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 10:54:00:134",false, true, true, 2020, 10, 1, 10, 54, 0, 13400)) line too long (124 > 90) http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@635 PS1, Line 635: (TimestampTC("-MM-dd HH:mm:ss:S", "2020-10-01 10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13450)) line too long (123 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 14 Dec 2020 01:13:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
fifteencai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16869 Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. IMPALA-9922: A better approach to deal with date's sub-second fractions In this patch, we relax the constraint on date format of to_timestamp(). Previously, the sub-second fraction width should never exceed the data, otherwise, the conversion will fail. This constraint is stricter than many query engines like Hive, Presto, ClickHouse etc. Hence we've encountered inconsistent conversions on dataset with malformed values. To relax this constraint, we made these modifications: 1) Introduced a new parameter `desired_length` alongside with `tok_len`, the former parameter refers to the fraction width we want. 2) When fraction part is a single `S`, the `desired_length` is set to actual width of data, making the output consistent with older logic. 3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out of the loop, avoiding unnecessary code execution. Example: > Before: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") > After: select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS") Testing: 1. Added 3 test cases in `timestamp-test.cc`. 2. Passed all backend tests Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 --- M be/src/runtime/datetime-iso-sql-format-parser.cc M be/src/runtime/datetime-parser-common.cc M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/timestamp-test.cc 5 files changed, 36 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/1 -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai Gerrit-Reviewer: Tim Armstrong