[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 3:

For further information on this improvement, please see @IMPALA-9922


--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Mon, 14 Dec 2020 07:17:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7848/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Mon, 14 Dec 2020 02:07:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7847/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 2
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Mon, 14 Dec 2020 01:51:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 3:

> Patch Set 2:
>
> (5 comments)

I have fixed these problem in PatchSet 3


--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Mon, 14 Dec 2020 01:44:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..

IMPALA-9922: A better approach to deal with date's sub-second fractions

In this patch, we relax the constraint on date format of to_timestamp().

Previously, the sub-second fraction width should never exceed the data,
otherwise, the conversion will fail. This constraint is stricter than
many query engines like Hive, Presto, ClickHouse etc. Hence we've
encountered inconsistent conversions on dataset with malformed values.

To relax this constraint, we made these modifications:
1) Introduced a new parameter `desired_length` alongside with `tok_len`,
the former parameter refers to the fraction width we want.
2) When fraction part is a single `S`, the `desired_length` is set to
actual width of data, making the output consistent with older logic.
3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out
of the loop, avoiding unnecessary code execution.
4) Since fraction is always positive, change it's type from `int32` to
`uint32`

Example:
> Before:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")
NULL

> After:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")
2020-01-01 18:00:00.012000


Testing:
1. Added 3 test cases in `timestamp-test.cc`.
2. Passed all backend tests

Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/timestamp-test.cc
5 files changed, 60 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/3
--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7846/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 14 Dec 2020 01:35:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/datetime-simple-date-format-parser.cc@475
PS2, Line 475: if (!ParseFractionToken(tok_val, tok_len, 
fraction_desired_len, dt_result)) return false;
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@629
PS2, Line 629: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:13", false, true, true, 2020, 10, 1, 10, 54, 0, 1300))
line too long (123 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@631
PS2, Line 631: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13400))
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@633
PS2, Line 633: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:134",false, true, true, 2020, 10, 1, 10, 54, 0, 13400))
line too long (124 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/2/be/src/runtime/timestamp-test.cc@635
PS2, Line 635: (TimestampTC("-MM-dd HH:mm:ss:S", "2020-10-01 
10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13450))
line too long (123 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 2
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 14 Dec 2020 01:30:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..

IMPALA-9922: A better approach to deal with date's sub-second fractions

In this patch, we relax the constraint on date format of to_timestamp().

Previously, the sub-second fraction width should never exceed the data,
otherwise, the conversion will fail. This constraint is stricter than
many query engines like Hive, Presto, ClickHouse etc. Hence we've
encountered inconsistent conversions on dataset with malformed values.

To relax this constraint, we made these modifications:
1) Introduced a new parameter `desired_length` alongside with `tok_len`,
the former parameter refers to the fraction width we want.
2) When fraction part is a single `S`, the `desired_length` is set to
actual width of data, making the output consistent with older logic.
3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out
of the loop, avoiding unnecessary code execution.


Example:
> Before:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")
NULL

> After:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")
2020-01-01 18:00:00.012000


Testing:
1. Added 3 test cases in `timestamp-test.cc`.
2. Passed all backend tests

Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/timestamp-test.cc
5 files changed, 36 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/2
--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 2
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16869 )

Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/datetime-simple-date-format-parser.cc@475
PS1, Line 475: if (!ParseFractionToken(tok_val, tok_len, 
fraction_desired_len, dt_result)) return false;
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@629
PS1, Line 629: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:13", false, true, true, 2020, 10, 1, 10, 54, 0, 1300))
line too long (123 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@631
PS1, Line 631: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13400))
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@633
PS1, Line 633: (TimestampTC("-MM-dd HH:mm:ss:SSS", "2020-10-01 
10:54:00:134",false, true, true, 2020, 10, 1, 10, 54, 0, 13400))
line too long (124 > 90)


http://gerrit.cloudera.org:8080/#/c/16869/1/be/src/runtime/timestamp-test.cc@635
PS1, Line 635: (TimestampTC("-MM-dd HH:mm:ss:S", "2020-10-01 
10:54:00:1345",false, true, true, 2020, 10, 1, 10, 54, 0, 13450))
line too long (123 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 14 Dec 2020 01:13:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions

2020-12-13 Thread fifteencai (Code Review)
fifteencai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16869


Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
..

IMPALA-9922: A better approach to deal with date's sub-second fractions

In this patch, we relax the constraint on date format of to_timestamp().

Previously, the sub-second fraction width should never exceed the data,
otherwise, the conversion will fail. This constraint is stricter than
many query engines like Hive, Presto, ClickHouse etc. Hence we've
encountered inconsistent conversions on dataset with malformed values.

To relax this constraint, we made these modifications:
1) Introduced a new parameter `desired_length` alongside with `tok_len`,
the former parameter refers to the fraction width we want.
2) When fraction part is a single `S`, the `desired_length` is set to
actual width of data, making the output consistent with older logic.
3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out
of the loop, avoiding unnecessary code execution.


Example:
> Before:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")

> After:
select to_timestamp("2020-01-01 18:00:00.12","-MM-dd HH:mm:ss.SSS")


Testing:
1. Added 3 test cases in `timestamp-test.cc`.
2. Passed all backend tests

Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/timestamp-test.cc
5 files changed, 36 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/1
--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Tim Armstrong