berkaysynnada opened a new pull request, #5603: URL: https://github.com/apache/arrow-datafusion/pull/5603
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Closes #5411 and #5412. # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR is composed of two parts but served as one PR since the parts are closely related (they share constants, change similar places in the file). 340 lines are added for implementation, and 670 are for tests. The first support is to timestamp subtraction within `ScalarValue`. In postgre, the behaviour is like: ``` CREATE TABLE test ( id INT, ts1 TIMESTAMP, ts2 TIMESTAMP ); ``` ``` INSERT INTO test VALUES (1, '2016-03-13 00:00:00', '2017-04-13 12:11:03'); INSERT INTO test VALUES (2, '2016-03-04 06:00:00', '2016-03-01 20:33:48'); INSERT INTO test VALUES (3, '2017-03-04 23:59:59', '2016-03-01 12:00:00'); ``` ``` SELECT ts2 - ts1 AS diff_ts FROM test; ``` <img width="352" alt="image" src="https://user-images.githubusercontent.com/124376117/223647110-d7381366-b087-43a8-a0aa-1ccbc623734d.png"> We can result in the same way with postgre. The difference is shown in days as the largest time unit. However, since we don't have such detailed fields for an interval, `TimestampSecond` or `TimestampMillisecond` timestamp subtractions give result in `IntervalDayTime` variant, and `TimestampMicrosecond` or `TimestampNanosecond` timestamp subtractions give result in `IntervalMonthDayNano` variant without using the month field. However, I need to underline that we can apply this operation only on scalar values. Supporting columnar operations is in the scope of arrow-rs, and it needs to be implemented in a similar way to have consistency. The second support is comparing, adding and subtracting two `ScalarValue` interval types, `IntervalYearMonth`, `IntervalDayTime`, `IntervalMonthDayNano`. With this PR, we can apply these operations between both the same and different variants. arrow-rs support is required also here to support columnar values. # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> `impl_op` match arms are extended to cover timestamp and interval types. It should be noted that we need the same types of timestamps to apply subtraction. `impl PartialOrd for ScalarValue `and `impl PartialEq for ScalarValue` are extended to handle interval types. However, to be able to compare months and days, we need to assume a month equals to 30 days (postgre behaves in the same way). # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, there are some tests covering edge cases with timezone options, and randomized tests that first add/subtract an interval type to/from a timestamp, then take the difference between the resulting timestamp and the initial timestamp to assert the equality with the given interval. For the interval operations, some tests are written to cover all variations of operations for different types. timestamp_next | timestamp_prev | Return as days -- | -- | -- 2017-04-16 00:00:00 . 000_000_100 | 2016-03-13 00:00:00 . 000_000_025 | months: 0, days: 399, nanos: 75 2016-03-13 00:00:00 . 000_000_025 | 2017-04-16 00:00:00 . 000_000_100 | months: 0, days: -399, nanos: -75 2016-12-16 01:00:00 . 100 | 2016-12-15 00:00:00 . 000 | days: 1, millis: 3600100 2016-12-15 00:00:00 . 000 | 2016-12-16 01:00:00 . 100 | days: -1, millis: -3600100 # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> - # To run end to end queries including timestamp subtraction: 1) `try_new_impl` function in arrow-rs gives an error while validating the schema and columns as such: > `column types must match schema types, expected Timestamp(Second, None) but found Interval(DayTime) at column index 0` That expectation of the output column needs to be changed to interval type. 2) In Datafusion, there is a function `coerce_types` returning the output type of applying `op` to an argument of `lhs_type` and `rhs_type`. These codes with the extensions in planner.rs and binary.rs also need to be reshaped. `evaluate` function in datetime.rs has to handle columnar values at each side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org