berkaysynnada opened a new pull request, #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   Closes #5411 and #5412.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in 
the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your 
changes and offer better suggestions for fixes.  
   -->
   
   This PR is composed of two parts but served as one PR since the parts are 
closely related (they share constants, change similar places in the file). 340 
lines are added for implementation, and 670 are for tests.
   
   The first support is to timestamp subtraction within `ScalarValue`. In 
postgre, the behaviour is like: 
   ```
   CREATE TABLE test (
     id INT,
     ts1 TIMESTAMP,
     ts2 TIMESTAMP
     
   );
   ```
   ```
   INSERT INTO test VALUES (1, '2016-03-13 00:00:00', '2017-04-13 12:11:03');
   INSERT INTO test VALUES (2, '2016-03-04 06:00:00', '2016-03-01 20:33:48');
   INSERT INTO test VALUES (3, '2017-03-04 23:59:59', '2016-03-01 12:00:00');
   ```
   ```
   SELECT
   ts2 - ts1 AS diff_ts
   FROM test;
   ```
   <img width="352" alt="image" 
src="https://user-images.githubusercontent.com/124376117/223647110-d7381366-b087-43a8-a0aa-1ccbc623734d.png";>
   
   We can result in the same way with postgre. The difference is shown in days 
as the largest time unit. However, since we don't have such detailed fields for 
an interval, `TimestampSecond` or `TimestampMillisecond` timestamp subtractions 
give result in `IntervalDayTime` variant, and `TimestampMicrosecond` or 
`TimestampNanosecond` timestamp subtractions give result in 
`IntervalMonthDayNano` variant without using the month field. However, I need 
to underline that we can apply this operation only on scalar values. Supporting 
columnar operations is in the scope of arrow-rs, and it needs to be implemented 
in a similar way to have consistency.
   
   The second support is comparing, adding and subtracting two `ScalarValue` 
interval types, `IntervalYearMonth`, `IntervalDayTime`, `IntervalMonthDayNano`. 
With this PR, we can apply these operations between both the same and different 
variants. arrow-rs support is required also here to support columnar values.
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   `impl_op` match arms are extended to cover timestamp and interval types. It 
should be noted that we need the same types of timestamps to apply subtraction. 
`impl PartialOrd for ScalarValue `and `impl PartialEq for ScalarValue` are 
extended to handle interval types. However, to be able to compare months and 
days, we need to assume a month equals to 30 days (postgre behaves in the same 
way).
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->
   
   Yes, there are some tests covering edge cases with timezone options, and 
randomized tests that first add/subtract an interval type to/from a timestamp, 
then take the difference between the resulting timestamp and the initial 
timestamp to assert the equality with the given interval. For the interval 
operations, some tests are written to cover all variations of operations for 
different types.
   
   
   
   
   timestamp_next | timestamp_prev | Return as days
   -- | -- | --
   2017-04-16 00:00:00 . 000_000_100 | 2016-03-13 00:00:00 . 000_000_025 | 
months: 0, days: 399, nanos: 75
   2016-03-13 00:00:00 . 000_000_025 | 2017-04-16 00:00:00 . 000_000_100 | 
months: 0, days: -399, nanos: -75
   2016-12-16 01:00:00 . 100 | 2016-12-15 00:00:00 . 000 | days: 1, millis: 
3600100
   2016-12-15 00:00:00 . 000 | 2016-12-16 01:00:00 . 100 | days: -1, millis: 
-3600100
   
   
   
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   -
   
   # To run end to end queries including timestamp subtraction:
   
   1) `try_new_impl` function in arrow-rs gives an error while validating the 
schema and columns as such:
   
   > `column types must match schema types, expected Timestamp(Second, None) 
but found Interval(DayTime) at column index 0`
    
   That expectation of the output column needs to be changed to interval type.
   
   2) In Datafusion, there is a function `coerce_types` returning the output 
type of applying `op` to an argument of `lhs_type` and `rhs_type`. These codes 
with the extensions in planner.rs and binary.rs also need to be reshaped. 
`evaluate` function in datetime.rs has to handle columnar values at each side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to