felipecrv commented on code in PR #6906:
URL: https://github.com/apache/arrow-rs/pull/6906#discussion_r1894369802
##########
arrow-arith/src/numeric.rs:
##########
@@ -550,6 +568,21 @@ date!(Date64Type);
trait IntervalOp: ArrowPrimitiveType {
fn add(left: Self::Native, right: Self::Native) -> Result<Self::Native,
ArrowError>;
fn sub(left: Self::Native, right: Self::Native) -> Result<Self::Native,
ArrowError>;
+ fn mul_int(left: Self::Native, right: i32) -> Result<Self::Native,
ArrowError>;
+ fn mul_float(left: Self::Native, right: f64) -> Result<Self::Native,
ArrowError>;
+ fn div_int(left: Self::Native, right: i32) -> Result<Self::Native,
ArrowError>;
+ fn div_float(left: Self::Native, right: f64) -> Result<Self::Native,
ArrowError>;
Review Comment:
> Unfortunately intervals cannot be coerced to a duration as proposed...
For the purpose of multiplication/division, some of the interval types can
(internally to the kernel) -- before returning the output array, they are
converted back to interval type to preserve their semantic meaning.
## YEAR_MONTH
Example: the double of `1 year and 1 month` is `2 years and 2 months` which
is what you get if you convert `(1y, 1m)` to `13m` then 2x to `26m` which
should then become `(2y, 2m)`. This is all fine because every year, no matter
what, has 12 months. So 12 months can become a year in the interval. This is
why the underlying implementation itself is in number of months.
Daylight savings and calendar consideration only come into play when you add
an interval to a specific timestamp/date which is not the case here. It would
be wrong to add `1y+1m` to `2024-12-20` as just adding the number of seconds
that interval in 30-day months converts to, but the operation we are doing here
is the scaling of the interval by some factor while preserving the semantic
meaning.
But maybe the **multiplication by a float** should not be available. Any
system that wants to support `interval[YEAR_MONTH]*float` multiplication should
round the float first or convert the interval to an interval type with more
resolution.
## DAY_TIME
`IntervalDayTimeType` expects the millis part to be less than a whole day
worth of milliseconds (i.e. leap seconds are not accounted for) [1][2]. Perhaps
the output of scaling operations on them (mul/div) should be a `MONTH_DAY_NANO`
interval. Because unlike years that always have 12 months, days don't always
have the same number of millis in them.
## MONTH_DAY_NANO
For `MonthDayNano` my approach breaks indeed -- the components should be
scaled independently. And the number of nanos is unbounded -- it doesn't have
to be less than a day's worth of nanos.
Multiplication by float and division gets really confusing though.
[1]
https://github.com/apache/arrow/blob/02a165922e46e5ed6dd3ed2446141cd0922a7c54/format/Schema.fbs#L398
[2]
https://github.com/apache/arrow/commit/7f7f72da6fa75f955bd69ac147b8494608c9aeb4
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]