Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
I just looked into this. The downside currently is w.r.t. performance
and being able to utilize Arrow kernels. The `ScalarFunction` repeats scalar
values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` will be for
repeated for each row.
In PostgreSQL, expressions are not allowed for `date_part` / `extract`,
`date_trunc` etc. :
https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
> Note that here the field parameter needs to be a string value, not a name.
The valid field names for date_part are the same as for extract.
This is also happening with the existing `date_trunc` function where
currently the date part (as string) is repeated / matched against for each row
and also evaluated per row (see below). That won't work with the `hour` kernel
for obvious reasons.
```rust
let result = range
.map(|i| {
if array.is_null(i) {
Ok(0_i64)
} else {
let date_time = match granularity_array.value(i) {
"second" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0)),
"minute" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0))
.and_then(|d| d.with_second(0)),
[...]
```
So I think here we have a few options:
* Refactor/Optimize `ScalarFunction` to also allow for scalar values, be
able to check on them + support literals (I guess it should use `ColumnarValue`
instead of just `Array`s).
* Have a similar (inefficient) implementation for extract from / date_part
to compute as currently `date_trunc`. I think that
* Refactor/Optimize `ScalarFunction` later and keep the Extract as is for
now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]