jacobmarble commented on issue #3015:
URL:
https://github.com/apache/arrow-datafusion/issues/3015#issuecomment-1204739188
I don't know how far DataFusion likes to deviate from PostgreSQL-flavored
SQL, but here's an idea for this feature.
### Suggestion 1
Make the default value of the `origin` argument `1970-01-01 00:00:00 UTC`.
It seems there is no default value specified by PostgreSQL. Certainly there
exists a need for `origin` values like `XXXX-XX-XX 00:02:30 UTC`, but the far
more common uses of `date_bin` are certainly with `stride` values like 1s, 10s,
30s, 1m, 5m, 10m, 15m, 20m, 30m, 1h, 2h, 3h, 6h, etc, with the reasonable
expectation that the result of `date_bin` align with a top-of-the-hour origin.
The result is that 99% of queries written with this function will be shorter
and less error prone.
Some proposed, equivalent statements:
```sql
-- as specified by PostgreSQL - origin at Unix epoch
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp
'1970-01-01 00:00:00 UTC');
-- 2022-08-03 14:45:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST');
```
### Suggestion 2
Allow `origin` values with type `timestamp`, `date`, `time` or `interval`.
Again, PostgreSQL doesn't have any prior art here.
The result enables better readability.
Some proposed, equivalent statements:
```sql
-- as specified by PostgreSQL - origin shifts bins forward 5 minutes
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp
'1970-01-01 00:05:00 UTC');
-- 2022-08-03 14:35:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', time
'00:05:00');
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval
'5 minutes');
```
```sql
-- as specified by PostgreSQL - origin shifts bins back 5 minutes
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp
'1969-12-31 23:55:00 UTC');
-- 2022-08-03 14:40:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', - time
'00:05:00');
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval
'- 5 minutes');
```
```sql
-- as specified by PostgreSQL - origin shifts bins forward one day
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', timestamp
'1970-01-02 00:00:00 UTC');
-- 2022-07-29 00:00:00
-- proposed
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', date
'1970-01-02');
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', interval '1
day');
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]