aedelbro opened a new issue, #14800:
URL: https://github.com/apache/druid/issues/14800
### Description
Currently, `LATEST_BY` and `EARLIEST_BY` accept only 1 timestamp column. The
proposed would allow the function to accept multiple timestamp columns, to be
used as tie-breakers.
for example: `LATEST_BY("myValueColumn", ARRAY[__time,
"mySecondTimeColumn"])`. The second (or `nth`) time column in the array would
be used if the proceeding time columns are equal.
### Motivation
Our motivation is to use a second time column to determine which rows are
"latest" or "earliest". If you have 2 rows with the exact same `__time` value,
both rows are "latest" or "earliest" (depending on the data and query being
used). It is no longer deterministic which row's value is returned. In our
application, it "flip flops" between the different values, as which value
"wins" is determined by druid's internal processing.
Adding a second (or more than 1) time column to use would resolve ties on
the primary time column.
Our application accepts events from customers where the customers may
provide the timestamp of the event. They are also allowed to provide secondary
time columns. We want to be able to find the "first" and "last" value for a
given timestamp. A simple example being a timestamp for when the data was
received by our application, so that we can emulate an "update" to a timestamp
without losing data by rolling up the source data.
### Example
csv data:
```csv
__time,time2,value,dim
1,1,1,a
1,2,2,a
1,3,3,b
2,1,4,b
2,2,5,b
2,2,6,a
3,1,7,a
3,2,8,b
3,3,9,a
```
Query and expected result:
```
SELECT
TIME_FLOOR(__time, 'PT1H') as flooredTime,
"dim",
EARLIEST_BY("value", ARRAY[_time, "time2"]) as "earliest",
LATEST_BY("value", ARRAY[_time, "time2"]) as "latest"
FROM "myDataSource" GROUP BY TIME_FLOOR(__time, 'PT1H'), "dim"
+--------------------------+-----+----------+--------+
| flooredTime | dim | earliest | latest |
+--------------------------+-----+----------+--------+
| 1970-01-01T00:00:00.000Z | a | 1 | 9 |
| 1970-01-01T00:00:00.000Z | b | 3 | 8 |
+--------------------------+-----+----------+--------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]