aedelbro opened a new issue, #14800:
URL: https://github.com/apache/druid/issues/14800

   ### Description
   
   Currently, `LATEST_BY` and `EARLIEST_BY` accept only 1 timestamp column. The 
proposed would allow the function to accept multiple timestamp columns, to be 
used as tie-breakers.
   for example: `LATEST_BY("myValueColumn", ARRAY[__time, 
"mySecondTimeColumn"])`. The second (or `nth`) time column in the array would 
be used if the proceeding time columns are equal. 
   
   
   ### Motivation
   
   Our motivation is to use a second time column to determine which rows are 
"latest" or "earliest". If you have 2 rows with the exact same `__time` value, 
both rows are "latest" or "earliest" (depending on the data and query being 
used). It is no longer deterministic which row's value is returned. In our 
application, it "flip flops" between the different values, as which value 
"wins" is determined by druid's internal processing.
   Adding a second (or more than 1) time column to use would resolve ties on 
the primary time column. 
   
   Our application accepts events from customers where the customers may 
provide the timestamp of the event. They are also allowed to provide secondary 
time columns. We want to be able to find the "first" and "last" value for a 
given timestamp. A simple example being a timestamp for when the data was 
received by our application, so that we can emulate an "update" to a timestamp 
without losing data by rolling up the source data.
   
   
   ### Example
   csv data:
   ```csv
   __time,time2,value,dim
   1,1,1,a
   1,2,2,a
   1,3,3,b
   2,1,4,b
   2,2,5,b
   2,2,6,a
   3,1,7,a
   3,2,8,b
   3,3,9,a
   ```
   
   Query and expected result:
   ```
   SELECT 
     TIME_FLOOR(__time, 'PT1H') as flooredTime,
     "dim",
     EARLIEST_BY("value", ARRAY[_time, "time2"]) as "earliest", 
     LATEST_BY("value", ARRAY[_time, "time2"]) as "latest" 
   FROM "myDataSource" GROUP BY TIME_FLOOR(__time, 'PT1H'), "dim"
   
   +--------------------------+-----+----------+--------+
   |        flooredTime       | dim | earliest | latest |
   +--------------------------+-----+----------+--------+
   | 1970-01-01T00:00:00.000Z | a   |        1 |      9 |
   | 1970-01-01T00:00:00.000Z | b   |        3 |      8 |
   +--------------------------+-----+----------+--------+
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to