I'm not sure how to express my logic simply where early triggers are a
necessity.
My application has large windows (2 weeks~) where early triggering is
absolutely required. But, also, my application has mostly relatively simple
logic which can be expressed in SQL. There's a ton of duplication, like the
following
```
SELECT A,B,C,
COUNT(*) FILTER (WHERE my_condition) AS total_conditions,
COUNT(*) AS total,
ROUND(COUNT(*) FILTER (WHERE my_condition)/(COUNT(*)), 1) AS
condition_rate,
AVG(D),
AVG(E),
AVG(F)
FROM foo
GROUP BY A,B,C, SESSION(...)
```
Just imagine these kinds of queries duplicated a ton, just varying which
fields are being averaged and grouped by.
This is fairly easy to do with SQL, with some copying and pasting. Just
Ctrl+Fing to give an idea (so far),
COUNT - 50
AVG - 27
GROUP BY - 12
Since Flink doesn't support GROUPING SETS for streaming, I'll need to
duplicate a lot of these queries actually. So this is an underestimation.
Is writing an absolute ton of custom AggregateFunction boilerplate the only
way to solve this problem? Is there no way to abstract this while
maintaining early triggers? I feel like I'm missing something. Is Flink SQL
streaming only for short windows where triggering only at the end of the
window is acceptable?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/