Thanks for digging into this.
Regarding this query:

INSERT INTO the_table
  SELECT window_end, COUNT(*)
    FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5' MINUTES))
GROUP BY window_end
  HAVING now() - window_end <= INTERVAL '14' DAYS;

I am not sure I understand what the conclusion is on the data retention
question, where the continuous streaming SQL query has retention semantics.
I think we would need to answer the following questions (I will call the
query that computed the managed table the "view materializer query" - VMQ).

(1) I guess the VMQ will send no updates for windows beyond the "retention
period" is over (14 days), as you said. That makes sense.

(2) Will the VMQ send retractions so that the data will be removed from the
table (via compactions)?
  - if yes, this seems semantically better for users, but it will be
expensive to keep the timers for retractions.
  - if not, we can still solve this by adding filters to queries against
the managed table, as long as these queries are in Flink.
  - any subscriber to the changelog stream would not see strictly a correct
result if we are not doing the retractions

(3) Do we want time retention semantics handled by the compaction?
  - if we say that we lazily apply the deletes in the queries that read the
managed tables, then we could also age out the old data during compaction.
  - that is cheap, but it might be too much of a special case to be very
relevant here.

(4) Do we want to declare those types of queries "out of scope" initially?
  - if yes, how many users are we affecting? (I guess probably not many,
but would be good to hear some thoughts from others on this)
  - should we simply reject such queries in the optimizer as "not possible
to support in managed tables"? I would suggest that, always better to tell
users exactly what works and what not, rather than letting them be
surprised in the end. Users can still remove the HAVING clause if they want
the query to run, and that would be better than if the VMQ just silently
ignores those semantics.

Thanks,
Stephan

Reply via email to