Hi all,

After some experimentation, we felt no problem putting the dynamic
storage outside of flink, and it also allowed us to design the
interface in more depth.

What do you think? If there is no problem, I am asking for PMC's help
here: we want to propose flink-dynamic-storage as a flink subproject,
and we want to build the project under apache.

Best,
Jingsong


On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <jingsongl...@gmail.com> wrote:
>
> Hi Stephan,
>
> Thanks for your reply.
>
> Data never expires automatically.
>
> If there is a need for data retention, the user can choose one of the
> following options:
> - In the SQL for querying the managed table, users filter the data by 
> themselves
> - Define the time partition, and users can delete the expired
> partition by themselves. (DROP PARTITION ...)
> - In the future version, we will support the "DELETE FROM" statement,
> users can delete the expired data according to the conditions.
>
> So to answer your question:
>
> > Will the VMQ send retractions so that the data will be removed from the 
> > table (via compactions)?
>
> The current implementation is not sending retraction, which I think
> theoretically should be sent, currently the user can filter by
> subsequent conditions.
> And yes, the subscriber would not see strictly a correct result. I
> think this is something we can improve for Flink SQL.
>
> > Do we want time retention semantics handled by the compaction?
>
> Currently, no, Data never expires automatically.
>
> > Do we want to declare those types of queries "out of scope" initially?
>
> I think we want users to be able to use three options above to
> accomplish their requirements.
>
> I will update FLIP to make the definition clearer and more explicit.
>
> Best,
> Jingsong
>
> On Wed, Nov 24, 2021 at 5:01 AM Stephan Ewen <ewenstep...@gmail.com> wrote:
> >
> > Thanks for digging into this.
> > Regarding this query:
> >
> > INSERT INTO the_table
> >   SELECT window_end, COUNT(*)
> >     FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5' MINUTES))
> > GROUP BY window_end
> >   HAVING now() - window_end <= INTERVAL '14' DAYS;
> >
> > I am not sure I understand what the conclusion is on the data retention 
> > question, where the continuous streaming SQL query has retention semantics. 
> > I think we would need to answer the following questions (I will call the 
> > query that computed the managed table the "view materializer query" - VMQ).
> >
> > (1) I guess the VMQ will send no updates for windows beyond the "retention 
> > period" is over (14 days), as you said. That makes sense.
> >
> > (2) Will the VMQ send retractions so that the data will be removed from the 
> > table (via compactions)?
> >   - if yes, this seems semantically better for users, but it will be 
> > expensive to keep the timers for retractions.
> >   - if not, we can still solve this by adding filters to queries against 
> > the managed table, as long as these queries are in Flink.
> >   - any subscriber to the changelog stream would not see strictly a correct 
> > result if we are not doing the retractions
> >
> > (3) Do we want time retention semantics handled by the compaction?
> >   - if we say that we lazily apply the deletes in the queries that read the 
> > managed tables, then we could also age out the old data during compaction.
> >   - that is cheap, but it might be too much of a special case to be very 
> > relevant here.
> >
> > (4) Do we want to declare those types of queries "out of scope" initially?
> >   - if yes, how many users are we affecting? (I guess probably not many, 
> > but would be good to hear some thoughts from others on this)
> >   - should we simply reject such queries in the optimizer as "not possible 
> > to support in managed tables"? I would suggest that, always better to tell 
> > users exactly what works and what not, rather than letting them be 
> > surprised in the end. Users can still remove the HAVING clause if they want 
> > the query to run, and that would be better than if the VMQ just silently 
> > ignores those semantics.
> >
> > Thanks,
> > Stephan
> >
>
>
> --
> Best, Jingsong Lee



--
Best, Jingsong Lee

Reply via email to