+1 with a separate repo and +1 with the flink-storage name

On Fri, Jan 7, 2022 at 8:40 AM Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi everyone,
>
> Vote for create a separate sub project for FLIP-188 thread is here:
> https://lists.apache.org/thread/wzzhr27cvrh6w107bn464m1m1ycfll1z
>
> Best,
> Jingsong
>
>
> On Fri, Jan 7, 2022 at 3:30 PM Jingsong Li <jingsongl...@gmail.com> wrote:
> >
> > Hi Timo,
> >
> > I think we can consider exposing to DataStream users in the future, if
> > the API definition is clear after.
> > I am fine with `flink-table-store` too.
> > But I tend to prefer shorter and clearer name:
> > `flink-store`.
> >
> > I think I can create a separate thread to vote.
> >
> > Looking forward to your thoughts!
> >
> > Best,
> > Jingsong
> >
> >
> > On Thu, Dec 30, 2021 at 9:48 PM Timo Walther <twal...@apache.org> wrote:
> > >
> > > +1 for a separate repository. And also +1 for finding a good name.
> > >
> > > `flink-warehouse` would be definitely a good marketing name but I agree
> > > that we should not start marketing for code bases. Are we planning to
> > > make this storage also available to DataStream API users? If not, I
> > > would also vote for `flink-managed-table` or better:
> `flink-table-store`
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > > On 29.12.21 07:58, Jingsong Li wrote:
> > > > Thanks Till for your suggestions.
> > > >
> > > > Personally, I like flink-warehouse, this is what we want to convey to
> > > > the user, but it indicates a bit too much scope.
> > > >
> > > > How about just calling it flink-store?
> > > > Simply to convey an impression: this is flink's store project,
> > > > providing a built-in store for the flink compute engine, which can be
> > > > used by flink-table as well as flink-datastream.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, Dec 28, 2021 at 5:15 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
> > > >>
> > > >> Hi Jingsong,
> > > >>
> > > >> I think that developing flink-dynamic-storage as a separate sub
> project is
> > > >> a very good idea since it allows us to move a lot faster and
> decouple
> > > >> releases from Flink. Hence big +1.
> > > >>
> > > >> Do we want to name it flink-dynamic-storage or shall we use a more
> > > >> descriptive name? dynamic-storage sounds a bit generic to me and I
> wouldn't
> > > >> know that this has something to do with letting Flink manage your
> tables
> > > >> and their storage. I don't have a very good idea but maybe we can
> call it
> > > >> flink-managed-tables, flink-warehouse, flink-olap or so.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Dec 28, 2021 at 9:49 AM Martijn Visser <
> mart...@ververica.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Jingsong,
> > > >>>
> > > >>> That sounds promising! +1 from my side to continue development
> under
> > > >>> flink-dynamic-storage as a Flink subproject. I think having a more
> in-depth
> > > >>> interface will benefit everyone.
> > > >>>
> > > >>> Best regards,
> > > >>>
> > > >>> Martijn
> > > >>>
> > > >>> On Tue, 28 Dec 2021 at 04:23, Jingsong Li <jingsongl...@gmail.com>
> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> After some experimentation, we felt no problem putting the dynamic
> > > >>>> storage outside of flink, and it also allowed us to design the
> > > >>>> interface in more depth.
> > > >>>>
> > > >>>> What do you think? If there is no problem, I am asking for PMC's
> help
> > > >>>> here: we want to propose flink-dynamic-storage as a flink
> subproject,
> > > >>>> and we want to build the project under apache.
> > > >>>>
> > > >>>> Best,
> > > >>>> Jingsong
> > > >>>>
> > > >>>>
> > > >>>> On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <
> jingsongl...@gmail.com>
> > > >>>> wrote:
> > > >>>>>
> > > >>>>> Hi Stephan,
> > > >>>>>
> > > >>>>> Thanks for your reply.
> > > >>>>>
> > > >>>>> Data never expires automatically.
> > > >>>>>
> > > >>>>> If there is a need for data retention, the user can choose one
> of the
> > > >>>>> following options:
> > > >>>>> - In the SQL for querying the managed table, users filter the
> data by
> > > >>>> themselves
> > > >>>>> - Define the time partition, and users can delete the expired
> > > >>>>> partition by themselves. (DROP PARTITION ...)
> > > >>>>> - In the future version, we will support the "DELETE FROM"
> statement,
> > > >>>>> users can delete the expired data according to the conditions.
> > > >>>>>
> > > >>>>> So to answer your question:
> > > >>>>>
> > > >>>>>> Will the VMQ send retractions so that the data will be removed
> from
> > > >>>> the table (via compactions)?
> > > >>>>>
> > > >>>>> The current implementation is not sending retraction, which I
> think
> > > >>>>> theoretically should be sent, currently the user can filter by
> > > >>>>> subsequent conditions.
> > > >>>>> And yes, the subscriber would not see strictly a correct result.
> I
> > > >>>>> think this is something we can improve for Flink SQL.
> > > >>>>>
> > > >>>>>> Do we want time retention semantics handled by the compaction?
> > > >>>>>
> > > >>>>> Currently, no, Data never expires automatically.
> > > >>>>>
> > > >>>>>> Do we want to declare those types of queries "out of scope"
> initially?
> > > >>>>>
> > > >>>>> I think we want users to be able to use three options above to
> > > >>>>> accomplish their requirements.
> > > >>>>>
> > > >>>>> I will update FLIP to make the definition clearer and more
> explicit.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jingsong
> > > >>>>>
> > > >>>>> On Wed, Nov 24, 2021 at 5:01 AM Stephan Ewen <
> ewenstep...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> Thanks for digging into this.
> > > >>>>>> Regarding this query:
> > > >>>>>>
> > > >>>>>> INSERT INTO the_table
> > > >>>>>>    SELECT window_end, COUNT(*)
> > > >>>>>>      FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL
> '5'
> > > >>>> MINUTES))
> > > >>>>>> GROUP BY window_end
> > > >>>>>>    HAVING now() - window_end <= INTERVAL '14' DAYS;
> > > >>>>>>
> > > >>>>>> I am not sure I understand what the conclusion is on the data
> > > >>>> retention question, where the continuous streaming SQL query has
> retention
> > > >>>> semantics. I think we would need to answer the following
> questions (I will
> > > >>>> call the query that computed the managed table the "view
> materializer
> > > >>>> query" - VMQ).
> > > >>>>>>
> > > >>>>>> (1) I guess the VMQ will send no updates for windows beyond the
> > > >>>> "retention period" is over (14 days), as you said. That makes
> sense.
> > > >>>>>>
> > > >>>>>> (2) Will the VMQ send retractions so that the data will be
> removed
> > > >>>> from the table (via compactions)?
> > > >>>>>>    - if yes, this seems semantically better for users, but it
> will be
> > > >>>> expensive to keep the timers for retractions.
> > > >>>>>>    - if not, we can still solve this by adding filters to
> queries
> > > >>>> against the managed table, as long as these queries are in Flink.
> > > >>>>>>    - any subscriber to the changelog stream would not see
> strictly a
> > > >>>> correct result if we are not doing the retractions
> > > >>>>>>
> > > >>>>>> (3) Do we want time retention semantics handled by the
> compaction?
> > > >>>>>>    - if we say that we lazily apply the deletes in the queries
> that
> > > >>>> read the managed tables, then we could also age out the old data
> during
> > > >>>> compaction.
> > > >>>>>>    - that is cheap, but it might be too much of a special case
> to be
> > > >>>> very relevant here.
> > > >>>>>>
> > > >>>>>> (4) Do we want to declare those types of queries "out of scope"
> > > >>>> initially?
> > > >>>>>>    - if yes, how many users are we affecting? (I guess probably
> not
> > > >>>> many, but would be good to hear some thoughts from others on this)
> > > >>>>>>    - should we simply reject such queries in the optimizer as
> "not
> > > >>>> possible to support in managed tables"? I would suggest that,
> always better
> > > >>>> to tell users exactly what works and what not, rather than
> letting them be
> > > >>>> surprised in the end. Users can still remove the HAVING clause if
> they want
> > > >>>> the query to run, and that would be better than if the VMQ just
> silently
> > > >>>> ignores those semantics.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Stephan
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best, Jingsong Lee
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Best, Jingsong Lee
> > > >>>>
> > > >>>
> > > >
> > > >
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
>
>
>
> --
> Best, Jingsong Lee
>

Reply via email to