I'm not suggesting to add support for Ignite. This was just an example.
Plasma and Arrow sound interesting, too.
For the sake of this proposal, it would be up to the user to implement a
TableFactory and corresponding TableSource / TableSink classes to persist
and read the data.

Am Mo., 26. Nov. 2018 um 12:06 Uhr schrieb Flavio Pompermaier <
pomperma...@okkam.it>:

> What about to add also Apache Plasma + Arrow as an alternative to Apache
> Ignite?
> [1]
> https://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/
>
> On Mon, Nov 26, 2018 at 11:56 AM Fabian Hueske <fhue...@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for the proposal!
> >
> > To summarize, you propose a new method Table.cache(): Table that will
> > trigger a job and write the result into some temporary storage as defined
> > by a TableFactory.
> > The cache() call blocks while the job is running and eventually returns a
> > Table object that represents a scan of the temporary table.
> > When the "session" is closed (closing to be defined?), the temporary
> tables
> > are all dropped.
> >
> > I think this behavior makes sense and is a good first step towards more
> > interactive workloads.
> > However, its performance suffers from writing to and reading from
> external
> > systems.
> > I think this is OK for now. Changes that would significantly improve the
> > situation (i.e., pinning data in-memory across jobs) would have large
> > impacts on many components of Flink.
> > Users could use in-memory filesystems or storage grids (Apache Ignite) to
> > mitigate some of the performance effects.
> >
> > Best, Fabian
> >
> >
> >
> > Am Mo., 26. Nov. 2018 um 03:38 Uhr schrieb Becket Qin <
> > becket....@gmail.com
> > >:
> >
> > > Thanks for the explanation, Piotrek.
> > >
> > > Is there any extra thing user can do on a MaterializedTable that they
> > > cannot do on a Table? After users call *table.cache(), *users can just
> > use
> > > that table and do anything that is supported on a Table, including SQL.
> > >
> > > Naming wise, either cache() or materialize() sounds fine to me. cache()
> > is
> > > a bit more general than materialize(). Given that we are enhancing the
> > > Table API to also support non-relational processing cases, cache()
> might
> > be
> > > slightly better.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Fri, Nov 23, 2018 at 11:25 PM Piotr Nowojski <
> pi...@data-artisans.com
> > >
> > > wrote:
> > >
> > > > Hi Becket,
> > > >
> > > > Ops, sorry I didn’t notice that you intend to reuse existing
> > > > `TableFactory`. I don’t know why, but I assumed that you want to
> > provide
> > > an
> > > > alternate way of writing the data.
> > > >
> > > > Now that I hopefully understand the proposal, maybe we could rename
> > > > `cache()` to
> > > >
> > > > void materialize()
> > > >
> > > > or going step further
> > > >
> > > > MaterializedTable materialize()
> > > > MaterializedTable createMaterializedView()
> > > >
> > > > ?
> > > >
> > > > The second option with returning a handle I think is more flexible
> and
> > > > could provide features such as “refresh”/“delete” or generally
> speaking
> > > > manage the the view. In the future we could also think about adding
> > hooks
> > > > to automatically refresh view etc. It is also more explicit -
> > > > materialization returning a new table handle will not have the same
> > > > implicit side effects as adding a simple line of code like
> `b.cache()`
> > > > would have.
> > > >
> > > > It would also be more SQL like, making it more intuitive for users
> > > already
> > > > familiar with the SQL.
> > > >
> > > > Piotrek
> > > >
> > > > > On 23 Nov 2018, at 14:53, Becket Qin <becket....@gmail.com> wrote:
> > > > >
> > > > > Hi Piotrek,
> > > > >
> > > > > For the cache() method itself, yes, it is equivalent to creating a
> > > > BUILT-IN
> > > > > materialized view with a lifecycle. That functionality is missing
> > > today,
> > > > > though. Not sure if I understand your question. Do you mean we
> > already
> > > > have
> > > > > the functionality and just need a syntax sugar?
> > > > >
> > > > > What's more interesting in the proposal is do we want to stop at
> > > creating
> > > > > the materialized view? Or do we want to extend that in the future
> to
> > a
> > > > more
> > > > > useful unified data store distributed with Flink? And do we want to
> > > have
> > > > a
> > > > > mechanism allow more flexible user job pattern with their own user
> > > > defined
> > > > > services. These considerations are much more architectural.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Fri, Nov 23, 2018 at 6:01 PM Piotr Nowojski <
> > > pi...@data-artisans.com>
> > > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> Interesting idea. I’m trying to understand the problem. Isn’t the
> > > > >> `cache()` call an equivalent of writing data to a sink and later
> > > reading
> > > > >> from it? Where this sink has a limited live scope/live time? And
> the
> > > > sink
> > > > >> could be implemented as in memory or a file sink?
> > > > >>
> > > > >> If so, what’s the problem with creating a materialised view from a
> > > table
> > > > >> “b” (from your document’s example) and reusing this materialised
> > view
> > > > >> later? Maybe we are lacking mechanisms to clean up materialised
> > views
> > > > (for
> > > > >> example when current session finishes)? Maybe we need some
> syntactic
> > > > sugar
> > > > >> on top of it?
> > > > >>
> > > > >> Piotrek
> > > > >>
> > > > >>> On 23 Nov 2018, at 07:21, Becket Qin <becket....@gmail.com>
> wrote:
> > > > >>>
> > > > >>> Thanks for the suggestion, Jincheng.
> > > > >>>
> > > > >>> Yes, I think it makes sense to have a persist() with
> > > lifecycle/defined
> > > > >>> scope. I just added a section in the future work for this.
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> Jiangjie (Becket) Qin
> > > > >>>
> > > > >>> On Fri, Nov 23, 2018 at 1:55 PM jincheng sun <
> > > sunjincheng...@gmail.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hi Jiangjie,
> > > > >>>>
> > > > >>>> Thank you for the explanation about the name of `cache()`, I
> > > > understand
> > > > >> why
> > > > >>>> you designed this way!
> > > > >>>>
> > > > >>>> Another idea is whether we can specify a lifecycle for data
> > > > persistence?
> > > > >>>> For example, persist (LifeCycle.SESSION), so that the user is
> not
> > > > >> worried
> > > > >>>> about data loss, and will clearly specify the time range for
> > keeping
> > > > >> time.
> > > > >>>> At the same time, if we want to expand, we can also share in a
> > > certain
> > > > >>>> group of session, for example: LifeCycle.SESSION_GROUP(...), I
> am
> > > not
> > > > >> sure,
> > > > >>>> just an immature suggestion, for reference only!
> > > > >>>>
> > > > >>>> Bests,
> > > > >>>> Jincheng
> > > > >>>>
> > > > >>>> Becket Qin <becket....@gmail.com> 于2018年11月23日周五 下午1:33写道:
> > > > >>>>
> > > > >>>>> Re: Jincheng,
> > > > >>>>>
> > > > >>>>> Thanks for the feedback. Regarding cache() v.s. persist(),
> > > > personally I
> > > > >>>>> find cache() to be more accurately describing the behavior,
> i.e.
> > > the
> > > > >>>> Table
> > > > >>>>> is cached for the session, but will be deleted after the
> session
> > is
> > > > >>>> closed.
> > > > >>>>> persist() seems a little misleading as people might think the
> > table
> > > > >> will
> > > > >>>>> still be there even after the session is gone.
> > > > >>>>>
> > > > >>>>> Great point about mixing the batch and stream processing in the
> > > same
> > > > >> job.
> > > > >>>>> We should absolutely move towards that goal. I imagine that
> would
> > > be
> > > > a
> > > > >>>> huge
> > > > >>>>> change across the board, including sources, operators and
> > > > >> optimizations,
> > > > >>>> to
> > > > >>>>> name some. Likely we will need several separate in-depth
> > > discussions.
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>>
> > > > >>>>> Jiangjie (Becket) Qin
> > > > >>>>>
> > > > >>>>> On Fri, Nov 23, 2018 at 5:14 AM Xingcan Cui <
> xingc...@gmail.com>
> > > > >> wrote:
> > > > >>>>>
> > > > >>>>>> Hi all,
> > > > >>>>>>
> > > > >>>>>> @Shaoxuan, I think the lifecycle or access domain are both
> > > > orthogonal
> > > > >>>> to
> > > > >>>>>> the cache problem. Essentially, this may be the first time we
> > plan
> > > > to
> > > > >>>>>> introduce another storage mechanism other than the state.
> Maybe
> > > it’s
> > > > >>>>> better
> > > > >>>>>> to first draw a big picture and then concentrate on a specific
> > > part?
> > > > >>>>>>
> > > > >>>>>> @Becket, yes, actually I am more concerned with the underlying
> > > > >> service.
> > > > >>>>>> This seems to be quite a major change to the existing
> codebase.
> > As
> > > > you
> > > > >>>>>> claimed, the service should be extendible to support other
> > > > components
> > > > >>>> and
> > > > >>>>>> we’d better discussed it in another thread.
> > > > >>>>>>
> > > > >>>>>> All in all, I also eager to enjoy the more interactive Table
> > API,
> > > in
> > > > >>>> case
> > > > >>>>>> of a general and flexible enough service mechanism.
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Xingcan
> > > > >>>>>>
> > > > >>>>>>> On Nov 22, 2018, at 10:16 AM, Xiaowei Jiang <
> > xiaow...@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Relying on a callback for the temp table for clean up is not
> > very
> > > > >>>>>> reliable.
> > > > >>>>>>> There is no guarantee that it will be executed successfully.
> We
> > > may
> > > > >>>>> risk
> > > > >>>>>>> leaks when that happens. I think that it's safer to have an
> > > > >>>> association
> > > > >>>>>>> between temp table and session id. So we can always clean up
> > temp
> > > > >>>>> tables
> > > > >>>>>>> which are no longer associated with any active sessions.
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> Xiaowei
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Nov 22, 2018 at 12:55 PM jincheng sun <
> > > > >>>>> sunjincheng...@gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi Jiangjie&Shaoxuan,
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks for initiating this great proposal!
> > > > >>>>>>>>
> > > > >>>>>>>> Interactive Programming is very useful and user friendly in
> > case
> > > > of
> > > > >>>>> your
> > > > >>>>>>>> examples.
> > > > >>>>>>>> Moreover, especially when a business has to be executed in
> > > several
> > > > >>>>>> stages
> > > > >>>>>>>> with dependencies,such as the pipeline of Flink ML, in order
> > to
> > > > >>>>> utilize
> > > > >>>>>> the
> > > > >>>>>>>> intermediate calculation results we have to submit a job by
> > > > >>>>>> env.execute().
> > > > >>>>>>>>
> > > > >>>>>>>> About the `cache()`  , I think is better to named
> `persist()`,
> > > And
> > > > >>>> The
> > > > >>>>>>>> Flink framework determines whether we internally cache in
> > memory
> > > > or
> > > > >>>>>> persist
> > > > >>>>>>>> to the storage system,Maybe save the data into state backend
> > > > >>>>>>>> (MemoryStateBackend or RocksDBStateBackend etc.)
> > > > >>>>>>>>
> > > > >>>>>>>> BTW, from the points of my view in the future, support for
> > > > streaming
> > > > >>>>> and
> > > > >>>>>>>> batch mode switching in the same job will also benefit in
> > > > >>>> "Interactive
> > > > >>>>>>>> Programming",  I am looking forward to your JIRAs and FLIP!
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Jincheng
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Becket Qin <becket....@gmail.com> 于2018年11月20日周二 下午9:56写道:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi all,
> > > > >>>>>>>>>
> > > > >>>>>>>>> As a few recent email threads have pointed out, it is a
> > > promising
> > > > >>>>>>>>> opportunity to enhance Flink Table API in various aspects,
> > > > >>>> including
> > > > >>>>>>>>> functionality and ease of use among others. One of the
> > > scenarios
> > > > >>>>> where
> > > > >>>>>> we
> > > > >>>>>>>>> feel Flink could improve is interactive programming. To
> > explain
> > > > the
> > > > >>>>>>>> issues
> > > > >>>>>>>>> and facilitate the discussion on the solution, we put
> > together
> > > > the
> > > > >>>>>>>>> following document with our proposal.
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1d4T2zTyfe7hdncEUAxrlNOYr4e5IMNEZLyqSuuswkA0/edit?usp=sharing
> > > > >>>>>>>>>
> > > > >>>>>>>>> Feedback and comments are very welcome!
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Jiangjie (Becket) Qin
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to