Hi,

I have a question about the key information of TableAggregateFunction.
IIUC, you need to define
something like primary key or unique key in the result table of
TableAggregateFunction, and also
need a way to let user configure this through the API. My question is, will
that effect the logic of
TableAggregateFunction user wrote? Should the user know that there will a
key and make some changes
to this function?

If so, what's the semantic the user should learned. If not, how will
framework deal with the output of user's
TableAggregateFunction. For example, if user output multiple rows with the
same key, should be latter one
replace previous ones?

Best,
Kurt


On Mon, Jul 1, 2019 at 7:19 AM jincheng sun <sunjincheng...@gmail.com>
wrote:

> Hi hequn, Thanks for the reply! I think `withKeys` solution is our better
> choice!
>
>
> Hequn Cheng <chenghe...@gmail.com> 于2019年6月26日周三 下午5:11写道:
>
> > Hi Jincheng,
> >
> > Thanks for raising the discussion!
> > The key information is very important for query optimizations. It would
> be
> > nice if we can use upsert mode to achieve better performance.
> >
> > +1 for the `withKeys` proposal. :)
> >
> > Best, Hequn
> >
> >
> > On Wed, Jun 26, 2019 at 4:37 PM jincheng sun <sunjincheng...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > With the continuous efforts from the community, we already supported
> > > `flatAggregate`[1] on TableAPI in retract mode. I think It's better to
> > add
> > > upsert mode for  `flatAggregate`.
> > >
> > > The result table of streaming non-window `flatAggregate` is a table
> > > contains updates. We can, of course, use a RetractStreamTableSink[2] to
> > > emit the table, but we can get better performance in upsert mode.
> > However,
> > > due to the lack of keys, we can’t use an UpsertStreamTableSink to emit
> > the
> > > table. We don’t have this problem for a normal aggregate as it emits a
> > > single row for each group, so the unique keys are exactly the same with
> > the
> > > group keys. While for a `flatAggregate`, its pretty difference that due
> > to
> > > emits multi rows(a “sub-table”) for a single group. To solve this
> > problem,
> > > we need to find a way to define keys on flatAggregate, so that we can
> > also
> > > use upsert sink to emit the result table after flatAggregate.
> > >
> > > So, Aljoscha, Hequn and I prepared a design document for how to define
> > the
> > > update keys for  `flatAggregate` in upsert mode.  The detail can be
> found
> > > here:
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/183qHo8PDG-xserDi_AbGP6YX9aPY0rVr80p3O3Gyz6U/edit?usp=sharing
> > >
> > > I appreciate it if you can have look at the document and any comments
> are
> > > welcome!
> > >
> > >
> > > Best,
> > >
> > > Jincheng
> > >
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739
> > >
> > > [2]
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sourceSinks.html#defining-a-streamtablesource
> > >
> >
>

Reply via email to