Re: Support local aggregate push down for Blink batch planner

Jark Wu Mon, 04 Jan 2021 20:06:08 -0800

I'm also +1 for idea#2.

Regarding to the updated interface,


Result applyAggregates(List<CallExpression> aggregateExpressions,
     int[] groupSet, DataType aggOutputDataType);

final class Result {
       private final List<CallExpression> acceptedAggregates;
       private final List<CallExpression> remainingAggregates;
}

I have following comments:

1) Do we need the composite Result return type? Is a boolean return type
enough?
    From my understanding, all of the aggregates should be accepted,
otherwise the pushdown should fail.
    Therefore, users don't need to distinguish which aggregates are
"accepted".

2) Does the `aggOutputDataType` represent the produced data type of the new
source, or just the return type of all the agg functions?
    I would prefer to `producedDataType` just like
`SupportsReadingMetadata` to reduce the effort for users to concat a final
output type.
    The return type of each agg function can be obtained from the
`CallExpression`.

3) What do you think about renaming `groupSet` to `grouping` or
`groupedFields` ?
    The `groupSet` may confuse users that it relates to "grouping sets".


What do you think?

Best,
Jark



On Tue, 5 Jan 2021 at 11:04, Kurt Young <ykt...@gmail.com> wrote:

> Sorry for the typo -_-!
> I meant idea #2.
>
> Best,
> Kurt
>
>
> On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu <liuyang0...@gmail.com>
> wrote:
>
>> Hi Kurt,
>>
>> Thx a lot for your feedback. If local aggregation is more like a physical
>> operator rather than logical
>> operator, I think your suggestion should be idea #2 which handle all in
>> the physical optimization phase?
>>
>> Looking forward for the further discussion.
>>
>>
>> Kurt Young <ykt...@gmail.com> 于2021年1月5日周二 上午9:52写道：
>>
>>> Local aggregation is more like a physical operator rather than logical
>>> operator. I would suggest going with idea #1.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu <liuyang0...@gmail.com>
>>> wrote:
>>>
>>> > Hi Jark, Thx a lot for your quick reply and valuable suggestions.
>>> > For (1): Agree: Since we are in the period of upgrading the new table
>>> > source api,
>>> > we really should consider the new interface for the new optimize rule.
>>> If
>>> > the new rule
>>> > doesn't use the new api, we'll have to upgrade it sooner or later. I
>>> have
>>> > change to use
>>> > the ability interface for the SupportsAggregatePushDown definition in
>>> above
>>> > proposal.
>>> >
>>> > For (2): Agree: Change to use CallExpression is a better choice, and
>>> have
>>> > resolved this
>>> > comment in the proposal.
>>> >
>>> > For (3): I suggest we first support the JDBC connector, as we don't
>>> have
>>> > Druid connector
>>> > and ES connector just has sink api at present.
>>> >
>>> > But perhaps the biggest question may be whether we should use idea 1 or
>>> > idea 2 in proposal.
>>> >
>>> > What do you think?  After we reach the agreement on the proposal, our
>>> team
>>> > can drive to
>>> > complete this feature.
>>> >
>>> > Jark Wu <imj...@gmail.com> 于2020年12月29日周二 下午2:58写道：
>>> >
>>> > > Hi Sebastian,
>>> > >
>>> > > Thanks for the proposal. I think this is a great improvement for
>>> Flink
>>> > SQL.
>>> > > I went through the design doc and have following thoughts:
>>> > >
>>> > > 1) Flink has deprecated the legacy TableSource in 1.11 and proposed
>>> a new
>>> > >  set of DynamicTableSource interfaces. Could you update your
>>> proposal to
>>> > > use the new interfaces?
>>> > >  Follow the existing ability interfaces, e.g.
>>> > > SupportsFilterPushDown, SupportsProjectionPushDown.
>>> > >
>>> > > 2) Personally, I think CallExpression would be a better
>>> representation
>>> > than
>>> > > separate `FunctionDefinition` and args. Because, it would be easier
>>> to
>>> > know
>>> > > what's the index and type of the arguments.
>>> > >
>>> > > 3) It would be better to list which connectors will be supported in
>>> the
>>> > > plan?
>>> > >
>>> > > Best,
>>> > > Jark
>>> > >
>>> > >
>>> > > On Tue, 29 Dec 2020 at 00:49, Sebastian Liu <liuyang0...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hi all,
>>> > > >
>>> > > > I'd like to discuss a new feature for the Blink Planner.
>>> > > > Aggregate operator of Flink SQL is currently fully done at Flink
>>> layer.
>>> > > > With the developing of storage, many downstream storage of Flink
>>> SQL
>>> > has
>>> > > > the ability to deal with Aggregation operator.
>>> > > > Pushing down Aggregate to data source layer will improve
>>> performance
>>> > from
>>> > > > the perspective of the network IO and computation overhead.
>>> > > >
>>> > > > I have drafted a design doc for this new feature.
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://docs.google.com/document/d/1kGwC_h4qBNxF2eMEz6T6arByOB8yilrPLqDN0QBQXW4/edit?usp=sharing
>>> > > >
>>> > > > Any comment or discussion is welcome.
>>> > > >
>>> > > > --
>>> > > >
>>> > > > *With kind regards
>>> > > > ------------------------------------------------------------
>>> > > > Sebastian Liu 刘洋
>>> > > > Institute of Computing Technology, Chinese Academy of Science
>>> > > > Mobile\WeChat: +86—15201613655
>>> > > > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > > > QQ: 3239559*
>>> > > >
>>> > >
>>> >
>>> >
>>> > --
>>> >
>>> > *With kind regards
>>> > ------------------------------------------------------------
>>> > Sebastian Liu 刘洋
>>> > Institute of Computing Technology, Chinese Academy of Science
>>> > Mobile\WeChat: +86—15201613655
>>> > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > QQ: 3239559*
>>> >
>>>
>>
>>
>> --
>>
>> *With kind regards
>> ------------------------------------------------------------
>> Sebastian Liu 刘洋
>> Institute of Computing Technology, Chinese Academy of Science
>> Mobile\WeChat: +86—15201613655
>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>> QQ: 3239559*
>>
>>

Re: Support local aggregate push down for Blink batch planner

Reply via email to