Re: Aggregates push down

Benchao Li Thu, 23 Jun 2022 05:08:36 -0700

Вадим,

Yes, we are short of documentation at the moment.


There is source code for adapters with processing aggregates, for example,
> Mongodb,

This is another different topic, adaptors such as Mongo/JDBC, they
transformed all
the RelNodes to their own convention, and then translate the RelNodes to
their own
query dialect. They do not pushdown filter/projection/aggregate, but they
invent a
method to implement the whole RelNode tree.

Вадим Ахмедов <akhmedov.va...@gmail.com> 于2022年6月22日周三 15:09写道：

> Hi Benchao,
> Thank you very much for your reply. Unfortunately, I did not find what you
> wrote in the Calcite documentation. It seems to be very sketchy. There is
> source code for adapters with processing aggregates, for example, Mongodb,
> but to understand thoroughly this it needs to spend a lot of time, which is
> often not enough. If there were examples in documentation explaining the
> minimum implementation of pushdown projections, the minimum implementation
> of pushdown filters, and the same for aggregates, it would be very helpful.
>
> сб, 18 июн. 2022 г. в 11:50, Benchao Li <libenc...@apache.org>:
>
> > Hi Вадим,
> >
> > I'd like to share how the projections and filters are pushed down
> > in the first place.
> >
> > 1. Firstly we should have a RelNode which can do projections and
> > filters, and in Calcite, this is done by BindableTableScan[1].
> > 2. Then we need a rule to match such as Filter/Project on top of Scan,
> > and push the filters into the Scan, and in Calcite this is done
> > by FilterTableScanRule[2] and ProjectTableScanRule[3].
> > 3. Finally, we should translate the Scan with filters and/or projections
> > to a executable form, this may be different for different projections
> > because they have their own physical representations. In Calcite,
> > BindableTableScan will be transformed to TableScanNode[4], which
> > will further push filters and projections into
> > ProjectableFilterableTable[5].
> >
> > Hence, to extend Calcite to push aggregations into Scan, you need
> > the same process. You need a physical Scan node which can do
> aggregations,
> > and a rule to match Aggregate on top of Scan to push it down. Then you
> also
> > need to implement the corresponding physical logics.
> >
> > If you want the Scan node to do all the projection/filter/aggregation
> > pushdown,
> > you need to be careful to deal with the mix of them, because generally
> they
> > are not pushed down in one go, e.g. you may push a aggregation into a
> Scan
> > which has been pushed the filters down.
> >
> > Hope this helps~
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L207
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/FilterTableScanRule.java#L57
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/ProjectTableScanRule.java#L57
> > [4]
> >
> >
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L63
> > [5]
> >
> >
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/schema/ProjectableFilterableTable.java#L38
> >
> > Вадим Ахмедов <akhmedov.va...@gmail.com> 于2022年6月17日周五 16:59写道：
> >
> > > Hi!
> > >
> > > I'm modifying a driver based on Apache Calcite that works with AWS S3
> > > storage using SQL queries. The interaction with S3 storage uses the S3
> > > Select dialect which is very similar to SQL. The driver uses
> > > ProjectableFilterableTable to scan CSV data loaded from AWS. The
> filters
> > as
> > > a list of RexNodes are used in the scan method to transform SQL queries
> > > into AWS S3 Select queries. Thus push down of projects and filters is
> > done
> > > into requests to the S3 storage.
> > >
> > > Now I need to modify the driver in such a way that the push down of
> > > aggregate functions additionally occurs.
> > >
> > > Calcite documentation has a hint:
> > > "If you want more control, you should write a planner rule. This will
> > allow
> > > you to push down expressions, to make a cost-based decision about
> whether
> > > to push down processing, and push down more complex operations such as
> > > join, aggregation, and sort."
> > >
> > > I really need advice on how I can push down the aggregate functions
> with
> > > minimal modification of the driver source code. I have to ignore the
> > > aggregate functions in SQL somehow and push them into queries in S3
> > Select
> > > so that the aggregation occurs on the S3 side and not in memory.
> > >
> > > If I try to replace ProjectableFilterableTable with TranslatableTable
> the
> > > code will become 10 times more complicated.
> > >
> > > Maybe there is some simpler way to push down the aggregates?
> > >
> > > If TranslatableTable is the only way to solve this problem, what
> > > minimalistic example can I use for this?
> > >
> > > Driver source code
> > > https://github.com/amannm/lake-driver
> > >
> > > Thanks,
> > > Vadim A.
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


-- 

Best,
Benchao Li

Re: Aggregates push down

Reply via email to