Re: Aggregates push down

Вадим Ахмедов Wed, 22 Jun 2022 00:09:54 -0700

Hi Benchao,
Thank you very much for your reply. Unfortunately, I did not find what you
wrote in the Calcite documentation. It seems to be very sketchy. There is
source code for adapters with processing aggregates, for example, Mongodb,
but to understand thoroughly this it needs to spend a lot of time, which is
often not enough. If there were examples in documentation explaining the
minimum implementation of pushdown projections, the minimum implementation
of pushdown filters, and the same for aggregates, it would be very helpful.


сб, 18 июн. 2022 г. в 11:50, Benchao Li <libenc...@apache.org>:

> Hi Вадим,
>
> I'd like to share how the projections and filters are pushed down
> in the first place.
>
> 1. Firstly we should have a RelNode which can do projections and
> filters, and in Calcite, this is done by BindableTableScan[1].
> 2. Then we need a rule to match such as Filter/Project on top of Scan,
> and push the filters into the Scan, and in Calcite this is done
> by FilterTableScanRule[2] and ProjectTableScanRule[3].
> 3. Finally, we should translate the Scan with filters and/or projections
> to a executable form, this may be different for different projections
> because they have their own physical representations. In Calcite,
> BindableTableScan will be transformed to TableScanNode[4], which
> will further push filters and projections into
> ProjectableFilterableTable[5].
>
> Hence, to extend Calcite to push aggregations into Scan, you need
> the same process. You need a physical Scan node which can do aggregations,
> and a rule to match Aggregate on top of Scan to push it down. Then you also
> need to implement the corresponding physical logics.
>
> If you want the Scan node to do all the projection/filter/aggregation
> pushdown,
> you need to be careful to deal with the mix of them, because generally they
> are not pushed down in one go, e.g. you may push a aggregation into a Scan
> which has been pushed the filters down.
>
> Hope this helps~
>
> [1]
>
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L207
> [2]
>
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/FilterTableScanRule.java#L57
> [3]
>
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/ProjectTableScanRule.java#L57
> [4]
>
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L63
> [5]
>
> https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/schema/ProjectableFilterableTable.java#L38
>
> Вадим Ахмедов <akhmedov.va...@gmail.com> 于2022年6月17日周五 16:59写道：
>
> > Hi!
> >
> > I'm modifying a driver based on Apache Calcite that works with AWS S3
> > storage using SQL queries. The interaction with S3 storage uses the S3
> > Select dialect which is very similar to SQL. The driver uses
> > ProjectableFilterableTable to scan CSV data loaded from AWS. The filters
> as
> > a list of RexNodes are used in the scan method to transform SQL queries
> > into AWS S3 Select queries. Thus push down of projects and filters is
> done
> > into requests to the S3 storage.
> >
> > Now I need to modify the driver in such a way that the push down of
> > aggregate functions additionally occurs.
> >
> > Calcite documentation has a hint:
> > "If you want more control, you should write a planner rule. This will
> allow
> > you to push down expressions, to make a cost-based decision about whether
> > to push down processing, and push down more complex operations such as
> > join, aggregation, and sort."
> >
> > I really need advice on how I can push down the aggregate functions with
> > minimal modification of the driver source code. I have to ignore the
> > aggregate functions in SQL somehow and push them into queries in S3
> Select
> > so that the aggregation occurs on the S3 side and not in memory.
> >
> > If I try to replace ProjectableFilterableTable with TranslatableTable the
> > code will become 10 times more complicated.
> >
> > Maybe there is some simpler way to push down the aggregates?
> >
> > If TranslatableTable is the only way to solve this problem, what
> > minimalistic example can I use for this?
> >
> > Driver source code
> > https://github.com/amannm/lake-driver
> >
> > Thanks,
> > Vadim A.
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: Aggregates push down

Reply via email to