Re: [DISCUSSION]Join optimization with Carbondata's metadata

Ajantha Bhat Tue, 10 Nov 2020 02:14:45 -0800

Hi Akash,
*Just my opinion*, once the spark supports it, we can handle it in carbon
if something needs to be supported.
*Doing this change independent of spark can make us lose the advantage once
spark brings it as default. *


Qubole's dynamic filtering is already merged in prestosql and this will be
merged in spark also as it is beneficial.
So, Maybe we can first support spark 3.X with carbon (which will first
bring the DPP [dynmic partition pruning] optimization)
and handle dynamic filtering when spark supports it.


Thanks,
Ajantha

On Tue, Nov 10, 2020 at 3:28 PM akashrn5 <akashr...@gmail.com> wrote:

> please note below points addition to above
>
> 1. There is a jira in spark similar what i have raised,
>
> https://issues.apache.org/jira/browse/SPARK-27227
> they are also aimed at same, but its still in progress and target for spark
> 3.1.0.
> Here they plan to first execute a query on right table to get the min max,
> bloom index like that and
> apply to left, still the design in review, can go through once.
> We can look more deeper into it once.
>
> 2.
>
> https://www.qubole.com/blog/enhance-spark-performance-with-dynamic-filtering/
> This is also similar one but its in private version,
> So please consider this also.
>
> With the above info and our segmentinfo meta, or may be we do store in
> cache
> once we scan the small table. we can use that info to reduce scan for big
> table.
> As we still do not have spark 3 integration and still dynamic filtering is
> in design phase.
>
> Please give your inputs, we can discuss further.
>
> Thanks
>
> Akash R
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [DISCUSSION]Join optimization with Carbondata's metadata

Reply via email to