Re: Improve carbondata CDC performance

akashrn5 Thu, 18 Feb 2021 02:23:14 -0800

Hi,

i got your point basically you wanted to make this logic to be useful for
normal join also.
But for the same thing i had raised a discussion before,  you can check
here.
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Join-optimization-with-Carbondata-s-metadata-tt103186.html#a103187>


Since Spark already handling in new version, everyone's opinion was not to
make before spark. SO this will be specific for CDC as here its little
different as we are joining intermediate dataframe with source to get the
files to scan. SO this should be fine.

Only problem with the cartesian product as mentioned in design doc, you can
check and give your inputs on that, i also have one more solution to search
in a distributed way with a interval tree data structure.

Thanks,
Akash



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Improve carbondata CDC performance

Reply via email to