Calcite's optimizer is based off the Volcano optimizer[0]. In that
paper you'll find an outline of the algorithm which is basically
equivalent to what Calcite uses. Adding multiple systems doesn't
complicate things very much. The main addition used by Calcite is what
we call a "convention" trait that allows the optimizer to deal with
expressions across multiple systems. More details are available in a
recently published paper on Calcite [1].

One important caveat to note is that the cost model used is not likely
to reflect the actual cost of query execution in many cases. It's
generally "good enough" in that the ordering of plans by cost will be
approximately correct. So although the optimal plan will be selected
according to the cost model, the plan which is actually the best in
practice may not be selected. That said, I would expect Calcite will
pick a plan which is generally quite close to the optimal,  but we
have no guarantee of this.

[0] 
https://pdfs.semanticscholar.org/a817/a3e74d1663d9eb35b4baf3161ab16f57df85.pdf
[1] https://arxiv.org/pdf/1802.10233.pdf

--
Michael Mior
mm...@apache.org

Le mar. 5 févr. 2019 à 15:52, Lekshmi <lekshmib...@gmail.com> a écrit :
>
> Hi,
>    I would like to know about the Calcite CBO in detail, including how it
> deals with global optimization when multiple processing systems are
> associated with it. Any documentation, pointers are much appreciated.
>
>
> Thanks and Regards
>
> Lekshmi B.G
> Email: lekshmib...@gmail.com

Reply via email to