Calcite's optimizer is based off the Volcano optimizer[0]. In that paper you'll find an outline of the algorithm which is basically equivalent to what Calcite uses. Adding multiple systems doesn't complicate things very much. The main addition used by Calcite is what we call a "convention" trait that allows the optimizer to deal with expressions across multiple systems. More details are available in a recently published paper on Calcite [1].
One important caveat to note is that the cost model used is not likely to reflect the actual cost of query execution in many cases. It's generally "good enough" in that the ordering of plans by cost will be approximately correct. So although the optimal plan will be selected according to the cost model, the plan which is actually the best in practice may not be selected. That said, I would expect Calcite will pick a plan which is generally quite close to the optimal, but we have no guarantee of this. [0] https://pdfs.semanticscholar.org/a817/a3e74d1663d9eb35b4baf3161ab16f57df85.pdf [1] https://arxiv.org/pdf/1802.10233.pdf -- Michael Mior mm...@apache.org Le mar. 5 févr. 2019 à 15:52, Lekshmi <lekshmib...@gmail.com> a écrit : > > Hi, > I would like to know about the Calcite CBO in detail, including how it > deals with global optimization when multiple processing systems are > associated with it. Any documentation, pointers are much appreciated. > > > Thanks and Regards > > Lekshmi B.G > Email: lekshmib...@gmail.com