Hi Gonzalo, you might want to check this WIP PR from Stamatis for Hive: https://github.com/apache/hive/pull/5249
Hive has its own physical plan (execution is based on Tez). The PR also introduces a Hive Spool operator. I am not familiar with the Pinot side of things but I think the PR might be relevant. Best regards, Alessandro On Thu, Sep 5, 2024, 12:26 Gonzalo Ortiz Jaureguizar <[email protected]> wrote: > You are right Julian, I was referring to Re*l*Nodes, not Re*x*Nodes. > > I didn't know about Spool. I've read that very educational Jira ticket and > the code using Spool in Calcite. Is very interesting to see the problems > you already have faced with Spools. > Given the current state of Spools in Calcite, I don't feel confident enough > to implement our solution on top of them. > It would be great to be able to work on having a complete Spool solution in > Cacite, but I don't think I will be able to do so, specially because in > Pinot we convert logical plans to physical plans without using Calcite > (something I would love to change, but there are always other priorities!). > > Therefore I think we are going to implement our own solution that looks for > subtrees once the logical planning is done. Hopefully in the future, with > the experience we got, we could try to contribute our solution to Calcite. > > Gonzalo > > El mié, 4 sept 2024 a las 19:48, Julian Hyde (<[email protected]>) > escribió: > > > Do you mean ‘common RelNodes’ rather than RexNodes? > > > > Are you aware of https://issues.apache.org/jira/browse/CALCITE-481 ? The > > Spool operator (and related cases) is the starting point for discussions > > about DAGs. It isn’t fully implemented, but at least we’d be using the > same > > terminology. > > > > Julian > > > > > > > On Sep 3, 2024, at 6:51 AM, Gonzalo Ortiz Jaureguizar < > > [email protected]> wrote: > > > > > > Hi there! > > > > > > In Pinot we want to work on a new optimization that lets us reuse some > > parts of the query plan. > > > Basically what we want is to change our nodes to be able to send the > > same data to multiple parent operators, transforming our trees into DAGs > > like shown in this diagram from Vladimir Ozerov post in Querify Labs < > > > https://www.querifylabs.com/blog/data-shuffling-in-distributed-sql-engines > > >: > > > > > > > > > > > > I've looked for older messages in the dev mailing list and I found some > > threads saying that the Calcite model is tree based and DAGs are not > > supported. If that is the case I will have to implement this optimization > > after the Calcite plan is generated, but I would like to avoid this > because > > we are trying to move more and more logic into Calcite procedures and > this > > would be a step back. > > > > >
