Hi Gonzalo,
you might want to check this WIP PR from Stamatis for Hive:
https://github.com/apache/hive/pull/5249

Hive has its own physical plan (execution is based on Tez). The PR also
introduces a Hive Spool operator.

I am not familiar with the Pinot side of things but I think the PR might be
relevant.

Best regards,
Alessandro

On Thu, Sep 5, 2024, 12:26 Gonzalo Ortiz Jaureguizar <[email protected]>
wrote:

> You are right Julian, I was referring to Re*l*Nodes, not Re*x*Nodes.
>
> I didn't know about Spool. I've read that very educational Jira ticket and
> the code using Spool in Calcite. Is very interesting to see the problems
> you already have faced with Spools.
> Given the current state of Spools in Calcite, I don't feel confident enough
> to implement our solution on top of them.
> It would be great to be able to work on having a complete Spool solution in
> Cacite, but I don't think I will be able to do so, specially because in
> Pinot we convert logical plans to physical plans without using Calcite
> (something I would love to change, but there are always other priorities!).
>
> Therefore I think we are going to implement our own solution that looks for
> subtrees once the logical planning is done. Hopefully in the future, with
> the experience we got, we could try to contribute our solution to Calcite.
>
> Gonzalo
>
> El mié, 4 sept 2024 a las 19:48, Julian Hyde (<[email protected]>)
> escribió:
>
> > Do you mean ‘common RelNodes’ rather than RexNodes?
> >
> > Are you aware of https://issues.apache.org/jira/browse/CALCITE-481 ? The
> > Spool operator (and related cases) is the starting point for discussions
> > about DAGs. It isn’t fully implemented, but at least we’d be using the
> same
> > terminology.
> >
> > Julian
> >
> >
> > > On Sep 3, 2024, at 6:51 AM, Gonzalo Ortiz Jaureguizar <
> > [email protected]> wrote:
> > >
> > > Hi there!
> > >
> > > In Pinot we want to work on a new optimization that lets us reuse some
> > parts of the query plan.
> > > Basically what we want is to change our nodes to be able to send the
> > same data to multiple parent operators, transforming our trees into DAGs
> > like shown in this diagram from Vladimir Ozerov post in Querify Labs <
> >
> https://www.querifylabs.com/blog/data-shuffling-in-distributed-sql-engines
> > >:
> > >
> > >
> > >
> > > I've looked for older messages in the dev mailing list and I found some
> > threads saying that the Calcite model is tree based and DAGs are not
> > supported. If that is the case I will have to implement this optimization
> > after the Calcite plan is generated, but I would like to avoid this
> because
> > we are trying to move more and more logic into Calcite procedures and
> this
> > would be a step back.
> >
> >
>

Reply via email to