Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

James Starr Tue, 08 Aug 2023 20:46:06 -0700

Alternatively, you could post-process the results after all the
optimizations are performed.  It is not a great deal of work to write a
recursive Rel/Rex Shuttle.


James

On Tue, Aug 8, 2023 at 7:12 PM LakeShen <shenleifight...@gmail.com> wrote:

> From calcite's point of view, calcite is not binded to any engine, it is
> not aware of the specific implementation of the underlying engine, as
> julian said, SEARCH (and Sarg) is introduced for internal optimization of
> calcite.
>
> In our project, we also do some extra processing for SEARCH (and Sarg),
> maybe we could add a configuration parameter to control whether have the
> SEARCH
> (and Sarg) RexNode. For example,we could add a configuration parameter for
> SqlToRelConverter if the user doesn't want a SEARCH (and Sarg)
> expression,if false, SqlNode to RelNode would not have the SEARCH (and
> Sarg) RexNode.
>
> Best, LakeShen
>
> Julian Hyde <jhyde.apa...@gmail.com> 于2023年8月9日周三 04:28写道：
>
> > The optimizations are the reason that SEARCH (and Sarg) exist. For the
> > simplifier to handle all of the combinations of <, <=, >, >=, =, <>, and
> > AND, OR, NOT is prohibitively expensive; if the same expressions are
> > converted to Sargs they can be optimized using simple set operations.
> >
> >
> > > On Aug 4, 2023, at 5:57 PM, P.F. ZHAN <dethr...@gmail.com> wrote:
> > >
> > > Thanks, Julian, for your answer. Initially, I was thinking that SEARCH
> > > might not be essential. Since Calcite first translates it into a SEARCH
> > > operator, and then we can use RexUtil#expandSearch to rewrite it into
> an
> > > operator supported by Spark, it seems like doing the same job twice.
> So,
> > > instead, why not consider disabling this operator, thus avoiding the
> need
> > > to reconvert the operator back into an expression supported by other
> > > execution engines, such as Spark, later on? Frankly speaking, I might
> not
> > > have taken into account the potential simplifications brought by the
> > SEARCH
> > > operator for optimizing the execution plan. If these
> > > simplifications produce a more efficient execution plan or make the
> > > optimization stage more efficient, then I'm open to exploring ways to
> > > implement an equivalent transformation within Kylin, or even exploring
> > the
> > > possibility of creating a similar implementation in other execution
> > engines
> > > like Spark.
> > >
> > > On Sat, Aug 5, 2023 at 2:57 AM Julian Hyde <jhyde.apa...@gmail.com>
> > wrote:
> > >
> > >> I agree that it should be solved ‘by config’ but not by global config.
> > The
> > >> mere fact that you are talking to Spark (i.e. using the JDBC adapter
> > with
> > >> the Spark dialect) should be sufficient right?
> > >>
> > >> Put another way. Calcite’s internal representation for expressions is
> > what
> > >> it is. The fact that SEARCH is part of that representation has many
> > >> benefits for simplification. Just expect there to be a a translation
> > step
> > >> from that representation to any backend.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Aug 4, 2023, at 7:22 AM, P.F. ZHAN <dethr...@gmail.com> wrote:
> > >>>
> > >>> Very nice suggestion. I wonder can we introduce this feature by
> config?
> > >>> Maybe it’s better for users using more than one query engine to
> > interpret
> > >>> and execute query.
> > >>>
> > >>>
> > >>> On Fri, Aug 4, 2023 at 22:03 Alessandro Solimando <
> > >>> alessandro.solima...@gmail.com> wrote:
> > >>>
> > >>>> Hello,
> > >>>> as LakeShen suggests, you can take a look into RexUtil#expandSearch,
> > you
> > >>>> can see it in action in RexProgramTest tests, one example:
> > >>>>
> > >>>>
> > >>>>
> > >>
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/test/java/org/apache/calcite/rex/RexProgramTest.java#L1710-L1727
> > >>>>
> > >>>> Best regards,
> > >>>> Alessandro
> > >>>>
> > >>>> On Fri, 4 Aug 2023 at 15:45, LakeShen <shenleifight...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> Hi P.F.ZHAN,in calcite,it has a method RexUtil#expandSearch to
> expand
> > >>>>> Search,maybe you could get some information from this method.
> > >>>>>
> > >>>>> There is also some logic to simplify Search in the
> > >>>>> RexSimplify#simplifySearch method. I hope this could help you.
> > >>>>> Here's the code: 1.
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L593
> > >>>>> 2.
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L2132
> > >>>>>
> > >>>>> Best,
> > >>>>> LakeShen
> > >>>>>
> > >>>>> Soumyadeep Mukhopadhyay <soumyamy...@gmail.com> 于2023年8月4日周五
> > 20:29写道：
> > >>>>>
> > >>>>>> Thank you, shall explore more on this! :)
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, 4 Aug 2023 at 5:53 PM, P.F. ZHAN <dethr...@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>>> Aha, I'm using Apache Kylin which uses Calcite to generate a
> > logical
> > >>>>>> plan,
> > >>>>>>> then convert to Spark plan to execute a query. Given that Calcite
> > has
> > >>>>>> more
> > >>>>>>> operations for aggregations, and Kylin  wants to take full
> > advantage
> > >>>> of
> > >>>>>>> precomputed cubes (something like Calcite's materialized views),
> it
> > >>>>> uses
> > >>>>>>> both Calcite and Spark(for distribution computing). Maybe it's
> wild
> > >>>>> and a
> > >>>>>>> little fun, but it does works well on many scenarios.
> > >>>>>>>
> > >>>>>>> On Fri, Aug 4, 2023 at 8:10 PM Soumyadeep Mukhopadhyay <
> > >>>>>>> soumyamy...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>> I am curious about your use case. Are you not losing out on the
> > >>>>>>>> optimisations of Calcite when you are using Spark? Is it
> possible
> > >>>> for
> > >>>>>> you
> > >>>>>>>> to share a general approach where we will be able to keep the
> > >>>>>>> optimisations
> > >>>>>>>> done by Calcite and use Spark on top of it?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Fri, 4 Aug 2023 at 5:19 PM, P.F. ZHAN <dethr...@gmail.com>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Generally speaking, the SEARCH operator is very good, but when
> we
> > >>>>> use
> > >>>>>>>>> Calcite to optimize the logical plan and then use Spark to
> > >>>> execute,
> > >>>>>>> this
> > >>>>>>>> is
> > >>>>>>>>> unsupported. So is there a more elegant way to close the SEARCH
> > >>>>>>> operator?
> > >>>>>>>>> Or how to convert the SEARCH operator to the IN operator before
> > >>>>>>>> converting
> > >>>>>>>>> the Calcite logical plan to the Spark logical plan? If we do
> > >>>> this,
> > >>>>> we
> > >>>>>>>> need
> > >>>>>>>>> to consider Join / Filter, are there any other RelNodes?
> > >>>>>>>>>
> > >>>>>>>>> Maybe, this optimization is optional more better at present for
> > >>>>> many
> > >>>>>>>> query
> > >>>>>>>>> execution engine does not support this operator?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

Reply via email to