Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

Julian Hyde Tue, 08 Aug 2023 13:28:35 -0700

The optimizations are the reason that SEARCH (and Sarg) exist. For the 
simplifier to handle all of the combinations of <, <=, >, >=, =, <>, and AND, 
OR, NOT is prohibitively expensive; if the same expressions are converted to 
Sargs they can be optimized using simple set operations.



> On Aug 4, 2023, at 5:57 PM, P.F. ZHAN <dethr...@gmail.com> wrote:
> 
> Thanks, Julian, for your answer. Initially, I was thinking that SEARCH
> might not be essential. Since Calcite first translates it into a SEARCH
> operator, and then we can use RexUtil#expandSearch to rewrite it into an
> operator supported by Spark, it seems like doing the same job twice. So,
> instead, why not consider disabling this operator, thus avoiding the need
> to reconvert the operator back into an expression supported by other
> execution engines, such as Spark, later on? Frankly speaking, I might not
> have taken into account the potential simplifications brought by the SEARCH
> operator for optimizing the execution plan. If these
> simplifications produce a more efficient execution plan or make the
> optimization stage more efficient, then I'm open to exploring ways to
> implement an equivalent transformation within Kylin, or even exploring the
> possibility of creating a similar implementation in other execution engines
> like Spark.
> 
> On Sat, Aug 5, 2023 at 2:57 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:
> 
>> I agree that it should be solved ‘by config’ but not by global config. The
>> mere fact that you are talking to Spark (i.e. using the JDBC adapter with
>> the Spark dialect) should be sufficient right?
>> 
>> Put another way. Calcite’s internal representation for expressions is what
>> it is. The fact that SEARCH is part of that representation has many
>> benefits for simplification. Just expect there to be a a translation step
>> from that representation to any backend.
>> 
>> Julian
>> 
>> 
>>> On Aug 4, 2023, at 7:22 AM, P.F. ZHAN <dethr...@gmail.com> wrote:
>>> 
>>> Very nice suggestion. I wonder can we introduce this feature by config?
>>> Maybe it’s better for users using more than one query engine to interpret
>>> and execute query.
>>> 
>>> 
>>> On Fri, Aug 4, 2023 at 22:03 Alessandro Solimando <
>>> alessandro.solima...@gmail.com> wrote:
>>> 
>>>> Hello,
>>>> as LakeShen suggests, you can take a look into RexUtil#expandSearch, you
>>>> can see it in action in RexProgramTest tests, one example:
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/test/java/org/apache/calcite/rex/RexProgramTest.java#L1710-L1727
>>>> 
>>>> Best regards,
>>>> Alessandro
>>>> 
>>>> On Fri, 4 Aug 2023 at 15:45, LakeShen <shenleifight...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi P.F.ZHAN,in calcite,it has a method RexUtil#expandSearch to expand
>>>>> Search,maybe you could get some information from this method.
>>>>> 
>>>>> There is also some logic to simplify Search in the
>>>>> RexSimplify#simplifySearch method. I hope this could help you.
>>>>> Here's the code: 1.
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L593
>>>>> 2.
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L2132
>>>>> 
>>>>> Best,
>>>>> LakeShen
>>>>> 
>>>>> Soumyadeep Mukhopadhyay <soumyamy...@gmail.com> 于2023年8月4日周五 20:29写道：
>>>>> 
>>>>>> Thank you, shall explore more on this! :)
>>>>>> 
>>>>>> 
>>>>>> On Fri, 4 Aug 2023 at 5:53 PM, P.F. ZHAN <dethr...@gmail.com> wrote:
>>>>>> 
>>>>>>> Aha, I'm using Apache Kylin which uses Calcite to generate a logical
>>>>>> plan,
>>>>>>> then convert to Spark plan to execute a query. Given that Calcite has
>>>>>> more
>>>>>>> operations for aggregations, and Kylin  wants to take full advantage
>>>> of
>>>>>>> precomputed cubes (something like Calcite's materialized views), it
>>>>> uses
>>>>>>> both Calcite and Spark(for distribution computing). Maybe it's wild
>>>>> and a
>>>>>>> little fun, but it does works well on many scenarios.
>>>>>>> 
>>>>>>> On Fri, Aug 4, 2023 at 8:10 PM Soumyadeep Mukhopadhyay <
>>>>>>> soumyamy...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I am curious about your use case. Are you not losing out on the
>>>>>>>> optimisations of Calcite when you are using Spark? Is it possible
>>>> for
>>>>>> you
>>>>>>>> to share a general approach where we will be able to keep the
>>>>>>> optimisations
>>>>>>>> done by Calcite and use Spark on top of it?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, 4 Aug 2023 at 5:19 PM, P.F. ZHAN <dethr...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Generally speaking, the SEARCH operator is very good, but when we
>>>>> use
>>>>>>>>> Calcite to optimize the logical plan and then use Spark to
>>>> execute,
>>>>>>> this
>>>>>>>> is
>>>>>>>>> unsupported. So is there a more elegant way to close the SEARCH
>>>>>>> operator?
>>>>>>>>> Or how to convert the SEARCH operator to the IN operator before
>>>>>>>> converting
>>>>>>>>> the Calcite logical plan to the Spark logical plan? If we do
>>>> this,
>>>>> we
>>>>>>>> need
>>>>>>>>> to consider Join / Filter, are there any other RelNodes?
>>>>>>>>> 
>>>>>>>>> Maybe, this optimization is optional more better at present for
>>>>> many
>>>>>>>> query
>>>>>>>>> execution engine does not support this operator?
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

Reply via email to