Re: Why is Filter's condition required to be flat?

2023-08-09 Thread Julian Hyde
Flattened expressions - so that AND never contains a child that is
AND, and OR never contains a child that is OR - is a canonized form
that seems to have more advantages than disadvantages. It's never
larger than the original (unlike CNF and DNF), frequently smaller than
the original, doesn't take much effort to canonize, and allows the
multiple consumers of expressions (simplifications, planner rules) to
make some assumptions that yield good results efficiently.

If you are only producing expressions you want Calcite to be liberal
in what it accepts; but if you are consuming expressions (e.g.
maintaining a rule) you want Calcite to be conservative in what it
accepts. You can't please both producers and consumers - it's a zero
sum game - but flattened expressions seemed to be a reasonable
compromise.

I hope there's a simple process to get your expressions flattened. If
there's not, log a bug, so that there's a documented process.

By the way, your note reminds me that we're not as diligent at
flattening OR as we are at flattening AND. We should fix that.

Julian

On Wed, Aug 9, 2023 at 9:09 AM Ian Bertolacci
 wrote:
>
> Hello,
> I’m curious about why Filter requires that the condition be a flattened tree?
> We have some transformations which occasionally result in non-flat trees, 
> which causes issues for us in testing.
>
> I know we can avoid this by constructing expressions using RelBuilder or the 
> RexUtil.composeConjunction, but our transformers work very similar to 
> RexShuttle, which simply clones a RexCall on reconstruction.
> So there are situations where a subexpression to an `AND` or `OR` expression 
> result in a nested `AND` or `OR` expression.
> One solution is to use RexUtil.flatten, but that only applies flattening for 
> subexpressions with the same operator, so that expressions like `AND( A, AND( 
> B, C ), OR( E, OR( F, G ) )`  only get partially flattened to only gets 
> flattened to `AND( A, B, C, OR( E, OR( F, G) )`  which still isn’t flat; the 
> correct flattening would be `AND( A, B, C, OR( E, F, G ) )`
> Similarly, something like `AND( A, AND( B, C ) ) == …` wouldn’t get flattened 
> at all.
>
> Is there a real reason for having this assertion?
> Would removing it cause problems?
> Thanks!
> -Ian J. Bertolacci


[jira] [Created] (CALCITE-5914) Cache compiled regular expressions in SQL function runtime

2023-08-09 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5914:


 Summary: Cache compiled regular expressions in SQL function runtime
 Key: CALCITE-5914
 URL: https://issues.apache.org/jira/browse/CALCITE-5914
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Cache compiled regular expressions (and other amortized work) in SQL function 
runtime.

Consider the following query:
{code}
SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*')
FROM emp
{code}

The first regular expression, {{A.*}}, is constant and can be compiled at 
prepare time or at the start of execution; the second regular expression '{{job 
|| '.*'}}' might vary from one row to the next. However if the {{job}} column 
has a small number of values it still might be beneficial to cache the compiled 
regular expression.

If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to 
{{java.util.regex.Pattern}}) then it would achieve benefits in both the 
constant and non-constant cases.

The cache needs to:
 * be thread-safe (in case queries are executing using multiple threads),
 * return thread-safe objects (as is {{Pattern}}),
 * have bounded space (so that a query doesn't blow memory with 1 million 
distinct regular expressions),
 * disposed after the query has terminated,
 * (ideally) share with regexes of the same language in the same query,
 * not conflict with regexes of different languages in the same query.

One possibility is to add an {{interface FunctionState}}, with subclasses 
including {{class RegexpCache}}, and if argument 1 of a function is a subclass 
of {{FunctionState}} the compiler would initialize the state in the generated 
code. The function can rely on the state argument being initialized, and being 
the same object from one call to the next. Example:

{code}
interface FunctionState {
}

class RegexpCache implements FunctionState {
  final Cache cache = ...;
}
{code}

This change should install the cache for all applicable functions, including 
LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_CONTAINS, REGEXP_REPLACE, 
other REGEXP_ functions, PARSE_URL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Why is Filter's condition required to be flat?

2023-08-09 Thread Ian Bertolacci
Hello,
I’m curious about why Filter requires that the condition be a flattened tree?
We have some transformations which occasionally result in non-flat trees, which 
causes issues for us in testing.

I know we can avoid this by constructing expressions using RelBuilder or the 
RexUtil.composeConjunction, but our transformers work very similar to 
RexShuttle, which simply clones a RexCall on reconstruction.
So there are situations where a subexpression to an `AND` or `OR` expression 
result in a nested `AND` or `OR` expression.
One solution is to use RexUtil.flatten, but that only applies flattening for 
subexpressions with the same operator, so that expressions like `AND( A, AND( 
B, C ), OR( E, OR( F, G ) )`  only get partially flattened to only gets 
flattened to `AND( A, B, C, OR( E, OR( F, G) )`  which still isn’t flat; the 
correct flattening would be `AND( A, B, C, OR( E, F, G ) )`
Similarly, something like `AND( A, AND( B, C ) ) == …` wouldn’t get flattened 
at all.

Is there a real reason for having this assertion?
Would removing it cause problems?
Thanks!
-Ian J. Bertolacci


Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

2023-08-09 Thread P.F. ZHAN
Thanks. I am planning to go through this operator and follow your
suggestions to handle my business.


On Wed, Aug 9, 2023 at 11:46 James Starr  wrote:

> Alternatively, you could post-process the results after all the
> optimizations are performed.  It is not a great deal of work to write a
> recursive Rel/Rex Shuttle.
>
> James
>
> On Tue, Aug 8, 2023 at 7:12 PM LakeShen  wrote:
>
> > From calcite's point of view, calcite is not binded to any engine, it is
> > not aware of the specific implementation of the underlying engine, as
> > julian said, SEARCH (and Sarg) is introduced for internal optimization of
> > calcite.
> >
> > In our project, we also do some extra processing for SEARCH (and Sarg),
> > maybe we could add a configuration parameter to control whether have the
> > SEARCH
> > (and Sarg) RexNode. For example,we could add a configuration parameter
> for
> > SqlToRelConverter if the user doesn't want a SEARCH (and Sarg)
> > expression,if false, SqlNode to RelNode would not have the SEARCH (and
> > Sarg) RexNode.
> >
> > Best, LakeShen
> >
> > Julian Hyde  于2023年8月9日周三 04:28写道:
> >
> > > The optimizations are the reason that SEARCH (and Sarg) exist. For the
> > > simplifier to handle all of the combinations of <, <=, >, >=, =, <>,
> and
> > > AND, OR, NOT is prohibitively expensive; if the same expressions are
> > > converted to Sargs they can be optimized using simple set operations.
> > >
> > >
> > > > On Aug 4, 2023, at 5:57 PM, P.F. ZHAN  wrote:
> > > >
> > > > Thanks, Julian, for your answer. Initially, I was thinking that
> SEARCH
> > > > might not be essential. Since Calcite first translates it into a
> SEARCH
> > > > operator, and then we can use RexUtil#expandSearch to rewrite it into
> > an
> > > > operator supported by Spark, it seems like doing the same job twice.
> > So,
> > > > instead, why not consider disabling this operator, thus avoiding the
> > need
> > > > to reconvert the operator back into an expression supported by other
> > > > execution engines, such as Spark, later on? Frankly speaking, I might
> > not
> > > > have taken into account the potential simplifications brought by the
> > > SEARCH
> > > > operator for optimizing the execution plan. If these
> > > > simplifications produce a more efficient execution plan or make the
> > > > optimization stage more efficient, then I'm open to exploring ways to
> > > > implement an equivalent transformation within Kylin, or even
> exploring
> > > the
> > > > possibility of creating a similar implementation in other execution
> > > engines
> > > > like Spark.
> > > >
> > > > On Sat, Aug 5, 2023 at 2:57 AM Julian Hyde 
> > > wrote:
> > > >
> > > >> I agree that it should be solved ‘by config’ but not by global
> config.
> > > The
> > > >> mere fact that you are talking to Spark (i.e. using the JDBC adapter
> > > with
> > > >> the Spark dialect) should be sufficient right?
> > > >>
> > > >> Put another way. Calcite’s internal representation for expressions
> is
> > > what
> > > >> it is. The fact that SEARCH is part of that representation has many
> > > >> benefits for simplification. Just expect there to be a a translation
> > > step
> > > >> from that representation to any backend.
> > > >>
> > > >> Julian
> > > >>
> > > >>
> > > >>> On Aug 4, 2023, at 7:22 AM, P.F. ZHAN  wrote:
> > > >>>
> > > >>> Very nice suggestion. I wonder can we introduce this feature by
> > config?
> > > >>> Maybe it’s better for users using more than one query engine to
> > > interpret
> > > >>> and execute query.
> > > >>>
> > > >>>
> > > >>> On Fri, Aug 4, 2023 at 22:03 Alessandro Solimando <
> > > >>> alessandro.solima...@gmail.com> wrote:
> > > >>>
> > >  Hello,
> > >  as LakeShen suggests, you can take a look into
> RexUtil#expandSearch,
> > > you
> > >  can see it in action in RexProgramTest tests, one example:
> > > 
> > > 
> > > 
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/test/java/org/apache/calcite/rex/RexProgramTest.java#L1710-L1727
> > > 
> > >  Best regards,
> > >  Alessandro
> > > 
> > >  On Fri, 4 Aug 2023 at 15:45, LakeShen 
> > > >> wrote:
> > > 
> > > > Hi P.F.ZHAN,in calcite,it has a method RexUtil#expandSearch to
> > expand
> > > > Search,maybe you could get some information from this method.
> > > >
> > > > There is also some logic to simplify Search in the
> > > > RexSimplify#simplifySearch method. I hope this could help you.
> > > > Here's the code: 1.
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L593
> > > > 2.
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L2132
> > > >
> > > > Best,
> > > >