Hi Julian,

Thanks for opening the discussion.

In general, I am convinced of the value and importance of the proposal.

I want to discuss more about the algebra involved, as the simplified range
sets may not have the minimum computational costs.
Some examples:

1. Suppose we have expression x in (1, 2, 4, 5, 6).

The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
4 and x <= 6), which represents 4 comparisons and 3 logical operations.
Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
which represents 5 comparisons and 4 logical operations.
It's hard to say which one has a smaller computational cost.

2. Suppose we have expression x in (1, 2, 4, 5)
The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
4 and x <= 5).
Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
I am not sure if the range set implementation supports this type of
simplification.

Therefore, to address above problems,
1. We may need a model to estimate the cost, and the simplification process
should be guided by the cost model.
2. Maybe we need to extend the range set implementation so it produces
expressions with minimum computational cost.

Best,
Liya Fan





On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> We have had several discussions over the years about how to represent
> IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
>
> I have generally taken the position that we should expand to ORs (e.g. “x
> = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> IN in RexCall.
>
> I have given this some further thought, as part of a couple of RexSimplify
> bugs [1] [2] and I now think we should replace IN with something more
> powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> but also ranges.
>
> Today you would write
>
>   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
>
> With sargs, you would instead write
>
>   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> [3..3], [5..5]]”)))
>
> There is a new operator IN_SARG, and a new node type RexSarg (it could
> almost be a RexLiteral).
>
> Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> we invest in sarg support in RexSimplify and RelOptPredicateList, that
> investment is likely to pay dividends in better plans.
>
> Guava RangeSets, and hence sargs, have support for discrete domains, so
> they can easily optimize ">2 AND <4" to "3”.
>
> Sargs would be the preferred (therefore canonical) form for any AND or OR
> list that has more than one comparison on the same operand (e.g. "x > 3 AND
> x < 17 AND x <> 10”).
>
> This proposal would subsume IN and therefore we would stop supporting IN
> in RexCall.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> https://issues.apache.org/jira/browse/CALCITE-4155>
>
> [2] https://github.com/julianhyde/calcite/tree/4159-simplify <
> https://github.com/julianhyde/calcite/tree/4159-simplify>
>
> [3] https://en.wikipedia.org/wiki/Sargable <
> https://en.wikipedia.org/wiki/Sargable>

Reply via email to