Let’s bring this discussion back to the original question: for these kinds of 
predicates, is Sarg a better data structure than IN?

It absolutely is. There are algorithms in RexSimplify that, for each term in an 
OR list, simplify using all available predicates, and when they have simplified 
that term, add it to the list of predicates. It is therefore cartesian in the 
size of the OR list.

The sarg data structure converts many kinds of large OR-lists and predicate 
sets into a single term. Because that term is immutable and based on sorted 
ranges (or points) it can be handled efficiently (e.g. two sargs can be 
intersected using a merge, rather than nested loops). That will make our 
simplification process more efficient.

Whether we then add new classes of optimization is a discussion for another day.

Julian


> On Aug 10, 2020, at 1:13 PM, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> 
> wrote:
> 
> Julian>I cannot see any cases that would become more expensive.
> 
> I mean the optimization passes might be complicated, not the storage of
> sargs themselves.
> For instance, CALCITE-4155 converts a in (1, 2, 3, 4, 5) to a >= 1 and a <=
> 5
> Is it a significant improvement?
> Is between representation always better?
> Does the case appear very often in practice?
> 
> However, the optimization does take time, so it looks like extra logic with
> doubtful gains.
> Of course, it is unlikely sargs would be a major time consumer, however, if
> we keep adding tiny simplifications we might
> end up with a condition where the planning is slow and we have no way to
> fix it.
> 
> I'm ok with people updating RexSimplify (and 4155 looks like an innocent
> feature), however, I think the current design is not really scalable
> (e.g. it might process the same expression multiple times).
> 
> Vladimir

Reply via email to