Re: SEARCH and Sargs

Stephen Carlin Sat, 18 Oct 2025 06:04:09 -0700

What you said makes perfect sense to me about restricting the functions.
Definitely think it was a good idea to create it and implement it.


But I do think you are slightly off about where the tension lies.  I think
the tension is on a very specific issue for the most part, at least for me.

If I'm understanding what you are saying about the tension correctly: I
think your NVL example is there to show that you can't really guess which
way to do the expand search?  And this is true.  However, the difference
between NVL and a case statement is prolly not something I care much
about.  There are only 2 parameters in an NVL function.  If you give me a
CASE when I originally had an NVL  my first reaction would be, "Where the
heck did that come from?" followed by  "meh, they do the same thing and the
efficiency isn't much different, the end user can figure out that they're
the same thing, I'll get around to fixing that later, if ever."

But the IN case is far, far different.  It is expanded as  "A = 1 OR A = 2
OR A =3 OR ...." and 100 more ORs which is a huge headache.  100 OR
statements is quite the nightmare compared to 1 IN statement.  I really
need that expansion to an IN operator.

Not sure how many operators fit the IN model as opposed to the NVL model,
but I'd imagine it's not a lot.  I'm guessing the main complaints came
about IN.

On Tue, Oct 14, 2025 at 4:17 PM Julian Hyde <[email protected]> wrote:

> SEARCH is a semi-join too. It’s just a semi-join to an infinite set (the
> points on the number line, or any totally-ordered type). :)
>
> I agree that it’s useful to have multiple functions, and mappings between
> them. We certainly allow multiple functions in the SQL we consume, and the
> SQL we produce (via the JDBC adapter). And we map between them by
> desugaring (e.g. "a BETWEEN b AND c" ==> "a >= b AND a <= c”.)
>
> It is very advantageous to restrict the set of functions that can be used
> in planner rules and simplifications. Since SEARCH can represent BETWEEN
> (on a constant range) and >, the argument goes, let’s convert them both to
> SEARCH, and our rules can be simpler.
>
> I am a fan of this approach. It is much harder to produce smart,
> consistent, robust planning rules if we let the surface area get too large.
>
> This approach is at tension with people who want and expect their
> operators to pass through planning unchanged. If someone wrote "NVL(a, b)"
> they are surprised if it comes out as "CASE WHEN a IS NULL THEN b ELSE a”.
> I guess we can mitigate by re-sugaring on the backend, before we generate
> SQL.
>
> Julian
>
>
>
>
> > On Oct 14, 2025, at 1:51 PM, Stephen Carlin <[email protected]>
> wrote:
> >
> > Heh, it's funny you say that because Stamatis filed this improvement
> > request in a meeting where I was complaining about things being more
> > complicated and having broken code because of SEARCH :)
> >
> > My main complaint is this, and perhaps it makes sense to file another
> Jira
> > for a potential new improvement:
> >
> > I totally get SEARCH and its usefulness.  So I have no complaints about
> the
> > existence of SEARCH, why it is needed, and why it makes sense.
> >
> > My problem with SEARCH is simply that it makes my code more complex.  I
> > kinda think of it this way:  The SEARCH operator is a superset of the
> > BETWEEN operator.  And we either are ok with having a representation for
> > both or we are stuck with it due to legacy reasons.  I prefer the former
> > explanation, heh.  So why not have an IN_FUNC operator as well?  No need
> to
> > get rid of SEARCH!  Just keep it as a superset.  And then create a
> RexUtil
> > method that does the work for me that converts a relevant SEARCH to an
> > IN_FUNC.
> >
> > Perhaps the general issue with IN is this: Like SEARCH, IN also has a
> > broader definition than just a function, but the broader definition is
> > different from SEARCH.  In our internal discussion, we talked about how
> > "IN" is always a semi-join of some nature, sometimes of course with a
> > subquery.  I believe this is how Calcite treats IN.  But in practice, IN
> > has a specialized case as something that fits right into what a
> > RexCall/function does.  I think many databases (at least the two I work
> on)
> > implement this specialized case of IN as a function internally, just like
> > BETWEEN.  So to have a 1:1 RexCall operator to function for IN just seems
> > to make intuitive sense for me.
> >
> > Perhaps we don't have a problem with the existence of a BETWEEN operator
> > stems from the fact that BETWEEN doesn't have the same type of overload
> > that IN does.
> >
> > So yeah though: If we don't think that an IN type operator is good for
> > Calcite, I need to develop code local to my database that does a
> conversion
> > to a function which differs from my normal RexCall conversion.   And my
> > intuition tells me that this effort has already been done elsewhere.
> Which
> > is why I think it belongs in Calcite, where the community would benefit.
> >
> > Anyway, having said all this:  the improvement in CALCITE-7226 will
> > definitely help me out because creating all those ORs before generating a
> > SEARCH seems quite inefficient. It might even lead to problems in my code
> > as the multiple ORs generate many layers of RexNodes which I think
> created
> > some really slow compilations for me during one of my tests.
> >
> > On Tue, Oct 14, 2025 at 11:30 AM Julian Hyde <[email protected]>
> wrote:
> >
> >> It’s nice to see issues like
> >> https://issues.apache.org/jira/browse/CALCITE-7226 "Convert IN to
> SEARCH
> >> in SqlToRelConverter” being logged.
> >>
> >> When I introduced SEARCH I got a lot of complaints about how I was
> making
> >> plans more complicated and generally breaking things. All I was really
> >> doing was bringing the concept of a Sarg [1] into our logical algebra.
> I’m
> >> glad that people now see that SEARCH leads to more robust, efficient
> plans.
> >>
> >> Julian
> >>
> >> [1] https://en.wikipedia.org/wiki/Sargable
>
>

Re: SEARCH and Sargs

Reply via email to