I am not sure I got your implication by "pollute". If you mean changes, yes, it 
requires some changes in rules. Do we need to change enumerables? Not 
necessary. We can just add a security guard saying that it is supported. Not 
everyone requires the Enumerable operators to support everything. More 
importantly, currently there is no logic or rules to translate sub-query 
directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. 
Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. That 
means we don't even have the chance to generate the NOT IN anti join. Is that 
still a concern?

Even if some day, some contributor extends Calcite's parser and 
SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN anti 
join, we still have chance to disable it. Is that still a concern?

There are many ways to play it safe.

> Brainstorming: maybe we could consider it as a separate logical operator
> (with its corresponding enumerable implementation)?
It doesn't sound cool. It requires much more work. You have to duplicate all 
the rules, metadata handler that deal with LogicalJoin, and for some rule that 
matches Join base class, you have to check it is a LogicalJoin or the logical 
operator for ANTI_NOTIN.

On 2020/07/20 08:28:42, Ruben Q L <rube...@gmail.com> wrote: 
> I have some concerns that this new type would "pollute" the existing Join
> logic, rules and enumerable implementations.
> 
> Brainstorming: maybe we could consider it as a separate logical operator
> (with its corresponding enumerable implementation)?
> 
> 
> Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan <h.y...@alibaba-inc.com> a
> écrit :
> 
> > I agree that NOT IN is toxic, and it is error-prone.
> > But you can't prevent people writing SQL with not in sub-queries, would
> > you rather let optimizer generate inefficient plan?
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Julian Hyde<jh...@apache.org>
> > 日 期:2020年07月20日 11:56:35
> > 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN
> >
> > Yuck!
> >
> > NOT IN is toxic. I'd rather keep it out of the algebra.
> >
> > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan <hy...@apache.org> wrote:
> > >
> > > Hi all,
> > >
> > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks
> > to Ruben for reminding).
> > > For some simple boolean context NOT_IN subquery, we can't transform it
> > to ANTI join. e.g.:
> > >
> > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is
> > nullable
> > >
> > > Because if there is a null value in the results of subquery, the NOT IN
> > predicate will return false, the whole query returns empty. And in Calcite,
> > the plan for this kind of query is inefficient.
> > >
> > > If we have ANTI_NOTIN to represent this kind of join, we can generate
> > more efficient plan, as long as the query executor support it.
> > >
> > > Thoughts?
> > >
> > > Haisheng Yuan
> > >
> > >
> >
> 

Reply via email to