Re: [DISCUSS] New RexNode: RexListCmp

2020-07-21 Thread Scott Reynolds
I wanted to reply and share our recent requirement for handling SQL like
the following `error_code IN (3002, 3030)' and the challenges we
faced. For our implementation on top of Apache Kudu, each disjunction
creates a `Scanner' – a resource we need to limit as it represents a
denial of service attack vector (e.g. too many scanners, heap fills
up). Good news for us is Kudu ships with an [`inListPredicate'] and we
expected a plan to include the `SqlKind.IN' as the function which we
could translate into `inListPredicate'. We were surprised when it didn't
do that. We did eventually make this work for our customers with a hack
below but it is not valid plan – for instance unparsing the plan
produces invalid SQL query – and therefore is brittle (but *works*
:fingers-crossed:) .

┌
│ // This is not the correct use of Array.
│ final RelDataType listType =
builder.getTypeFactory().createArrayType(fieldType, -1);
│ return builder.call(SqlStdOperatorTable.IN,
│  builder.field(conditionTableName, columnName),
│  rexBuilder.makeLiteral(resultValue, listType, true));
└
We filed a ticket to do it the correct way, which is to take all the
disjunctions, and "un-parse" them into `inListPredicate' calls *if
possible*. This struck us as pretty dense code *but* would apply to
other disjunctions.

It would be *great* if Calcite shipped with a `RexCall' that our
implementation could translate with little effort into a
`inListPredicate'.


[`inListPredicate']
https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduPredicate.html#newInListPredicate-org.apache.kudu.ColumnSchema-java.util.List-

On Mon, Jul 20, 2020 at 3:09 PM Stamatis Zampetakis 
wrote:

> Another quick thought as far as it concerns the IN operator would be to use
> RexCall as it is right now where the first operand in the list is a
> RexInputRef for instance and the rest are the literals.
> I assume that taking this direction would need to change a bit the
> respective SqlOperator.
>
> I haven't thought of this thoroughly so maybe there are important things
> that I am missing.
>
> Best,
> Stamatis
>
>
> On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde  wrote:
>
> > The name isn't very intuitive.
> >
> > The concept of a list and a comparison operator seems OK. As Vladimir
> > points out, it is somewhat similar to RexSubQuery, so maybe this could
> > be a sub-class (but organizing the data a bit more efficiently).
> >
> > I would be very wary of null semantics. RexNode scalar operators are
> > forced to do 3-valued logic, but this is almost a relational operator
> > and it would be better without that burden.
> >
> > Julian
> >
> >
> >
> > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
> >  wrote:
> > >
> > > >Do you know what is the impact on Enumerable implementation?
> > >
> > > I guess there are plenty of options there.
> > >
> > > The key question regarding RexListCmp is as we introduce a new Rex
> node,
> > > all the planning rules and all engines
> > > must support it somehow.
> > >
> > > Technically speaking, we have RexSubQuery.
> > > Haisheng, have you considered an option to stick with RexSubQuery to
> > avoid
> > > having two more-or-less the same rex classes?
> > >
> > > Vladimir
> >
>


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Stamatis Zampetakis
Another quick thought as far as it concerns the IN operator would be to use
RexCall as it is right now where the first operand in the list is a
RexInputRef for instance and the rest are the literals.
I assume that taking this direction would need to change a bit the
respective SqlOperator.

I haven't thought of this thoroughly so maybe there are important things
that I am missing.

Best,
Stamatis


On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde  wrote:

> The name isn't very intuitive.
>
> The concept of a list and a comparison operator seems OK. As Vladimir
> points out, it is somewhat similar to RexSubQuery, so maybe this could
> be a sub-class (but organizing the data a bit more efficiently).
>
> I would be very wary of null semantics. RexNode scalar operators are
> forced to do 3-valued logic, but this is almost a relational operator
> and it would be better without that burden.
>
> Julian
>
>
>
> On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
>  wrote:
> >
> > >Do you know what is the impact on Enumerable implementation?
> >
> > I guess there are plenty of options there.
> >
> > The key question regarding RexListCmp is as we introduce a new Rex node,
> > all the planning rules and all engines
> > must support it somehow.
> >
> > Technically speaking, we have RexSubQuery.
> > Haisheng, have you considered an option to stick with RexSubQuery to
> avoid
> > having two more-or-less the same rex classes?
> >
> > Vladimir
>


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Julian Hyde
The name isn't very intuitive.

The concept of a list and a comparison operator seems OK. As Vladimir
points out, it is somewhat similar to RexSubQuery, so maybe this could
be a sub-class (but organizing the data a bit more efficiently).

I would be very wary of null semantics. RexNode scalar operators are
forced to do 3-valued logic, but this is almost a relational operator
and it would be better without that burden.

Julian



On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
 wrote:
>
> >Do you know what is the impact on Enumerable implementation?
>
> I guess there are plenty of options there.
>
> The key question regarding RexListCmp is as we introduce a new Rex node,
> all the planning rules and all engines
> must support it somehow.
>
> Technically speaking, we have RexSubQuery.
> Haisheng, have you considered an option to stick with RexSubQuery to avoid
> having two more-or-less the same rex classes?
>
> Vladimir


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Vladimir Sitnikov
>Do you know what is the impact on Enumerable implementation?

I guess there are plenty of options there.

The key question regarding RexListCmp is as we introduce a new Rex node,
all the planning rules and all engines
must support it somehow.

Technically speaking, we have RexSubQuery.
Haisheng, have you considered an option to stick with RexSubQuery to avoid
having two more-or-less the same rex classes?

Vladimir


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Rui Wang
Hi Haisheng,

Do you know what is the impact on Enumerable implementation? Will
RexListCmp still be converted AND/OR something else in that layer or there
will be new logic introduced to generate code for RexListCmp?

-Rui

On Sun, Jul 19, 2020 at 10:13 PM Enrico Olivelli 
wrote:

> Il Lun 20 Lug 2020, 03:00 Haisheng Yuan  ha scritto:
>
> > Hi all,
> >
> > This is a rough idea, I'd like to see how the community think about it.
> >
> > RexListCmp extends RexNode / RexCall {
> >   public final SqlOperator op;
> >   public final RexNode left;
> >   public final ImmutableList list;
> >   public final RexQuantifier quantifier;
> >   public final RelDataType type;
> > }
> >
> > Enum RexQuantifier {
> >   ALL,
> >   ANY
> > }
> >
> > Background:
> >
> > It is not uncommon that the query contains large number of constant IN
> > list, e.g.
> > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1);
> > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1);
> >
> > Currently, Calcite either translates it into a Join, or expand to OR/AND,
> > which is inefficient, and may cause problems.
> >
>
> Yes. It is not efficient.
>
> I would love to see this new feature
>
> Thanks
> Enrico
>
>
>
>
> > With RexListCmp, the predicate in query 1) will be represented as:
> > RexListCmp {
> >   op = "<>",
> >   left = "a"
> >   list = "1,2,3...1"
> >   quantifier = "ALL"
> > }
> >
> > The predicate in query 2) will be represented as:
> > RexListCmp {
> >   op = "=",
> >   left = "b"
> >   list = "1,2,3...1"
> >   quantifier = "ANY"
> > }
> >
> > It may also be used to represent the predicate in the following query:
> >
> > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ...
> (1000,
> > 1000));
> >
> > Further more, it is extensible. The op is not limited to be equals or not
> > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql
> operator
> > like geospatial operator intersect:
> > boolean &&( geometry A , geometry B )
> >
> > Thoughts?
> >
> > Thanks,
> > Haisheng Yuan
> >
> >
> >
>


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-19 Thread Enrico Olivelli
Il Lun 20 Lug 2020, 03:00 Haisheng Yuan  ha scritto:

> Hi all,
>
> This is a rough idea, I'd like to see how the community think about it.
>
> RexListCmp extends RexNode / RexCall {
>   public final SqlOperator op;
>   public final RexNode left;
>   public final ImmutableList list;
>   public final RexQuantifier quantifier;
>   public final RelDataType type;
> }
>
> Enum RexQuantifier {
>   ALL,
>   ANY
> }
>
> Background:
>
> It is not uncommon that the query contains large number of constant IN
> list, e.g.
> 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1);
> 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1);
>
> Currently, Calcite either translates it into a Join, or expand to OR/AND,
> which is inefficient, and may cause problems.
>

Yes. It is not efficient.

I would love to see this new feature

Thanks
Enrico




> With RexListCmp, the predicate in query 1) will be represented as:
> RexListCmp {
>   op = "<>",
>   left = "a"
>   list = "1,2,3...1"
>   quantifier = "ALL"
> }
>
> The predicate in query 2) will be represented as:
> RexListCmp {
>   op = "=",
>   left = "b"
>   list = "1,2,3...1"
>   quantifier = "ANY"
> }
>
> It may also be used to represent the predicate in the following query:
>
> SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000,
> 1000));
>
> Further more, it is extensible. The op is not limited to be equals or not
> equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator
> like geospatial operator intersect:
> boolean &&( geometry A , geometry B )
>
> Thoughts?
>
> Thanks,
> Haisheng Yuan
>
>
>


[DISCUSS] New RexNode: RexListCmp

2020-07-19 Thread Haisheng Yuan
Hi all,

This is a rough idea, I'd like to see how the community think about it.

RexListCmp extends RexNode / RexCall {
  public final SqlOperator op;
  public final RexNode left;
  public final ImmutableList list;
  public final RexQuantifier quantifier;
  public final RelDataType type;
}

Enum RexQuantifier {
  ALL,
  ANY
}

Background:

It is not uncommon that the query contains large number of constant IN list, 
e.g.
1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1);
2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1);

Currently, Calcite either translates it into a Join, or expand to OR/AND, which 
is inefficient, and may cause problems.

With RexListCmp, the predicate in query 1) will be represented as:
RexListCmp {
  op = "<>",
  left = "a"
  list = "1,2,3...1"
  quantifier = "ALL"
}

The predicate in query 2) will be represented as:
RexListCmp {
  op = "=",
  left = "b"
  list = "1,2,3...1"
  quantifier = "ANY"
}

It may also be used to represent the predicate in the following query:

SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000, 
1000));

Further more, it is extensible. The op is not limited to be equals or not 
equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator like 
geospatial operator intersect:
boolean &&( geometry A , geometry B )

Thoughts?

Thanks,
Haisheng Yuan