Re: [DISCUSS] New RexNode: RexListCmp
I wanted to reply and share our recent requirement for handling SQL like the following `error_code IN (3002, 3030)' and the challenges we faced. For our implementation on top of Apache Kudu, each disjunction creates a `Scanner' – a resource we need to limit as it represents a denial of service attack vector (e.g. too many scanners, heap fills up). Good news for us is Kudu ships with an [`inListPredicate'] and we expected a plan to include the `SqlKind.IN' as the function which we could translate into `inListPredicate'. We were surprised when it didn't do that. We did eventually make this work for our customers with a hack below but it is not valid plan – for instance unparsing the plan produces invalid SQL query – and therefore is brittle (but *works* :fingers-crossed:) . ┌ │ // This is not the correct use of Array. │ final RelDataType listType = builder.getTypeFactory().createArrayType(fieldType, -1); │ return builder.call(SqlStdOperatorTable.IN, │ builder.field(conditionTableName, columnName), │ rexBuilder.makeLiteral(resultValue, listType, true)); └ We filed a ticket to do it the correct way, which is to take all the disjunctions, and "un-parse" them into `inListPredicate' calls *if possible*. This struck us as pretty dense code *but* would apply to other disjunctions. It would be *great* if Calcite shipped with a `RexCall' that our implementation could translate with little effort into a `inListPredicate'. [`inListPredicate'] https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduPredicate.html#newInListPredicate-org.apache.kudu.ColumnSchema-java.util.List- On Mon, Jul 20, 2020 at 3:09 PM Stamatis Zampetakis wrote: > Another quick thought as far as it concerns the IN operator would be to use > RexCall as it is right now where the first operand in the list is a > RexInputRef for instance and the rest are the literals. > I assume that taking this direction would need to change a bit the > respective SqlOperator. > > I haven't thought of this thoroughly so maybe there are important things > that I am missing. > > Best, > Stamatis > > > On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde wrote: > > > The name isn't very intuitive. > > > > The concept of a list and a comparison operator seems OK. As Vladimir > > points out, it is somewhat similar to RexSubQuery, so maybe this could > > be a sub-class (but organizing the data a bit more efficiently). > > > > I would be very wary of null semantics. RexNode scalar operators are > > forced to do 3-valued logic, but this is almost a relational operator > > and it would be better without that burden. > > > > Julian > > > > > > > > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov > > wrote: > > > > > > >Do you know what is the impact on Enumerable implementation? > > > > > > I guess there are plenty of options there. > > > > > > The key question regarding RexListCmp is as we introduce a new Rex > node, > > > all the planning rules and all engines > > > must support it somehow. > > > > > > Technically speaking, we have RexSubQuery. > > > Haisheng, have you considered an option to stick with RexSubQuery to > > avoid > > > having two more-or-less the same rex classes? > > > > > > Vladimir > > >
Re: [DISCUSS] New RexNode: RexListCmp
Another quick thought as far as it concerns the IN operator would be to use RexCall as it is right now where the first operand in the list is a RexInputRef for instance and the rest are the literals. I assume that taking this direction would need to change a bit the respective SqlOperator. I haven't thought of this thoroughly so maybe there are important things that I am missing. Best, Stamatis On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde wrote: > The name isn't very intuitive. > > The concept of a list and a comparison operator seems OK. As Vladimir > points out, it is somewhat similar to RexSubQuery, so maybe this could > be a sub-class (but organizing the data a bit more efficiently). > > I would be very wary of null semantics. RexNode scalar operators are > forced to do 3-valued logic, but this is almost a relational operator > and it would be better without that burden. > > Julian > > > > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov > wrote: > > > > >Do you know what is the impact on Enumerable implementation? > > > > I guess there are plenty of options there. > > > > The key question regarding RexListCmp is as we introduce a new Rex node, > > all the planning rules and all engines > > must support it somehow. > > > > Technically speaking, we have RexSubQuery. > > Haisheng, have you considered an option to stick with RexSubQuery to > avoid > > having two more-or-less the same rex classes? > > > > Vladimir >
Re: [DISCUSS] New RexNode: RexListCmp
The name isn't very intuitive. The concept of a list and a comparison operator seems OK. As Vladimir points out, it is somewhat similar to RexSubQuery, so maybe this could be a sub-class (but organizing the data a bit more efficiently). I would be very wary of null semantics. RexNode scalar operators are forced to do 3-valued logic, but this is almost a relational operator and it would be better without that burden. Julian On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov wrote: > > >Do you know what is the impact on Enumerable implementation? > > I guess there are plenty of options there. > > The key question regarding RexListCmp is as we introduce a new Rex node, > all the planning rules and all engines > must support it somehow. > > Technically speaking, we have RexSubQuery. > Haisheng, have you considered an option to stick with RexSubQuery to avoid > having two more-or-less the same rex classes? > > Vladimir
Re: [DISCUSS] New RexNode: RexListCmp
>Do you know what is the impact on Enumerable implementation? I guess there are plenty of options there. The key question regarding RexListCmp is as we introduce a new Rex node, all the planning rules and all engines must support it somehow. Technically speaking, we have RexSubQuery. Haisheng, have you considered an option to stick with RexSubQuery to avoid having two more-or-less the same rex classes? Vladimir
Re: [DISCUSS] New RexNode: RexListCmp
Hi Haisheng, Do you know what is the impact on Enumerable implementation? Will RexListCmp still be converted AND/OR something else in that layer or there will be new logic introduced to generate code for RexListCmp? -Rui On Sun, Jul 19, 2020 at 10:13 PM Enrico Olivelli wrote: > Il Lun 20 Lug 2020, 03:00 Haisheng Yuan ha scritto: > > > Hi all, > > > > This is a rough idea, I'd like to see how the community think about it. > > > > RexListCmp extends RexNode / RexCall { > > public final SqlOperator op; > > public final RexNode left; > > public final ImmutableList list; > > public final RexQuantifier quantifier; > > public final RelDataType type; > > } > > > > Enum RexQuantifier { > > ALL, > > ANY > > } > > > > Background: > > > > It is not uncommon that the query contains large number of constant IN > > list, e.g. > > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1); > > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1); > > > > Currently, Calcite either translates it into a Join, or expand to OR/AND, > > which is inefficient, and may cause problems. > > > > Yes. It is not efficient. > > I would love to see this new feature > > Thanks > Enrico > > > > > > With RexListCmp, the predicate in query 1) will be represented as: > > RexListCmp { > > op = "<>", > > left = "a" > > list = "1,2,3...1" > > quantifier = "ALL" > > } > > > > The predicate in query 2) will be represented as: > > RexListCmp { > > op = "=", > > left = "b" > > list = "1,2,3...1" > > quantifier = "ANY" > > } > > > > It may also be used to represent the predicate in the following query: > > > > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... > (1000, > > 1000)); > > > > Further more, it is extensible. The op is not limited to be equals or not > > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql > operator > > like geospatial operator intersect: > > boolean &&( geometry A , geometry B ) > > > > Thoughts? > > > > Thanks, > > Haisheng Yuan > > > > > > >
Re: [DISCUSS] New RexNode: RexListCmp
Il Lun 20 Lug 2020, 03:00 Haisheng Yuan ha scritto: > Hi all, > > This is a rough idea, I'd like to see how the community think about it. > > RexListCmp extends RexNode / RexCall { > public final SqlOperator op; > public final RexNode left; > public final ImmutableList list; > public final RexQuantifier quantifier; > public final RelDataType type; > } > > Enum RexQuantifier { > ALL, > ANY > } > > Background: > > It is not uncommon that the query contains large number of constant IN > list, e.g. > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1); > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1); > > Currently, Calcite either translates it into a Join, or expand to OR/AND, > which is inefficient, and may cause problems. > Yes. It is not efficient. I would love to see this new feature Thanks Enrico > With RexListCmp, the predicate in query 1) will be represented as: > RexListCmp { > op = "<>", > left = "a" > list = "1,2,3...1" > quantifier = "ALL" > } > > The predicate in query 2) will be represented as: > RexListCmp { > op = "=", > left = "b" > list = "1,2,3...1" > quantifier = "ANY" > } > > It may also be used to represent the predicate in the following query: > > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000, > 1000)); > > Further more, it is extensible. The op is not limited to be equals or not > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator > like geospatial operator intersect: > boolean &&( geometry A , geometry B ) > > Thoughts? > > Thanks, > Haisheng Yuan > > >
[DISCUSS] New RexNode: RexListCmp
Hi all, This is a rough idea, I'd like to see how the community think about it. RexListCmp extends RexNode / RexCall { public final SqlOperator op; public final RexNode left; public final ImmutableList list; public final RexQuantifier quantifier; public final RelDataType type; } Enum RexQuantifier { ALL, ANY } Background: It is not uncommon that the query contains large number of constant IN list, e.g. 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1); 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1); Currently, Calcite either translates it into a Join, or expand to OR/AND, which is inefficient, and may cause problems. With RexListCmp, the predicate in query 1) will be represented as: RexListCmp { op = "<>", left = "a" list = "1,2,3...1" quantifier = "ALL" } The predicate in query 2) will be represented as: RexListCmp { op = "=", left = "b" list = "1,2,3...1" quantifier = "ANY" } It may also be used to represent the predicate in the following query: SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000, 1000)); Further more, it is extensible. The op is not limited to be equals or not equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator like geospatial operator intersect: boolean &&( geometry A , geometry B ) Thoughts? Thanks, Haisheng Yuan