There’s not very much difference between supporting IN in RexCalls vs. supporting $HARD_IN in RexCalls. The distinction is important to me because I think it’s important to have a “core” set of operators, and I don’t want to add IN to that core. Stamatis feels differently, and I respect that.
One thing that makes me thing that IN does not belong in “core” is that there is a generalization range-sets. With IN we can represent “x = 1 or x = 3 or x = 5”, but with range sets we can also represent “x = 1 or x between 3 and 5 or x> 100”. We make heavy use of range sets in DateRangeRules. Julian > On Oct 8, 2019, at 2:47 PM, Xiening Dai <xndai....@gmail.com> wrote: > > In my opinion, we will need both - supporting IN operation (either through an > operator or an internal function) and adding support for building a balanced > tree. It’s always good to be resilient and capable of handling edge cases. > The IN support might require more work. Haisheng’s proposal is a practical > solution to current issue. > >> On Oct 8, 2019, at 11:06 AM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote: >> >> Adding IN RexNode only partially solves the problem, as it is still masking >> the underlying issue. The fundamental reason for the stack overflow iies in >> the left-deep binary tree. For queries that have tens of thousands of OR >> condition, but not equals, which is not uncommon in our case, e.g. >> (a like '...') or (b like '...') or (c like '..') >> there will still be stack overflow. >> >> - Haisheng >> >> ------------------------------------------------------------------ >> 发件人:Stamatis Zampetakis<zabe...@gmail.com> >> 日 期:2019年10月08日 15:09:01 >> 收件人:<dev@calcite.apache.org> >> 主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic >> >> It might be better to add a proper IN operator in RexCalls instead of >> something internal that does more or less the same thing. >> It is true that adds more paths in the code and thus requires some >> additional dev and further support but I think it is worth it. >> Many people so far expressed an interest to work on various cases involving >> an IN operator so it might not be long before >> we have full support for the IN operator. >> >> SqlToRelConverter can still decide to expand or not based on some criterion >> or property. >> >> >> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote: >> >>> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to >>> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to >>> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x >>> IN (v1, v2)". >>> >>> At any point in this lifecycle, you could intercept and and simplify. >>> >>> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h.y...@alibaba-inc.com> >>> wrote: >>>> >>>> Will the filter condition with “$HARD_IN” internal function be able to >>> pushed down and be recognized by the source SQL system, like Peter >>> mentioned? >>>> >>>> If not, we have to translate the internal function back to IN during >>> Rel2Sql phase. Otherwise, the data read from the source table can be much >>> larger. >>>> >>>> - Haisheng >>>> >>>> ------------------------------------------------------------------ >>>> 发件人:Julian Hyde<jh...@apache.org> >>>> 日 期:2019年10月08日 04:53:11 >>>> 收件人:dev<dev@calcite.apache.org> >>>> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic >>>> >>>> In >>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 >>> < >>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> >>> I floated the idea of a “$HARD_IN” internal function that has the same >>> semantics as IN but is not expanded to ‘… = OR … = …’. >>>> >>>> I think it would be a useful tool, if used judiciously. >>>> >>>> Julian >>>> >>>> >>>>> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h.y...@alibaba-inc.com> >>> wrote: >>>>> >>>>> As a workaround, you can modify you SqlRexConverlet, create a RexCall >>> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a >>> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4. >>>>> Because every OR RexCall has exactly 2 operands, it won't transform >>> into SqlCall with left deep tree. >>>>> >>>>> Let me know it works for you or not. >>>>> >>>>> - Haisheng >>>>> >>>>> ------------------------------------------------------------------ >>>>> 发件人:Haisheng Yuan<h.y...@alibaba-inc.com> >>>>> 日 期:2019年10月05日 07:37:04 >>>>> 收件人:Peter Wicks (pwicks)<pwi...@micron.com>; dev@calcite.apache.org< >>> dev@calcite.apache.org> >>>>> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >>> Logic >>>>> >>>>> If you want to push the filter down to the source SQL sytem, then >>> transforming to a join won't help you either. >>>>> >>>>> The reason of stackoverflow for large ORs is the left deep binary >>> tree, we need to change it to balanced binary tree, to reduce the depth of >>> the call. >>>>> >>>>> I will open a pull request later. >>>>> >>>>> - Haisheng >>>>> >>>>> ------------------------------------------------------------------ >>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com> >>>>> 日 期:2019年10月04日 21:32:25 >>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org> >>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >>> Logic >>>>> >>>>> Zoltan, >>>>> >>>>> Thanks for the suggestion. I actually tried doing a UDF first, and it >>> was also successful, sorry for not sharing those details earlier. >>>>> The problem with the UDF is that the predicates are not pushed down to >>> the source SQL system (by design), and this can result in a 100x increase >>> in the amount of data returned from the database. This data will be >>> correctly filtered by the UDF, but returning 100x the data makes it a lot >>> slower. So I was trying to push it down to the source server instead. >>>>> >>>>> What do you mean by, "I guess Calcite might probably won't be able to >>> do much with these ORs anyway..."? From my experiments I've seen two >>> results from passing in this many OR's: >>>>> >>>>> - If no other predicates are included in the query, then Calcite >>> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4) >>>>> - If additional predicates are included, then Calcite nests the OR >>> statements, leading to a stackoverflow for very large OR's, which is >>> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>>> -----Original Message----- >>>>> From: Zoltan Haindrich <k...@rxd.hu> >>>>> Sent: Friday, October 4, 2019 12:38 AM >>>>> To: dev@calcite.apache.org; Haisheng Yuan <h.y...@alibaba-inc.com>; >>> Peter Wicks (pwicks) <pwi...@micron.com> >>>>> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" >>> Conversion Logic >>>>> >>>>> >>>>> I think you might try another approach: introduce some UDF and use >>> your translation logic to call that - as the UDF will be opaque for calcite >>> it will be left alone. >>>>> I guess Calcite might probably won't be able to do much with these ORs >>> anyway... >>>>> >>>>> >>>>> On 10/3/19 11:26 PM, Haisheng Yuan wrote: >>>>>> I don't think this can be done in SqlRexConvertlet, which converts >>> SqlNode to RexNode. >>>>>> You might need to massage the SqlToRelConverter to create the RelNode >>> that you want. >>>>>> >>>>>> BTW, I still think we need RexNode for IN/ANY. >>>>> I also feel that there is some need for IN nodes; but there are some >>> good sides of not having it as well: like simplification handles them >>> better. >>>>> >>>>> >>>>>> >>>>>> - Haisheng >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com> >>>>>> 日 期:2019年10月04日 04:03:51 >>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org> >>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >>> Logic >>>>>> >>>>>> Haisheng, >>>>>> >>>>>> Yes, that is what I would like to do. Unfortunately, I’m not sure how >>> to proceed to actually do that. I was hoping for a pointer to an example >>> that is similar? >>>>>> >>>>>> Thanks! >>>>>> Peter >>>>>> >>>>>> From: Haisheng Yuan <h.y...@alibaba-inc.com> >>>>>> Sent: Thursday, October 3, 2019 1:35 PM >>>>>> To: Peter Wicks (pwicks) <pwi...@micron.com>; dev@calcite.apache.org >>>>>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >>> Logic >>>>>> >>>>>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, >>> unfortunately. >>>>>> >>>>>> You can create a Values node with these authorization data, and make >>> a semi join with the table and Values you created. >>>>>> >>>>>> - Haisheng >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com<mailto:pwi...@micron.com>> >>>>>> 日 期:2019年10月04日 02:34:02 >>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto: >>> dev@calcite.apache.org%3c...@calcite.apache.org>> >>>>>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic >>>>>> >>>>>> A little detail about what I'm trying to do: >>>>>> >>>>>> I have an external API that contains authorization information on a >>> per user basis. I want users to be able to include an operation in their >>> query that will filter data based on this authorization data. >>>>>> >>>>>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that >>> implements SqlRexConvertlet, and I am able to get this working. The user >>> includes in their predicate statement `custom_authorize(column)`, my >>> convertlet queries the API, gets the authorization rules, builds an OR >>> statement, and the results come back. This works sometimes, but other times >>> the OR condition becomes too large, and I run into CALCITE-2792: >>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&reserved=0, >>> which causes a stackoverflow and my query dies. >>>>>> >>>>>> So I tried converting to an IN statement, having read that IN >>> statements are automatically converted to a sub query join when the default >>> limit of 20 is exceeded. The problem is that this appears only to be true >>> for IN statements that are included in the initial query. IN statements >>> created as the result of a convertlet do not get modified, and are sent as >>> an IN statement, which results in a failure to parse the query. I looked at >>> how Calcite normally does this translation from IN to exists using a join, >>> but it depends on a lot of classes/instances that aren't available in the >>> SqlRexContext space. Is it possible to rewrite my IN statement to a >>> join/exists query like Calcite normally does? >>>>>> >>>>>> Also, am I doing things all wrong? Is there a better way to go about >>> this? >>>>>> >>>>>> Code Sample below is for the OR version, the commented code can be >>> swapped in to see how I was building the IN statement. >>>>>> >>>>>> @Override >>>>>> public RexNode convertCall(SqlRexContext cx, SqlCall call) { >>>>>> HashSet<String> keyList = null; >>>>>> try { >>>>>> keyList = new >>> Manager().getAllowedIDs(getContextInformation().getQueryUser()); >>>>>> } catch (SQLException e) { >>>>>> e.printStackTrace(); >>>>>> } >>>>>> >>>>>> final RexBuilder rexBuilder = cx.getRexBuilder(); >>>>>> final RexNode column = cx.convertExpression(call.operand(0)); >>>>>> >>>>>> final List<RexNode> nodes = new ArrayList<>(); >>>>>> for(String s: keyList) { >>>>>> nodes.add(rexBuilder.makeCall(EQUALS, column, >>> rexBuilder.makeLiteral(s))); >>>>>> //nodes.add(rexBuilder.makeLiteral(s)); >>>>>> } >>>>>> >>>>>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes); >>>>>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new >>> RexNode[0])); >>>>>> >>>>>> return in; >>>>>> } >>>>>> >>>>>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, >>> RexNode... nodes) { >>>>>> return rexBuilder.makeCall(SqlStdOperatorTable.IN, >>>>>> ImmutableList.<RexNode>builder().add(node).add(nodes).build()); >>>>>> } >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >