There’s not very much difference between supporting IN in RexCalls vs. 
supporting $HARD_IN in RexCalls. The distinction is important to me because I 
think it’s important to have a “core” set of operators, and I don’t want to add 
IN to that core.  Stamatis feels differently, and I respect that.

One thing that makes me thing that IN does not belong in “core” is that there 
is a generalization range-sets. With IN we can represent “x = 1 or x = 3 or x = 
5”, but with range sets we can also represent “x = 1 or x between 3 and 5 or x> 
100”. We make heavy use of range sets in DateRangeRules.

Julian


> On Oct 8, 2019, at 2:47 PM, Xiening Dai <xndai....@gmail.com> wrote:
> 
> In my opinion, we will need both - supporting IN operation (either through an 
> operator or an internal function) and adding support for building a balanced 
> tree. It’s always good to be resilient and capable of handling edge cases. 
> The IN support might require more work. Haisheng’s proposal is a practical 
> solution to current issue.
> 
>> On Oct 8, 2019, at 11:06 AM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote:
>> 
>> Adding IN RexNode only partially solves the problem, as it is still masking 
>> the underlying issue. The fundamental reason for the stack overflow iies in 
>> the left-deep binary tree. For queries that have tens of thousands of OR 
>> condition, but not equals, which is not uncommon in our case, e.g.
>> (a like '...') or (b like '...') or (c like '..')
>> there will still be stack overflow. 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Stamatis Zampetakis<zabe...@gmail.com>
>> 日 期:2019年10月08日 15:09:01
>> 收件人:<dev@calcite.apache.org>
>> 主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> It might be better to add a proper IN operator in RexCalls instead of
>> something internal that does more or less the same thing.
>> It is true that adds more paths in the code and thus requires some
>> additional dev and further support but I think it is worth it.
>> Many people so far expressed an interest to work on various cases involving
>> an IN operator so it might not be long before
>> we have full support for the IN operator.
>> 
>> SqlToRelConverter can still decide to expand or not based on some criterion
>> or property.
>> 
>> 
>> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:
>> 
>>> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
>>> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
>>> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
>>> IN (v1, v2)".
>>> 
>>> At any point in this lifecycle, you could intercept and and simplify.
>>> 
>>> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h.y...@alibaba-inc.com>
>>> wrote:
>>>> 
>>>> Will the filter condition with “$HARD_IN” internal function be able to
>>> pushed down and be recognized by the source SQL system, like Peter
>>> mentioned?
>>>> 
>>>> If not, we have to translate the internal function back to IN during
>>> Rel2Sql phase. Otherwise, the data read from the source table can be much
>>> larger.
>>>> 
>>>> - Haisheng
>>>> 
>>>> ------------------------------------------------------------------
>>>> 发件人:Julian Hyde<jh...@apache.org>
>>>> 日 期:2019年10月08日 04:53:11
>>>> 收件人:dev<dev@calcite.apache.org>
>>>> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>>>> 
>>>> In
>>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
>>> <
>>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
>>> I floated the idea of a “$HARD_IN” internal function that has the same
>>> semantics as IN but is not expanded to ‘… = OR … = …’.
>>>> 
>>>> I think it would be a useful tool, if used judiciously.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h.y...@alibaba-inc.com>
>>> wrote:
>>>>> 
>>>>> As a workaround, you can modify you SqlRexConverlet, create a RexCall
>>> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
>>> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
>>>>> Because every OR RexCall has exactly 2 operands, it won't transform
>>> into SqlCall with left deep tree.
>>>>> 
>>>>> Let me know it works for you or not.
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Haisheng Yuan<h.y...@alibaba-inc.com>
>>>>> 日 期:2019年10月05日 07:37:04
>>>>> 收件人:Peter Wicks (pwicks)<pwi...@micron.com>; dev@calcite.apache.org<
>>> dev@calcite.apache.org>
>>>>> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>> 
>>>>> If you want to push the filter down to the source SQL sytem, then
>>> transforming to a join won't help you either.
>>>>> 
>>>>> The reason of stackoverflow for large ORs is the left deep binary
>>> tree, we need to change it to balanced binary tree, to reduce the depth of
>>> the call.
>>>>> 
>>>>> I will open a pull request later.
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com>
>>>>> 日 期:2019年10月04日 21:32:25
>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>> 
>>>>> Zoltan,
>>>>> 
>>>>> Thanks for the suggestion. I actually tried doing a UDF first, and it
>>> was also successful, sorry for not sharing those details earlier.
>>>>> The problem with the UDF is that the predicates are not pushed down to
>>> the source SQL system (by design), and this can result in a 100x increase
>>> in the amount of data returned from the database. This data will be
>>> correctly filtered by the UDF, but returning 100x the data makes it a lot
>>> slower. So I was trying to push it down to the source server instead.
>>>>> 
>>>>> What do you mean by, "I guess Calcite might probably won't be able to
>>> do much with these ORs anyway..."? From my experiments I've seen two
>>> results from passing in this many OR's:
>>>>> 
>>>>> - If no other predicates are included in the query, then Calcite
>>> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
>>>>> - If additional predicates are included, then Calcite nests the OR
>>> statements, leading to a stackoverflow for very large OR's, which is
>>> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
>>>>> 
>>>>> Thanks,
>>>>> Peter
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Zoltan Haindrich <k...@rxd.hu>
>>>>> Sent: Friday, October 4, 2019 12:38 AM
>>>>> To: dev@calcite.apache.org; Haisheng Yuan <h.y...@alibaba-inc.com>;
>>> Peter Wicks (pwicks) <pwi...@micron.com>
>>>>> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
>>> Conversion Logic
>>>>> 
>>>>> 
>>>>> I think you might try another approach: introduce some UDF and use
>>> your translation logic to call that - as the UDF will be opaque for calcite
>>> it will be left alone.
>>>>> I guess Calcite might probably won't be able to do much with these ORs
>>> anyway...
>>>>> 
>>>>> 
>>>>> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>>>>>> I don't think this can be done in SqlRexConvertlet, which converts
>>> SqlNode to RexNode.
>>>>>> You might need to massage the SqlToRelConverter to create the RelNode
>>> that you want.
>>>>>> 
>>>>>> BTW, I still think we need RexNode for IN/ANY.
>>>>> I also feel that there is some need for IN nodes; but there are some
>>> good sides of not having it as well: like simplification handles them
>>> better.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> - Haisheng
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com>
>>>>>> 日 期:2019年10月04日 04:03:51
>>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
>>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>>> 
>>>>>> Haisheng,
>>>>>> 
>>>>>> Yes, that is what I would like to do. Unfortunately, I’m not sure how
>>> to proceed to actually do that. I was hoping for a pointer to an example
>>> that is similar?
>>>>>> 
>>>>>> Thanks!
>>>>>> Peter
>>>>>> 
>>>>>> From: Haisheng Yuan <h.y...@alibaba-inc.com>
>>>>>> Sent: Thursday, October 3, 2019 1:35 PM
>>>>>> To: Peter Wicks (pwicks) <pwi...@micron.com>; dev@calcite.apache.org
>>>>>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>>> 
>>>>>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
>>> unfortunately.
>>>>>> 
>>>>>> You can create a Values node with these authorization data, and make
>>> a semi join with the table and Values you created.
>>>>>> 
>>>>>> - Haisheng
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com<mailto:pwi...@micron.com>>
>>>>>> 日 期:2019年10月04日 02:34:02
>>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
>>> dev@calcite.apache.org%3c...@calcite.apache.org>>
>>>>>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>>>>>> 
>>>>>> A little detail about what I'm trying to do:
>>>>>> 
>>>>>> I have an external API that contains authorization information on a
>>> per user basis. I want users to be able to include an operation in their
>>> query that will filter data based on this authorization data.
>>>>>> 
>>>>>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
>>> implements SqlRexConvertlet, and I am able to get this working. The user
>>> includes in their predicate statement `custom_authorize(column)`, my
>>> convertlet queries the API, gets the authorization rules, builds an OR
>>> statement, and the results come back. This works sometimes, but other times
>>> the OR condition becomes too large, and I run into CALCITE-2792:
>>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>>> which causes a stackoverflow and my query dies.
>>>>>> 
>>>>>> So I tried converting to an IN statement, having read that IN
>>> statements are automatically converted to a sub query join when the default
>>> limit of 20 is exceeded. The problem is that this appears only to be true
>>> for IN statements that are included in the initial query. IN statements
>>> created as the result of a convertlet do not get modified, and are sent as
>>> an IN statement, which results in a failure to parse the query. I looked at
>>> how Calcite normally does this translation from IN to exists using a join,
>>> but it depends on a lot of classes/instances that aren't available in the
>>> SqlRexContext space. Is it possible to rewrite my IN statement to a
>>> join/exists query like Calcite normally does?
>>>>>> 
>>>>>> Also, am I doing things all wrong? Is there a better way to go about
>>> this?
>>>>>> 
>>>>>> Code Sample below is for the OR version, the commented code can be
>>> swapped in to see how I was building the IN statement.
>>>>>> 
>>>>>> @Override
>>>>>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>>>>>> HashSet<String> keyList = null;
>>>>>> try {
>>>>>> keyList = new
>>> Manager().getAllowedIDs(getContextInformation().getQueryUser());
>>>>>> } catch (SQLException e) {
>>>>>> e.printStackTrace();
>>>>>> }
>>>>>> 
>>>>>> final RexBuilder rexBuilder = cx.getRexBuilder();
>>>>>> final RexNode column = cx.convertExpression(call.operand(0));
>>>>>> 
>>>>>> final List<RexNode> nodes = new ArrayList<>();
>>>>>> for(String s: keyList) {
>>>>>> nodes.add(rexBuilder.makeCall(EQUALS, column,
>>> rexBuilder.makeLiteral(s)));
>>>>>> //nodes.add(rexBuilder.makeLiteral(s));
>>>>>> }
>>>>>> 
>>>>>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>>>>>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
>>> RexNode[0]));
>>>>>> 
>>>>>> return in;
>>>>>> }
>>>>>> 
>>>>>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
>>> RexNode... nodes) {
>>>>>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>>>>>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Peter
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to