In 
https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
 
<https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
 I floated the idea of a “$HARD_IN” internal function that has the same 
semantics as IN but is not expanded to ‘… = OR … = …’.

I think it would be a useful tool, if used judiciously. 

Julian


> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote:
> 
> As a workaround, you can modify you SqlRexConverlet, create a RexCall with 
> balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat 
> RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> Because every OR RexCall has exactly 2 operands, it won't transform into 
> SqlCall with left deep tree.
> 
> Let me know it works for you or not.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Haisheng Yuan<h.y...@alibaba-inc.com>
> 日 期:2019年10月05日 07:37:04
> 收件人:Peter Wicks (pwicks)<pwi...@micron.com>; 
> dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> If you want to push the filter down to the source SQL sytem, then 
> transforming to a join won't help you either.
> 
> The reason of stackoverflow for large ORs is the left deep binary tree, we 
> need to change it to balanced binary tree, to reduce the depth of the call.
> 
> I will open a pull request later.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<pwi...@micron.com>
> 日 期:2019年10月04日 21:32:25
> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Zoltan,
> 
> Thanks for the suggestion. I actually tried doing a UDF first, and it was 
> also successful, sorry for not sharing those details earlier.
> The problem with the UDF is that the predicates are not pushed down to the 
> source SQL system (by design), and this can result in a 100x increase in the 
> amount of data returned from the database. This data will be correctly 
> filtered by the UDF, but returning 100x the data makes it a lot slower. So I 
> was trying to push it down to the source server instead.
> 
> What do you mean by, "I guess Calcite might probably won't be able to do much 
> with these ORs anyway..."? From my experiments I've seen two results from 
> passing in this many OR's:
> 
> - If no other predicates are included in the query, then Calcite succeeds! It 
> leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> - If additional predicates are included, then Calcite nests the OR 
> statements, leading to a stackoverflow for very large OR's, which is 
> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) 
> 
> Thanks,
> Peter
> 
> -----Original Message-----
> From: Zoltan Haindrich <k...@rxd.hu> 
> Sent: Friday, October 4, 2019 12:38 AM
> To: dev@calcite.apache.org; Haisheng Yuan <h.y...@alibaba-inc.com>; Peter 
> Wicks (pwicks) <pwi...@micron.com>
> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> 
> I think you might try another approach: introduce some UDF and use your 
> translation logic to call that - as the UDF will be opaque for calcite it 
> will be left alone.
> I guess Calcite might probably won't be able to do much with these ORs 
> anyway...
> 
> 
> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>> I don't think this can be done in SqlRexConvertlet, which converts SqlNode 
>> to RexNode.
>> You might need to massage the SqlToRelConverter to create the RelNode that 
>> you want.
>> 
>> BTW, I still think we need RexNode for IN/ANY.
> I also feel that there is some need for IN nodes; but there are some good 
> sides of not having it as well: like simplification handles them better.
> 
> 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com>
>> 日 期:2019年10月04日 04:03:51
>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Haisheng,
>> 
>> Yes, that is what I would like to do. Unfortunately, I’m not sure how to 
>> proceed to actually do that. I was hoping for a pointer to an example that 
>> is similar?
>> 
>> Thanks!
>> Peter
>> 
>> From: Haisheng Yuan <h.y...@alibaba-inc.com>
>> Sent: Thursday, October 3, 2019 1:35 PM
>> To: Peter Wicks (pwicks) <pwi...@micron.com>; dev@calcite.apache.org
>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, 
>> unfortunately.
>> 
>> You can create a Values node with these authorization data, and make a semi 
>> join with the table and Values you created.
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pwi...@micron.com<mailto:pwi...@micron.com>>
>> 日 期:2019年10月04日 02:34:02
>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:dev@calcite.apache.org%3c...@calcite.apache.org>>
>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> A little detail about what I'm trying to do:
>> 
>> I have an external API that contains authorization information on a per user 
>> basis. I want users to be able to include an operation in their query that 
>> will filter data based on this authorization data.
>> 
>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements 
>> SqlRexConvertlet, and I am able to get this working. The user includes in 
>> their predicate statement `custom_authorize(column)`, my convertlet queries 
>> the API, gets the authorization rules, builds an OR statement, and the 
>> results come back. This works sometimes, but other times the OR condition 
>> becomes too large, and I run into CALCITE-2792: 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>>  which causes a stackoverflow and my query dies.
>> 
>> So I tried converting to an IN statement, having read that IN statements are 
>> automatically converted to a sub query join when the default limit of 20 is 
>> exceeded. The problem is that this appears only to be true for IN statements 
>> that are included in the initial query. IN statements created as the result 
>> of a convertlet do not get modified, and are sent as an IN statement, which 
>> results in a failure to parse the query. I looked at how Calcite normally 
>> does this translation from IN to exists using a join, but it depends on a 
>> lot of classes/instances that aren't available in the SqlRexContext space. 
>> Is it possible to rewrite my IN statement to a join/exists query like 
>> Calcite normally does?
>> 
>> Also, am I doing things all wrong? Is there a better way to go about this?
>> 
>> Code Sample below is for the OR version, the commented code can be swapped 
>> in to see how I was building the IN statement.
>> 
>> @Override
>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>> HashSet<String> keyList = null;
>> try {
>> keyList = new 
>> Manager().getAllowedIDs(getContextInformation().getQueryUser());
>> } catch (SQLException e) {
>> e.printStackTrace();
>> }
>> 
>> final RexBuilder rexBuilder = cx.getRexBuilder();
>> final RexNode column = cx.convertExpression(call.operand(0));
>> 
>> final List<RexNode> nodes = new ArrayList<>();
>> for(String s: keyList) {
>> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
>> //nodes.add(rexBuilder.makeLiteral(s));
>> }
>> 
>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new 
>> RexNode[0]));
>> 
>> return in;
>> }
>> 
>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... 
>> nodes) {
>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>> }
>> 
>> 
>> Thanks,
>> Peter
>> 
> 
> 

Reply via email to