If you want to push the filter down to the source SQL sytem, then transforming 
to a join won't help you either.

The reason of stackoverflow for large ORs is the left deep binary tree, we need 
to change it to balanced binary tree, to reduce the depth of the call.

I will open a pull request later.

- Haisheng

------------------------------------------------------------------
发件人:Peter Wicks (pwicks)<pwi...@micron.com>
日 期:2019年10月04日 21:32:25
收件人:dev@calcite.apache.org<dev@calcite.apache.org>
主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic

Zoltan,

Thanks for the suggestion. I actually tried doing a UDF first, and it was also 
successful, sorry for not sharing those details earlier.
The problem with the UDF is that the predicates are not pushed down to the 
source SQL system (by design), and this can result in a 100x increase in the 
amount of data returned from the database. This data will be correctly filtered 
by the UDF, but returning 100x the data makes it a lot slower. So I was trying 
to push it down to the source server instead.

What do you mean by, "I guess Calcite might probably won't be able to do much 
with these ORs anyway..."? From my experiments I've seen two results from 
passing in this many OR's:

 - If no other predicates are included in the query, then Calcite succeeds! It 
leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
 - If additional predicates are included, then Calcite nests the OR statements, 
leading to a stackoverflow for very large OR's, which is CALCITE-2792, ((((a=1) 
OR a=2) OR a=3) OR a=4) 

Thanks,
 Peter

-----Original Message-----
From: Zoltan Haindrich <k...@rxd.hu> 
Sent: Friday, October 4, 2019 12:38 AM
To: dev@calcite.apache.org; Haisheng Yuan <h.y...@alibaba-inc.com>; Peter Wicks 
(pwicks) <pwi...@micron.com>
Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic


I think you might try another approach: introduce some UDF and use your 
translation logic to call that - as the UDF will be opaque for calcite it will 
be left alone.
I guess Calcite might probably won't be able to do much with these ORs anyway...


On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> I don't think this can be done in SqlRexConvertlet, which converts SqlNode to 
> RexNode.
> You might need to massage the SqlToRelConverter to create the RelNode that 
> you want.
> 
> BTW, I still think we need RexNode for IN/ANY.
I also feel that there is some need for IN nodes; but there are some good sides 
of not having it as well: like simplification handles them better.


> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<pwi...@micron.com>
> 日 期:2019年10月04日 04:03:51
> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Haisheng,
> 
> Yes, that is what I would like to do. Unfortunately, I’m not sure how to 
> proceed to actually do that. I was hoping for a pointer to an example that is 
> similar?
> 
> Thanks!
> Peter
> 
> From: Haisheng Yuan <h.y...@alibaba-inc.com>
> Sent: Thursday, October 3, 2019 1:35 PM
> To: Peter Wicks (pwicks) <pwi...@micron.com>; dev@calcite.apache.org
> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, unfortunately.
> 
> You can create a Values node with these authorization data, and make a semi 
> join with the table and Values you created.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<pwi...@micron.com<mailto:pwi...@micron.com>>
> 日 期:2019年10月04日 02:34:02
> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:dev@calcite.apache.org%3c...@calcite.apache.org>>
> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> A little detail about what I'm trying to do:
> 
> I have an external API that contains authorization information on a per user 
> basis. I want users to be able to include an operation in their query that 
> will filter data based on this authorization data.
> 
> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements 
> SqlRexConvertlet, and I am able to get this working. The user includes in 
> their predicate statement `custom_authorize(column)`, my convertlet queries 
> the API, gets the authorization rules, builds an OR statement, and the 
> results come back. This works sometimes, but other times the OR condition 
> becomes too large, and I run into CALCITE-2792: 
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>  which causes a stackoverflow and my query dies.
> 
> So I tried converting to an IN statement, having read that IN statements are 
> automatically converted to a sub query join when the default limit of 20 is 
> exceeded. The problem is that this appears only to be true for IN statements 
> that are included in the initial query. IN statements created as the result 
> of a convertlet do not get modified, and are sent as an IN statement, which 
> results in a failure to parse the query. I looked at how Calcite normally 
> does this translation from IN to exists using a join, but it depends on a lot 
> of classes/instances that aren't available in the SqlRexContext space. Is it 
> possible to rewrite my IN statement to a join/exists query like Calcite 
> normally does?
> 
> Also, am I doing things all wrong? Is there a better way to go about this?
> 
> Code Sample below is for the OR version, the commented code can be swapped in 
> to see how I was building the IN statement.
> 
> @Override
> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> HashSet<String> keyList = null;
> try {
> keyList = new Manager().getAllowedIDs(getContextInformation().getQueryUser());
> } catch (SQLException e) {
> e.printStackTrace();
> }
> 
> final RexBuilder rexBuilder = cx.getRexBuilder();
> final RexNode column = cx.convertExpression(call.operand(0));
> 
> final List<RexNode> nodes = new ArrayList<>();
> for(String s: keyList) {
> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
> //nodes.add(rexBuilder.makeLiteral(s));
> }
> 
> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new 
> RexNode[0]));
> 
> return in;
> }
> 
> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... 
> nodes) {
> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> }
> 
> 
> Thanks,
> Peter
> 

Reply via email to