Re: Re: Re: Re: [DISCUSS] On-demand traitset request
Yes, you are right, they are similar. But metadata framework can't provide the flexibility for the optimization engine to extend to other trait without modifying the core. Our goal should be making the optimization engine do all the work, physical operators just specify its behavior and characteristics. In SQL Server, there is another physical property called Rewindability, which requires a physical operator to be rewindable. If we want to add this trait, as I said in last email [1], users just need to override the derive method to consider Rewindability: public T derivedTrait(RelTraitDef traitDef) If using metadata, users have to define their own metadata for the added trait, like RelMdRewindability and modify the core to call the method, which is not ideal. But we can leverage the RelMdDistribution and RelMdCollation if applicable in derivedTrait methods. However, I don't agree to the design of RelMdCollation to return a list of collations. And I don't think the property requirement and derivation of a physical operator should be scattered to different places, which is a minor thing, though. [1] http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/ - Haisheng On Fri, Oct 25, 2019 at 08:36:07 GMT Stamatis Zampetakis wrote: I would like a further clarification regarding the methods: derivedDistribution() derivedCollation() What would be the difference with the existing derivation mechanism in RelMdDistribution [1], and RelMdCollation [2]. They are not sufficient to provide the necessary information? [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java [2] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java --Original Mail -- Sender:Haisheng Yuan Send Date:Fri Oct 25 12:01:40 2019 Recipients:dev@calcite.apache.org (dev@calcite.apache.org) Subject:Re: Re: Re: [DISCUSS] On-demand traitset request I didn't say adding to RelNode, but a new API/interface for physical operator only. What matters is not the number of interfaces, but the necessity of these methods. - Haisheng -- 发件人:Danny Chan 日 期:2019年10月25日 09:55:56 收件人: 主 题:Re: Re: [DISCUSS] On-demand traitset request I have the same feeling, it seems to much interfaces for the physical node(we do not really have physical class for physical nodes yet), so these interfaces may just be put into the RelNode, that was too complex and to much for me, can we have a way that do not modify the nodes itself ? Best, Danny Chan 在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道: > Overall, I agree that better encapsulation of propagation and derivation of > traits would be beneficial for our system. > > Regarding the API proposed by Haisheng, I have to think a bit more on it. > At first glance, adding such methods directly in the RelNode API does not > appear an ideal solution since I don't see how easily it can be extended to > support other kinds of traits. > > Best, > Stamatis > > On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan > wrote: > > > To Stamatis, > > Not exactly. My initial thought was giving the physical operator the > > abiity to customize and fully control physical property derivation > > strategy, thus can further help the purpose driven trait request. But since > > we agree to think more high-level API to support on-demand traitset > > request, I will illustrate what API is expected from implentator's > > perspective. > > > > Jingfeng gave us basic steps on how the plan might be generated using > > top-down purpose driven only manner, I think differently with the first > > several steps. > > > > SELECT DISTINCT c, b FROM > > ( SELECT R.c c, S.b b FROM R, S > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > > Aggregate . (c, b) > > +--- MergeJoin . (a, b, c) > > |--- TableScan on R > > +-- TableScan on S > > > > 1. Aggreate require collation (c,b) from its child, not permutation. > > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or > > ignore it. > > a) Pass down. it has join condition on (a,b,c), the required columns > > can be coverd by join condition columns, so MergeJoin will try to deliver > > (c,b,a), and both children must exact match. Then we will have sort on both > > children of MergeJoin. > > b) Ignore it. Require its first child collation on (a,b,c), but > > matching type is subset. R delivers (c,b,a). Then using the first child's > > derived collation trait to require its second child to exact match. Thus we > > have a sort on S, and a sort on top of MergeJoin. > > > > Both plan might be good or bad. If R, S are large, but the join result is > > small, plan b) might be better, otherwise plan a) might be better. > > > > Anyway, I hope the physical operators can have full control the physical > > properties requests and derivation, in physical
[jira] [Created] (CALCITE-3450) Support Intersect and Minus in RelMdTableReferences.
xzh_dz created CALCITE-3450: --- Summary: Support Intersect and Minus in RelMdTableReferences. Key: CALCITE-3450 URL: https://issues.apache.org/jira/browse/CALCITE-3450 Project: Calcite Issue Type: Wish Reporter: xzh_dz Support Intersect and Minus in RelMdTableReferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: Re: [DISCUSS] On-demand traitset request
I didn't say adding to RelNode, but a new API/interface for physical operator only. What matters is not the number of interfaces, but the necessity of these methods. - Haisheng -- 发件人:Danny Chan 日 期:2019年10月25日 09:55:56 收件人: 主 题:Re: Re: [DISCUSS] On-demand traitset request I have the same feeling, it seems to much interfaces for the physical node(we do not really have physical class for physical nodes yet), so these interfaces may just be put into the RelNode, that was too complex and to much for me, can we have a way that do not modify the nodes itself ? Best, Danny Chan 在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道: > Overall, I agree that better encapsulation of propagation and derivation of > traits would be beneficial for our system. > > Regarding the API proposed by Haisheng, I have to think a bit more on it. > At first glance, adding such methods directly in the RelNode API does not > appear an ideal solution since I don't see how easily it can be extended to > support other kinds of traits. > > Best, > Stamatis > > On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan > wrote: > > > To Stamatis, > > Not exactly. My initial thought was giving the physical operator the > > abiity to customize and fully control physical property derivation > > strategy, thus can further help the purpose driven trait request. But since > > we agree to think more high-level API to support on-demand traitset > > request, I will illustrate what API is expected from implentator's > > perspective. > > > > Jingfeng gave us basic steps on how the plan might be generated using > > top-down purpose driven only manner, I think differently with the first > > several steps. > > > > SELECT DISTINCT c, b FROM > > ( SELECT R.c c, S.b b FROM R, S > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > > Aggregate . (c, b) > > +--- MergeJoin . (a, b, c) > > |--- TableScan on R > > +-- TableScan on S > > > > 1. Aggreate require collation (c,b) from its child, not permutation. > > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or > > ignore it. > > a) Pass down. it has join condition on (a,b,c), the required columns > > can be coverd by join condition columns, so MergeJoin will try to deliver > > (c,b,a), and both children must exact match. Then we will have sort on both > > children of MergeJoin. > > b) Ignore it. Require its first child collation on (a,b,c), but > > matching type is subset. R delivers (c,b,a). Then using the first child's > > derived collation trait to require its second child to exact match. Thus we > > have a sort on S, and a sort on top of MergeJoin. > > > > Both plan might be good or bad. If R, S are large, but the join result is > > small, plan b) might be better, otherwise plan a) might be better. > > > > Anyway, I hope the physical operators can have full control the physical > > properties requests and derivation, in physical operator class itself, not > > rules, not other places. > > > > Per our experience, we have spent too much time on writing code for > > dealing with all kinds of property requirement and derivation. But in fact, > > life should be easier. I would like to the physical operator provides the > > following API, and the 3rd party implementator just need to > > override/implement them, no more need to be taken care. > > > > 1. void setDistributionRequests(int numReq) > > Each operator can specify how many optimzation requests on some trait it > > want to do. e.g. HashJoin may request the following distribution on both > > children: > > - (hash distribution on key1, hash distribution on key1) > > - (hash distribution on key2, hash distribution on key2) > > - (hash distribution on all keys, hash distribution on all keys) > > - (Any, Broadcast) > > - (Gather, Gather) > > > > 2. RelDistribution requiredDistribution(RelDistribution required, int > > child) //same for collation > > Given the required distribution from parent operator, returns the required > > distribution for its nth child. > > > > 3. RelDistribution derivedDistribution() //same for collation > > Derive the distribution of the operator itelf from child operators. > > > > 4. MatchType distributionMatchType(int child) //same for collation > > Returns the distribution match type for its nth child, how does it match > > the other children. > > Similar with Jinfeng's point, I think there should be 3 types of matching: > > exact, satisfy, subset. > > e.g. > > R is distributed by (a), S is distributed by (a,b) > > select * from R join S using a,b,c > > If we have plan > > HashJoin > > |-- TableScan on R > > +-- TableScan on S > > We may require the match type on S to be satisfy. (a,b) satisfies required > > distribution (a,b,c). > > Fot the outer child R, we require it to be exact match with inner. > > > > 5. ExecOrder getExecOrder() > > Returns how the operator's children is executed, left to right, or right > > to left. Typically, hash join is right
[jira] [Created] (CALCITE-3449) Modify sql will have sub schema in it's table name
Zhuang created CALCITE-3449: --- Summary: Modify sql will have sub schema in it's table name Key: CALCITE-3449 URL: https://issues.apache.org/jira/browse/CALCITE-3449 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.21.0 Reporter: Zhuang When sending queries to target databse, sub schemas should be removed from the sql, but not for modify statement(update\delete\insert). For example, {quote}delete from "sub_schema".target_table where ... {quote} will send following sql to target database {quote}delete from {color:#ff}"sub_schema"{color}.target_table where ... {quote} But select quries worked as expect. {quote}select * from "sub_schema".target_table {quote} will be translated into following and send to target databse. {quote}select * from target_table {quote} I've done some inspect, and find that the code are different. In org.apache.calcite.rel.rel2sql.RelToSqlConverter.java, the table names have sub schemas in it. {code:title=RelToSqlConverter.java|borderStyle=solid}public Result visit(TableModify modify) { final Map pairs = ImmutableMap.of(); final Context context = aliasContext(pairs, false); // Target Table Name final SqlIdentifier sqlTargetTable = new SqlIdentifier(modify.getTable().getQualifiedName(), POS); {code} But in select query , the table name are just table name, without sub schema. {code:title=Bar.java|borderStyle=solid} public Result visit(TableScan e) { final SqlIdentifier identifier; final JdbcTable jdbcTable = e.getTable().unwrap(JdbcTable.class); if (jdbcTable != null) { // Use the foreign catalog, schema and table names, if they exist, // rather than the qualified name of the shadow table in Calcite. identifier = jdbcTable.tableName(); } else { final List qualifiedName = e.getTable().getQualifiedName(); identifier = new SqlIdentifier(qualifiedName, SqlParserPos.ZERO); } {code} I'm really new to Caicite and commit issues, thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3448) AggregateOnProjectToAggregateUnifyRule ignores Project incorrectly when there's missing grouping or mapping breaks ordering
jin xing created CALCITE-3448: - Summary: AggregateOnProjectToAggregateUnifyRule ignores Project incorrectly when there's missing grouping or mapping breaks ordering Key: CALCITE-3448 URL: https://issues.apache.org/jira/browse/CALCITE-3448 Project: Calcite Issue Type: Improvement Components: core Reporter: jin xing Assignee: jin xing -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] State of the project 2019
Oh, you can add my weixin(send personal mail for that) and I have a free ticket for the conference ! Best, Danny Chan 在 2019年10月25日 +0800 PM6:29,Juan Pan ,写道: > Hi Danny, > > > I am interested in your coming talk in Beijing China. How to take part in it, > can you give me more detail? > > > Juan Pan > > > panj...@apache.org > Juan Pan(Trista), Apache ShardingSphere > > > On 10/23/2019 18:23,Danny Chan wrote: > I gave a talk last year in a university in > France, and nobody in the audience had ever heard of Calcite before. > > Oops, that's a pity, I would also give a talk about Calcite on Flink Forward > Asia 2019 of BeiJing China, hope more people would know Apache Calcite. > > Best, > Danny Chan > 在 2019年10月23日 +0800 PM2:36,dev@calcite.apache.org,写道: > > I gave a talk last year in a university in > France, and nobody in the audience had ever heard of Calcite before.
Re: [DISCUSS] State of the project 2019
Hi Danny, I am interested in your coming talk in Beijing China. How to take part in it, can you give me more detail? Juan Pan panj...@apache.org Juan Pan(Trista), Apache ShardingSphere On 10/23/2019 18:23,Danny Chan wrote: I gave a talk last year in a university in France, and nobody in the audience had ever heard of Calcite before. Oops, that's a pity, I would also give a talk about Calcite on Flink Forward Asia 2019 of BeiJing China, hope more people would know Apache Calcite. Best, Danny Chan 在 2019年10月23日 +0800 PM2:36,dev@calcite.apache.org,写道: I gave a talk last year in a university in France, and nobody in the audience had ever heard of Calcite before.
[jira] [Created] (CALCITE-3447) Fix equivalents in method SubstitutionVisitor#go
daimin created CALCITE-3447: --- Summary: Fix equivalents in method SubstitutionVisitor#go Key: CALCITE-3447 URL: https://issues.apache.org/jira/browse/CALCITE-3447 Project: Calcite Issue Type: Bug Components: core Reporter: daimin Code segment here depends on `hashcode` and `equals` methods of class `MutableRel`: [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/SubstitutionVisitor.java#L492-L502] However the implementations of class `MutableScan` delegates to class `TableScan`, which directly relies on implementations of class `Object`. This leads to a situation that two `MutableScan` on the exactly same table will not be considered as equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: [DISCUSS] On-demand traitset request
I would like a further clarification regarding the methods: derivedDistribution() derivedCollation() What would be the difference with the existing derivation mechanism in RelMdDistribution [1], and RelMdCollation [2]. They are not sufficient to provide the necessary information? [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java [2] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java On Fri, Oct 25, 2019 at 3:56 AM Danny Chan wrote: > I have the same feeling, it seems to much interfaces for the physical > node(we do not really have physical class for physical nodes yet), so these > interfaces may just be put into the RelNode, that was too complex and to > much for me, can we have a way that do not modify the nodes itself ? > > Best, > Danny Chan > 在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道: > > Overall, I agree that better encapsulation of propagation and derivation > of > > traits would be beneficial for our system. > > > > Regarding the API proposed by Haisheng, I have to think a bit more on it. > > At first glance, adding such methods directly in the RelNode API does not > > appear an ideal solution since I don't see how easily it can be extended > to > > support other kinds of traits. > > > > Best, > > Stamatis > > > > On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan > > wrote: > > > > > To Stamatis, > > > Not exactly. My initial thought was giving the physical operator the > > > abiity to customize and fully control physical property derivation > > > strategy, thus can further help the purpose driven trait request. But > since > > > we agree to think more high-level API to support on-demand traitset > > > request, I will illustrate what API is expected from implentator's > > > perspective. > > > > > > Jingfeng gave us basic steps on how the plan might be generated using > > > top-down purpose driven only manner, I think differently with the first > > > several steps. > > > > > > SELECT DISTINCT c, b FROM > > > ( SELECT R.c c, S.b b FROM R, S > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > > > > Aggregate . (c, b) > > > +--- MergeJoin . (a, b, c) > > > |--- TableScan on R > > > +-- TableScan on S > > > > > > 1. Aggreate require collation (c,b) from its child, not permutation. > > > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or > > > ignore it. > > > a) Pass down. it has join condition on (a,b,c), the required columns > > > can be coverd by join condition columns, so MergeJoin will try to > deliver > > > (c,b,a), and both children must exact match. Then we will have sort on > both > > > children of MergeJoin. > > > b) Ignore it. Require its first child collation on (a,b,c), but > > > matching type is subset. R delivers (c,b,a). Then using the first > child's > > > derived collation trait to require its second child to exact match. > Thus we > > > have a sort on S, and a sort on top of MergeJoin. > > > > > > Both plan might be good or bad. If R, S are large, but the join result > is > > > small, plan b) might be better, otherwise plan a) might be better. > > > > > > Anyway, I hope the physical operators can have full control the > physical > > > properties requests and derivation, in physical operator class itself, > not > > > rules, not other places. > > > > > > Per our experience, we have spent too much time on writing code for > > > dealing with all kinds of property requirement and derivation. But in > fact, > > > life should be easier. I would like to the physical operator provides > the > > > following API, and the 3rd party implementator just need to > > > override/implement them, no more need to be taken care. > > > > > > 1. void setDistributionRequests(int numReq) > > > Each operator can specify how many optimzation requests on some trait > it > > > want to do. e.g. HashJoin may request the following distribution on > both > > > children: > > > - (hash distribution on key1, hash distribution on key1) > > > - (hash distribution on key2, hash distribution on key2) > > > - (hash distribution on all keys, hash distribution on all keys) > > > - (Any, Broadcast) > > > - (Gather, Gather) > > > > > > 2. RelDistribution requiredDistribution(RelDistribution required, int > > > child) //same for collation > > > Given the required distribution from parent operator, returns the > required > > > distribution for its nth child. > > > > > > 3. RelDistribution derivedDistribution() //same for collation > > > Derive the distribution of the operator itelf from child operators. > > > > > > 4. MatchType distributionMatchType(int child) //same for collation > > > Returns the distribution match type for its nth child, how does it > match > > > the other children. > > > Similar with Jinfeng's point, I think there should be 3 types of > matching: > > > exact, satisfy, subset. > > > e.g. > > > R is distributed by (a), S is distributed by