Re: Re: Re: Re: [DISCUSS] On-demand traitset request

2019-10-25 Thread Haisheng Yuan
Yes, you are right, they are similar. But metadata framework can't provide the 
flexibility for the optimization engine to extend to other trait without 
modifying the core. Our goal should be making the optimization engine do all 
the work, physical operators just specify its behavior and characteristics.

In SQL Server, there is another physical property called Rewindability, which 
requires a physical operator to be rewindable. If we want to add this trait, as 
I said in last email [1], users just need to override the derive method to 
consider Rewindability:
public  T derivedTrait(RelTraitDef traitDef)

If using metadata, users have to define their own metadata for the added trait, 
like RelMdRewindability and modify the core to call the method, which is not 
ideal. But we can leverage the RelMdDistribution and RelMdCollation if 
applicable in derivedTrait methods. However, I don't agree to the design of 
RelMdCollation to return a list of collations. And I don't think the property 
requirement and derivation of a physical operator should be scattered to 
different places, which is a minor thing, though.


[1] 
http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/
- Haisheng

On Fri, Oct 25, 2019 at 08:36:07 GMT Stamatis Zampetakis  
wrote:
I would like a further clarification regarding the methods:
derivedDistribution()
derivedCollation()

What would be the difference with the existing derivation mechanism in
RelMdDistribution [1], and RelMdCollation [2].
They are not sufficient to provide the necessary information?

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java


 --Original Mail --
Sender:Haisheng Yuan 
Send Date:Fri Oct 25 12:01:40 2019
Recipients:dev@calcite.apache.org (dev@calcite.apache.org) 

Subject:Re: Re: Re: [DISCUSS] On-demand traitset request

I didn't say adding to RelNode, but a new API/interface for physical operator 
only.

What matters is not the number of interfaces, but the necessity of these 
methods.

- Haisheng

--
发件人:Danny Chan
日 期:2019年10月25日 09:55:56
收件人:
主 题:Re: Re: [DISCUSS] On-demand traitset request

I have the same feeling, it seems to much interfaces for the physical node(we 
do not really have physical class for physical nodes yet), so these interfaces 
may just be put into the RelNode, that was too complex and to much for me, can 
we have a way that do not modify the nodes itself ?

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道:
> Overall, I agree that better encapsulation of propagation and derivation of
> traits would be beneficial for our system.
>
> Regarding the API proposed by Haisheng, I have to think a bit more on it.
> At first glance, adding such methods directly in the RelNode API does not
> appear an ideal solution since I don't see how easily it can be extended to
> support other kinds of traits.
>
> Best,
> Stamatis
>
> On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan 
> wrote:
>
> > To Stamatis,
> > Not exactly. My initial thought was giving the physical operator the
> > abiity to customize and fully control physical property derivation
> > strategy, thus can further help the purpose driven trait request. But since
> > we agree to think more high-level API to support on-demand traitset
> > request, I will illustrate what API is expected from implentator's
> > perspective.
> >
> > Jingfeng gave us basic steps on how the plan might be generated using
> > top-down purpose driven only manner, I think differently with the first
> > several steps.
> >
> > SELECT DISTINCT c, b FROM
> > ( SELECT R.c c, S.b b FROM R, S
> > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Aggregate . (c, b)
> > +--- MergeJoin . (a, b, c)
> > |--- TableScan on R
> > +-- TableScan on S
> >
> > 1. Aggreate require collation (c,b) from its child, not permutation.
> > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > ignore it.
> > a) Pass down. it has join condition on (a,b,c), the required columns
> > can be coverd by join condition columns, so MergeJoin will try to deliver
> > (c,b,a), and both children must exact match. Then we will have sort on both
> > children of MergeJoin.
> > b) Ignore it. Require its first child collation on (a,b,c), but
> > matching type is subset. R delivers (c,b,a). Then using the first child's
> > derived collation trait to require its second child to exact match. Thus we
> > have a sort on S, and a sort on top of MergeJoin.
> >
> > Both plan might be good or bad. If R, S are large, but the join result is
> > small, plan b) might be better, otherwise plan a) might be better.
> >
> > Anyway, I hope the physical operators can have full control the physical
> > properties requests and derivation, in physical 

[jira] [Created] (CALCITE-3450) Support Intersect and Minus in RelMdTableReferences.

2019-10-25 Thread xzh_dz (Jira)
xzh_dz created CALCITE-3450:
---

 Summary: Support Intersect and Minus in RelMdTableReferences.
 Key: CALCITE-3450
 URL: https://issues.apache.org/jira/browse/CALCITE-3450
 Project: Calcite
  Issue Type: Wish
Reporter: xzh_dz


Support Intersect and Minus in RelMdTableReferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: [DISCUSS] On-demand traitset request

2019-10-25 Thread Haisheng Yuan
I didn't say adding to RelNode, but a new API/interface for physical operator 
only.

What matters is not the number of interfaces, but the necessity of these 
methods.

- Haisheng

--
发件人:Danny Chan
日 期:2019年10月25日 09:55:56
收件人:
主 题:Re: Re: [DISCUSS] On-demand traitset request

I have the same feeling, it seems to much interfaces for the physical node(we 
do not really have physical class for physical nodes yet), so these interfaces 
may just be put into the RelNode, that was too complex and to much for me, can 
we have a way that do not modify the nodes itself ?

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道:
> Overall, I agree that better encapsulation of propagation and derivation of
> traits would be beneficial for our system.
>
> Regarding the API proposed by Haisheng, I have to think a bit more on it.
> At first glance, adding such methods directly in the RelNode API does not
> appear an ideal solution since I don't see how easily it can be extended to
> support other kinds of traits.
>
> Best,
> Stamatis
>
> On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan 
> wrote:
>
> > To Stamatis,
> > Not exactly. My initial thought was giving the physical operator the
> > abiity to customize and fully control physical property derivation
> > strategy, thus can further help the purpose driven trait request. But since
> > we agree to think more high-level API to support on-demand traitset
> > request, I will illustrate what API is expected from implentator's
> > perspective.
> >
> > Jingfeng gave us basic steps on how the plan might be generated using
> > top-down purpose driven only manner, I think differently with the first
> > several steps.
> >
> > SELECT DISTINCT c, b FROM
> > ( SELECT R.c c, S.b b FROM R, S
> > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Aggregate . (c, b)
> > +--- MergeJoin . (a, b, c)
> > |--- TableScan on R
> > +-- TableScan on S
> >
> > 1. Aggreate require collation (c,b) from its child, not permutation.
> > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > ignore it.
> > a) Pass down. it has join condition on (a,b,c), the required columns
> > can be coverd by join condition columns, so MergeJoin will try to deliver
> > (c,b,a), and both children must exact match. Then we will have sort on both
> > children of MergeJoin.
> > b) Ignore it. Require its first child collation on (a,b,c), but
> > matching type is subset. R delivers (c,b,a). Then using the first child's
> > derived collation trait to require its second child to exact match. Thus we
> > have a sort on S, and a sort on top of MergeJoin.
> >
> > Both plan might be good or bad. If R, S are large, but the join result is
> > small, plan b) might be better, otherwise plan a) might be better.
> >
> > Anyway, I hope the physical operators can have full control the physical
> > properties requests and derivation, in physical operator class itself, not
> > rules, not other places.
> >
> > Per our experience, we have spent too much time on writing code for
> > dealing with all kinds of property requirement and derivation. But in fact,
> > life should be easier. I would like to the physical operator provides the
> > following API, and the 3rd party implementator just need to
> > override/implement them, no more need to be taken care.
> >
> > 1. void setDistributionRequests(int numReq)
> > Each operator can specify how many optimzation requests on some trait it
> > want to do. e.g. HashJoin may request the following distribution on both
> > children:
> > - (hash distribution on key1, hash distribution on key1)
> > - (hash distribution on key2, hash distribution on key2)
> > - (hash distribution on all keys, hash distribution on all keys)
> > - (Any, Broadcast)
> > - (Gather, Gather)
> >
> > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > child) //same for collation
> > Given the required distribution from parent operator, returns the required
> > distribution for its nth child.
> >
> > 3. RelDistribution derivedDistribution() //same for collation
> > Derive the distribution of the operator itelf from child operators.
> >
> > 4. MatchType distributionMatchType(int child) //same for collation
> > Returns the distribution match type for its nth child, how does it match
> > the other children.
> > Similar with Jinfeng's point, I think there should be 3 types of matching:
> > exact, satisfy, subset.
> > e.g.
> > R is distributed by (a), S is distributed by (a,b)
> > select * from R join S using a,b,c
> > If we have plan
> > HashJoin
> > |-- TableScan on R
> > +-- TableScan on S
> > We may require the match type on S to be satisfy. (a,b) satisfies required
> > distribution (a,b,c).
> > Fot the outer child R, we require it to be exact match with inner.
> >
> > 5. ExecOrder getExecOrder()
> > Returns how the operator's children is executed, left to right, or right
> > to left. Typically, hash join is right 

[jira] [Created] (CALCITE-3449) Modify sql will have sub schema in it's table name

2019-10-25 Thread Zhuang (Jira)
Zhuang created CALCITE-3449:
---

 Summary: Modify sql will have sub schema in it's table name
 Key: CALCITE-3449
 URL: https://issues.apache.org/jira/browse/CALCITE-3449
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Zhuang


When sending queries to target databse, sub schemas should be removed from the 
sql, but not for modify statement(update\delete\insert). For example, 
{quote}delete from "sub_schema".target_table where ...
{quote}
will send following sql to target database
{quote}delete from {color:#ff}"sub_schema"{color}.target_table where ... 
{quote}
But select quries worked as expect.
{quote}select * from "sub_schema".target_table
{quote}
will be translated into following and send to target databse.
{quote}select * from target_table
{quote}
I've done some inspect, and find that the code are different. In 
org.apache.calcite.rel.rel2sql.RelToSqlConverter.java, the table names have sub 
schemas in it.
 
{code:title=RelToSqlConverter.java|borderStyle=solid}public Result 
visit(TableModify modify) {
 final Map pairs = ImmutableMap.of();
 final Context context = aliasContext(pairs, false);

// Target Table Name
 final SqlIdentifier sqlTargetTable =
 new SqlIdentifier(modify.getTable().getQualifiedName(), POS);
{code}

But in select query , the table name are just table name, without sub schema.

{code:title=Bar.java|borderStyle=solid}
public Result visit(TableScan e) {
final SqlIdentifier identifier;
final JdbcTable jdbcTable = e.getTable().unwrap(JdbcTable.class);
if (jdbcTable != null) {
  // Use the foreign catalog, schema and table names, if they exist,
  // rather than the qualified name of the shadow table in Calcite.
  identifier = jdbcTable.tableName();
} else {
  final List qualifiedName = e.getTable().getQualifiedName();
  identifier = new SqlIdentifier(qualifiedName, SqlParserPos.ZERO);
}
{code}

I'm really new to Caicite and commit issues, thanks!
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3448) AggregateOnProjectToAggregateUnifyRule ignores Project incorrectly when there's missing grouping or mapping breaks ordering

2019-10-25 Thread jin xing (Jira)
jin xing created CALCITE-3448:
-

 Summary: AggregateOnProjectToAggregateUnifyRule ignores Project 
incorrectly when there's missing grouping or mapping breaks ordering
 Key: CALCITE-3448
 URL: https://issues.apache.org/jira/browse/CALCITE-3448
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: jin xing
Assignee: jin xing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] State of the project 2019

2019-10-25 Thread Danny Chan
Oh, you can add my weixin(send personal mail for that) and I have a free ticket 
for the conference !

Best,
Danny Chan
在 2019年10月25日 +0800 PM6:29,Juan Pan ,写道:
> Hi Danny,
>
>
> I am interested in your coming talk in Beijing China. How to take part in it, 
> can you give me more detail?
>
>
> Juan Pan
>
>
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
>
>
> On 10/23/2019 18:23,Danny Chan wrote:
> I gave a talk last year in a university in
> France, and nobody in the audience had ever heard of Calcite before.
>
> Oops, that's a pity, I would also give a talk about Calcite on Flink Forward 
> Asia 2019 of BeiJing China, hope more people would know Apache Calcite.
>
> Best,
> Danny Chan
> 在 2019年10月23日 +0800 PM2:36,dev@calcite.apache.org,写道:
>
> I gave a talk last year in a university in
> France, and nobody in the audience had ever heard of Calcite before.


Re: [DISCUSS] State of the project 2019

2019-10-25 Thread Juan Pan
Hi Danny,


I am interested in your coming talk in Beijing China. How to take part in it, 
can you give me more detail?


 Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 10/23/2019 18:23,Danny Chan wrote:
I gave a talk last year in a university in
France, and nobody in the audience had ever heard of Calcite before.

Oops, that's a pity, I would also give a talk about Calcite on Flink Forward 
Asia 2019 of BeiJing China, hope more people  would know Apache Calcite.

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:36,dev@calcite.apache.org,写道:

I gave a talk last year in a university in
France, and nobody in the audience had ever heard of Calcite before.


[jira] [Created] (CALCITE-3447) Fix equivalents in method SubstitutionVisitor#go

2019-10-25 Thread daimin (Jira)
daimin created CALCITE-3447:
---

 Summary: Fix equivalents in method SubstitutionVisitor#go
 Key: CALCITE-3447
 URL: https://issues.apache.org/jira/browse/CALCITE-3447
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: daimin


Code segment here depends on `hashcode` and `equals` methods of class 
`MutableRel`:

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/SubstitutionVisitor.java#L492-L502]
 

However the implementations of class `MutableScan` delegates to class 
`TableScan`, which directly relies on implementations of class `Object`. This 
leads to a situation that two `MutableScan` on the exactly same table will not 
be considered as equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: [DISCUSS] On-demand traitset request

2019-10-25 Thread Stamatis Zampetakis
I would like a further clarification regarding the methods:
derivedDistribution()
derivedCollation()

What would be the difference with the existing derivation mechanism in
RelMdDistribution [1], and RelMdCollation [2].
They are not sufficient to provide the necessary information?

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java


On Fri, Oct 25, 2019 at 3:56 AM Danny Chan  wrote:

> I have the same feeling, it seems to much interfaces for the physical
> node(we do not really have physical class for physical nodes yet), so these
> interfaces may just be put into the RelNode, that was too complex and to
> much for me, can we have a way that do not modify the nodes itself ?
>
> Best,
> Danny Chan
> 在 2019年10月23日 +0800 PM2:53,Stamatis Zampetakis ,写道:
> > Overall, I agree that better encapsulation of propagation and derivation
> of
> > traits would be beneficial for our system.
> >
> > Regarding the API proposed by Haisheng, I have to think a bit more on it.
> > At first glance, adding such methods directly in the RelNode API does not
> > appear an ideal solution since I don't see how easily it can be extended
> to
> > support other kinds of traits.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan 
> > wrote:
> >
> > > To Stamatis,
> > > Not exactly. My initial thought was giving the physical operator the
> > > abiity to customize and fully control physical property derivation
> > > strategy, thus can further help the purpose driven trait request. But
> since
> > > we agree to think more high-level API to support on-demand traitset
> > > request, I will illustrate what API is expected from implentator's
> > > perspective.
> > >
> > > Jingfeng gave us basic steps on how the plan might be generated using
> > > top-down purpose driven only manner, I think differently with the first
> > > several steps.
> > >
> > > SELECT DISTINCT c, b FROM
> > > ( SELECT R.c c, S.b b FROM R, S
> > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >
> > > Aggregate . (c, b)
> > > +--- MergeJoin . (a, b, c)
> > > |--- TableScan on R
> > > +-- TableScan on S
> > >
> > > 1. Aggreate require collation (c,b) from its child, not permutation.
> > > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > > ignore it.
> > > a) Pass down. it has join condition on (a,b,c), the required columns
> > > can be coverd by join condition columns, so MergeJoin will try to
> deliver
> > > (c,b,a), and both children must exact match. Then we will have sort on
> both
> > > children of MergeJoin.
> > > b) Ignore it. Require its first child collation on (a,b,c), but
> > > matching type is subset. R delivers (c,b,a). Then using the first
> child's
> > > derived collation trait to require its second child to exact match.
> Thus we
> > > have a sort on S, and a sort on top of MergeJoin.
> > >
> > > Both plan might be good or bad. If R, S are large, but the join result
> is
> > > small, plan b) might be better, otherwise plan a) might be better.
> > >
> > > Anyway, I hope the physical operators can have full control the
> physical
> > > properties requests and derivation, in physical operator class itself,
> not
> > > rules, not other places.
> > >
> > > Per our experience, we have spent too much time on writing code for
> > > dealing with all kinds of property requirement and derivation. But in
> fact,
> > > life should be easier. I would like to the physical operator provides
> the
> > > following API, and the 3rd party implementator just need to
> > > override/implement them, no more need to be taken care.
> > >
> > > 1. void setDistributionRequests(int numReq)
> > > Each operator can specify how many optimzation requests on some trait
> it
> > > want to do. e.g. HashJoin may request the following distribution on
> both
> > > children:
> > > - (hash distribution on key1, hash distribution on key1)
> > > - (hash distribution on key2, hash distribution on key2)
> > > - (hash distribution on all keys, hash distribution on all keys)
> > > - (Any, Broadcast)
> > > - (Gather, Gather)
> > >
> > > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > > child) //same for collation
> > > Given the required distribution from parent operator, returns the
> required
> > > distribution for its nth child.
> > >
> > > 3. RelDistribution derivedDistribution() //same for collation
> > > Derive the distribution of the operator itelf from child operators.
> > >
> > > 4. MatchType distributionMatchType(int child) //same for collation
> > > Returns the distribution match type for its nth child, how does it
> match
> > > the other children.
> > > Similar with Jinfeng's point, I think there should be 3 types of
> matching:
> > > exact, satisfy, subset.
> > > e.g.
> > > R is distributed by (a), S is distributed by