[jira] [Created] (CALCITE-3718) Support Intersect and Minus in Bindables

2020-01-08 Thread xzh_dz (Jira)
xzh_dz created CALCITE-3718:
---

 Summary: Support Intersect and Minus in Bindables
 Key: CALCITE-3718
 URL: https://issues.apache.org/jira/browse/CALCITE-3718
 Project: Calcite
  Issue Type: Wish
Reporter: xzh_dz


Intersect and Minus are not supported in Bindables, this pr try support 
Intersect and Minus in Bindables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Cannot parse query that contains colons for casting in postgres sql

2020-01-08 Thread XING JIN
Hi, Rick ~
Babel parser supports casting by double colon [1]

[1] https://issues.apache.org/jira/browse/CALCITE-2843

-Jin

Rick Liao  于2020年1月9日周四 上午6:05写道:

> Hello all,
>
> I'm trying to parse a postgres sql query that contains :: for casting. The
> parser fails when it reaches the colon. I'm using version 1.21.0. This is
> what I'm running (with a simple query to showcase the error):
>
> SqlParser.Config parserConfig =
>
> SqlParser.configBuilder().setConformance(SqlConformanceEnum.LENIENT).build();
> SqlParser parser1 = SqlParser.create("SELECT 'test'::text t from
> countries", parserConfig);
> SqlNode parsedQuery1 = parser1.parseQuery();
>
> The error:
>
> org.apache.calcite.sql.parser.SqlParseException: Encountered ":" at line 1,
> column 14.
>
> Was expecting one of:
>
> 
>
> "AS" ...
>
> "EXCEPT" ...
>
> "FETCH" ...
>
> "FROM" ...
>
> "INTERSECT" ...
>
> "LIMIT" ...
>
> "OFFSET" ...
>
> "ORDER" ...
>
> "MINUS" ...
>
> "UESCAPE" ...
>
> "UNION" ...
>
>  ...
>
> "," ...
>
>  ...
>
>  ...
>
>  ...
>
>  ...
>
>  ...
>
> "." ...
>
> "NOT" ...
>
> "IN" ...
>
> "<" ...
>
> "<=" ...
>
> ">" ...
>
> ">=" ...
>
> "=" ...
>
> "<>" ...
>
> "!=" ...
>
> "BETWEEN" ...
>
> "LIKE" ...
>
> "SIMILAR" ...
>
> "+" ...
>
> "-" ...
>
> "*" ...
>
> "/" ...
>
> "%" ...
>
> "||" ...
>
> "AND" ...
>
> "OR" ...
>
> "IS" ...
>
> "MEMBER" ...
>
> "SUBMULTISET" ...
>
> "CONTAINS" ...
>
> "OVERLAPS" ...
>
> "EQUALS" ...
>
> "PRECEDES" ...
>
> "SUCCEEDS" ...
>
> "MULTISET" ...
>
> "[" ...
>
>
>
> at
>
> org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(SqlParserImpl.java:343)
>
> at
>
> org.apache.calcite.sql.parser.impl.SqlParserImpl.normalizeException(SqlParserImpl.java:142)
>
> at
> org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:147)
>
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:162)
>
> Perhaps, I have just not found the right conformance for postgres sql.
>
> Thanks for your help!
>
> Rick
>


Re: [DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

2020-01-08 Thread Xiening Dai
That’s why I say it’s hard to solve under current framework design. The example 
query you provide can be, and should be, optimized during logical 
transformation phase. At that moment, there shouldn’t be any cost calculation 
since all we are doing is to explore equivalences. Once the transformation is 
done, the row count, uniqueness which could affect cost calculation shouldn’t 
change anymore. 

But with current Calcite design, rules don’t have concepts of different stages, 
which means a RelNode can be implemented before it’s been fully explored. 
That’s why we will have problem like this.


> On Jan 8, 2020, at 1:42 PM, Vladimir Sitnikov  
> wrote:
> 
>> In theory, the cardinality and uniqueness of a RelSubset should never
> changed per definition of equivalent set
> 
> I agree. It is like in theory there is no difference between theory and
> practice :)
> 
> What if we have select empid from (select empid from emps where empid>0)
> where empid<0  ?
> The original logical plans would likely have two filters, and metadata
> would estimate rowcount as 15% * 15% * tableRowCount (or something like
> that).
> 
> Later the optimizer might realize the relation is equivalent to an empty
> relation, and it could refine the row count as 0.
> So cardinality estimates of a set can vary over time, and I don't think we
> can prevent that.
> 
> It would be nice if metadata could identify a fixed point somehow like in
> dataflow algorithms.
> 
>> We should probably fix defineMaterialization() to provide uniqueness info
> in the first place
> 
> I'm not sure that is the only trigger.
> The thing is VolcanoPlanner.isValid is not activated by default, so we do
> not see cost-related assertion errors.
> 
> Vladimir



Cannot parse query that contains colons for casting in postgres sql

2020-01-08 Thread Rick Liao
Hello all,

I'm trying to parse a postgres sql query that contains :: for casting. The
parser fails when it reaches the colon. I'm using version 1.21.0. This is
what I'm running (with a simple query to showcase the error):

SqlParser.Config parserConfig =
SqlParser.configBuilder().setConformance(SqlConformanceEnum.LENIENT).build();
SqlParser parser1 = SqlParser.create("SELECT 'test'::text t from
countries", parserConfig);
SqlNode parsedQuery1 = parser1.parseQuery();

The error:

org.apache.calcite.sql.parser.SqlParseException: Encountered ":" at line 1,
column 14.

Was expecting one of:



"AS" ...

"EXCEPT" ...

"FETCH" ...

"FROM" ...

"INTERSECT" ...

"LIMIT" ...

"OFFSET" ...

"ORDER" ...

"MINUS" ...

"UESCAPE" ...

"UNION" ...

 ...

"," ...

 ...

 ...

 ...

 ...

 ...

"." ...

"NOT" ...

"IN" ...

"<" ...

"<=" ...

">" ...

">=" ...

"=" ...

"<>" ...

"!=" ...

"BETWEEN" ...

"LIKE" ...

"SIMILAR" ...

"+" ...

"-" ...

"*" ...

"/" ...

"%" ...

"||" ...

"AND" ...

"OR" ...

"IS" ...

"MEMBER" ...

"SUBMULTISET" ...

"CONTAINS" ...

"OVERLAPS" ...

"EQUALS" ...

"PRECEDES" ...

"SUCCEEDS" ...

"MULTISET" ...

"[" ...



at
org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(SqlParserImpl.java:343)

at
org.apache.calcite.sql.parser.impl.SqlParserImpl.normalizeException(SqlParserImpl.java:142)

at
org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:147)

at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:162)

Perhaps, I have just not found the right conformance for postgres sql.

Thanks for your help!

Rick


Re: [DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

2020-01-08 Thread Vladimir Sitnikov
>In theory, the cardinality and uniqueness of a RelSubset should never
changed per definition of equivalent set

I agree. It is like in theory there is no difference between theory and
practice :)

What if we have select empid from (select empid from emps where empid>0)
where empid<0  ?
The original logical plans would likely have two filters, and metadata
would estimate rowcount as 15% * 15% * tableRowCount (or something like
that).

Later the optimizer might realize the relation is equivalent to an empty
relation, and it could refine the row count as 0.
So cardinality estimates of a set can vary over time, and I don't think we
can prevent that.

It would be nice if metadata could identify a fixed point somehow like in
dataflow algorithms.

>We should probably fix defineMaterialization() to provide uniqueness info
in the first place

I'm not sure that is the only trigger.
The thing is VolcanoPlanner.isValid is not activated by default, so we do
not see cost-related assertion errors.

Vladimir


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-08 Thread Xiening Dai
I see. But that’s unrelated to join ordering. 

> On Jan 7, 2020, at 11:29 PM, Danny Chan  wrote:
> 
> Internally we have a multi-inputs merge join, for each input there maybe a 
> collation permutations.
> 
> Best,
> Danny Chan
> 在 2020年1月8日 +0800 AM1:20,Xiening Dai ,写道:
>> Danny, I am not sure how this would affect join re-order. Could you 
>> elaborate?
>> 
>> 
>>> On Jan 7, 2020, at 1:29 AM, Danny Chan  wrote:
>>> 
>>> Hi, guys, it seems that the discussion silent now, so do we have some 
>>> conclusion that can contribute to current code, i.e. the suggested API 
>>> change or new abstraction ?
>>> 
>>> Or better, can someone give a design doc so that we can push and make that 
>>> implemented ?
>>> 
>>> Personally I was always looking forward to the result, because Apache Flink 
>>> suffers for the bad planning performance for Join re-order or traits 
>>> auto-adapter.
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年11月20日 +0800 AM2:14,Vladimir Ozerov ,写道:
 HI Igor,
 
 Thank you for the details. Meanwhile, I solved it with separation of
 conversion rules from the physical optimization rules. So the first pass
 creates physical nodes with unknown physical properties (top-bottom), while
 subsequent processing of the leaf nodes triggers rules which convert "bad"
 physical nodes to "good" physical nodes with know distribution and
 collation.
 
 Regards,
 Vladimir.
 
 пн, 18 нояб. 2019 г. в 13:43, Seliverstov Igor :
 
> Vladimir,
> 
> Hope it may help you.
> 
> Currently we applied the next way (just rough description):
> 
> 1) We created an API to derive possible traits permutations on the basis
> of children traits (quite similar to one, described in «On Demand trait
> set request» topic)
> 
> 2) added a general rule that copies Logical nodes, but requesting our
> convention from their children (IGNITE convention, ANY distribution, EMPTY
> collation) and sets importance of old Logical nodes to zero - so, we have 
> a
> Logical parent which input satisfies any possible distribution and no 
> rules
> matched to previous logical node any more.
> 
> 3) Physical rules to create physical rel nodes only if physical traits may
> be derived (there is no «barrier», described in one of previous messages) 
> -
> derived traits are a collection, we don’t create a physical rel node for
> each possible traits set, also we may set zero importance for previously
> created rel nodes to decrease search space.
> 
> Now we know actual and required distribution, we don’t need
> AbstractConverters and able just call TraitDef.convert() method inside a
> rule.
> A rule still able to produce the same output several times, but
> «memorization" inside the planner solves it for us.
> 
> preliminary tests show almost zero overhead of the approach.
> 
> Regards,
> Igor
> 
> 
>> 14 нояб. 2019 г., в 14:49, Vladimir Ozerov 
> написал(а):
>> 
>> Hi Xing,
>> 
>> Thanks for your suggestion. Yes, this may help, and if I get your idea
>> right, I already had it in my reproducer:
>> 1) Create the converted physical input:
>> 
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49
>> 2) Use it in case no physical children were found:
>> 
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L79
>> 
>> This idea is borrowed from Apache Drill physical rules. But the problem
> is
>> that this approach leads to a suboptimal plan - parent node doesn't know
>> the future distribution of a child node. And as a result, it doesn't know
>> it's own distribution. So the final plan is constructed in that way:
>> 1.1) Root enforced "SINGLETON" on its child:
>> -> PhysicalRoot[SINGLETON]
>> -> Converter[SINGLETON]
>> -> PhysicalProject[*ANY*]
>> -> PhysicalScan[REPLICATED]
>> 
>> 1.2) But since the child (PhysicalProject) failed to infer distribution
>> during rule call, it falls back to ANY distribution. In order to satisfy
>> SINGLETON distribution of a parent, we inject an exchange in the final
> plan:
>> -> PhysicalRoot[SINGLETON]
>> * -> Exchange[SINGLETON]*
>> -> PhysicalProject[*ANY*]
>> -> PhysicalScan[REPLICATED]
>> 
>> 2) But this is a suboptimal plan. The optimal plan is:
>> -> PhysicalRoot[SINGLETON]
>> -> PhysicalProject[REPLICATED]
>> -> PhysicalScan[REPLICATED]
>> 
>> You may observe it in my tests:
>> 1)
>> 
> https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L46
>> -
>> works as you described and produces not optimal plan with exchange
>> 2)
>> 
> 

Re: [DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

2020-01-08 Thread Xiening Dai
This is similar to CALCITE-2166 where a RelNode’s best cost could increase 
after its input RelSubset cardinality is changed. Unfortunately there’s no easy 
way to fix this with current framework design. In theory, the cardinality and 
uniqueness of a RelSubset should never changed per definition of equivalent 
set. The cost propagation logic is built upon such assumption, which I think is 
fine.

Regarding this particular issue, I think the JIRA is pointing to the right 
direction. We should probably fix defineMaterialization() to provide uniqueness 
info in the first place. This would be a much easier fix than updating cost 
propogation.

> On Jan 8, 2020, at 7:21 AM, Vladimir Sitnikov  
> wrote:
> 
> Hi,
> 
> As far as I understand, the incremental best/bestCost maintenance at
> RelSubset level does not really work.
> 
> That issue is triggered a lot in MaterializationTests due to
> https://issues.apache.org/jira/browse/CALCITE-3682
> (MaterializationService#defineMaterialization
> loses information on unique keys)
> In other words, materialization does not have uniqueness information, so
> when the planner realizes that materialization is connected to the source
> table,
> it suddenly receives extra metadata which alters cost estimates
> dramatically.
> 
> Here's the setup:
> 1) RelSubset assumes that best and bestCost are always maintained
> incrementally.
> 2) If a relation changes (e.g. it is added to a subset), the cost change is
> propagated to parentRels (~all the rels that might have that rel as input).
> 
> The propagation happens only in case the new rel is a new best (see [1]).
> So far it looks ok: if we have the new best, then we propagate to other
> parents.
> If the new rel is worse than the previous best, why bother with propagation?
> 
> == Now comes the issue ==
> The newly added rel might easily affect the costs of other rels even if the
> rel is not the best in its subset.
> 
> Here's how that is possible:
> RelMdColumnUniqueness#areColumnsUnique(RelSubset, ...) iterates over all
> the rels in the subset,
> so even if the newly added rel is not the best, it might happen to
> answer areColumnsUnique request
> so other cost functions that rely on uniqueness (e.g. cardinality
> estimations) would change.
> 
> In other words: if the planner somehow realizes a certain subset returns
> unique rows, then a join (in a very distant subset) that was supposed to be
> M*N
> becomes M+N, and its cost greatly reduces even though the subset's best is
> not changed.
> 
> At this point, I'm inclined that incremental bestCost maintenance is not
> really possible.
> 
> Any thoughts?
> 
> [1]:
> https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L358-L360
> 
> Vladimir



[DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

2020-01-08 Thread Vladimir Sitnikov
Hi,

As far as I understand, the incremental best/bestCost maintenance at
RelSubset level does not really work.

That issue is triggered a lot in MaterializationTests due to
https://issues.apache.org/jira/browse/CALCITE-3682
(MaterializationService#defineMaterialization
loses information on unique keys)
In other words, materialization does not have uniqueness information, so
when the planner realizes that materialization is connected to the source
table,
it suddenly receives extra metadata which alters cost estimates
dramatically.

Here's the setup:
1) RelSubset assumes that best and bestCost are always maintained
incrementally.
2) If a relation changes (e.g. it is added to a subset), the cost change is
propagated to parentRels (~all the rels that might have that rel as input).

The propagation happens only in case the new rel is a new best (see [1]).
So far it looks ok: if we have the new best, then we propagate to other
parents.
If the new rel is worse than the previous best, why bother with propagation?

== Now comes the issue ==
The newly added rel might easily affect the costs of other rels even if the
rel is not the best in its subset.

Here's how that is possible:
RelMdColumnUniqueness#areColumnsUnique(RelSubset, ...) iterates over all
the rels in the subset,
so even if the newly added rel is not the best, it might happen to
answer areColumnsUnique request
so other cost functions that rely on uniqueness (e.g. cardinality
estimations) would change.

In other words: if the planner somehow realizes a certain subset returns
unique rows, then a join (in a very distant subset) that was supposed to be
M*N
becomes M+N, and its cost greatly reduces even though the subset's best is
not changed.

At this point, I'm inclined that incremental bestCost maintenance is not
really possible.

Any thoughts?

[1]:
https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L358-L360

Vladimir


[discuss] Implement Presto adapter

2020-01-08 Thread Forward Xu
Hi everybody,

I'd like to kick off a discussion on Implement Presto adapter in calcite.

We know that presto is also a powerful SQL query tool, which is widely used
with the olap of big data. So I want to add an adapter for presto to calite.

Would love to hear your thoughts.

Best,
Forward


[jira] [Created] (CALCITE-3717) Query fails with "division by zero" exception

2020-01-08 Thread Hank Miller (Jira)
Hank Miller created CALCITE-3717:


 Summary: Query fails with "division by zero" exception
 Key: CALCITE-3717
 URL: https://issues.apache.org/jira/browse/CALCITE-3717
 Project: Calcite
  Issue Type: Bug
Reporter: Hank Miller


Hi, community. We use Calcite-1.21 to connect and compute data resident in 
PostgreSQL and MySQL. It works pretty well. But we encountered a problem 
yesterday.

We combine data from different datasources.

Assume the combined table is _T(, c1, c2, c3)_, in which _c1, c2, c3_ are 
integer columns. Then according to our business logic, we need to make further 
data processing in Calcite.
{code:java}
select case when c1=0 then ..
when c2=0 then 1.0/c1 + ...
when c3=0 then 1.0/c1 + 1.0/c2 + ...
   else -99.99 end 
from T{code}
However, the query throws "java.lang.ArithmeticException: Division by zero".

Anything goes wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)