[jira] [Commented] (CALCITE-4786) Facilitate use of graalvm native-image compilation

2021-12-02 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452786#comment-17452786
 ] 

Lai Zhou commented on CALCITE-4786:
---

Sounds like an exciting project. (y)
 
 

> Facilitate use of graalvm native-image compilation
> --
>
> Key: CALCITE-4786
> URL: https://issues.apache.org/jira/browse/CALCITE-4786
> Project: Calcite
>  Issue Type: Improvement
>  Components: build
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now, there are number of things that make it difficult to use Calcite 
> with GraalVM native compilation.
> There are several reasons that supporting this kind of compilation could be 
> beneficial:
> - Enable use of Calcite as a Lambda with minimal startup-time
> - Create a Calcite shared library that can be easily embedded in other 
> languages
> Initially, I would focus this work on core parsing and query planning. 
> This work was inspired by work on https://substrait.io
> Let's use this ticket to track improvements that can be done to enable this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2020-09-09 Thread Lai Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou closed CALCITE-2741.
-
Resolution: Not A Problem

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valuable when someone want to migrate his hive etl jobs to 
> real-time scene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition

2020-09-09 Thread Lai Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou closed CALCITE-2992.
-
Resolution: Fixed

> Enhance implicit conversions when generating hash join keys for an 
> equiCondition
> 
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4161) MergeJoin algorithm should not assume inputs sorted in ascending order

2020-08-06 Thread Lai Zhou (Jira)
Lai Zhou created CALCITE-4161:
-

 Summary: MergeJoin algorithm should not assume inputs sorted in 
ascending order
 Key: CALCITE-4161
 URL: https://issues.apache.org/jira/browse/CALCITE-4161
 Project: Calcite
  Issue Type: Bug
  Components: core, linq4j
Affects Versions: 1.24.0
Reporter: Lai Zhou


given a sql:
{code:java}
select id,first_name,vs.specialty_id from vets join vet_specialties vs on 
vets.id = vs.vet_id and vet_id>1 order by id desc limit 100
{code}
the final plan is:

 
{code:java}
EnumerableCalc(expr#0..3=[{inputs}], proj#0..1=[{exprs}], specialty_id=[$t3]) 
EnumerableLimit(fetch=[100]) EnumerableMergeJoin(condition=[=($0, $2)], 
joinType=[inner]) EnumerableSort(sort0=[$0], dir0=[DESC]) 
EnumerableCalc(expr#0..2=[{inputs}], proj#0..1=[{exprs}]) 
JdbcToEnumerableConverter JdbcFilter(condition=[>($0, 1)]) 
JdbcTableScan(table=[[default, vets]]) EnumerableSort(sort0=[$0], dir0=[DESC]) 
JdbcToEnumerableConverter JdbcFilter(condition=[>($0, 1)]) 
JdbcTableScan(table=[[default, vet_specialties]]) 
{code}
The inputs of EnumerableMergeJoin is sorted in descending order,

but the MergeJoinEnumerator just supports inputs sorted in ascending order,

the result is wrong.

 

I think the MergeJoin should not assume inputs sorted in ascending order,

it should know the inputs sorted order .

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions

2019-08-26 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915562#comment-16915562
 ] 

Lai Zhou commented on CALCITE-3284:
---

yes, you're right.

I have pushed a new commit, which resolved this issue.

 

> Enumerable hash semijoin / antijoin support non-equi join conditions
> 
>
> Key: CALCITE-3284
> URL: https://issues.apache.org/jira/browse/CALCITE-3284
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Haisheng Yuan
>Priority: Major
>
> Calcite should be able to generate enumerable hash semijoin / antijoin with 
> non-equi join conditions, as long as there are equi-join condtions, so that 
> we can do hash look up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions

2019-08-26 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915531#comment-16915531
 ] 

Lai Zhou commented on CALCITE-3284:
---

[~rubenql],

Since Calcite can't support `semi join` or `anti join` keywords now, so how to 
construct a sql to test a semi join with nonEquiConditions?

And a anti join with nonEquiConditions?

 

> Enumerable hash semijoin / antijoin support non-equi join conditions
> 
>
> Key: CALCITE-3284
> URL: https://issues.apache.org/jira/browse/CALCITE-3284
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Haisheng Yuan
>Priority: Major
>
> Calcite should be able to generate enumerable hash semijoin / antijoin with 
> non-equi join conditions, as long as there are equi-join condtions, so that 
> we can do hash look up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Issue Comment Deleted] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions

2019-08-26 Thread Lai Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-3284:
--
Comment: was deleted

(was: [~rubenql], Did Calcite already support semi join/anti join keywords?

I intend to query 
{code:java}
SELECT d.deptno, d.name FROM depts d semi join emps e on d.deptno=e.deptno and 
d.deptno >10{code}
but the `semi join` is not supported now.

So how to write a case for semi/anti join with nonEquiCondition?)

> Enumerable hash semijoin / antijoin support non-equi join conditions
> 
>
> Key: CALCITE-3284
> URL: https://issues.apache.org/jira/browse/CALCITE-3284
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Haisheng Yuan
>Priority: Major
>
> Calcite should be able to generate enumerable hash semijoin / antijoin with 
> non-equi join conditions, as long as there are equi-join condtions, so that 
> we can do hash look up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions

2019-08-26 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915526#comment-16915526
 ] 

Lai Zhou commented on CALCITE-3284:
---

[~rubenql], Did Calcite already support semi join/anti join keywords?

I intend to query 
{code:java}
SELECT d.deptno, d.name FROM depts d semi join emps e on d.deptno=e.deptno and 
d.deptno >10{code}
but the `semi join` is not supported now.

So how to write a case for semi/anti join with nonEquiCondition?

> Enumerable hash semijoin / antijoin support non-equi join conditions
> 
>
> Key: CALCITE-3284
> URL: https://issues.apache.org/jira/browse/CALCITE-3284
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Haisheng Yuan
>Priority: Major
>
> Calcite should be able to generate enumerable hash semijoin / antijoin with 
> non-equi join conditions, as long as there are equi-join condtions, so that 
> we can do hash look up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions

2019-08-23 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914082#comment-16914082
 ] 

Lai Zhou commented on CALCITE-3284:
---

I'll resolve this issue in this PR.

[https://github.com/apache/calcite/pull/1156]

 

 

> Enumerable hash semijoin / antijoin support non-equi join conditions
> 
>
> Key: CALCITE-3284
> URL: https://issues.apache.org/jira/browse/CALCITE-3284
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Haisheng Yuan
>Priority: Major
>
> Calcite should be able to generate enumerable hash semijoin / antijoin with 
> non-equi join conditions, as long as there are equi-join condtions, so that 
> we can do hash look up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-08-22 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913882#comment-16913882
 ] 

Lai Zhou commented on CALCITE-2973:
---

I noticed this performance issue before. I'll try to find a better way to 
handle it.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-08-21 Thread Lai Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912868#comment-16912868
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql], [~hyuan] ,[~julianhyde], [~danny0405], the pr is ready ,would 
someone help to review it ?

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852674#comment-16852674
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql],[~michaelmior], now the patch is good enough to be merged.

I adopt my initial solution to support the non inner join with mixed  
conditions(equi conditions and non-equi conditions):

introducing an EnumerablePredicativeHashJoin(before I call it 
EnumerableThetaHashJoin) .

The EnumerablePredicativeHashJoin and EnumerableHashJoin share the same hash 
join algorithm, but EnumerablePredicativeHashJoin extends Join rather than 
EquiJoin.

I believe this solution will do  no harm to current rules, but in the long 
term, we'd better change the EnumerableHashJoin to extend Join.

[~hyuan] created an issue to work on this, see 
https://issues.apache.org/jira/browse/CALCITE-3089.

So, I think we can resolved this issue first.

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition

2019-05-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852603#comment-16852603
 ] 

Lai Zhou commented on CALCITE-2992:
---

May be we can make the CAST translation to include this conversion logic , 
making the solution more generic.

But there will be a lot of work to make the validator to work out for implicit 
casting, I'll spend some time to involve related issues.

> Enhance implicit conversions when generating hash join keys for an 
> equiCondition
> 
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-3071) Cache the whole sql plan to reduce the latency and improve the performance

2019-05-15 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-3071:
--
Description: 
In real business, when sql queries become complex,  the overhead of sql plan 
will increase quickly , and many of the sql queries are duplicates.

We already have some caching  issue about improving the performance, such as 
the issue

https://issues.apache.org/jira/browse/CALCITE-2703,

which reduce code generation and class loading overhead when executing queries 
in the EnumerableConvention, but I think it's not enough.

I propose to cache the whole sql plan to reduce the latency ,for the same sql

, ignoring the cost optimizing based on statistics here, we can cache the 
generated code for it.

I use the FrameworkConfig API to execute sql queries, in this way I can easily 
do this job .

but it's not easy to make a whole sql execution plan(that says code-gen) cache 
in the sql processing flow based on JDBC Connection, because there're many 
intermediate state in this processing flow.

 

Let's discuss this feature and the probable solutions.

  was:
In real business, when sql queries become complex,  the overhead of sql plan 
will increase quickly , and many of the sql queries are duplicates.

We already do something on cacheing to improve the performance, such as the 
issue

https://issues.apache.org/jira/browse/CALCITE-2703,

which reduce code generation and class loading overhead when executing queries 
in the EnumerableConvention, but I think it's not enough.

I propose to cache the whole sql plan to reduce the latency ,for the same sql

, ignoring the cost optimizing based on statistics here, we can cache the 
generated code for it.

I use the FrameworkConfig API to execute sql queries, in this way I can easily 
do this job .

but it's not easy to make a whole sql execution plan(that says code-gen) cache 
in the sql processing flow based on JDBC Connection, because there're many 
intermediate state in this processing flow.

 

Let's discuss this feature and the probable solutions.


> Cache the whole sql plan to reduce the latency and improve the performance
> --
>
> Key: CALCITE-3071
> URL: https://issues.apache.org/jira/browse/CALCITE-3071
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> In real business, when sql queries become complex,  the overhead of sql plan 
> will increase quickly , and many of the sql queries are duplicates.
> We already have some caching  issue about improving the performance, such as 
> the issue
> https://issues.apache.org/jira/browse/CALCITE-2703,
> which reduce code generation and class loading overhead when executing 
> queries in the EnumerableConvention, but I think it's not enough.
> I propose to cache the whole sql plan to reduce the latency ,for the same sql
> , ignoring the cost optimizing based on statistics here, we can cache the 
> generated code for it.
> I use the FrameworkConfig API to execute sql queries, in this way I can 
> easily do this job .
> but it's not easy to make a whole sql execution plan(that says code-gen) 
> cache in the sql processing flow based on JDBC Connection, because there're 
> many intermediate state in this processing flow.
>  
> Let's discuss this feature and the probable solutions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-3071) Cache the whole sql plan to reduce the latency and improve the performance

2019-05-15 Thread Lai Zhou (JIRA)
Lai Zhou created CALCITE-3071:
-

 Summary: Cache the whole sql plan to reduce the latency and 
improve the performance
 Key: CALCITE-3071
 URL: https://issues.apache.org/jira/browse/CALCITE-3071
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.19.0
Reporter: Lai Zhou


In real business, when sql queries become complex,  the overhead of sql plan 
will increase quickly , and many of the sql queries are duplicates.

We already do something on cacheing to improve the performance, such as the 
issue

https://issues.apache.org/jira/browse/CALCITE-2703,

which reduce code generation and class loading overhead when executing queries 
in the EnumerableConvention, but I think it's not enough.

I propose to cache the whole sql plan to reduce the latency ,for the same sql

, ignoring the cost optimizing based on statistics here, we can cache the 
generated code for it.

I use the FrameworkConfig API to execute sql queries, in this way I can easily 
do this job .

but it's not easy to make a whole sql execution plan(that says code-gen) cache 
in the sql processing flow based on JDBC Connection, because there're many 
intermediate state in this processing flow.

 

Let's discuss this feature and the probable solutions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-3069) Make the JDBC Connection more extensible like the FrameworkConfig API

2019-05-15 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-3069:
--
Description: 
More and more users are interested in building custom sql engines on top of 
Calcite.

But for different sql engines, there're differences on sql parsing, expression 
conversions, implicit type casting ...even the physical implementations for 
logical plan. 

I think the FrameworkConfig API now provided a better way than JDBC Connection 
to custom these things.Are there any plans  in the roadmap to enhance the JDBC 
Connection config , like FrameworkConfig API , to improve Calcite's 
extensibility ?

Otherwise, implementing the whole physical plan like the default 
Enumerable-implementation will be boring , that also require a lot of work. May 
be we can do something to make the physical and execution plan(that says 
code-gen ) more customizable.

Are there any thoughts on this issue?

 

  was:
More and more users are interested in building custom sql engines on top of 
Calcite.

But for different sql engines, there're differences on sql parsing, expression 
conversions, implicit type casting ...even the physical implementations for 
logical plan. 

I think the FrameworkConfig API now provided a better way than JDBC Connection 
to custom these things.Are there any plans  in the roadmap to enhance the JDBC 
Connection config , like FrameworkConfig API , to improve Calcite's 
extensibility ?

Otherwise, implementing the whole physical plan like the default 
Enumerable-implementation will be boring , that also require a lot of work. May 
be we can do something to make the physical and execution plan(that says 
code-gen ) more customizable.

Are there any thoughts on this issue?

 
 * customizable

 * customizable

 

 


> Make the JDBC Connection more extensible like the FrameworkConfig API
> -
>
> Key: CALCITE-3069
> URL: https://issues.apache.org/jira/browse/CALCITE-3069
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> More and more users are interested in building custom sql engines on top of 
> Calcite.
> But for different sql engines, there're differences on sql parsing, 
> expression conversions, implicit type casting ...even the physical 
> implementations for logical plan. 
> I think the FrameworkConfig API now provided a better way than JDBC 
> Connection to custom these things.Are there any plans  in the roadmap to 
> enhance the JDBC Connection config , like FrameworkConfig API , to improve 
> Calcite's extensibility ?
> Otherwise, implementing the whole physical plan like the default 
> Enumerable-implementation will be boring , that also require a lot of work. 
> May be we can do something to make the physical and execution plan(that says 
> code-gen ) more customizable.
> Are there any thoughts on this issue?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-3069) Make the JDBC Connection more extensible like the FrameworkConfig API

2019-05-15 Thread Lai Zhou (JIRA)
Lai Zhou created CALCITE-3069:
-

 Summary: Make the JDBC Connection more extensible like the 
FrameworkConfig API
 Key: CALCITE-3069
 URL: https://issues.apache.org/jira/browse/CALCITE-3069
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.19.0
Reporter: Lai Zhou


More and more users are interested in building custom sql engines on top of 
Calcite.

But for different sql engines, there're differences on sql parsing, expression 
conversions, implicit type casting ...even the physical implementations for 
logical plan. 

I think the FrameworkConfig API now provided a better way than JDBC Connection 
to custom these things.Are there any plans  in the roadmap to enhance the JDBC 
Connection config , like FrameworkConfig API , to improve Calcite's 
extensibility ?

Otherwise, implementing the whole physical plan like the default 
Enumerable-implementation will be boring , that also require a lot of work. May 
be we can do something to make the physical and execution plan(that says 
code-gen ) more customizable.

Are there any thoughts on this issue?

 
 * customizable

 * customizable

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-15 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840251#comment-16840251
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql] , good analysis. I tested this solution, but there're still some 
failed tests, the report:
{code:java}
[ERROR] Tests run: 5018, Failures: 47, Errors: 7, Skipped: 115  
[ERROR] Errors: [ERROR] LatticeSuggesterTest.testEmpDept:76 » IndexOutOfBounds 
index (8) must be less ... [ERROR] 
LatticeSuggesterTest.testExpressionInAggregate:272 » IndexOutOfBounds index 
(3... [ERROR] LatticeSuggesterTest.testFoodMartAll:389->checkFoodMartAll:301 » 
IndexOutOfBounds [ERROR] 
LatticeSuggesterTest.testFoodMartAllEvolve:393->checkFoodMartAll:301 » 
IndexOutOfBounds [ERROR] LatticeSuggesterTest.testFoodmart:153 » 
IndexOutOfBounds index (17) must be le... [ERROR] 
LatticeSuggesterTest.testSharedSnowflake:264 » IndexOutOfBounds index (31) 
mus... [ERROR] 
MaterializationTest.testJoinMaterialization9:1825->checkMaterialize:202->checkMaterialize:210
 » SQL
{code}
Check the LatticeSuggesterTest.testSharedSnowflake , I found the 
!join.analyzeCondition().isEqui(),
did harm to this query. 

If I keep the line as 

 
{code:java}
!(join instanceof EquiJoin)
{code}
Almost All the reported failed tests will be success, except the 
MaterializationTest.testJoinMaterialization9. You can change this line to find 
more details.I think this modification is not safe.

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-15 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840190#comment-16840190
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql],thanks , I understand it.

When creating a SemiJoin from a EnumerableJoin, the remainCondition is 
missed.Now it backs to my previous question:

Should we define the EnumerableJoin as an EquiJoin or  a pure Join?, if it's an 
EquiJoin, the condition just contains the equi part.

If we change  the EnumerableJoin to a pure join, it will cause some other 
problems , such as that, the FilterJoinRule can't work.

My initial solution is to introduce a  EnumerableThetaHashJoin to handle the 
non-inner join that contains a remainCondition.

This EnumerableThetaHashJoin is more like a EnumerableThetaJoin, which is a 
Join rather than an EquiJoin,

And EnumerableThetaHashJoin and Enumerable(Hash)Join can share the same hash 
join algorithm .

I think this solution is more clear and will do no harm to current rules.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-15 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965
 ] 

Lai Zhou edited comment on CALCITE-2973 at 5/15/19 7:26 AM:


[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

But some of tests are failed, may introduce a bug after removing the filter. 
I'll check the problems .

[~zabetak],[~rubenql]

I found a problem about inner join, please checkout the current codebase, see  
the test case:

JdbcAdapterTest.testScalarSubQuery,

for the non-equi join sql
{code:java}
CalciteAssert.model(JdbcTest.SCOTT_MODEL) .query("SELECT COUNT(ename) AS cEname 
FROM \"SCOTT\".\"EMP\" " + "WHERE DEPTNO > (SELECT deptno FROM 
\"SCOTT\".\"DEPT\" " + "WHERE dname = 'ACCOUNTING')") .enable(CalciteAssert.DB 
== CalciteAssert.DatabaseInstance.HSQLDB) .returns("CENAME=11\n");
{code}
Before, the generated plan was that:

 
{code:java}
EnumerableAggregate(group=[{}], CENAME=[COUNT($1)]) 
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[>($t2, $t0)], proj#0..2=[{exprs}], 
$condition=[$t3]) EnumerableJoin(condition=[true], joinType=[inner], 
remainCondition=[>($2, $0)]) EnumerableAggregate(group=[{}], 
agg#0=[SINGLE_VALUE($0)]) JdbcToEnumerableConverter JdbcFilter(condition=[=($1, 
'ACCOUNTING')]) JdbcTableScan(table=[[SCOTT, DEPT]]) JdbcToEnumerableConverter 
JdbcProject(ENAME=[$1], DEPTNO=[$7]) JdbcTableScan(table=[[SCOTT, EMP]])
{code}
 

After replacing the filter by a remainCondition, the planner find a semi-join 
based plan as the best plan ,but which was a bad plan.

 
{code:java}
EnumerableAggregate(group=[{}], CENAME=[COUNT($0)]) 
EnumerableSemiJoin(condition=[true], joinType=[inner]) 
JdbcToEnumerableConverter JdbcProject(ENAME=[$1], DEPTNO=[$7]) 
JdbcTableScan(table=[[SCOTT, EMP]]) JdbcToEnumerableConverter 
JdbcFilter(condition=[=($1, 'ACCOUNTING')]) JdbcTableScan(table=[[SCOTT, DEPT]])
{code}
The condition of this logical SemiJoin was true .

Could you help me to identify this problem ?

 


was (Author: hhlai1990):
[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

But some of tests are failed, may introduce a bug after dropping the filter. 
I'll check the problems .

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965
 ] 

Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:35 AM:


[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

But some of tests are failed, may introduce a bug after dropping the filter. 
I'll check the problems .


was (Author: hhlai1990):
[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965
 ] 

Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:13 AM:


[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.


was (Author: hhlai1990):
[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join and a filter , the Enumerable(Hash)Join can handle it.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join and a filter , the Enumerable(Hash)Join can handle it.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-13 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838611#comment-16838611
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql], I agree with you. It's a good idea to using approach1 to  handle 
the inner join case.

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter

2019-05-13 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838403#comment-16838403
 ] 

Lai Zhou commented on CALCITE-2173:
---

[~masayuki038], I'll take some time to review your code, I'm only just 
beginning to get familiar with Arrow. (y)

 

> Sample implementation of ArrowAdapter
> -
>
> Key: CALCITE-2173
> URL: https://issues.apache.org/jira/browse/CALCITE-2173
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Masayuki Takahashi
>Priority: Minor
>
> I try to implement Apache Arrow adaper.
> [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow]
> Issues:
> * Add ArrowJoin, ArrowUnion, etc..
> * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. 
> So I have added 'of' method on PhysType to create PhysTypeImpl instance since 
> it can't access from arrow package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-12 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838246#comment-16838246
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~zabetak], the query as you said,
{code:java}
SELECT e.name FROM emp e INNER JOIN department d ON e.address.zipcode = 
d.zipcode{code}
I add a test for it, and I found the RexFieldAccess `e.address.zipcode` would 
be converted to a new RexInputRef , that was made by JoinPushExpressionsRule,

see 
[https://github.com/apache/calcite/blob/6afa38bae794462e6e250237a1b60cc4220b2885/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L3290].

Please see the latest commit, there's a test named 
`leftOuterJoinWithPredicateContainsRexFieldAccess` in EnumerableJoinTest.

I admit the rule based approach you proposed is also good for this issue. But I 
still think it's a little complicated, and it seems to increase the overhead of 
computation if we introduce a new projection.

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836862#comment-16836862
 ] 

Lai Zhou commented on CALCITE-2173:
---

[~masayuki038], I know what you mean. This Arrow adapter is just an adapter to 
connect the datasource in arrow format.

as you said, executing the query in parallel is a good way to improve 
performance. 

Another way is to organize the in-memory data in columnar format, enable 
vectorized expression execution.

Both of above two way need to rewrite the implementations of operators.

> Sample implementation of ArrowAdapter
> -
>
> Key: CALCITE-2173
> URL: https://issues.apache.org/jira/browse/CALCITE-2173
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Masayuki Takahashi
>Priority: Minor
>
> I try to implement Apache Arrow adaper.
> [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow]
> Issues:
> * Add ArrowJoin, ArrowUnion, etc..
> * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. 
> So I have added 'of' method on PhysType to create PhysTypeImpl instance since 
> it can't access from arrow package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836434#comment-16836434
 ] 

Lai Zhou commented on CALCITE-2173:
---

[~masayuki038] ,are you still work on this?

What's your recent plan? I'm glad to take some time on this issue with you, 

If we have the arrow adapter, we can support vectorized udf execution ,I'd like 
to see the performance improvement.

 

> Sample implementation of ArrowAdapter
> -
>
> Key: CALCITE-2173
> URL: https://issues.apache.org/jira/browse/CALCITE-2173
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Masayuki Takahashi
>Priority: Minor
>
> I try to implement Apache Arrow adaper.
> [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow]
> Issues:
> * Add ArrowJoin, ArrowUnion, etc..
> * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. 
> So I have added 'of' method on PhysType to create PhysTypeImpl instance since 
> it can't access from arrow package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836422#comment-16836422
 ] 

Lai Zhou commented on CALCITE-2040:
---

[~masayuki038]

Great job. I add a relation link to 
https://issues.apache.org/jira/browse/CALCITE-2173 .
I think it's more than an `adapter`  of Calcite, it may be a new physical 
implementation that like the default Enumerable implementation.
 

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162
 ] 

Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:40 AM:
---

I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I found someone did part of this on github.

See 
[https://github.com/masayuki038/calcite-arrow-sample/blob/master/src/main/scala/net/wrap_trap/calcite_arrow_sample/ArrowTranslatableTable.scala]

It may be a good start.

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.


was (Author: hhlai1990):
I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162
 ] 

Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:35 AM:
---

I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.


was (Author: hhlai1990):
I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I will have a try to make Arrow as a 
calling convention.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162
 ] 

Lai Zhou commented on CALCITE-2040:
---

I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I will have a try to make Arrow as a 
calling convention.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-06 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649
 ] 

Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:49 AM:
---

[~zabetak],I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions:

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

-the sql queries are deterministic,without dynamic parameters, so the whole sql 
plan cache will be helpful(we can also use placeholders in the execution plan 
to cache the dynamic query  ).-

 

 

 

 

 

 

 

 


was (Author: hhlai1990):
[~zabetak],I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the 

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-06 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649
 ] 

Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:36 AM:
---

[~zabetak],I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions:

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

the sql queries are deterministic,without dynamic parameters, so the whole sql 
plan cache will be helpful(we can also use placeholders in the execution plan 
to cache the dynamic query  ).

 

 

 

 

 

 

 

 


was (Author: hhlai1990):
[~zabetak],I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor 

[jira] [Commented] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-06 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649
 ] 

Lai Zhou commented on CALCITE-2741:
---

[~zabetak],I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions:

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

the sql queries are deterministic, so the whole sql plan cache will be helpful;

 

 

 

 

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valuable when someone want to migrate his hive etl jobs to 
> real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-05-04 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2741:
--
Affects Version/s: 1.19.0
  Description: 
I write a hive adapter for calcite to support Hive sql ,includes UDF、UDAF、UDTF 
and some of SqlSpecialOperator.

How do you think of supporting a direct implemention of hive sql like this?

I think it will be valuable when someone want to migrate his hive etl jobs to 
real-time scene.

  was:
[~julianhyde],

I write a hive adapter for calcite to support Hive sql ,includes UDF、UDAF、UDTF 
and some of SqlSpecialOperator.

How do you think of supporting a direct implemention of hive sql like this?

I think it will be valueable when someone want to migrate his hive etl jobs to 
real-time scene.


> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valuable when someone want to migrate his hive etl jobs to 
> real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001
 ] 

Lai Zhou edited comment on CALCITE-2282 at 5/1/19 2:01 AM:
---

[~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with 
same NAME, KIND to your own table, then validator will use it to replace the 
original one"

It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I 
forgot my new solutions for DIVIDE in last comment.

Here is my code:

 
{code:java}
newOp = new SqlBinaryOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
newOp = new SqlPrefixOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
register(newOp);

{code}
If we  put an operator with same NAME, KIND,SqlSyntax to replace the original 
one, we'd better keep the same class `SqlBinaryOperator` or 
`SqlPrefixOperator`. So I introduced a new constructor for them to construct 
the Operator.

as [~julianhyde] said, "Another technique could be a visitor that walks over 
expressions and replaces Calcite's DIVIDE with Hive's DIVIDE."  It also works .

Thanks.

 

 


was (Author: hhlai1990):
[~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with 
same NAME, KIND to your own table, then validator will use it to replace the 
original one"

It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I 
forgot my new solutions for DIVIDE in last comment.

Here is my code:

 
{code:java}
newOp = new SqlBinaryOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
newOp = new SqlPrefixOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
register(newOp);

{code}
If we  put an operator with same NAME, KIND to replace the original one, we'd 
better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I 
introduced a new constructor

for them to construct the Operator.

as [~julianhyde] said, "Another technique could be a visitor that walks over 
expressions and replaces Calcite's DIVIDE with Hive's DIVIDE."  It also works .

Thanks.

 

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061
 ] 

Lai Zhou edited comment on CALCITE-2741 at 4/30/19 9:40 AM:


hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that supports hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 


was (Author: hhlai1990):
hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Priority: Minor
>
> [~julianhyde],
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061
 ] 

Lai Zhou commented on CALCITE-2741:
---

hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase. 

[[https://github.com/51nb/marble]|[https://github.com/51nb/marble]].

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Priority: Minor
>
> [~julianhyde],
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061
 ] 

Lai Zhou edited comment on CALCITE-2741 at 4/30/19 8:30 AM:


hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase: [https://github.com/51nb/marble]

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 


was (Author: hhlai1990):
hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang]

I create a new adapter of Calcite that support hive sql queries on dataset. 

Since the extensions is made base on Calcite 1.18.0, I pushed the project to a 
new codebase. 

[[https://github.com/51nb/marble]|[https://github.com/51nb/marble]].

And I proposed a Table API to make it easy to execute a sql query.

We use it in our company's core financial business to unify the way to compute 
lots of model variables .

This project shows how we extend Calcite core to support hive sql queries, it 
may be helpful to people 

who wants to build a customized sql engine on top of Calcite.

 

 

 

 

> Add operator table with Hive-specific built-in functions
> 
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Lai Zhou
>Priority: Minor
>
> [~julianhyde],
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/30/19 6:52 AM:


[~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with 
same NAME, KIND to your own table, then validator will use it to replace the 
original one"

It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I 
forgot my new solutions for DIVIDE in last comment.

Here is my code:

 
{code:java}
newOp = new SqlBinaryOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
newOp = new SqlPrefixOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
register(newOp);

{code}
If we  put an operator with same NAME, KIND to replace the original one, we'd 
better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I 
introduced a new constructor

for them to construct the Operator.

as [~julianhyde] said, "Another technique could be a visitor that walks over 
expressions and replaces Calcite's DIVIDE with Hive's DIVIDE."  It also works .

Thanks.

 

 


was (Author: hhlai1990):
[~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with 
same NAME, KIND to your own table, then validator will use it to replace the 
original one"

It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I 
forgot my new solutions for DIVIDE in last comment.

Here is my code:

 
{code:java}
newOp = new SqlBinaryOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
newOp = new SqlPrefixOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
register(newOp);

{code}
If we  put an operator with same NAME, KIND to replace the original one, we'd 
better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I 
introduced a new constructor

for them to construct the Operator.

as [~julianhyde] said, "Another technique could be a visitor that walks over 
expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." can also work .

Thanks.

 

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-30 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001
 ] 

Lai Zhou commented on CALCITE-2282:
---

[~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with 
same NAME, KIND to your own table, then validator will use it to replace the 
original one"

It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I 
forgot my new solutions for DIVIDE in last comment.

Here is my code:

 
{code:java}
newOp = new SqlBinaryOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
newOp = new SqlPrefixOperator(upName,
operatorInSqlStdOperatorTable.getKind(),
operatorInSqlStdOperatorTable.getLeftPrec(),
operatorInSqlStdOperatorTable.getRightPrec(),
HiveSqlUDFReturnTypeInference.INSTANCE, null,
HiveSqlFunction.ArgChecker.INSTANCE);
register(newOp);

{code}
If we  put an operator with same NAME, KIND to replace the original one, we'd 
better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I 
introduced a new constructor

for them to construct the Operator.

as [~julianhyde] said, "Another technique could be a visitor that walks over 
expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." can also work .

Thanks.

 

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:30 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a SqlConformance.

Consider the simplifyCast of RexSimplify, in Hive Sql 

 
{code:java}
select cast('' as Decimal)
{code}
will return null, but in Calcite it will throw exception.  I want to use a 
SqlConformance to customize the generated expression when simplifing Cast.

 

Since the 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a

SqlConformance.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:31 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a SqlConformance.

Consider the simplifyCast of RexSimplify, in Hive Sql 
{code:java}
select cast('' as decimal)
{code}
it will return null, but in Calcite it will throw exception.  I want to use a 
SqlConformance to customize the generated expression when simplifing Cast.

Since the 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a SqlConformance.

Consider the simplifyCast of RexSimplify, in Hive Sql 
{code:java}
select cast('' as Decimal)
{code}
it will return null, but in Calcite it will throw exception.  I want to use a 
SqlConformance to customize the generated expression when simplifing Cast.

 

Since the 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:30 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a SqlConformance.

Consider the simplifyCast of RexSimplify, in Hive Sql 
{code:java}
select cast('' as Decimal)
{code}
it will return null, but in Calcite it will throw exception.  I want to use a 
SqlConformance to customize the generated expression when simplifing Cast.

 

Since the 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a SqlConformance.

Consider the simplifyCast of RexSimplify, in Hive Sql 

 
{code:java}
select cast('' as Decimal)
{code}
will return null, but in Calcite it will throw exception.  I want to use a 
SqlConformance to customize the generated expression when simplifing Cast.

 

Since the 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:21 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.But now the caller  RexSimplify 
didn't have a

SqlConformance.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution, we can ignore it here .

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:18 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the current 
SqlConformance

to RexExecutorImpl when reducing expressions.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when reducing expressions.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:01 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when reducing expressions.

The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .So,may I create a PR to solve the 
RexExecutorImpl's problem first?

 

 

 

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:00 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .So,may I create a PR to solve the 
RexExecutorImpl's problem first?

 

 

 

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source

[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:00 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source

[~zabetak],[~julianhyde]

Is there already an example query that joins different data source using 
different sql dialects meanwhile ? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

? Do we really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source ? Do we 
really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou commented on CALCITE-3014:
---

[~zabetak],[~julianhyde]

Is there already an example query that joins different data source ? Do we 
really need this feature?

If Just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-28 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840
 ] 

Lai Zhou edited comment on CALCITE-3014 at 4/28/19 6:42 AM:


[~zabetak],[~julianhyde]

Is there already an example query that joins different data source ? Do we 
really need this feature?

If just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 


was (Author: hhlai1990):
[~zabetak],[~julianhyde]

Is there already an example query that joins different data source ? Do we 
really need this feature?

If Just consider the problem of RexExecutorImpl here, we need pass the 
RexToLixTranslator to RexExecutorImpl when 

reducing expressions. The 
[AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
 did have a `TODO` solution .

So,may I create a PR to solve the RexExecutorImpl's problem first?

 

 

> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-25 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-3014:
--
Description: 
I found SqlConformanceEnum is hard coded in a few places.

[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]

[https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]

I think it's not easy to fix them in a generic way.  To support different 

SQL compatibility modes well, many place of current codebase is possible to be 
modified.

It will `drill a hole` to pass the SqlConformance config in the whole process 
of  one sql query.

May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
frequently.

 

 

  was:
[~julianhyde]  I found SqlConformanceEnum is hard coded in a few places.

[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]

[https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]

I think it's not easy to fix them in a generic way.  To support different 

SQL compatibility modes well, many place of current codebase is possible to be 
modified.

It will `drill a hole` to pass the SqlConformance config in the whole process 
of  one sql query.

May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
frequently.

 

 


> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-25 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825923#comment-16825923
 ] 

Lai Zhou commented on CALCITE-2282:
---

[~zhztheplayer], It likes what I commented before,
{code:java}
/**since SqlOperator is identified by name and kind ,
see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, 
we can override implementors of operators that declared in * 
SqlStdOperatorTable *
 */ 
{code}
{code:java}
SqlOperator newOp = new HiveSqlFunction(functionInStd.getNameAsId(), 
functionInStd.getKind(), HiveSqlUDFReturnTypeInference.INSTANCE, 
functionInStd.getFunctionType()); 
register(newOp);
{code}
But DIVIDE can't be  replaced  correctly by this way.  You will find that the 
static built-in functions would not be  looked up from customized 
OperatorTable. 

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-24 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:35 AM:


[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real business.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to custom  static built-in 
operators  without changing the parser for this use case.

 


was (Author: hhlai1990):
[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real business.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to use custom operator table 
without changing the parser for this use case.

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-24 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:35 AM:


[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real business.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to custom  static built-in 
operators  without changing the parser for this use case.

 


was (Author: hhlai1990):
[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real business.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to custom  static built-in 
operators  without changing the parser for this use case.

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-24 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:33 AM:


[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real business.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to use custom operator table 
without changing the parser for this use case.

 


was (Author: hhlai1990):
[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real buisiness.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to use custom operator table 
without changing the parser for this use case.

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-24 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653
 ] 

Lai Zhou commented on CALCITE-2282:
---

[~danny0405], it's the way how I override the built-in operators:

In Parser.jj
{code:java}
|  { op = SqlStdOperatorTable.LIKE; } 
|   { op = SqlStdOperatorTable.SIMILAR_TO; }
|  { op = HiveSqlOperatorTable.RLIKE; }
|  { op = HiveSqlOperatorTable.REGEXP; } )
{code}
{code:java}
|  { return SqlStdOperatorTable.PLUS; } 
|  { return SqlStdOperatorTable.MINUS; } 
|  { return HiveSqlOperatorTable.MULTIPLY; } 
|  { return HiveSqlOperatorTable.DIVIDE; }{code}
The default DIVIDE operator in SqlStdOperatorTable is not ok for real buisiness.

Consider the follow sql:

select 2/5 ,  the result is 0. But we expect 0.4.

[~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the 
Parser.jj.

I didn't find the way as [~zhztheplayer] said to to use custom operator table 
without changing the parser for this use case.

 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-22 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823659#comment-16823659
 ] 

Lai Zhou commented on CALCITE-2973:
---

I modified EnumerableJoin to be able to deal with non-equi join that has equi 
conditions.

I didn't rename the EnumerableJoin this time, we can rename it to 
`EnumerableHashJoin` in next patch later.

Now EnumerableDefaults's method `join_`  implemented the hash join  algorithm 
for a join , no matter it has a non-equi condition or not.

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-22 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/23/19 2:16 AM:


[~zabetak],[~hyuan], should we keep EnumerableJoin  as an `EquiJoin`or change 
it to extend `Join`?

I have a try to change it to extend `Join`, but the FilterJoinRule can't work. 
It can't push down the remain condition into the inputs of the join correctly.

see 
[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165]

If we keep EnumerableJoin  as an `EquiJoin`, we need to introduce a field for 
EnumerableJoin to reference the join condition, as we need to extract remain 
part condition of it.So what's the better way?

 


was (Author: hhlai1990):
[~zabetak],[~hyuan], should we keep EnumerableJoin  as an `EquiJoin`or change 
it to extend `Join`?

I have a try to change it to extend `Join`, but the FilterJoinRule can't work. 
It can't push down the remain condition into a filter after an inner join 
correctly.

see 
[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165]

If we keep EnumerableJoin  as an `EquiJoin`, we need to introduce a field for 
EnumerableJoin to reference the join condition, as we need to extract remain 
part condition of it.So what's the better way?

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-22 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/22/19 2:18 PM:


[~zabetak],[~hyuan], should we keep EnumerableJoin  as an `EquiJoin`or change 
it to extend `Join`?

I have a try to change it to extend `Join`, but the FilterJoinRule can't work. 
It can't push down the remain condition into a filter after an inner join 
correctly.

see 
[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165]

If we keep EnumerableJoin  as an `EquiJoin`, we need to introduce a field for 
EnumerableJoin to reference the join condition, as we need to extract remain 
part condition of it.So what's the better way?

 


was (Author: hhlai1990):
[~zabetak],[~hyuan], should we keep EnumerableJoin  as an `EquiJoin`or change 
it to extend `Join`?

I have a try to change it to extend `Join`, but the FilterJoinRule can't work. 
It can't push down the remain condition into a filter after an inner join 
correctly.

see 
[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165]

If we keep EnumerableJoin  as an `EquiJoin`, we need to introduce a field for 
EnumerableJoin to reference the join condition, because we need to extract 
remain part condition of it.So what's the better way?

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-22 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~zabetak],[~hyuan], should we keep EnumerableJoin  as an `EquiJoin`or change 
it to extend `Join`?

I have a try to change it to extend `Join`, but the FilterJoinRule can't work. 
It can't push down the remain condition into a filter after an inner join 
correctly.

see 
[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165]

If we keep EnumerableJoin  as an `EquiJoin`, we need to introduce a field for 
EnumerableJoin to reference the join condition, because we need to extract 
remain part condition of it.So what's the better way?

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-22 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-3014:
--
Description: 
[~julianhyde]  I found SqlConformanceEnum is hard coded in a few places.

[https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]

[https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]

I think it's not easy to fix them in a generic way.  To support different 

SQL compatibility modes well, many place of current codebase is possible to be 
modified.

It will `drill a hole` to pass the SqlConformance config in the whole process 
of  one sql query.

May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
frequently.

 

 

  was:
[~julianhyde]  I found SqlConformanceEnum is hard coded in a few places.

[1|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]

[2|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]

I think it's not easy to fix them in a generic way.  To support different 

SQL compatibility modes well, many place of current codebase is possible to be 
modified.

It will `drill a hole` to pass the SqlConformance config in the whole process 
of  one sql query.

May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
frequently.

 

 


> SqlConformanceEnum is hard coded in a few places
> 
>
> Key: CALCITE-3014
> URL: https://issues.apache.org/jira/browse/CALCITE-3014
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>
> [~julianhyde]  I found SqlConformanceEnum is hard coded in a few places.
> [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]
> [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]
> I think it's not easy to fix them in a generic way.  To support different 
> SQL compatibility modes well, many place of current codebase is possible to 
> be modified.
> It will `drill a hole` to pass the SqlConformance config in the whole process 
> of  one sql query.
> May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
> frequently.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places

2019-04-22 Thread Lai Zhou (JIRA)
Lai Zhou created CALCITE-3014:
-

 Summary: SqlConformanceEnum is hard coded in a few places
 Key: CALCITE-3014
 URL: https://issues.apache.org/jira/browse/CALCITE-3014
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.19.0
Reporter: Lai Zhou


[~julianhyde]  I found SqlConformanceEnum is hard coded in a few places.

[1|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81]

[2|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226]

I think it's not easy to fix them in a generic way.  To support different 

SQL compatibility modes well, many place of current codebase is possible to be 
modified.

It will `drill a hole` to pass the SqlConformance config in the whole process 
of  one sql query.

May be we can put the SqlConformance config in ThreadLocal, avoiding pass it 
frequently.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:23 AM:
-

[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}
2.introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {
/**since SqlOperator is identified by name and kind ,see
 *  {@link SqlOperator#equals(Object)} and
 *  {@link SqlOperator#hashCode()},
 *  we can override implementors of operators that declared in
 *  SqlStdOperatorTable
 *  */
CallImplementor callImplementor;
if (sqlOperator.getName().equals("NOT RLIKE")) {
  callImplementor =
  RexImpTable.createImplementor(
  RexImpTable.NotImplementor.of(
  new HiveUDFImplementor()), NullPolicy.STRICT, false);
} else {
  callImplementor =
  RexImpTable.createImplementor(
  new HiveUDFImplementor(), NullPolicy.NONE, false);
}
map.put(sqlOperator, callImplementor);

  }

}
// directly override some implementors of SqlOperator that declared in
// SqlStdOperatorTable
map.put(SqlStdOperatorTable.ITEM,
new RexImpTable.ItemImplementor(true));
  });
{code}

  The way to achieve my goal might be quick and dirty . If Calcite can be more 
pluggable, it will be friendly to people who wants to  build a new sql engine 
on top of Calcite.


was (Author: hhlai1990):
[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}
2.introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else 

[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:22 AM:
-

[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}
2.introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {
/**since SqlOperator is identified by name and kind ,see
 *  {@link SqlOperator#equals(Object)} and
 *  {@link SqlOperator#hashCode()},
 *  we can override implementors of operators that declared in
 *  SqlStdOperatorTable
 *  */
CallImplementor callImplementor;
if (sqlOperator.getName().equals("NOT RLIKE")) {
  callImplementor =
  RexImpTable.createImplementor(
  RexImpTable.NotImplementor.of(
  new HiveUDFImplementor()), NullPolicy.STRICT, false);
} else {
  callImplementor =
  RexImpTable.createImplementor(
  new HiveUDFImplementor(), NullPolicy.NONE, false);
}
map.put(sqlOperator, callImplementor);

  }

}
// directly override some implementors of SqlOperator that declared in
// SqlStdOperatorTable
map.put(SqlStdOperatorTable.ITEM,
new RexImpTable.ItemImplementor(true));
  });
{code}

  The way to achieve my goal might be quick and dirty , if Calcite can be more 
pluggable, it will be friendly to people who wants to use calcite to make a new 
sql engine.


was (Author: hhlai1990):
[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}
2.introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {

[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939
 ] 

Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:19 AM:
-

[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}
2.introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {
/**since SqlOperator is identified by name and kind ,see
 *  {@link SqlOperator#equals(Object)} and
 *  {@link SqlOperator#hashCode()},
 *  we can override implementors of operators that declared in
 *  SqlStdOperatorTable
 *  */
CallImplementor callImplementor;
if (sqlOperator.getName().equals("NOT RLIKE")) {
  callImplementor =
  RexImpTable.createImplementor(
  RexImpTable.NotImplementor.of(
  new HiveUDFImplementor()), NullPolicy.STRICT, false);
} else {
  callImplementor =
  RexImpTable.createImplementor(
  new HiveUDFImplementor(), NullPolicy.NONE, false);
}
map.put(sqlOperator, callImplementor);

  }

}
// directly override some implementors of SqlOperator that declared in
// SqlStdOperatorTable
map.put(SqlStdOperatorTable.ITEM,
new RexImpTable.ItemImplementor(true));
  });
{code}

  The way to achieve my goal might be quick and dirty , but I think it will be 
friendly to people who wants to use calcite to make a new sql engine.


was (Author: hhlai1990):
[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}

 # introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {
/**since 

[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939
 ] 

Lai Zhou commented on CALCITE-2282:
---

[~julianhyde], I think we should let the validator to resolve operators ,  the 
parser only need to parse the sql . After the parser parsed the sql, we just 
have unresolved functions.

I'm working to bridge the hive functions to calcite these days.

Since implementing a whole function list for a DB is  an amount of work and 
boring,I just want to make the parser and the functions compatible with hive 
,so I hope to reuse the built-in functions of Calcite as possible.

I made some extensions to reach it:
 # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to 
lookupOperatorOverloads, so I can plugin some new operators or replace the 
built-in operators of Calcite. For example, I want to implement a Hive DIVIDE 
operator, then I redefined it:
{code:java}
/**
 * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by
 * {@link SqlStdOperatorTable#DIVIDE}
 */
public static final SqlBinaryOperator DIVIDE =
new SqlBinaryOperator(
"/",
SqlKind.DIVIDE,
60,
true,
HiveSqlUDFReturnTypeInference.INSTANCE,
null,
HiveSqlFunction.ArgChecker.INSTANCE);

{code}

 # introduce a post processor for RexImpTable, to define Implementors . Here is 
some part of  the code :
{code:java}
private void defineImplementors() {
  //define implementors for hive operator
  final List operatorList = getOperatorList();
  RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> {
for (SqlOperator sqlOperator : operatorList) {
  if (sqlOperator instanceof HiveSqlAggFunction) {
HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator;
aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction));
  } else {
/**since SqlOperator is identified by name and kind ,see
 *  {@link SqlOperator#equals(Object)} and
 *  {@link SqlOperator#hashCode()},
 *  we can override implementors of operators that declared in
 *  SqlStdOperatorTable
 *  */
CallImplementor callImplementor;
if (sqlOperator.getName().equals("NOT RLIKE")) {
  callImplementor =
  RexImpTable.createImplementor(
  RexImpTable.NotImplementor.of(
  new HiveUDFImplementor()), NullPolicy.STRICT, false);
} else {
  callImplementor =
  RexImpTable.createImplementor(
  new HiveUDFImplementor(), NullPolicy.NONE, false);
}
map.put(sqlOperator, callImplementor);

  }

}
// directly override some implementors of SqlOperator that declared in
// SqlStdOperatorTable
map.put(SqlStdOperatorTable.ITEM,
new RexImpTable.ItemImplementor(true));
  });
{code}

  The way to achieve my goal might be quick and dirty , but I think it will be 
friendly to people who wants to use calcite to make a new sql engine.

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820902#comment-16820902
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/18/19 9:33 AM:


[~zabetak], I can't find a good way to break a theta join into an equi-join + 
filter/projection , I think it will also make the rules hard to understand.

But I found another simple and clear way , please see the latest commit 
[https://github.com/apache/calcite/pull/1156/files]

We still keep the EquiJoin as a pure equil join without remain condition.

For a theta join, as Calcite defined in the EnumerableJoinRule,
{code:java}
!info.isEqui() && join.getJoinType() != JoinRelType.INNER{code}
 

if it has equi keys, we can use a hash-join or merge-join instead of 
nested-loop-join to improve the performance .

So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , 
I found there are some difference  between algorithms of pure hash join and 
hash join with remain condition :

When we implement a pure hash join , we just need to compare the hash join keys 
, but when we implement a hash join with remain condition, we need to compare 
some other columns to find the unmatched records.

So I introduced a new method named `thetaHashJoin` in EnumerableDefaults.

 

 

 


was (Author: hhlai1990):
[~zabetak], I can't find a good way to break a theta join into an equi-join + 
filter/projection , I think it will also make the rules hard to understand.

But I found another simple and clear way , please see the latest commit 
:[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]]

We still keep the EquiJoin as a pure equil join without remain condition.

For a theta join, as Calcite defined in the EnumerableJoinRule,
{code:java}
!info.isEqui() && join.getJoinType() != JoinRelType.INNER{code}
 

if it has equi keys, we can use a hash-join or merge-join instead of 
nested-loop-join to improve the performance .

So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , 
I found there are some difference  between algorithms of pure hash join and 
hash join with remain condition :

When we implement a pure hash join , we just need to compare the hash join keys 
, but when we implement a hash join with remain condition, we need to compare 
some other columns to find the unmatched records.

So I introduced a new method named `thetaHashJoin` in EnumerableDefaults.

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-18 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820902#comment-16820902
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~zabetak], I can't find a good way to break a theta join into an equi-join + 
filter/projection , I think it will also make the rules hard to understand.

But I found another simple and clear way , please see the latest commit 
:[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]]

We still keep the EquiJoin as a pure equil join without remain condition.

For a theta join, as Calcite defined in the EnumerableJoinRule,
{code:java}
!info.isEqui() && join.getJoinType() != JoinRelType.INNER{code}
 

if it has equi keys, we can use a hash-join or merge-join instead of 
nested-loop-join to improve the performance .

So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , 
I found there are some difference  between algorithms of pure hash join and 
hash join with remain condition :

When we implement a pure hash join , we just need to compare the hash join keys 
, but when we implement a hash join with remain condition, we need to compare 
some other columns to find the unmatched records.

So I introduced a new method named `thetaHashJoin` in EnumerableDefaults.

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-17 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:04 AM:


[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see

[EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  algorithm  to support join with predicate. 

see

[EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  algorithm  to support join with predicate. 

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-17 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:02 AM:


[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  algorithm  to support join with predicate. 

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-17 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:01 AM:


[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819701#comment-16819701
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~zabetak], great thanks to your suggestion. I'll  take it into account  and 
give you feedback later.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-1531) SqlValidatorException when boolean operators are used with NULL

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819698#comment-16819698
 ] 

Lai Zhou commented on CALCITE-1531:
---

[~julianhyde], could you reopen this issue , and change the summary to "Infer 
the type for  a naked NULL literal from it‘s context" ?

[~zabetak], should I create a new issue?

> SqlValidatorException when boolean operators are used with NULL
> ---
>
> Key: CALCITE-1531
> URL: https://issues.apache.org/jira/browse/CALCITE-1531
> Project: Calcite
>  Issue Type: Bug
>Reporter: Serhii Harnyk
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.11.0
>
>
> SqlValidatorException when we use boolean AND, OR operators with null.
> {noformat}
> 0: jdbc:calcite:localhost> SELECT (CASE WHEN true or null then 1 else 0 end) 
> from (VALUES(1));
> 2016-12-06 17:12:47,622 [main] ERROR - 
> org.apache.calcite.sql.validate.SqlValidatorException: Illegal use of 'NULL'
> 2016-12-06 17:12:47,623 [main] ERROR - 
> org.apache.calcite.runtime.CalciteContextException: From line 1, column 27 to 
> line 1, column 30: Illegal use of 'NULL'
> Error: Error while executing SQL "SELECT (CASE WHEN true or null then 1 else 
> 0 end) from (VALUES(1))": From line 1, column 27 to line 1, column 30: 
> Illegal use of 'NULL' (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818921#comment-16818921
 ] 

Lai Zhou commented on CALCITE-2282:
---

+1.[~julianhyde] , is there a good way to solve this ?

I use a customized parser to override the createCall method of 
SqlAbstractParserImpl to activate my OperatorTable.

I think it's a typical case that respond to the Calcite's  objective: One 
planner fits all.

 
{code:java}
public class ${parser.class} extends SqlAbstractParserImpl
{
private static final Logger LOGGER = CalciteTrace.getParserTracer();

// Can't use quoted literal because of a bug in how JavaCC translates
// backslash-backslash.
private static final char BACKSLASH = 0x5c;
private static final char DOUBLE_QUOTE = 0x22;
private static final String DQ = DOUBLE_QUOTE + "";
private static final String DQDQ = DQ + DQ;

private static Metadata metadata;

private Casing unquotedCasing;
private Casing quotedCasing;
private int identifierMaxLength;
private SqlConformance conformance;

/**
 * {@link SqlParserImplFactory} implementation for creating parser.
 */
public static final SqlParserImplFactory FACTORY = new 
SqlParserImplFactory() {
public SqlAbstractParserImpl getParser(Reader stream) {
return new ${parser.class}(stream);
}
};

  @Override
  public SqlCall createCall(
  SqlIdentifier funName,
  SqlParserPos pos,
  SqlFunctionCategory funcType,
  SqlLiteral functionQualifier,
  SqlNode[] operands) {
SqlOperator fun = null;

// First, try a half-hearted resolution as a builtin function.
// If we find one, use it; this will guarantee that we
// preserve the correct syntax (i.e. don't quote builtin function
/// name when regenerating SQL).
if (funName.isSimple()) {
  final List list = new ArrayList();
  HiveSqlOperatorTable.instance().lookupOperatorOverloads(funName, 
funcType, SqlSyntax.FUNCTION, list);
  if (list.size() > 0) {
fun = (SqlOperator) list.get(0);
  }
}

// Otherwise, just create a placeholder function.  Later, during
// validation, it will be resolved into a real function reference.
if (fun == null) {
  fun = new SqlUnresolvedFunction(funName, null, null, null, null,
  funcType);
}

return fun.createCall(functionQualifier, pos, operands);
}
{code}
 

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Sudheesh Katkam
>Priority: Major
> Attachments: CALCITE-2282.patch.txt
>
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2992:
--
Summary: Enhance implicit conversions when generating hash join keys for an 
equiCondition  (was: Make implicit conversions when generating hash join keys 
for an equiCondition)

> Enhance implicit conversions when generating hash join keys for an 
> equiCondition
> 
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:24 AM:


[~julianhyde], you're right. I add a test case:

[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 


was (Author: hhlai1990):
[~julianhyde], you're right. I add a test case:

[链接标题|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:24 AM:


[~julianhyde], you're right. I add a test case:

[链接标题|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 


was (Author: hhlai1990):
[~julianhyde], you're right. I add a test case:

[EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM:


[~julianhyde], you're right. I add a test case:

[EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 


was (Author: hhlai1990):
[~julianhyde], you're right. I add a test case:

[EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM:


[~julianhyde], you're right. I add a test case:

[[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 


was (Author: hhlai1990):
[~julianhyde], you're right. I add a test case:

[[EnumerableJoinTest.java#L83|https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM:


[~julianhyde], you're right. I add a test case:

[EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 


was (Author: hhlai1990):
[~julianhyde], you're right. I add a test case:

[[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
 generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
 If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-16 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765
 ] 

Lai Zhou commented on CALCITE-2992:
---

[~julianhyde], you're right. I add a test case:

[[EnumerableJoinTest.java#L83|https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]

The hashJoinKeysCompareIntAndLong() test method can pass through.

In my useCase, I customized the `EQUALS` operator with another 
SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive 
GenericUDFOPEqual. So the CAST translation is never taken.

But Calcite doesn't support implicit conversions well for types that belong to 
different type family now.

See the test method hashJoinKeysCompareIntAndString(), it would fail.

I made a new commit to roll-back the last commit , and do something to enhance 
the implicit conversions when
generating hash join keys. Considering follow case:

If leftKey type is String ,and rightKey  type is Int, we can convert the keys 
to Double.
If leftKey type is String ,and rightKey is Decimal, we can convert the keys to 
Decimal.

The implicit conversions in this patch for hash join keys would’t depend on the 
CAST translation , nor be in conflict with it . 

 

 

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-15 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2992:
--
Description: 
Considering follow sql join:

 
{code:java}
select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
{code}
as known in java :

 
{code:java}
Integer intValue = 2;
Long longValue = 2L;
new Object[]{intValue}.hashCode().equals
(
new Object[]{longValue}.hashCode()
)
= false;
{code}
We shoudn't use the orginal Object as a key in the HashMap,

I think it'd be better to convert hash join keys to string and compare string 
values.

 

  was:
Considering follow sql join:

 
{code:java}
select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
{code}
as known in java :

 
{code:java}
Integer intValue = 2;
Long longValue = 2L;
Objects.equals(intValue, longValue) = false;
{code}
We shoudn't use the orginal Object as a key in the HashMap,

I think it'd be better to convert hash join keys to string and compare string 
values.

 

Summary: Make implicit conversions when generating hash join keys for 
an equiCondition  (was: Enhance implicit conversions for different sql type 
family)

> Make implicit conversions when generating hash join keys for an equiCondition
> -
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> new Object[]{intValue}.hashCode().equals
> (
> new Object[]{longValue}.hashCode()
> )
> = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2992) Enhance implicit conversions for different sql type family

2019-04-15 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2992:
--
Summary: Enhance implicit conversions for different sql type family  (was: 
Make implicit conversions when generating hash join keys for an equiCondition)

> Enhance implicit conversions for different sql type family
> --
>
> Key: CALCITE-2992
> URL: https://issues.apache.org/jira/browse/CALCITE-2992
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Considering follow sql join:
>  
> {code:java}
> select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
> {code}
> as known in java :
>  
> {code:java}
> Integer intValue = 2;
> Long longValue = 2L;
> Objects.equals(intValue, longValue) = false;
> {code}
> We shoudn't use the orginal Object as a key in the HashMap,
> I think it'd be better to convert hash join keys to string and compare string 
> values.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition

2019-04-12 Thread Lai Zhou (JIRA)
Lai Zhou created CALCITE-2992:
-

 Summary: Make implicit conversions when generating hash join keys 
for an equiCondition
 Key: CALCITE-2992
 URL: https://issues.apache.org/jira/browse/CALCITE-2992
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.19.0
Reporter: Lai Zhou


Considering follow sql join:

 
{code:java}
select t1.*,t2.*  from t1 join t2 on t1.intValue=t2.longValue
{code}
as known in java :

 
{code:java}
Integer intValue = 2;
Long longValue = 2L;
Objects.equals(intValue, longValue) = false;
{code}
We shoudn't use the orginal Object as a key in the HashMap,

I think it'd be better to convert hash join keys to string and compare string 
values.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-11 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/12/19 4:04 AM:


[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak],[~hyuan]

I make a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-11 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~julianhyde],[~zabetak],[~hyuan]

I make a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-04-11 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2973:
--
Summary: Allow theta joins that have equi conditions to be executed using a 
hash join algorithm  (was: Allow theta joins that have equi conditions to be 
executed using a merge join or hash join algorithm)

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a merge join or hash join algorithm

2019-04-11 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/11/19 10:04 AM:
-

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|can't be planned|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  equi and non-equi  conditions meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  equi and non-equi  conditions meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins that have equi conditions to be executed using a merge join 
> or hash join algorithm
> 
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a merge join or hash join algorithm

2019-04-11 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/11/19 10:04 AM:
-

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|*can't be planned*|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  equi and non-equi  conditions meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|can't be planned|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  equi and non-equi  conditions meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins that have equi conditions to be executed using a merge join 
> or hash join algorithm
> 
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2973) Allow theta joins that has equi keys to be executed using a merge join or hash join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2973:
--
Summary: Allow theta joins that has equi keys to be executed using a merge 
join or hash join algorithm  (was: Allow theta joins to be executed using a 
merge join algorithm)

> Allow theta joins that has equi keys to be executed using a merge join or 
> hash join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:51 AM:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:51 AM:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  equi and non-equi  conditions meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators 
after re-desgined:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
 EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:46 AM:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition*|EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
 (changed)+
EnumerableFilter
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
 (changed)
 or
 EnumerableHashJoin
 (new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition* **  **  |EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
(changed)
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
(changed)
 or
 EnumerableHashJoin
(new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:45 AM:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition** ** * |EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
EnumerableFilter
 or
 EnumerableMergeJoin(changed)
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin(changed)
 or
 EnumerableHashJoin(new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner||
|*only equi condition*|EnumerableJoin|EnumerableJoin|
|*only*  *non-equi  condition*** ** |EnumerableJoin|EnumerableThetaJoin|
|*mixed equi and non-equi  condition*|EnumerableJoin+EnumerableFilter
or
EnumerableMergeJoin(changed)
 
|EnumerableThetaJoin
or
 EnumerableMergeJoin
or
EnumerableHashJoin|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:45 AM:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition* **  **  |EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
 EnumerableFilter
 or
 EnumerableMergeJoin
(changed)
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin
(changed)
 or
 EnumerableHashJoin
(new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner ||
|*only equi condition*|EnumerableJoin|EnumerableJoin |
|*only*  *non-equi  condition** ** * |EnumerableJoin|EnumerableThetaJoin |
|*mixed equi and non-equi  condition*|EnumerableJoin+
EnumerableFilter
 or
 EnumerableMergeJoin(changed)
  |EnumerableThetaJoin
 or
 EnumerableMergeJoin(changed)
 or
 EnumerableHashJoin(new)|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~julianhyde],[~zabetak] , good idea.

I just create a new rule for my application, to avoid changing the  
calcite-core.

I'll make a PR later to  allow theta joins to be executed using a merge join or 
hash join.

I draw a table to describe the relationship of join types and join operators:

 
|| ||inner||non-inner||
|*only equi condition*|EnumerableJoin|EnumerableJoin|
|*only*  *non-equi  condition*** ** |EnumerableJoin|EnumerableThetaJoin|
|*mixed equi and non-equi  condition*|EnumerableJoin+EnumerableFilter
or
EnumerableMergeJoin(changed)
 
|EnumerableThetaJoin
or
 EnumerableMergeJoin
or
EnumerableHashJoin|

If a join is non-inner and has  ** equi and non-equi  condition meanwhile, we 
have 3 choice to plan it.

Now  EnumerableThetaJoin  and EnumerableMergeJoin have a corresponding rule 
respectively, 

What do you think if I introduce a  new rule( EnumerableThetaHashJoinRule) to 
allow theta joins  to be executed using a hash join?

 

 

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm

2019-04-02 Thread Lai Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2973:
--
Summary: Allow theta joins to be executed using a merge join algorithm  
(was: Make EnumerableMergeJoinRule to support a theta join)

> Allow theta joins to be executed using a merge join algorithm
> -
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2973) Make EnumerableMergeJoinRule to support a theta join

2019-04-02 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807393#comment-16807393
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:28 AM:
---

[~julianhyde] , consider another query that the join conditions contains an 
equi condition and a non-equi condition meanwhile :

 
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON 
t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <1{code}
 Merge join is  also good for this query. But now it will be converted to a 
nested loop join.

 I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
  // EnumerableJoinRel only supports equi-join. We can put a filter on top
  // if it is an inner join.
  try {
boolean hasEquiKeys = !info.leftKeys.isEmpty()
&& !info.rightKeys.isEmpty();
if (hasEquiKeys) {
  return convertToThetaMergeJoin(rel);
} else {
  return new EnumerableThetaJoin(cluster, traitSet, left, right,
  join.getCondition(), join.getVariablesSet(), join.getJoinType());
}
  } catch (Exception e) {
EnumerableRules.LOGGER.debug(e.toString());
return null;
  }
}
{code}
 if the join has equi-keys, it will be converted an EnumerableThetaMergeJoin .
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right, 
info.getEquiCondition(left, right, cluster.getRexBuilder()), 
info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, 
join.getVariablesSet(), join.getJoinType());{code}
I implement the  EnumerableThetaMergeJoin to handle a theta join with equi keys 
.

The key difference of  EnumerableThetaMergeJoin and  EnumerableMergeJoin is 
that:

EnumerableThetaMergeJoin use a predicate generated by the remaining part of the 
JoinInfo,

and the  predicate will be applied on the cartesians result  of a merge join.

see 
[https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]

 I do some changes:
{code:java}
public TResult current() {
  final List list = cartesians.current();
  @SuppressWarnings("unchecked") final TSource left =
  (TSource) list.get(0);
  @SuppressWarnings("unchecked") final TInner right =
  (TInner) list.get(1);
  //apply predicate for the result in cartesians
  boolean isNonEquiPredicateSatisfied=predicate.apply(left, right);
  if (!isNonEquiPredicateSatisfied) {
if (generateNullsOnLeft) {
  return resultSelector.apply(null, right);
}
if (generateNullsOnRight) {
  return resultSelector.apply(left, null);
}
  }
  return resultSelector.apply(left, right);
}
{code}
 

 


was (Author: hhlai1990):
[~julianhyde] , consider another query that the join conditions contains an 
equi condition and a non-equi condition meanwhile :

 
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON 
t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <1{code}
 Merge join is  also good for this query. But now it will be converted to a 
nested loop join.

 I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
  // EnumerableJoinRel only supports equi-join. We can put a filter on top
  // if it is an inner join.
  try {
boolean hasEquiKeys = !info.leftKeys.isEmpty()
&& !info.rightKeys.isEmpty();
if (hasEquiKeys) {
  return convertToThetaMergeJoin(rel);
} else {
  return new EnumerableThetaJoin(cluster, traitSet, left, right,
  join.getCondition(), join.getVariablesSet(), join.getJoinType());
}
  } catch (Exception e) {
EnumerableRules.LOGGER.debug(e.toString());
return null;
  }
}
{code}
 if the join has equi-keys, it will convert the join rel to a 
EnumerableThetaMergeJoin .
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right, 
info.getEquiCondition(left, right, cluster.getRexBuilder()), 
info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, 
join.getVariablesSet(), join.getJoinType());{code}
I implement the  EnumerableThetaMergeJoin to handle a theta join with equi keys 
.

The key difference of  EnumerableThetaMergeJoin and  EnumerableMergeJoin is 
that:

EnumerableThetaMergeJoin use a predicate generated by the remaining part of the 
JoinInfo,

and the  predicate will be applied on the cartesians result  of a merge join.

see 
[https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]

 I do some changes:
{code:java}
public TResult current() {
  final List list = 

  1   2   >