[jira] [Commented] (CALCITE-4786) Facilitate use of graalvm native-image compilation
[ https://issues.apache.org/jira/browse/CALCITE-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452786#comment-17452786 ] Lai Zhou commented on CALCITE-4786: --- Sounds like an exciting project. (y) > Facilitate use of graalvm native-image compilation > -- > > Key: CALCITE-4786 > URL: https://issues.apache.org/jira/browse/CALCITE-4786 > Project: Calcite > Issue Type: Improvement > Components: build >Reporter: Jacques Nadeau >Priority: Major > > Right now, there are number of things that make it difficult to use Calcite > with GraalVM native compilation. > There are several reasons that supporting this kind of compilation could be > beneficial: > - Enable use of Calcite as a Lambda with minimal startup-time > - Create a Calcite shared library that can be easily embedded in other > languages > Initially, I would focus this work on core parsing and query planning. > This work was inspired by work on https://substrait.io > Let's use this ticket to track improvements that can be done to enable this. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou closed CALCITE-2741. - Resolution: Not A Problem > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valuable when someone want to migrate his hive etl jobs to > real-time scene. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou closed CALCITE-2992. - Resolution: Fixed > Enhance implicit conversions when generating hash join keys for an > equiCondition > > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4161) MergeJoin algorithm should not assume inputs sorted in ascending order
Lai Zhou created CALCITE-4161: - Summary: MergeJoin algorithm should not assume inputs sorted in ascending order Key: CALCITE-4161 URL: https://issues.apache.org/jira/browse/CALCITE-4161 Project: Calcite Issue Type: Bug Components: core, linq4j Affects Versions: 1.24.0 Reporter: Lai Zhou given a sql: {code:java} select id,first_name,vs.specialty_id from vets join vet_specialties vs on vets.id = vs.vet_id and vet_id>1 order by id desc limit 100 {code} the final plan is: {code:java} EnumerableCalc(expr#0..3=[{inputs}], proj#0..1=[{exprs}], specialty_id=[$t3]) EnumerableLimit(fetch=[100]) EnumerableMergeJoin(condition=[=($0, $2)], joinType=[inner]) EnumerableSort(sort0=[$0], dir0=[DESC]) EnumerableCalc(expr#0..2=[{inputs}], proj#0..1=[{exprs}]) JdbcToEnumerableConverter JdbcFilter(condition=[>($0, 1)]) JdbcTableScan(table=[[default, vets]]) EnumerableSort(sort0=[$0], dir0=[DESC]) JdbcToEnumerableConverter JdbcFilter(condition=[>($0, 1)]) JdbcTableScan(table=[[default, vet_specialties]]) {code} The inputs of EnumerableMergeJoin is sorted in descending order, but the MergeJoinEnumerator just supports inputs sorted in ascending order, the result is wrong. I think the MergeJoin should not assume inputs sorted in ascending order, it should know the inputs sorted order . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions
[ https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915562#comment-16915562 ] Lai Zhou commented on CALCITE-3284: --- yes, you're right. I have pushed a new commit, which resolved this issue. > Enumerable hash semijoin / antijoin support non-equi join conditions > > > Key: CALCITE-3284 > URL: https://issues.apache.org/jira/browse/CALCITE-3284 > Project: Calcite > Issue Type: Improvement >Reporter: Haisheng Yuan >Priority: Major > > Calcite should be able to generate enumerable hash semijoin / antijoin with > non-equi join conditions, as long as there are equi-join condtions, so that > we can do hash look up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions
[ https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915531#comment-16915531 ] Lai Zhou commented on CALCITE-3284: --- [~rubenql], Since Calcite can't support `semi join` or `anti join` keywords now, so how to construct a sql to test a semi join with nonEquiConditions? And a anti join with nonEquiConditions? > Enumerable hash semijoin / antijoin support non-equi join conditions > > > Key: CALCITE-3284 > URL: https://issues.apache.org/jira/browse/CALCITE-3284 > Project: Calcite > Issue Type: Improvement >Reporter: Haisheng Yuan >Priority: Major > > Calcite should be able to generate enumerable hash semijoin / antijoin with > non-equi join conditions, as long as there are equi-join condtions, so that > we can do hash look up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Issue Comment Deleted] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions
[ https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-3284: -- Comment: was deleted (was: [~rubenql], Did Calcite already support semi join/anti join keywords? I intend to query {code:java} SELECT d.deptno, d.name FROM depts d semi join emps e on d.deptno=e.deptno and d.deptno >10{code} but the `semi join` is not supported now. So how to write a case for semi/anti join with nonEquiCondition?) > Enumerable hash semijoin / antijoin support non-equi join conditions > > > Key: CALCITE-3284 > URL: https://issues.apache.org/jira/browse/CALCITE-3284 > Project: Calcite > Issue Type: Improvement >Reporter: Haisheng Yuan >Priority: Major > > Calcite should be able to generate enumerable hash semijoin / antijoin with > non-equi join conditions, as long as there are equi-join condtions, so that > we can do hash look up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions
[ https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915526#comment-16915526 ] Lai Zhou commented on CALCITE-3284: --- [~rubenql], Did Calcite already support semi join/anti join keywords? I intend to query {code:java} SELECT d.deptno, d.name FROM depts d semi join emps e on d.deptno=e.deptno and d.deptno >10{code} but the `semi join` is not supported now. So how to write a case for semi/anti join with nonEquiCondition? > Enumerable hash semijoin / antijoin support non-equi join conditions > > > Key: CALCITE-3284 > URL: https://issues.apache.org/jira/browse/CALCITE-3284 > Project: Calcite > Issue Type: Improvement >Reporter: Haisheng Yuan >Priority: Major > > Calcite should be able to generate enumerable hash semijoin / antijoin with > non-equi join conditions, as long as there are equi-join condtions, so that > we can do hash look up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-3284) Enumerable hash semijoin / antijoin support non-equi join conditions
[ https://issues.apache.org/jira/browse/CALCITE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914082#comment-16914082 ] Lai Zhou commented on CALCITE-3284: --- I'll resolve this issue in this PR. [https://github.com/apache/calcite/pull/1156] > Enumerable hash semijoin / antijoin support non-equi join conditions > > > Key: CALCITE-3284 > URL: https://issues.apache.org/jira/browse/CALCITE-3284 > Project: Calcite > Issue Type: Improvement >Reporter: Haisheng Yuan >Priority: Major > > Calcite should be able to generate enumerable hash semijoin / antijoin with > non-equi join conditions, as long as there are equi-join condtions, so that > we can do hash look up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913882#comment-16913882 ] Lai Zhou commented on CALCITE-2973: --- I noticed this performance issue before. I'll try to find a better way to handle it. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912868#comment-16912868 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql], [~hyuan] ,[~julianhyde], [~danny0405], the pr is ready ,would someone help to review it ? > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852674#comment-16852674 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql],[~michaelmior], now the patch is good enough to be merged. I adopt my initial solution to support the non inner join with mixed conditions(equi conditions and non-equi conditions): introducing an EnumerablePredicativeHashJoin(before I call it EnumerableThetaHashJoin) . The EnumerablePredicativeHashJoin and EnumerableHashJoin share the same hash join algorithm, but EnumerablePredicativeHashJoin extends Join rather than EquiJoin. I believe this solution will do no harm to current rules, but in the long term, we'd better change the EnumerableHashJoin to extend Join. [~hyuan] created an issue to work on this, see https://issues.apache.org/jira/browse/CALCITE-3089. So, I think we can resolved this issue first. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852603#comment-16852603 ] Lai Zhou commented on CALCITE-2992: --- May be we can make the CAST translation to include this conversion logic , making the solution more generic. But there will be a lot of work to make the validator to work out for implicit casting, I'll spend some time to involve related issues. > Enhance implicit conversions when generating hash join keys for an > equiCondition > > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3071) Cache the whole sql plan to reduce the latency and improve the performance
[ https://issues.apache.org/jira/browse/CALCITE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-3071: -- Description: In real business, when sql queries become complex, the overhead of sql plan will increase quickly , and many of the sql queries are duplicates. We already have some caching issue about improving the performance, such as the issue https://issues.apache.org/jira/browse/CALCITE-2703, which reduce code generation and class loading overhead when executing queries in the EnumerableConvention, but I think it's not enough. I propose to cache the whole sql plan to reduce the latency ,for the same sql , ignoring the cost optimizing based on statistics here, we can cache the generated code for it. I use the FrameworkConfig API to execute sql queries, in this way I can easily do this job . but it's not easy to make a whole sql execution plan(that says code-gen) cache in the sql processing flow based on JDBC Connection, because there're many intermediate state in this processing flow. Let's discuss this feature and the probable solutions. was: In real business, when sql queries become complex, the overhead of sql plan will increase quickly , and many of the sql queries are duplicates. We already do something on cacheing to improve the performance, such as the issue https://issues.apache.org/jira/browse/CALCITE-2703, which reduce code generation and class loading overhead when executing queries in the EnumerableConvention, but I think it's not enough. I propose to cache the whole sql plan to reduce the latency ,for the same sql , ignoring the cost optimizing based on statistics here, we can cache the generated code for it. I use the FrameworkConfig API to execute sql queries, in this way I can easily do this job . but it's not easy to make a whole sql execution plan(that says code-gen) cache in the sql processing flow based on JDBC Connection, because there're many intermediate state in this processing flow. Let's discuss this feature and the probable solutions. > Cache the whole sql plan to reduce the latency and improve the performance > -- > > Key: CALCITE-3071 > URL: https://issues.apache.org/jira/browse/CALCITE-3071 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > In real business, when sql queries become complex, the overhead of sql plan > will increase quickly , and many of the sql queries are duplicates. > We already have some caching issue about improving the performance, such as > the issue > https://issues.apache.org/jira/browse/CALCITE-2703, > which reduce code generation and class loading overhead when executing > queries in the EnumerableConvention, but I think it's not enough. > I propose to cache the whole sql plan to reduce the latency ,for the same sql > , ignoring the cost optimizing based on statistics here, we can cache the > generated code for it. > I use the FrameworkConfig API to execute sql queries, in this way I can > easily do this job . > but it's not easy to make a whole sql execution plan(that says code-gen) > cache in the sql processing flow based on JDBC Connection, because there're > many intermediate state in this processing flow. > > Let's discuss this feature and the probable solutions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3071) Cache the whole sql plan to reduce the latency and improve the performance
Lai Zhou created CALCITE-3071: - Summary: Cache the whole sql plan to reduce the latency and improve the performance Key: CALCITE-3071 URL: https://issues.apache.org/jira/browse/CALCITE-3071 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.19.0 Reporter: Lai Zhou In real business, when sql queries become complex, the overhead of sql plan will increase quickly , and many of the sql queries are duplicates. We already do something on cacheing to improve the performance, such as the issue https://issues.apache.org/jira/browse/CALCITE-2703, which reduce code generation and class loading overhead when executing queries in the EnumerableConvention, but I think it's not enough. I propose to cache the whole sql plan to reduce the latency ,for the same sql , ignoring the cost optimizing based on statistics here, we can cache the generated code for it. I use the FrameworkConfig API to execute sql queries, in this way I can easily do this job . but it's not easy to make a whole sql execution plan(that says code-gen) cache in the sql processing flow based on JDBC Connection, because there're many intermediate state in this processing flow. Let's discuss this feature and the probable solutions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3069) Make the JDBC Connection more extensible like the FrameworkConfig API
[ https://issues.apache.org/jira/browse/CALCITE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-3069: -- Description: More and more users are interested in building custom sql engines on top of Calcite. But for different sql engines, there're differences on sql parsing, expression conversions, implicit type casting ...even the physical implementations for logical plan. I think the FrameworkConfig API now provided a better way than JDBC Connection to custom these things.Are there any plans in the roadmap to enhance the JDBC Connection config , like FrameworkConfig API , to improve Calcite's extensibility ? Otherwise, implementing the whole physical plan like the default Enumerable-implementation will be boring , that also require a lot of work. May be we can do something to make the physical and execution plan(that says code-gen ) more customizable. Are there any thoughts on this issue? was: More and more users are interested in building custom sql engines on top of Calcite. But for different sql engines, there're differences on sql parsing, expression conversions, implicit type casting ...even the physical implementations for logical plan. I think the FrameworkConfig API now provided a better way than JDBC Connection to custom these things.Are there any plans in the roadmap to enhance the JDBC Connection config , like FrameworkConfig API , to improve Calcite's extensibility ? Otherwise, implementing the whole physical plan like the default Enumerable-implementation will be boring , that also require a lot of work. May be we can do something to make the physical and execution plan(that says code-gen ) more customizable. Are there any thoughts on this issue? * customizable * customizable > Make the JDBC Connection more extensible like the FrameworkConfig API > - > > Key: CALCITE-3069 > URL: https://issues.apache.org/jira/browse/CALCITE-3069 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > More and more users are interested in building custom sql engines on top of > Calcite. > But for different sql engines, there're differences on sql parsing, > expression conversions, implicit type casting ...even the physical > implementations for logical plan. > I think the FrameworkConfig API now provided a better way than JDBC > Connection to custom these things.Are there any plans in the roadmap to > enhance the JDBC Connection config , like FrameworkConfig API , to improve > Calcite's extensibility ? > Otherwise, implementing the whole physical plan like the default > Enumerable-implementation will be boring , that also require a lot of work. > May be we can do something to make the physical and execution plan(that says > code-gen ) more customizable. > Are there any thoughts on this issue? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3069) Make the JDBC Connection more extensible like the FrameworkConfig API
Lai Zhou created CALCITE-3069: - Summary: Make the JDBC Connection more extensible like the FrameworkConfig API Key: CALCITE-3069 URL: https://issues.apache.org/jira/browse/CALCITE-3069 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.19.0 Reporter: Lai Zhou More and more users are interested in building custom sql engines on top of Calcite. But for different sql engines, there're differences on sql parsing, expression conversions, implicit type casting ...even the physical implementations for logical plan. I think the FrameworkConfig API now provided a better way than JDBC Connection to custom these things.Are there any plans in the roadmap to enhance the JDBC Connection config , like FrameworkConfig API , to improve Calcite's extensibility ? Otherwise, implementing the whole physical plan like the default Enumerable-implementation will be boring , that also require a lot of work. May be we can do something to make the physical and execution plan(that says code-gen ) more customizable. Are there any thoughts on this issue? * customizable * customizable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840251#comment-16840251 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql] , good analysis. I tested this solution, but there're still some failed tests, the report: {code:java} [ERROR] Tests run: 5018, Failures: 47, Errors: 7, Skipped: 115 [ERROR] Errors: [ERROR] LatticeSuggesterTest.testEmpDept:76 » IndexOutOfBounds index (8) must be less ... [ERROR] LatticeSuggesterTest.testExpressionInAggregate:272 » IndexOutOfBounds index (3... [ERROR] LatticeSuggesterTest.testFoodMartAll:389->checkFoodMartAll:301 » IndexOutOfBounds [ERROR] LatticeSuggesterTest.testFoodMartAllEvolve:393->checkFoodMartAll:301 » IndexOutOfBounds [ERROR] LatticeSuggesterTest.testFoodmart:153 » IndexOutOfBounds index (17) must be le... [ERROR] LatticeSuggesterTest.testSharedSnowflake:264 » IndexOutOfBounds index (31) mus... [ERROR] MaterializationTest.testJoinMaterialization9:1825->checkMaterialize:202->checkMaterialize:210 » SQL {code} Check the LatticeSuggesterTest.testSharedSnowflake , I found the !join.analyzeCondition().isEqui(), did harm to this query. If I keep the line as {code:java} !(join instanceof EquiJoin) {code} Almost All the reported failed tests will be success, except the MaterializationTest.testJoinMaterialization9. You can change this line to find more details.I think this modification is not safe. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840190#comment-16840190 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql],thanks , I understand it. When creating a SemiJoin from a EnumerableJoin, the remainCondition is missed.Now it backs to my previous question: Should we define the EnumerableJoin as an EquiJoin or a pure Join?, if it's an EquiJoin, the condition just contains the equi part. If we change the EnumerableJoin to a pure join, it will cause some other problems , such as that, the FilterJoinRule can't work. My initial solution is to introduce a EnumerableThetaHashJoin to handle the non-inner join that contains a remainCondition. This EnumerableThetaHashJoin is more like a EnumerableThetaJoin, which is a Join rather than an EquiJoin, And EnumerableThetaHashJoin and Enumerable(Hash)Join can share the same hash join algorithm . I think this solution is more clear and will do no harm to current rules. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965 ] Lai Zhou edited comment on CALCITE-2973 at 5/15/19 7:26 AM: [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic way. But some of tests are failed, may introduce a bug after removing the filter. I'll check the problems . [~zabetak],[~rubenql] I found a problem about inner join, please checkout the current codebase, see the test case: JdbcAdapterTest.testScalarSubQuery, for the non-equi join sql {code:java} CalciteAssert.model(JdbcTest.SCOTT_MODEL) .query("SELECT COUNT(ename) AS cEname FROM \"SCOTT\".\"EMP\" " + "WHERE DEPTNO > (SELECT deptno FROM \"SCOTT\".\"DEPT\" " + "WHERE dname = 'ACCOUNTING')") .enable(CalciteAssert.DB == CalciteAssert.DatabaseInstance.HSQLDB) .returns("CENAME=11\n"); {code} Before, the generated plan was that: {code:java} EnumerableAggregate(group=[{}], CENAME=[COUNT($1)]) EnumerableCalc(expr#0..2=[{inputs}], expr#3=[>($t2, $t0)], proj#0..2=[{exprs}], $condition=[$t3]) EnumerableJoin(condition=[true], joinType=[inner], remainCondition=[>($2, $0)]) EnumerableAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)]) JdbcToEnumerableConverter JdbcFilter(condition=[=($1, 'ACCOUNTING')]) JdbcTableScan(table=[[SCOTT, DEPT]]) JdbcToEnumerableConverter JdbcProject(ENAME=[$1], DEPTNO=[$7]) JdbcTableScan(table=[[SCOTT, EMP]]) {code} After replacing the filter by a remainCondition, the planner find a semi-join based plan as the best plan ,but which was a bad plan. {code:java} EnumerableAggregate(group=[{}], CENAME=[COUNT($0)]) EnumerableSemiJoin(condition=[true], joinType=[inner]) JdbcToEnumerableConverter JdbcProject(ENAME=[$1], DEPTNO=[$7]) JdbcTableScan(table=[[SCOTT, EMP]]) JdbcToEnumerableConverter JdbcFilter(condition=[=($1, 'ACCOUNTING')]) JdbcTableScan(table=[[SCOTT, DEPT]]) {code} The condition of this logical SemiJoin was true . Could you help me to identify this problem ? was (Author: hhlai1990): [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic way. But some of tests are failed, may introduce a bug after dropping the filter. I'll check the problems . > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965 ] Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:35 AM: [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic way. But some of tests are failed, may introduce a bug after dropping the filter. I'll check the problems . was (Author: hhlai1990): [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic way. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965 ] Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:13 AM: [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic way. was (Author: hhlai1990): [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join and a filter , the Enumerable(Hash)Join can handle it. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839965#comment-16839965 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql], now the inner join with a remainCondtion won't be converted to an inner-join and a filter , the Enumerable(Hash)Join can handle it. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838611#comment-16838611 ] Lai Zhou commented on CALCITE-2973: --- [~rubenql], I agree with you. It's a good idea to using approach1 to handle the inner join case. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter
[ https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838403#comment-16838403 ] Lai Zhou commented on CALCITE-2173: --- [~masayuki038], I'll take some time to review your code, I'm only just beginning to get familiar with Arrow. (y) > Sample implementation of ArrowAdapter > - > > Key: CALCITE-2173 > URL: https://issues.apache.org/jira/browse/CALCITE-2173 > Project: Calcite > Issue Type: Improvement >Reporter: Masayuki Takahashi >Priority: Minor > > I try to implement Apache Arrow adaper. > [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow] > Issues: > * Add ArrowJoin, ArrowUnion, etc.. > * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. > So I have added 'of' method on PhysType to create PhysTypeImpl instance since > it can't access from arrow package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838246#comment-16838246 ] Lai Zhou commented on CALCITE-2973: --- [~zabetak], the query as you said, {code:java} SELECT e.name FROM emp e INNER JOIN department d ON e.address.zipcode = d.zipcode{code} I add a test for it, and I found the RexFieldAccess `e.address.zipcode` would be converted to a new RexInputRef , that was made by JoinPushExpressionsRule, see [https://github.com/apache/calcite/blob/6afa38bae794462e6e250237a1b60cc4220b2885/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L3290]. Please see the latest commit, there's a test named `leftOuterJoinWithPredicateContainsRexFieldAccess` in EnumerableJoinTest. I admit the rule based approach you proposed is also good for this issue. But I still think it's a little complicated, and it seems to increase the overhead of computation if we introduce a new projection. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter
[ https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836862#comment-16836862 ] Lai Zhou commented on CALCITE-2173: --- [~masayuki038], I know what you mean. This Arrow adapter is just an adapter to connect the datasource in arrow format. as you said, executing the query in parallel is a good way to improve performance. Another way is to organize the in-memory data in columnar format, enable vectorized expression execution. Both of above two way need to rewrite the implementations of operators. > Sample implementation of ArrowAdapter > - > > Key: CALCITE-2173 > URL: https://issues.apache.org/jira/browse/CALCITE-2173 > Project: Calcite > Issue Type: Improvement >Reporter: Masayuki Takahashi >Priority: Minor > > I try to implement Apache Arrow adaper. > [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow] > Issues: > * Add ArrowJoin, ArrowUnion, etc.. > * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. > So I have added 'of' method on PhysType to create PhysTypeImpl instance since > it can't access from arrow package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2173) Sample implementation of ArrowAdapter
[ https://issues.apache.org/jira/browse/CALCITE-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836434#comment-16836434 ] Lai Zhou commented on CALCITE-2173: --- [~masayuki038] ,are you still work on this? What's your recent plan? I'm glad to take some time on this issue with you, If we have the arrow adapter, we can support vectorized udf execution ,I'd like to see the performance improvement. > Sample implementation of ArrowAdapter > - > > Key: CALCITE-2173 > URL: https://issues.apache.org/jira/browse/CALCITE-2173 > Project: Calcite > Issue Type: Improvement >Reporter: Masayuki Takahashi >Priority: Minor > > I try to implement Apache Arrow adaper. > [https://github.com/masayuki038/calcite/tree/arrow2/arrow/src/main/java/org/apache/calcite/adapter/arrow] > Issues: > * Add ArrowJoin, ArrowUnion, etc.. > * This Arrow Adapter use org.apache.calcite.adapter.enumerable.PhysTypeImpl. > So I have added 'of' method on PhysType to create PhysTypeImpl instance since > it can't access from arrow package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836422#comment-16836422 ] Lai Zhou commented on CALCITE-2040: --- [~masayuki038] Great job. I add a relation link to https://issues.apache.org/jira/browse/CALCITE-2173 . I think it's more than an `adapter` of Calcite, it may be a new physical implementation that like the default Enumerable implementation. > Create adapter for Apache Arrow > --- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162 ] Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:40 AM: --- I think it may improve a lot of performance if we have Arrow as a calling convention. [~julianhyde],Do you mean a new kind of Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ? I found someone did part of this on github. See [https://github.com/masayuki038/calcite-arrow-sample/blob/master/src/main/scala/net/wrap_trap/calcite_arrow_sample/ArrowTranslatableTable.scala] It may be a good start. I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as a calling convention in Calcite. was (Author: hhlai1990): I think it may improve a lot of performance if we have Arrow as a calling convention. [~julianhyde],Do you mean a new kind of Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ? I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as a calling convention in Calcite. > Create adapter for Apache Arrow > --- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162 ] Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:35 AM: --- I think it may improve a lot of performance if we have Arrow as a calling convention. [~julianhyde],Do you mean a new kind of Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ? I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as a calling convention in Calcite. was (Author: hhlai1990): I think it may improve a lot of performance if we have Arrow as a calling convention. [~julianhyde],Do you mean a new kind of Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ? I'm just getting familiar with Arrow.I will have a try to make Arrow as a calling convention. > Create adapter for Apache Arrow > --- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836162#comment-16836162 ] Lai Zhou commented on CALCITE-2040: --- I think it may improve a lot of performance if we have Arrow as a calling convention. [~julianhyde],Do you mean a new kind of Enumerable-implementations for Filter, Project, Aggregate and TableScan need to be introduced ? I'm just getting familiar with Arrow.I will have a try to make Arrow as a calling convention. > Create adapter for Apache Arrow > --- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649 ] Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:49 AM: --- [~zabetak],I also think it was not exactly an adapter. My initial goal was to build a real-time/high-performance in memory sql engine that supports hive sql dialects on top of Calcite. I had a try to use the JDBC interface first, but I encountered some issues: # custom config issue: For every JDBC connection, we need put the data of current session into the schema, it means that current schema is bound to current session. So the static SchemaFactory can't work out for this, we need introduce the DDL functions like what was in calcite-server module. The SqlDdlNodes in calcite-server module would populate the table through FrameworkConfig API . When we execute a sql like {code:java} create table t1 as select * from t2 where t2.id>100{code} the populate method will be invoked,see [SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221] . We need custom the FrameworkConfig here, include OperatorTable,SqlConformance and more other custom configs. By the way, the FrameworkConfig should be builded with all the configs from current CalcitePrepare.Context rather than only the rootSchema , it was a bug. And the config options of CalcitePrepare.Context was just a subset of FrameworkConfig, most of the time we need use the FrameworkConfig API directly to build a new sql engine. When we execute a sql like {code:java} select * from t2 where t2.id>100 {code} CalcitePrepareImpl would handle this sql flow, it did the similar thing, but some configs are hard coded , such as RexExecutor,Programs. When implementing the EnumerableRel, the RelImplementor also might need be customized, see the example [HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java]. Now the JDBC interface didn't provide the way to custom these configs, so we proposed a new Table API that inspired by Apache Flink, to simplify the usage of Calcite when building a new sql engine. 2. cache issue: It's not easy to cache the whole sql plan if we use JDBC interface to handle a query, due to it's multiple-phase processing flow, but it is very easy to do this with the Table API,see [TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412]. summary: The proposed Table API makes it easy to config the sql engine and cache the whole sql plan to improve the query performance.It fits the scenes that satisfy these conditions: the datasources are deterministic and already in memory, there is no computation need to be pushed down; -the sql queries are deterministic,without dynamic parameters, so the whole sql plan cache will be helpful(we can also use placeholders in the execution plan to cache the dynamic query ).- was (Author: hhlai1990): [~zabetak],I also think it was not exactly an adapter. My initial goal was to build a real-time/high-performance in memory sql engine that supports hive sql dialects on top of Calcite. I had a try to use the JDBC interface first, but I encountered some issues: # custom config issue: For every JDBC connection, we need put the data of current session into the schema, it means that current schema is bound to current session. So the static SchemaFactory can't work out for this, we need introduce the DDL functions like what was in calcite-server module. The SqlDdlNodes in calcite-server module would populate the table through FrameworkConfig API . When we execute a sql like {code:java} create table t1 as select * from t2 where t2.id>100{code} the populate method will be invoked,see [SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221] . We need custom the FrameworkConfig here, include OperatorTable,SqlConformance and more other custom configs. By the way, the FrameworkConfig should be builded with all the configs from current CalcitePrepare.Context rather than only the rootSchema , it was a bug. And the config options of CalcitePrepare.Context was just a subset of FrameworkConfig, most of the time we need use the FrameworkConfig API directly to build a new sql engine. When we execute a sql like {code:java} select * from t2 where t2.id>100 {code} CalcitePrepareImpl would handle this sql flow, it did the similar thing, but some configs are hard coded , such as RexExecutor,Programs. When implementing the EnumerableRel, the
[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649 ] Lai Zhou edited comment on CALCITE-2741 at 5/6/19 9:36 AM: --- [~zabetak],I also think it was not exactly an adapter. My initial goal was to build a real-time/high-performance in memory sql engine that supports hive sql dialects on top of Calcite. I had a try to use the JDBC interface first, but I encountered some issues: # custom config issue: For every JDBC connection, we need put the data of current session into the schema, it means that current schema is bound to current session. So the static SchemaFactory can't work out for this, we need introduce the DDL functions like what was in calcite-server module. The SqlDdlNodes in calcite-server module would populate the table through FrameworkConfig API . When we execute a sql like {code:java} create table t1 as select * from t2 where t2.id>100{code} the populate method will be invoked,see [SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221] . We need custom the FrameworkConfig here, include OperatorTable,SqlConformance and more other custom configs. By the way, the FrameworkConfig should be builded with all the configs from current CalcitePrepare.Context rather than only the rootSchema , it was a bug. And the config options of CalcitePrepare.Context was just a subset of FrameworkConfig, most of the time we need use the FrameworkConfig API directly to build a new sql engine. When we execute a sql like {code:java} select * from t2 where t2.id>100 {code} CalcitePrepareImpl would handle this sql flow, it did the similar thing, but some configs are hard coded , such as RexExecutor,Programs. When implementing the EnumerableRel, the RelImplementor also might need be customized, see the example [HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java]. Now the JDBC interface didn't provide the way to custom these configs, so we proposed a new Table API that inspired by Apache Flink, to simplify the usage of Calcite when building a new sql engine. 2. cache issue: It's not easy to cache the whole sql plan if we use JDBC interface to handle a query, due to it's multiple-phase processing flow, but it is very easy to do this with the Table API,see [TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412]. summary: The proposed Table API makes it easy to config the sql engine and cache the whole sql plan to improve the query performance.It fits the scenes that satisfy these conditions: the datasources are deterministic and already in memory, there is no computation need to be pushed down; the sql queries are deterministic,without dynamic parameters, so the whole sql plan cache will be helpful(we can also use placeholders in the execution plan to cache the dynamic query ). was (Author: hhlai1990): [~zabetak],I also think it was not exactly an adapter. My initial goal was to build a real-time/high-performance in memory sql engine that supports hive sql dialects on top of Calcite. I had a try to use the JDBC interface first, but I encountered some issues: # custom config issue: For every JDBC connection, we need put the data of current session into the schema, it means that current schema is bound to current session. So the static SchemaFactory can't work out for this, we need introduce the DDL functions like what was in calcite-server module. The SqlDdlNodes in calcite-server module would populate the table through FrameworkConfig API . When we execute a sql like {code:java} create table t1 as select * from t2 where t2.id>100{code} the populate method will be invoked,see [SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221] . We need custom the FrameworkConfig here, include OperatorTable,SqlConformance and more other custom configs. By the way, the FrameworkConfig should be builded with all the configs from current CalcitePrepare.Context rather than only the rootSchema , it was a bug. And the config options of CalcitePrepare.Context was just a subset of FrameworkConfig, most of the time we need use the FrameworkConfig API directly to build a new sql engine. When we execute a sql like {code:java} select * from t2 where t2.id>100 {code} CalcitePrepareImpl would handle this sql flow, it did the similar thing, but some configs are hard coded , such as RexExecutor,Programs. When implementing the EnumerableRel, the RelImplementor
[jira] [Commented] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833649#comment-16833649 ] Lai Zhou commented on CALCITE-2741: --- [~zabetak],I also think it was not exactly an adapter. My initial goal was to build a real-time/high-performance in memory sql engine that supports hive sql dialects on top of Calcite. I had a try to use the JDBC interface first, but I encountered some issues: # custom config issue: For every JDBC connection, we need put the data of current session into the schema, it means that current schema is bound to current session. So the static SchemaFactory can't work out for this, we need introduce the DDL functions like what was in calcite-server module. The SqlDdlNodes in calcite-server module would populate the table through FrameworkConfig API . When we execute a sql like {code:java} create table t1 as select * from t2 where t2.id>100{code} the populate method will be invoked,see [SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221] . We need custom the FrameworkConfig here, include OperatorTable,SqlConformance and more other custom configs. By the way, the FrameworkConfig should be builded with all the configs from current CalcitePrepare.Context rather than only the rootSchema , it was a bug. And the config options of CalcitePrepare.Context was just a subset of FrameworkConfig, most of the time we need use the FrameworkConfig API directly to build a new sql engine. When we execute a sql like {code:java} select * from t2 where t2.id>100 {code} CalcitePrepareImpl would handle this sql flow, it did the similar thing, but some configs are hard coded , such as RexExecutor,Programs. When implementing the EnumerableRel, the RelImplementor also might need be customized, see the example [HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java]. Now the JDBC interface didn't provide the way to custom these configs, so we proposed a new Table API that inspired by Apache Flink, to simplify the usage of Calcite when building a new sql engine. 2. cache issue: It's not easy to cache the whole sql plan if we use JDBC interface to handle a query, due to it's multiple-phase processing flow, but it is very easy to do this with the Table API,see [TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412]. summary: The proposed Table API makes it easy to config the sql engine and cache the whole sql plan to improve the query performance.It fits the scenes that satisfy these conditions: the datasources are deterministic and already in memory, there is no computation need to be pushed down; the sql queries are deterministic, so the whole sql plan cache will be helpful; > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valuable when someone want to migrate his hive etl jobs to > real-time scene. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2741: -- Affects Version/s: 1.19.0 Description: I write a hive adapter for calcite to support Hive sql ,includes UDF、UDAF、UDTF and some of SqlSpecialOperator. How do you think of supporting a direct implemention of hive sql like this? I think it will be valuable when someone want to migrate his hive etl jobs to real-time scene. was: [~julianhyde], I write a hive adapter for calcite to support Hive sql ,includes UDF、UDAF、UDTF and some of SqlSpecialOperator. How do you think of supporting a direct implemention of hive sql like this? I think it will be valueable when someone want to migrate his hive etl jobs to real-time scene. > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valuable when someone want to migrate his hive etl jobs to > real-time scene. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001 ] Lai Zhou edited comment on CALCITE-2282 at 5/1/19 2:01 AM: --- [~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with same NAME, KIND to your own table, then validator will use it to replace the original one" It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I forgot my new solutions for DIVIDE in last comment. Here is my code: {code:java} newOp = new SqlBinaryOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); newOp = new SqlPrefixOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); register(newOp); {code} If we put an operator with same NAME, KIND,SqlSyntax to replace the original one, we'd better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I introduced a new constructor for them to construct the Operator. as [~julianhyde] said, "Another technique could be a visitor that walks over expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." It also works . Thanks. was (Author: hhlai1990): [~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with same NAME, KIND to your own table, then validator will use it to replace the original one" It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I forgot my new solutions for DIVIDE in last comment. Here is my code: {code:java} newOp = new SqlBinaryOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); newOp = new SqlPrefixOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); register(newOp); {code} If we put an operator with same NAME, KIND to replace the original one, we'd better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I introduced a new constructor for them to construct the Operator. as [~julianhyde] said, "Another technique could be a visitor that walks over expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." It also works . Thanks. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061 ] Lai Zhou edited comment on CALCITE-2741 at 4/30/19 9:40 AM: hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang] I create a new adapter of Calcite that supports hive sql queries on dataset. Since the extensions is made base on Calcite 1.18.0, I pushed the project to a new codebase: [https://github.com/51nb/marble] And I proposed a Table API to make it easy to execute a sql query. We use it in our company's core financial business to unify the way to compute lots of model variables . This project shows how we extend Calcite core to support hive sql queries, it may be helpful to people who wants to build a customized sql engine on top of Calcite. was (Author: hhlai1990): hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang] I create a new adapter of Calcite that support hive sql queries on dataset. Since the extensions is made base on Calcite 1.18.0, I pushed the project to a new codebase: [https://github.com/51nb/marble] And I proposed a Table API to make it easy to execute a sql query. We use it in our company's core financial business to unify the way to compute lots of model variables . This project shows how we extend Calcite core to support hive sql queries, it may be helpful to people who wants to build a customized sql engine on top of Calcite. > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Reporter: Lai Zhou >Priority: Minor > > [~julianhyde], > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valueable when someone want to migrate his hive etl jobs > to real-time scene. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061 ] Lai Zhou commented on CALCITE-2741: --- hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang] I create a new adapter of Calcite that support hive sql queries on dataset. Since the extensions is made base on Calcite 1.18.0, I pushed the project to a new codebase. [[https://github.com/51nb/marble]|[https://github.com/51nb/marble]]. And I proposed a Table API to make it easy to execute a sql query. We use it in our company's core financial business to unify the way to compute lots of model variables . This project shows how we extend Calcite core to support hive sql queries, it may be helpful to people who wants to build a customized sql engine on top of Calcite. > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Reporter: Lai Zhou >Priority: Minor > > [~julianhyde], > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valueable when someone want to migrate his hive etl jobs > to real-time scene. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions
[ https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830061#comment-16830061 ] Lai Zhou edited comment on CALCITE-2741 at 4/30/19 8:30 AM: hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang] I create a new adapter of Calcite that support hive sql queries on dataset. Since the extensions is made base on Calcite 1.18.0, I pushed the project to a new codebase: [https://github.com/51nb/marble] And I proposed a Table API to make it easy to execute a sql query. We use it in our company's core financial business to unify the way to compute lots of model variables . This project shows how we extend Calcite core to support hive sql queries, it may be helpful to people who wants to build a customized sql engine on top of Calcite. was (Author: hhlai1990): hi,[~julianhyde] , [~zabetak],[~zhztheplayer], [~hyuan],[~francischuang] I create a new adapter of Calcite that support hive sql queries on dataset. Since the extensions is made base on Calcite 1.18.0, I pushed the project to a new codebase. [[https://github.com/51nb/marble]|[https://github.com/51nb/marble]]. And I proposed a Table API to make it easy to execute a sql query. We use it in our company's core financial business to unify the way to compute lots of model variables . This project shows how we extend Calcite core to support hive sql queries, it may be helpful to people who wants to build a customized sql engine on top of Calcite. > Add operator table with Hive-specific built-in functions > > > Key: CALCITE-2741 > URL: https://issues.apache.org/jira/browse/CALCITE-2741 > Project: Calcite > Issue Type: New Feature > Components: core >Reporter: Lai Zhou >Priority: Minor > > [~julianhyde], > I write a hive adapter for calcite to support Hive sql ,includes > UDF、UDAF、UDTF and some of SqlSpecialOperator. > How do you think of supporting a direct implemention of hive sql like this? > I think it will be valueable when someone want to migrate his hive etl jobs > to real-time scene. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001 ] Lai Zhou edited comment on CALCITE-2282 at 4/30/19 6:52 AM: [~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with same NAME, KIND to your own table, then validator will use it to replace the original one" It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I forgot my new solutions for DIVIDE in last comment. Here is my code: {code:java} newOp = new SqlBinaryOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); newOp = new SqlPrefixOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); register(newOp); {code} If we put an operator with same NAME, KIND to replace the original one, we'd better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I introduced a new constructor for them to construct the Operator. as [~julianhyde] said, "Another technique could be a visitor that walks over expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." It also works . Thanks. was (Author: hhlai1990): [~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with same NAME, KIND to your own table, then validator will use it to replace the original one" It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I forgot my new solutions for DIVIDE in last comment. Here is my code: {code:java} newOp = new SqlBinaryOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); newOp = new SqlPrefixOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); register(newOp); {code} If we put an operator with same NAME, KIND to replace the original one, we'd better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I introduced a new constructor for them to construct the Operator. as [~julianhyde] said, "Another technique could be a visitor that walks over expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." can also work . Thanks. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830001#comment-16830001 ] Lai Zhou commented on CALCITE-2282: --- [~zhztheplayer] ,thanks, you're right. " That said, you can put a operator with same NAME, KIND to your own table, then validator will use it to replace the original one" It really works. I don't need to rewrite the Parser.jj to replace DIVIDE. I forgot my new solutions for DIVIDE in last comment. Here is my code: {code:java} newOp = new SqlBinaryOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); newOp = new SqlPrefixOperator(upName, operatorInSqlStdOperatorTable.getKind(), operatorInSqlStdOperatorTable.getLeftPrec(), operatorInSqlStdOperatorTable.getRightPrec(), HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); register(newOp); {code} If we put an operator with same NAME, KIND to replace the original one, we'd better keep the same class `SqlBinaryOperator` or `SqlPrefixOperator`. So I introduced a new constructor for them to construct the Operator. as [~julianhyde] said, "Another technique could be a visitor that walks over expressions and replaces Calcite's DIVIDE with Hive's DIVIDE." can also work . Thanks. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:30 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. Consider the simplifyCast of RexSimplify, in Hive Sql {code:java} select cast('' as Decimal) {code} will return null, but in Calcite it will throw exception. I want to use a SqlConformance to customize the generated expression when simplifing Cast. Since the [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:31 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. Consider the simplifyCast of RexSimplify, in Hive Sql {code:java} select cast('' as decimal) {code} it will return null, but in Calcite it will throw exception. I want to use a SqlConformance to customize the generated expression when simplifing Cast. Since the [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. Consider the simplifyCast of RexSimplify, in Hive Sql {code:java} select cast('' as Decimal) {code} it will return null, but in Calcite it will throw exception. I want to use a SqlConformance to customize the generated expression when simplifing Cast. Since the [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:30 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. Consider the simplifyCast of RexSimplify, in Hive Sql {code:java} select cast('' as Decimal) {code} it will return null, but in Calcite it will throw exception. I want to use a SqlConformance to customize the generated expression when simplifing Cast. Since the [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. Consider the simplifyCast of RexSimplify, in Hive Sql {code:java} select cast('' as Decimal) {code} will return null, but in Calcite it will throw exception. I want to use a SqlConformance to customize the generated expression when simplifing Cast. Since the [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:21 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions.But now the caller RexSimplify didn't have a SqlConformance. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution, we can ignore it here . was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:18 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the current SqlConformance to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:01 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution .So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:00 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution .So,may I create a PR to solve the RexExecutorImpl's problem first? was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 7:00 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source [~zabetak],[~julianhyde] Is there already an example query that joins different data source using different sql dialects meanwhile ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou commented on CALCITE-3014: --- [~zabetak],[~julianhyde] Is there already an example query that joins different data source ? Do we really need this feature? If Just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827840#comment-16827840 ] Lai Zhou edited comment on CALCITE-3014 at 4/28/19 6:42 AM: [~zabetak],[~julianhyde] Is there already an example query that joins different data source ? Do we really need this feature? If just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? was (Author: hhlai1990): [~zabetak],[~julianhyde] Is there already an example query that joins different data source ? Do we really need this feature? If Just consider the problem of RexExecutorImpl here, we need pass the RexToLixTranslator to RexExecutorImpl when reducing expressions. The [AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] did have a `TODO` solution . So,may I create a PR to solve the RexExecutorImpl's problem first? > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-3014: -- Description: I found SqlConformanceEnum is hard coded in a few places. [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] I think it's not easy to fix them in a generic way. To support different SQL compatibility modes well, many place of current codebase is possible to be modified. It will `drill a hole` to pass the SqlConformance config in the whole process of one sql query. May be we can put the SqlConformance config in ThreadLocal, avoiding pass it frequently. was: [~julianhyde] I found SqlConformanceEnum is hard coded in a few places. [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] I think it's not easy to fix them in a generic way. To support different SQL compatibility modes well, many place of current codebase is possible to be modified. It will `drill a hole` to pass the SqlConformance config in the whole process of one sql query. May be we can put the SqlConformance config in ThreadLocal, avoiding pass it frequently. > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825923#comment-16825923 ] Lai Zhou commented on CALCITE-2282: --- [~zhztheplayer], It likes what I commented before, {code:java} /**since SqlOperator is identified by name and kind , see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, we can override implementors of operators that declared in * SqlStdOperatorTable * */ {code} {code:java} SqlOperator newOp = new HiveSqlFunction(functionInStd.getNameAsId(), functionInStd.getKind(), HiveSqlUDFReturnTypeInference.INSTANCE, functionInStd.getFunctionType()); register(newOp); {code} But DIVIDE can't be replaced correctly by this way. You will find that the static built-in functions would not be looked up from customized OperatorTable. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653 ] Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:35 AM: [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real business. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to custom static built-in operators without changing the parser for this use case. was (Author: hhlai1990): [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real business. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to use custom operator table without changing the parser for this use case. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653 ] Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:35 AM: [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real business. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to custom static built-in operators without changing the parser for this use case. was (Author: hhlai1990): [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real business. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to custom static built-in operators without changing the parser for this use case. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653 ] Lai Zhou edited comment on CALCITE-2282 at 4/25/19 2:33 AM: [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real business. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to use custom operator table without changing the parser for this use case. was (Author: hhlai1990): [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real buisiness. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to use custom operator table without changing the parser for this use case. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825653#comment-16825653 ] Lai Zhou commented on CALCITE-2282: --- [~danny0405], it's the way how I override the built-in operators: In Parser.jj {code:java} | { op = SqlStdOperatorTable.LIKE; } | { op = SqlStdOperatorTable.SIMILAR_TO; } | { op = HiveSqlOperatorTable.RLIKE; } | { op = HiveSqlOperatorTable.REGEXP; } ) {code} {code:java} | { return SqlStdOperatorTable.PLUS; } | { return SqlStdOperatorTable.MINUS; } | { return HiveSqlOperatorTable.MULTIPLY; } | { return HiveSqlOperatorTable.DIVIDE; }{code} The default DIVIDE operator in SqlStdOperatorTable is not ok for real buisiness. Consider the follow sql: select 2/5 , the result is 0. But we expect 0.4. [~julianhyde], Now the only way to custom the DIVIDE operator is to rewrite the Parser.jj. I didn't find the way as [~zhztheplayer] said to to use custom operator table without changing the parser for this use case. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823659#comment-16823659 ] Lai Zhou commented on CALCITE-2973: --- I modified EnumerableJoin to be able to deal with non-equi join that has equi conditions. I didn't rename the EnumerableJoin this time, we can rename it to `EnumerableHashJoin` in next patch later. Now EnumerableDefaults's method `join_` implemented the hash join algorithm for a join , no matter it has a non-equi condition or not. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090 ] Lai Zhou edited comment on CALCITE-2973 at 4/23/19 2:16 AM: [~zabetak],[~hyuan], should we keep EnumerableJoin as an `EquiJoin`or change it to extend `Join`? I have a try to change it to extend `Join`, but the FilterJoinRule can't work. It can't push down the remain condition into the inputs of the join correctly. see [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165] If we keep EnumerableJoin as an `EquiJoin`, we need to introduce a field for EnumerableJoin to reference the join condition, as we need to extract remain part condition of it.So what's the better way? was (Author: hhlai1990): [~zabetak],[~hyuan], should we keep EnumerableJoin as an `EquiJoin`or change it to extend `Join`? I have a try to change it to extend `Join`, but the FilterJoinRule can't work. It can't push down the remain condition into a filter after an inner join correctly. see [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165] If we keep EnumerableJoin as an `EquiJoin`, we need to introduce a field for EnumerableJoin to reference the join condition, as we need to extract remain part condition of it.So what's the better way? > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090 ] Lai Zhou edited comment on CALCITE-2973 at 4/22/19 2:18 PM: [~zabetak],[~hyuan], should we keep EnumerableJoin as an `EquiJoin`or change it to extend `Join`? I have a try to change it to extend `Join`, but the FilterJoinRule can't work. It can't push down the remain condition into a filter after an inner join correctly. see [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165] If we keep EnumerableJoin as an `EquiJoin`, we need to introduce a field for EnumerableJoin to reference the join condition, as we need to extract remain part condition of it.So what's the better way? was (Author: hhlai1990): [~zabetak],[~hyuan], should we keep EnumerableJoin as an `EquiJoin`or change it to extend `Join`? I have a try to change it to extend `Join`, but the FilterJoinRule can't work. It can't push down the remain condition into a filter after an inner join correctly. see [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165] If we keep EnumerableJoin as an `EquiJoin`, we need to introduce a field for EnumerableJoin to reference the join condition, because we need to extract remain part condition of it.So what's the better way? > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823090#comment-16823090 ] Lai Zhou commented on CALCITE-2973: --- [~zabetak],[~hyuan], should we keep EnumerableJoin as an `EquiJoin`or change it to extend `Join`? I have a try to change it to extend `Join`, but the FilterJoinRule can't work. It can't push down the remain condition into a filter after an inner join correctly. see [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165] If we keep EnumerableJoin as an `EquiJoin`, we need to introduce a field for EnumerableJoin to reference the join condition, because we need to extract remain part condition of it.So what's the better way? > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
[ https://issues.apache.org/jira/browse/CALCITE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-3014: -- Description: [~julianhyde] I found SqlConformanceEnum is hard coded in a few places. [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] I think it's not easy to fix them in a generic way. To support different SQL compatibility modes well, many place of current codebase is possible to be modified. It will `drill a hole` to pass the SqlConformance config in the whole process of one sql query. May be we can put the SqlConformance config in ThreadLocal, avoiding pass it frequently. was: [~julianhyde] I found SqlConformanceEnum is hard coded in a few places. [1|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] [2|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] I think it's not easy to fix them in a generic way. To support different SQL compatibility modes well, many place of current codebase is possible to be modified. It will `drill a hole` to pass the SqlConformance config in the whole process of one sql query. May be we can put the SqlConformance config in ThreadLocal, avoiding pass it frequently. > SqlConformanceEnum is hard coded in a few places > > > Key: CALCITE-3014 > URL: https://issues.apache.org/jira/browse/CALCITE-3014 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > > [~julianhyde] I found SqlConformanceEnum is hard coded in a few places. > [https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] > [https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] > I think it's not easy to fix them in a generic way. To support different > SQL compatibility modes well, many place of current codebase is possible to > be modified. > It will `drill a hole` to pass the SqlConformance config in the whole process > of one sql query. > May be we can put the SqlConformance config in ThreadLocal, avoiding pass it > frequently. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3014) SqlConformanceEnum is hard coded in a few places
Lai Zhou created CALCITE-3014: - Summary: SqlConformanceEnum is hard coded in a few places Key: CALCITE-3014 URL: https://issues.apache.org/jira/browse/CALCITE-3014 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.19.0 Reporter: Lai Zhou [~julianhyde] I found SqlConformanceEnum is hard coded in a few places. [1|https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rex/RexExecutorImpl.java#L81] [2|https://github.com/apache/calcite/blob/72f36a8830afe7f903d8cb32cf547ea484e49fef/core/src/main/java/org/apache/calcite/interpreter/AggregateNode.java#L226] I think it's not easy to fix them in a generic way. To support different SQL compatibility modes well, many place of current codebase is possible to be modified. It will `drill a hole` to pass the SqlConformance config in the whole process of one sql query. May be we can put the SqlConformance config in ThreadLocal, avoiding pass it frequently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939 ] Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:23 AM: - [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} 2.introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else { /**since SqlOperator is identified by name and kind ,see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, * we can override implementors of operators that declared in * SqlStdOperatorTable * */ CallImplementor callImplementor; if (sqlOperator.getName().equals("NOT RLIKE")) { callImplementor = RexImpTable.createImplementor( RexImpTable.NotImplementor.of( new HiveUDFImplementor()), NullPolicy.STRICT, false); } else { callImplementor = RexImpTable.createImplementor( new HiveUDFImplementor(), NullPolicy.NONE, false); } map.put(sqlOperator, callImplementor); } } // directly override some implementors of SqlOperator that declared in // SqlStdOperatorTable map.put(SqlStdOperatorTable.ITEM, new RexImpTable.ItemImplementor(true)); }); {code} The way to achieve my goal might be quick and dirty . If Calcite can be more pluggable, it will be friendly to people who wants to build a new sql engine on top of Calcite. was (Author: hhlai1990): [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} 2.introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939 ] Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:22 AM: - [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} 2.introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else { /**since SqlOperator is identified by name and kind ,see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, * we can override implementors of operators that declared in * SqlStdOperatorTable * */ CallImplementor callImplementor; if (sqlOperator.getName().equals("NOT RLIKE")) { callImplementor = RexImpTable.createImplementor( RexImpTable.NotImplementor.of( new HiveUDFImplementor()), NullPolicy.STRICT, false); } else { callImplementor = RexImpTable.createImplementor( new HiveUDFImplementor(), NullPolicy.NONE, false); } map.put(sqlOperator, callImplementor); } } // directly override some implementors of SqlOperator that declared in // SqlStdOperatorTable map.put(SqlStdOperatorTable.ITEM, new RexImpTable.ItemImplementor(true)); }); {code} The way to achieve my goal might be quick and dirty , if Calcite can be more pluggable, it will be friendly to people who wants to use calcite to make a new sql engine. was (Author: hhlai1990): [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} 2.introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else {
[jira] [Comment Edited] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939 ] Lai Zhou edited comment on CALCITE-2282 at 4/18/19 10:19 AM: - [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} 2.introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else { /**since SqlOperator is identified by name and kind ,see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, * we can override implementors of operators that declared in * SqlStdOperatorTable * */ CallImplementor callImplementor; if (sqlOperator.getName().equals("NOT RLIKE")) { callImplementor = RexImpTable.createImplementor( RexImpTable.NotImplementor.of( new HiveUDFImplementor()), NullPolicy.STRICT, false); } else { callImplementor = RexImpTable.createImplementor( new HiveUDFImplementor(), NullPolicy.NONE, false); } map.put(sqlOperator, callImplementor); } } // directly override some implementors of SqlOperator that declared in // SqlStdOperatorTable map.put(SqlStdOperatorTable.ITEM, new RexImpTable.ItemImplementor(true)); }); {code} The way to achieve my goal might be quick and dirty , but I think it will be friendly to people who wants to use calcite to make a new sql engine. was (Author: hhlai1990): [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} # introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else { /**since
[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820939#comment-16820939 ] Lai Zhou commented on CALCITE-2282: --- [~julianhyde], I think we should let the validator to resolve operators , the parser only need to parse the sql . After the parser parsed the sql, we just have unresolved functions. I'm working to bridge the hive functions to calcite these days. Since implementing a whole function list for a DB is an amount of work and boring,I just want to make the parser and the functions compatible with hive ,so I hope to reuse the built-in functions of Calcite as possible. I made some extensions to reach it: # introduce a HiveSqlOperatorTable which extends ReflectiveSqlOperatorTable to lookupOperatorOverloads, so I can plugin some new operators or replace the built-in operators of Calcite. For example, I want to implement a Hive DIVIDE operator, then I redefined it: {code:java} /** * replace the DIVIDE FUNCTION in Parser.jj, Inferring type is not right by * {@link SqlStdOperatorTable#DIVIDE} */ public static final SqlBinaryOperator DIVIDE = new SqlBinaryOperator( "/", SqlKind.DIVIDE, 60, true, HiveSqlUDFReturnTypeInference.INSTANCE, null, HiveSqlFunction.ArgChecker.INSTANCE); {code} # introduce a post processor for RexImpTable, to define Implementors . Here is some part of the code : {code:java} private void defineImplementors() { //define implementors for hive operator final List operatorList = getOperatorList(); RexImpTable.INSTANCE.defineImplementors((map, aggMap, winAggMap) -> { for (SqlOperator sqlOperator : operatorList) { if (sqlOperator instanceof HiveSqlAggFunction) { HiveSqlAggFunction aggFunction = (HiveSqlAggFunction) sqlOperator; aggMap.put(aggFunction, () -> new HiveUDAFImplementor(aggFunction)); } else { /**since SqlOperator is identified by name and kind ,see * {@link SqlOperator#equals(Object)} and * {@link SqlOperator#hashCode()}, * we can override implementors of operators that declared in * SqlStdOperatorTable * */ CallImplementor callImplementor; if (sqlOperator.getName().equals("NOT RLIKE")) { callImplementor = RexImpTable.createImplementor( RexImpTable.NotImplementor.of( new HiveUDFImplementor()), NullPolicy.STRICT, false); } else { callImplementor = RexImpTable.createImplementor( new HiveUDFImplementor(), NullPolicy.NONE, false); } map.put(sqlOperator, callImplementor); } } // directly override some implementors of SqlOperator that declared in // SqlStdOperatorTable map.put(SqlStdOperatorTable.ITEM, new RexImpTable.ItemImplementor(true)); }); {code} The way to achieve my goal might be quick and dirty , but I think it will be friendly to people who wants to use calcite to make a new sql engine. > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820902#comment-16820902 ] Lai Zhou edited comment on CALCITE-2973 at 4/18/19 9:33 AM: [~zabetak], I can't find a good way to break a theta join into an equi-join + filter/projection , I think it will also make the rules hard to understand. But I found another simple and clear way , please see the latest commit [https://github.com/apache/calcite/pull/1156/files] We still keep the EquiJoin as a pure equil join without remain condition. For a theta join, as Calcite defined in the EnumerableJoinRule, {code:java} !info.isEqui() && join.getJoinType() != JoinRelType.INNER{code} if it has equi keys, we can use a hash-join or merge-join instead of nested-loop-join to improve the performance . So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , I found there are some difference between algorithms of pure hash join and hash join with remain condition : When we implement a pure hash join , we just need to compare the hash join keys , but when we implement a hash join with remain condition, we need to compare some other columns to find the unmatched records. So I introduced a new method named `thetaHashJoin` in EnumerableDefaults. was (Author: hhlai1990): [~zabetak], I can't find a good way to break a theta join into an equi-join + filter/projection , I think it will also make the rules hard to understand. But I found another simple and clear way , please see the latest commit :[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]] We still keep the EquiJoin as a pure equil join without remain condition. For a theta join, as Calcite defined in the EnumerableJoinRule, {code:java} !info.isEqui() && join.getJoinType() != JoinRelType.INNER{code} if it has equi keys, we can use a hash-join or merge-join instead of nested-loop-join to improve the performance . So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , I found there are some difference between algorithms of pure hash join and hash join with remain condition : When we implement a pure hash join , we just need to compare the hash join keys , but when we implement a hash join with remain condition, we need to compare some other columns to find the unmatched records. So I introduced a new method named `thetaHashJoin` in EnumerableDefaults. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820902#comment-16820902 ] Lai Zhou commented on CALCITE-2973: --- [~zabetak], I can't find a good way to break a theta join into an equi-join + filter/projection , I think it will also make the rules hard to understand. But I found another simple and clear way , please see the latest commit :[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]] We still keep the EquiJoin as a pure equil join without remain condition. For a theta join, as Calcite defined in the EnumerableJoinRule, {code:java} !info.isEqui() && join.getJoinType() != JoinRelType.INNER{code} if it has equi keys, we can use a hash-join or merge-join instead of nested-loop-join to improve the performance . So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , I found there are some difference between algorithms of pure hash join and hash join with remain condition : When we implement a pure hash join , we just need to compare the hash join keys , but when we implement a hash join with remain condition, we need to compare some other columns to find the unmatched records. So I introduced a new method named `thetaHashJoin` in EnumerableDefaults. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943 ] Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:04 AM: [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new algorithm to support join with predicate. see [EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] was (Author: hhlai1990): [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new algorithm to support join with predicate. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943 ] Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:02 AM: [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new algorithm to support join with predicate. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] was (Author: hhlai1990): [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943 ] Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:01 AM: [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] was (Author: hhlai1990): [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819701#comment-16819701 ] Lai Zhou commented on CALCITE-2973: --- [~zabetak], great thanks to your suggestion. I'll take it into account and give you feedback later. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-1531) SqlValidatorException when boolean operators are used with NULL
[ https://issues.apache.org/jira/browse/CALCITE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819698#comment-16819698 ] Lai Zhou commented on CALCITE-1531: --- [~julianhyde], could you reopen this issue , and change the summary to "Infer the type for a naked NULL literal from it‘s context" ? [~zabetak], should I create a new issue? > SqlValidatorException when boolean operators are used with NULL > --- > > Key: CALCITE-1531 > URL: https://issues.apache.org/jira/browse/CALCITE-1531 > Project: Calcite > Issue Type: Bug >Reporter: Serhii Harnyk >Assignee: Julian Hyde >Priority: Major > Fix For: 1.11.0 > > > SqlValidatorException when we use boolean AND, OR operators with null. > {noformat} > 0: jdbc:calcite:localhost> SELECT (CASE WHEN true or null then 1 else 0 end) > from (VALUES(1)); > 2016-12-06 17:12:47,622 [main] ERROR - > org.apache.calcite.sql.validate.SqlValidatorException: Illegal use of 'NULL' > 2016-12-06 17:12:47,623 [main] ERROR - > org.apache.calcite.runtime.CalciteContextException: From line 1, column 27 to > line 1, column 30: Illegal use of 'NULL' > Error: Error while executing SQL "SELECT (CASE WHEN true or null then 1 else > 0 end) from (VALUES(1))": From line 1, column 27 to line 1, column 30: > Illegal use of 'NULL' (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser
[ https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818921#comment-16818921 ] Lai Zhou commented on CALCITE-2282: --- +1.[~julianhyde] , is there a good way to solve this ? I use a customized parser to override the createCall method of SqlAbstractParserImpl to activate my OperatorTable. I think it's a typical case that respond to the Calcite's objective: One planner fits all. {code:java} public class ${parser.class} extends SqlAbstractParserImpl { private static final Logger LOGGER = CalciteTrace.getParserTracer(); // Can't use quoted literal because of a bug in how JavaCC translates // backslash-backslash. private static final char BACKSLASH = 0x5c; private static final char DOUBLE_QUOTE = 0x22; private static final String DQ = DOUBLE_QUOTE + ""; private static final String DQDQ = DQ + DQ; private static Metadata metadata; private Casing unquotedCasing; private Casing quotedCasing; private int identifierMaxLength; private SqlConformance conformance; /** * {@link SqlParserImplFactory} implementation for creating parser. */ public static final SqlParserImplFactory FACTORY = new SqlParserImplFactory() { public SqlAbstractParserImpl getParser(Reader stream) { return new ${parser.class}(stream); } }; @Override public SqlCall createCall( SqlIdentifier funName, SqlParserPos pos, SqlFunctionCategory funcType, SqlLiteral functionQualifier, SqlNode[] operands) { SqlOperator fun = null; // First, try a half-hearted resolution as a builtin function. // If we find one, use it; this will guarantee that we // preserve the correct syntax (i.e. don't quote builtin function /// name when regenerating SQL). if (funName.isSimple()) { final List list = new ArrayList(); HiveSqlOperatorTable.instance().lookupOperatorOverloads(funName, funcType, SqlSyntax.FUNCTION, list); if (list.size() > 0) { fun = (SqlOperator) list.get(0); } } // Otherwise, just create a placeholder function. Later, during // validation, it will be resolved into a real function reference. if (fun == null) { fun = new SqlUnresolvedFunction(funName, null, null, null, null, funcType); } return fun.createCall(functionQualifier, pos, operands); } {code} > Allow OperatorTable to be pluggable in the parser > - > > Key: CALCITE-2282 > URL: https://issues.apache.org/jira/browse/CALCITE-2282 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Sudheesh Katkam >Priority: Major > Attachments: CALCITE-2282.patch.txt > > > SqlAbstractParserImpl [hardcodes OperatorTable to > SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334]. > Make this pluggable via a protected method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2992) Enhance implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2992: -- Summary: Enhance implicit conversions when generating hash join keys for an equiCondition (was: Make implicit conversions when generating hash join keys for an equiCondition) > Enhance implicit conversions when generating hash join keys for an > equiCondition > > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:24 AM: [~julianhyde], you're right. I add a test case: [https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . was (Author: hhlai1990): [~julianhyde], you're right. I add a test case: [链接标题|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:24 AM: [~julianhyde], you're right. I add a test case: [链接标题|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . was (Author: hhlai1990): [~julianhyde], you're right. I add a test case: [EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM: [~julianhyde], you're right. I add a test case: [EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . was (Author: hhlai1990): [~julianhyde], you're right. I add a test case: [EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM: [~julianhyde], you're right. I add a test case: [[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . was (Author: hhlai1990): [~julianhyde], you're right. I add a test case: [[EnumerableJoinTest.java#L83|https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou edited comment on CALCITE-2992 at 4/16/19 8:23 AM: [~julianhyde], you're right. I add a test case: [EnumerableJoinTest|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . was (Author: hhlai1990): [~julianhyde], you're right. I add a test case: [[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818765#comment-16818765 ] Lai Zhou commented on CALCITE-2992: --- [~julianhyde], you're right. I add a test case: [[EnumerableJoinTest.java#L83|https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]|[https://github.com/apache/calcite/blob/c3d3c6468a54d0033073fc94d221c401c30be0a3/core/src/test/java/org/apache/calcite/test/enumerable/EnumerableJoinTest.java#L83]] The hashJoinKeysCompareIntAndLong() test method can pass through. In my useCase, I customized the `EQUALS` operator with another SqlOperandTypeChecker ,which perform a dynamic operand checking by a Hive GenericUDFOPEqual. So the CAST translation is never taken. But Calcite doesn't support implicit conversions well for types that belong to different type family now. See the test method hashJoinKeysCompareIntAndString(), it would fail. I made a new commit to roll-back the last commit , and do something to enhance the implicit conversions when generating hash join keys. Considering follow case: If leftKey type is String ,and rightKey type is Int, we can convert the keys to Double. If leftKey type is String ,and rightKey is Decimal, we can convert the keys to Decimal. The implicit conversions in this patch for hash join keys would’t depend on the CAST translation , nor be in conflict with it . > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2992: -- Description: Considering follow sql join: {code:java} select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue {code} as known in java : {code:java} Integer intValue = 2; Long longValue = 2L; new Object[]{intValue}.hashCode().equals ( new Object[]{longValue}.hashCode() ) = false; {code} We shoudn't use the orginal Object as a key in the HashMap, I think it'd be better to convert hash join keys to string and compare string values. was: Considering follow sql join: {code:java} select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue {code} as known in java : {code:java} Integer intValue = 2; Long longValue = 2L; Objects.equals(intValue, longValue) = false; {code} We shoudn't use the orginal Object as a key in the HashMap, I think it'd be better to convert hash join keys to string and compare string values. Summary: Make implicit conversions when generating hash join keys for an equiCondition (was: Enhance implicit conversions for different sql type family) > Make implicit conversions when generating hash join keys for an equiCondition > - > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > new Object[]{intValue}.hashCode().equals > ( > new Object[]{longValue}.hashCode() > ) > = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2992) Enhance implicit conversions for different sql type family
[ https://issues.apache.org/jira/browse/CALCITE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2992: -- Summary: Enhance implicit conversions for different sql type family (was: Make implicit conversions when generating hash join keys for an equiCondition) > Enhance implicit conversions for different sql type family > -- > > Key: CALCITE-2992 > URL: https://issues.apache.org/jira/browse/CALCITE-2992 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Considering follow sql join: > > {code:java} > select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue > {code} > as known in java : > > {code:java} > Integer intValue = 2; > Long longValue = 2L; > Objects.equals(intValue, longValue) = false; > {code} > We shoudn't use the orginal Object as a key in the HashMap, > I think it'd be better to convert hash join keys to string and compare string > values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2992) Make implicit conversions when generating hash join keys for an equiCondition
Lai Zhou created CALCITE-2992: - Summary: Make implicit conversions when generating hash join keys for an equiCondition Key: CALCITE-2992 URL: https://issues.apache.org/jira/browse/CALCITE-2992 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.19.0 Reporter: Lai Zhou Considering follow sql join: {code:java} select t1.*,t2.* from t1 join t2 on t1.intValue=t2.longValue {code} as known in java : {code:java} Integer intValue = 2; Long longValue = 2L; Objects.equals(intValue, longValue) = false; {code} We shoudn't use the orginal Object as a key in the HashMap, I think it'd be better to convert hash join keys to string and compare string values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943 ] Lai Zhou edited comment on CALCITE-2973 at 4/12/19 4:04 AM: [~julianhyde],[~zabetak],[~hyuan] I made a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] was (Author: hhlai1990): [~julianhyde],[~zabetak],[~hyuan] I make a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815943#comment-16815943 ] Lai Zhou commented on CALCITE-2973: --- [~julianhyde],[~zabetak],[~hyuan] I make a PR to improve the EnumerableJoin. Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta joins that have equi conditions to be executed using a hash join algorithm." Now a join rel node will be converted to an EnumerableJoin if it has mixed equi and non-equi conditions. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62] Now EnumerableJoin can handle a per-row condition, I introduce a the remainCondition to generate the predicate for the join. see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250] I also introduce a new method to support join with predicate, it doesn't affect the old join method . see [https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061] > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2973: -- Summary: Allow theta joins that have equi conditions to be executed using a hash join algorithm (was: Allow theta joins that have equi conditions to be executed using a merge join or hash join algorithm) > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a merge join or hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/11/19 10:04 AM: - [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|can't be planned|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has equi and non-equi conditions meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has equi and non-equi conditions meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins that have equi conditions to be executed using a merge join > or hash join algorithm > > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a merge join or hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/11/19 10:04 AM: - [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|*can't be planned*|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has equi and non-equi conditions meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|can't be planned|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has equi and non-equi conditions meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins that have equi conditions to be executed using a merge join > or hash join algorithm > > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2973) Allow theta joins that has equi keys to be executed using a merge join or hash join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2973: -- Summary: Allow theta joins that has equi keys to be executed using a merge join or hash join algorithm (was: Allow theta joins to be executed using a merge join algorithm) > Allow theta joins that has equi keys to be executed using a merge join or > hash join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:51 AM: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:51 AM: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has equi and non-equi conditions meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators after re-desgined: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:46 AM: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition*|EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed)+ EnumerableFilter |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition* ** ** |EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed) |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:45 AM: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition** ** * |EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin(changed) |EnumerableThetaJoin or EnumerableMergeJoin(changed) or EnumerableHashJoin(new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner|| |*only equi condition*|EnumerableJoin|EnumerableJoin| |*only* *non-equi condition*** ** |EnumerableJoin|EnumerableThetaJoin| |*mixed equi and non-equi condition*|EnumerableJoin+EnumerableFilter or EnumerableMergeJoin(changed) |EnumerableThetaJoin or EnumerableMergeJoin or EnumerableHashJoin| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou edited comment on CALCITE-2973 at 4/3/19 3:45 AM: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition* ** ** |EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin (changed) |EnumerableThetaJoin or EnumerableMergeJoin (changed) or EnumerableHashJoin (new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? was (Author: hhlai1990): [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner || |*only equi condition*|EnumerableJoin|EnumerableJoin | |*only* *non-equi condition** ** * |EnumerableJoin|EnumerableThetaJoin | |*mixed equi and non-equi condition*|EnumerableJoin+ EnumerableFilter or EnumerableMergeJoin(changed) |EnumerableThetaJoin or EnumerableMergeJoin(changed) or EnumerableHashJoin(new)| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808324#comment-16808324 ] Lai Zhou commented on CALCITE-2973: --- [~julianhyde],[~zabetak] , good idea. I just create a new rule for my application, to avoid changing the calcite-core. I'll make a PR later to allow theta joins to be executed using a merge join or hash join. I draw a table to describe the relationship of join types and join operators: || ||inner||non-inner|| |*only equi condition*|EnumerableJoin|EnumerableJoin| |*only* *non-equi condition*** ** |EnumerableJoin|EnumerableThetaJoin| |*mixed equi and non-equi condition*|EnumerableJoin+EnumerableFilter or EnumerableMergeJoin(changed) |EnumerableThetaJoin or EnumerableMergeJoin or EnumerableHashJoin| If a join is non-inner and has ** equi and non-equi condition meanwhile, we have 3 choice to plan it. Now EnumerableThetaJoin and EnumerableMergeJoin have a corresponding rule respectively, What do you think if I introduce a new rule( EnumerableThetaHashJoinRule) to allow theta joins to be executed using a hash join? > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2973) Allow theta joins to be executed using a merge join algorithm
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lai Zhou updated CALCITE-2973: -- Summary: Allow theta joins to be executed using a merge join algorithm (was: Make EnumerableMergeJoinRule to support a theta join) > Allow theta joins to be executed using a merge join algorithm > - > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.19.0 >Reporter: Lai Zhou >Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 1*1), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2973) Make EnumerableMergeJoinRule to support a theta join
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807393#comment-16807393 ] Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:28 AM: --- [~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile : {code:java} SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <1{code} Merge join is also good for this query. But now it will be converted to a nested loop join. I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule: {code:java} final JoinInfo info = JoinInfo.of(left, right, join.getCondition()); if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) { // EnumerableJoinRel only supports equi-join. We can put a filter on top // if it is an inner join. try { boolean hasEquiKeys = !info.leftKeys.isEmpty() && !info.rightKeys.isEmpty(); if (hasEquiKeys) { return convertToThetaMergeJoin(rel); } else { return new EnumerableThetaJoin(cluster, traitSet, left, right, join.getCondition(), join.getVariablesSet(), join.getJoinType()); } } catch (Exception e) { EnumerableRules.LOGGER.debug(e.toString()); return null; } } {code} if the join has equi-keys, it will be converted an EnumerableThetaMergeJoin . {code:java} new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code} I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys . The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is that: EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo, and the predicate will be applied on the cartesians result of a merge join. see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298] I do some changes: {code:java} public TResult current() { final List list = cartesians.current(); @SuppressWarnings("unchecked") final TSource left = (TSource) list.get(0); @SuppressWarnings("unchecked") final TInner right = (TInner) list.get(1); //apply predicate for the result in cartesians boolean isNonEquiPredicateSatisfied=predicate.apply(left, right); if (!isNonEquiPredicateSatisfied) { if (generateNullsOnLeft) { return resultSelector.apply(null, right); } if (generateNullsOnRight) { return resultSelector.apply(left, null); } } return resultSelector.apply(left, right); } {code} was (Author: hhlai1990): [~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile : {code:java} SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <1{code} Merge join is also good for this query. But now it will be converted to a nested loop join. I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule: {code:java} final JoinInfo info = JoinInfo.of(left, right, join.getCondition()); if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) { // EnumerableJoinRel only supports equi-join. We can put a filter on top // if it is an inner join. try { boolean hasEquiKeys = !info.leftKeys.isEmpty() && !info.rightKeys.isEmpty(); if (hasEquiKeys) { return convertToThetaMergeJoin(rel); } else { return new EnumerableThetaJoin(cluster, traitSet, left, right, join.getCondition(), join.getVariablesSet(), join.getJoinType()); } } catch (Exception e) { EnumerableRules.LOGGER.debug(e.toString()); return null; } } {code} if the join has equi-keys, it will convert the join rel to a EnumerableThetaMergeJoin . {code:java} new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code} I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys . The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is that: EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo, and the predicate will be applied on the cartesians result of a merge join. see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298] I do some changes: {code:java} public TResult current() { final List list =