date:20190514

[jira] [Commented] (CALCITE-3065) RexLiteral#getValueAs should consider primitive type

2019-05-14 Thread Danny Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840099#comment-16840099
 ] 

Danny Chan commented on CALCITE-3065:
-

[~Aron.tao] While i checked the usage of method *SqlLiteral#getValueAs* and 
*SqlCallBinding#getOperandLiteralValue*, we never passed in a primitive java 
class as argument in current Calcite codebase.

So i'm wondering why you need a code snippet like that ? Maybe you should just 
fetch and match the SqlTypeName [1] for the literal and passed in a Boxed type 
class to *SqlLiteral#getValueAs* explicitly.

 

[1] 
[https://github.com/apache/calcite/blob/e98c779d1ec0bc87c81a72b974c89a41a7222a07/core/src/main/java/org/apache/calcite/sql/SqlLiteral.java#L152]

> RexLiteral#getValueAs should consider primitive type
> 
>
> Key: CALCITE-3065
> URL: https://issues.apache.org/jira/browse/CALCITE-3065
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiatao Tao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-05-13-12-04-36-365.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !image-2019-05-13-12-04-36-365.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-3067) Splunk Calcite adapter cannot parse right session keys from Splunk 7.2

2019-05-14 Thread Shawn Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Chen updated CALCITE-3067:

Description: *SplunkConnectionImpl.java* successfully builds a connection 
to Splunk 7.2, but it cannot parse the correct session key due to the outdated 
regular expression in this class, therefore it gets HTTP 401 response after 
sending further search commands to Splunk.  (was: *SplunkConnectionImpl.java* 
successfully builds a connection to Splunk 7.2, but it cannot parse the correct 
session key due to the outdated regular expression in this class, therefore it 
cannot send further search commands to Splunk.)

> Splunk Calcite adapter cannot parse right session keys from Splunk 7.2
> --
>
> Key: CALCITE-3067
> URL: https://issues.apache.org/jira/browse/CALCITE-3067
> Project: Calcite
>  Issue Type: Bug
>  Components: splunk
>Affects Versions: 1.19.0
> Environment: Splunk 7.2 on Mac OS 10.14
> Calcite 1.19
>Reporter: Shawn Chen
>Priority: Major
> Fix For: next
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> *SplunkConnectionImpl.java* successfully builds a connection to Splunk 7.2, 
> but it cannot parse the correct session key due to the outdated regular 
> expression in this class, therefore it gets HTTP 401 response after sending 
> further search commands to Splunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-3067) Splunk Calcite adapter cannot parse right session keys from Splunk 7.2

2019-05-14 Thread Shawn Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Chen updated CALCITE-3067:

Description: *SplunkConnectionImpl.java* successfully builds a connection 
to Splunk 7.2, but it cannot parse the correct session key due to the outdated 
regular expression in this class, therefore it cannot send further search 
commands to Splunk.  (was: SplunkConnectionImpl.java successfully builds a 
connection to Splunk 7.2, but it cannot parse the correct session key due to 
the outdated regular expression. )

> Splunk Calcite adapter cannot parse right session keys from Splunk 7.2
> --
>
> Key: CALCITE-3067
> URL: https://issues.apache.org/jira/browse/CALCITE-3067
> Project: Calcite
>  Issue Type: Bug
>  Components: splunk
>Affects Versions: 1.19.0
> Environment: Splunk 7.2 on Mac OS 10.14
> Calcite 1.19
>Reporter: Shawn Chen
>Priority: Major
> Fix For: next
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> *SplunkConnectionImpl.java* successfully builds a connection to Splunk 7.2, 
> but it cannot parse the correct session key due to the outdated regular 
> expression in this class, therefore it cannot send further search commands to 
> Splunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-3067) Splunk Calcite adapter cannot parse right session keys from Splunk 7.2

2019-05-14 Thread Shawn Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Chen updated CALCITE-3067:

Summary: Splunk Calcite adapter cannot parse right session keys from Splunk 
7.2  (was: Splunk Calcite adapter cannot parse the right session key from 
Splunk 7.2)

> Splunk Calcite adapter cannot parse right session keys from Splunk 7.2
> --
>
> Key: CALCITE-3067
> URL: https://issues.apache.org/jira/browse/CALCITE-3067
> Project: Calcite
>  Issue Type: Bug
>  Components: splunk
>Affects Versions: 1.19.0
> Environment: Splunk 7.2 on Mac OS 10.14
> Calcite 1.19
>Reporter: Shawn Chen
>Priority: Major
> Fix For: next
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> SplunkConnectionImpl.java successfully builds a connection to Splunk 7.2, but 
> it cannot parse the correct session key due to the outdated regular 
> expression. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (CALCITE-3067) Splunk Calcite adapter cannot parse the right session key from Splunk 7.2

2019-05-14 Thread Shawn Chen (JIRA)

Shawn Chen created CALCITE-3067:
---

 Summary: Splunk Calcite adapter cannot parse the right session key 
from Splunk 7.2
 Key: CALCITE-3067
 URL: https://issues.apache.org/jira/browse/CALCITE-3067
 Project: Calcite
  Issue Type: Bug
  Components: splunk
Affects Versions: 1.19.0
 Environment: Splunk 7.2 on Mac OS 10.14

Calcite 1.19
Reporter: Shawn Chen
 Fix For: next


SplunkConnectionImpl.java successfully builds a connection to Splunk 7.2, but 
it cannot parse the correct session key due to the outdated regular expression. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CALCITE-3017) Improve null handling of JsonValueExpressionOperator

2019-05-14 Thread Hongze Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang resolved CALCITE-3017.
---
Resolution: Fixed

Fixed in 
[e98c779d1ec0bc87c81a72b974c89a41a7222a07|https://github.com/apache/calcite/commit/e98c779d1ec0bc87c81a72b974c89a41a7222a07].

> Improve null handling of JsonValueExpressionOperator
> 
>
> Key: CALCITE-3017
> URL: https://issues.apache.org/jira/browse/CALCITE-3017
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In Calcite's implementation, {{JsonValueExpressionOperator}} currently 
> returns a null value no matter the argument is JSON NULL value or SQL NULL 
> value. But in MySQL, some JSON functions behave differently on different null 
> inputs. For instance for a MySQL JSON function {{JSON_STORAGE_SIZE}}, if we 
> execute:
> {code:sql}
> SELECT JSON_STORAGE_SIZE(null), JSON_STORAGE_SIZE('null')
> {code}
> The result should be:
> ||JSON_STORAGE_SIZE(null)||JSON_STORAGE_SIZE('null')||
> |null|2|
> We should improve the operator a bit to support different behaviors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-05-14 Thread Haisheng Yuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haisheng Yuan resolved CALCITE-2936.

   Resolution: Fixed
Fix Version/s: 1.20.0

Fixed in 
https://github.com/apache/calcite/commit/3f24710db7e1ef91eb8fe934057456bffc2de780.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2624) Add a rule to copy a sort below a join operator

2019-05-14 Thread Haisheng Yuan (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839978#comment-16839978
 ] 

Haisheng Yuan commented on CALCITE-2624:


The PR is mixing physical property enforcement with logical plan exploration. 
The physical property should only be requested by physical operator by 
enforcing expected sort order. The patch is creating more useless plan 
alternatives than necessary. It only makes sense to push a top limit into the 
outer relation of an outer join, other than that, I don't think it is the right 
way to go.

> Add a rule to copy a sort below a join operator
> ---
>
> Key: CALCITE-2624
> URL: https://issues.apache.org/jira/browse/CALCITE-2624
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: Stamatis Zampetakis
>Assignee: Khawla Mouhoubi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the only rule that allows a sort to traverse a binary operator is 
> the SortJoinTransposeRule. The rule was introduced mainly to push limits in 
> the case of left and right outer joins (see CALCITE-831).
> I assume that the main reason that we don't have more rules is that sorts 
> with limits and offsets cannot be pushed safely below many types of join 
> operators. However, in many cases, it is possible and beneficial for 
> optimization purposes to just push the sort without the limit and offset. 
> Since we do not know in advance if the join operator preserves the order we 
> cannot remove (that is why I am saying copy and not transpose) the sort 
> operator on top of the join. The latter is not really a problem since the 
> SortRemoveRule can detect such cases and remove the sort if it is redundant.
> A few concrete examples where this optimization makes sense are outlined 
> below:
>  * allow the sort to be later absorbed by an index scan and disappear from 
> the plan (Sort + Tablescan => IndexScan with RelCollation);
>  * allow operators that require sorted inputs to be exploited more easily 
> (e.g., merge join);
>  * allow the sort to be performed on a possibly smaller result (assuming that 
> the physical binary operator that is going to be used preserves the order of 
> left/right input and the top sort operator can be removed entirely).
> I propose to add a new rule (e.g., SortCopyBelowJoinRule, 
> SortJoinCopyBelowRule) which allows a sort to be copied to the left or right 
> (or to both if it is rather easy to decompose the sort) of a join operator 
> (excluding the limit and offset attributes) if the respective inputs are not 
> already sorted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839965#comment-16839965
 ] 

Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:35 AM:


[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

But some of tests are failed, may introduce a bug after dropping the filter. 
I'll check the problems .


was (Author: hhlai1990):
[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839965#comment-16839965
 ] 

Lai Zhou edited comment on CALCITE-2973 at 5/15/19 3:13 AM:


[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join plus a filter , the Enumerable(Hash)Join can handle it in a generic 
way.


was (Author: hhlai1990):
[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join and a filter , the Enumerable(Hash)Join can handle it.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3065) RexLiteral#getValueAs should consider primitive type

2019-05-14 Thread Jiatao Tao (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839961#comment-16839961
 ] 

Jiatao Tao commented on CALCITE-3065:
-

Hi [~danny0405] [~julianhyde]

code like this 

 
{code:java}
val tp = new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT)
literal.getValueAs(tp.getJavaClass(literal.getType).asInstanceOf[java.lang.Class[_]])
{code}
 

And `tp.getJavaClass(literal.getType)` return int not Interger, and getValueAs 
can not recognize int, so I open this jira.

 

And Primitive.box(clazz) do work, but in my opinion, leave this to callee is 
more elegant, maybe we can do this in method "getValueAs".

 

And do appreciate for your reply, hope to hear your opinion again.

 

> RexLiteral#getValueAs should consider primitive type
> 
>
> Key: CALCITE-3065
> URL: https://issues.apache.org/jira/browse/CALCITE-3065
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiatao Tao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-05-13-12-04-36-365.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !image-2019-05-13-12-04-36-365.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Lai Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839965#comment-16839965
 ] 

Lai Zhou commented on CALCITE-2973:
---

[~rubenql], now the inner join with a remainCondtion won't be converted to an 
inner-join and a filter , the Enumerable(Hash)Join can handle it.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3062) Do not populate provenanceMap if not used

2019-05-14 Thread Chunwei Lei (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839943#comment-16839943
 ] 

Chunwei Lei commented on CALCITE-3062:
--

I am not pretty sure what I worry about makes sense. So let us see what others 
think about it.;)

> Do not populate provenanceMap if not used
> -
>
> Key: CALCITE-3062
> URL: https://issues.apache.org/jira/browse/CALCITE-3062
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{VolcanoPlanner}}'s field {{provenanceMap}} tracks the provenance of each 
> node being added to the planner, but the information is only used when the 
> log level of the planner of the debug, so when planner is not running in 
> debug mode, this data goes to waste and used memory unnecessary.
> The planner should be changed so that the information is only captured if 
> used later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2601) Support REVERSE(str) in SqlFunctions

2019-05-14 Thread Julian Hyde (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839931#comment-16839931
 ] 

Julian Hyde commented on CALCITE-2601:
--

* It's not a standard function. Can you create a new function category "j" (for 
JDBC).
* And move it to {{SqlLibraryOperators}}.
* Also change the JIRA case title to "Add REVERSE function"; whether it is in 
{{SqlFunctions}} is an implementation detail.
* In the description, you don't need to say that it returns null if the input 
is null.

> Support REVERSE(str) in SqlFunctions
> 
>
> Key: CALCITE-2601
> URL: https://issues.apache.org/jira/browse/CALCITE-2601
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: TANG Wen-hui
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Returns the string _{{str}}_ with the order of the characters reversed.
> I think ’Reverse‘ seems to be a generic function.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3050) Integrate SqlDialect and SqlParser.Config

2019-05-14 Thread Chunwei Lei (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839920#comment-16839920
 ] 

Chunwei Lei commented on CALCITE-3050:
--

[~julianhyde], I would like to review it.

> Integrate SqlDialect and SqlParser.Config
> -
>
> Key: CALCITE-3050
> URL: https://issues.apache.org/jira/browse/CALCITE-3050
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Danny Chan
>Priority: Major
>
> {{SqlDialect}} is used by the JDBC adapter to generate SQL in the target 
> dialect of a data source. {{SqlParser.Config}} is used to set what the parser 
> should allow for SQL statements sent to Calcite. But they both are a 
> representation of "dialect". And they come together when we want to use a 
> Babel parser to understand SQL statements that are meant for a data source.
> So it makes sense to integrate them, somehow. We could add a method 
> {code}void SqlParser.ConfigBuilder.setFrom(SqlDialect dialect){code} or do it 
> from the other end, {code}SqlDialect.configureParser(SqlParser.ConfigBuilder 
> configBuilder){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2282) Allow OperatorTable to be pluggable in the parser

2019-05-14 Thread Danny Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839900#comment-16839900
 ] 

Danny Chan commented on CALCITE-2282:
-

I would merge this PR if there are no more comments in 24 hours.

> Allow OperatorTable to be pluggable in the parser
> -
>
> Key: CALCITE-2282
> URL: https://issues.apache.org/jira/browse/CALCITE-2282
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Sudheesh Katkam
>Assignee: Danny Chan
>Priority: Major
>  Labels: pull-request-available
> Attachments: CALCITE-2282.patch.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SqlAbstractParserImpl [hardcodes OperatorTable to 
> SqlStdOperatorTable|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334|https://github.com/apache/calcite/blob/8327e674e7f0a768d124fa37fd75cda4b8a35bb6/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L334].
>  Make this pluggable via a protected method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-2913) Adapter for Apache Kafka

2019-05-14 Thread Xu Mingmin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated CALCITE-2913:

Summary: Adapter for Apache Kafka  (was: Add Apache Kafka Adapter)

> Adapter for Apache Kafka
> 
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-2913) Add Apache Kafka Adapter

2019-05-14 Thread Andrei Sereda (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Sereda updated CALCITE-2913:
---
Summary: Add Apache Kafka Adapter  (was: Add Kafka Adapter)

> Add Apache Kafka Adapter
> 
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-2913) Add Kafka Adapter

2019-05-14 Thread Andrei Sereda (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Sereda updated CALCITE-2913:
---
Summary: Add Kafka Adapter  (was: add a KafkaAdapter for Stream)

> Add Kafka Adapter
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Andrei Sereda (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839780#comment-16839780
 ] 

Andrei Sereda commented on CALCITE-2913:


[~mingmxu] for calcite kafka demo, docker image is fine.

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Xu Mingmin (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839774#comment-16839774
 ] 

Xu Mingmin commented on CALCITE-2913:
-

[~sereda] fair enough to me. I think \{{MockConsumer}} is good for 
unit/integration test.

What I mentioned is, how can we make it easier to try STREAM SQL(that's why I 
start this task)? --This seems too far from this task, let's address it later.

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Andrei Sereda (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839760#comment-16839760
 ] 

Andrei Sereda edited comment on CALCITE-2913 at 5/14/19 8:36 PM:
-

[~mingmxu] and [~julianhyde] regarding docker images. I would like to avoid 
having external dependencies for  
 unit / integration tests. 

Prefer the approach that has been taken by Elastic / Mongo / Geode adapters 
were embedded (or fake) java version is used during tests. As we have seen 
during releases, people usually forget running integration tests before PR and 
issues are discovered during release.

My goal is to run most of tests in PR phase. Perhaps, only performance tests 
upon release. 

Example:  
[EmbeddedKafkaBroker|https://github.com/spring-projects/spring-kafka/blob/master/spring-kafka-test/src/main/java/org/springframework/kafka/test/EmbeddedKafkaBroker.java]
 



was (Author: asereda):
[~mingmxu] and [~julianhyde] regarding docker images. I would like to avoid 
having external dependencies for  
 unit / integration tests. 

Prefer the approach that has been taken by Elastic / Mongo / Geode adapters 
were embedded (or fake) java version is used during tests. As we have seen 
during releases, people usually forget running integration tests before PR and 
issues are discovered during release.

My goal is to run most of tests in PR phase. Perhaps, only performance tests 
upon release. 


> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Andrei Sereda (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839760#comment-16839760
 ] 

Andrei Sereda commented on CALCITE-2913:


[~mingmxu] and [~julianhyde] regarding docker images. I would like to avoid 
having external dependencies for  
 unit / integration tests. 

Prefer the approach that has been taken by Elastic / Mongo / Geode adapters 
were embedded (or fake) java version is used during tests. As we have seen 
during releases, people usually forget running integration tests before PR and 
issues are discovered during release.

My goal is to run most of tests in PR phase. Perhaps, only performance tests 
upon release. 


> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Xu Mingmin (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839755#comment-16839755
 ] 

Xu Mingmin commented on CALCITE-2913:
-

Thanks [~julianhyde] for your comments, let me update 2 and 3 first.

For 4, I'm sure there're some docker images, let me find one with sample 
data(will create and share one if not find), not only for smoke test, but also 
as a tutorial.

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Xu Mingmin (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839757#comment-16839757
 ] 

Xu Mingmin commented on CALCITE-2913:
-

[~sereda] agree not to put the dependency in code. we may add it in document 
only so users can follow the steps to try it. Currently 
[http://calcite.apache.org/docs/stream.html] have lots of examples, but not 
that easy to reproduce.

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Andrei Sereda (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839749#comment-16839749
 ] 

Andrei Sereda commented on CALCITE-2913:


{quote}
Maybe I missed it, but is there a docker image or other method by which a 
release manager can do a quick smoke test, i.e. run a test that really talks to 
Kafka.
{quote}
There is a test with 
[MockConsumer|https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/MockConsumer.html]
 which is close.  

Additionally one can run embedded Kafka in tests. Ideally I would like to avoid 
docker dependency. 

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2913) add a KafkaAdapter for Stream

2019-05-14 Thread Julian Hyde (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839731#comment-16839731
 ] 

Julian Hyde commented on CALCITE-2913:
--

I made an initial quick pass (didn't look at the code, just the overall 
structure):
* Yes, I'd like this module to be part of Calcite.
* Make sure to respect Kafka's branding. Each page in the documentation where 
Kafka is mentioned, the first mention should call it "Apache Kafka". I'd change 
the name of this case/commit to "Adapter for Apache Kafka".
* In a couple of places (pom.xml and adapter.md) there is a list of 
modules/adapters; you've put kafka at the end, but it should be in alphabetical 
order.
* Maybe I missed it, but is there a docker image or other method by which a 
release manager can do a quick smoke test, i.e. run a test that really talks to 
Kafka.

> add a KafkaAdapter for Stream
> -
>
> Key: CALCITE-2913
> URL: https://issues.apache.org/jira/browse/CALCITE-2913
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: Xu Mingmin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Add a new adapter which exposes a Kafka topic into a Stream table, we've one 
> version in our env and would like to share it if not duplicated.
> Hint, we are actually extending it as a batch table, limiting to Stream table 
> sounds more straight-forward to me as 1st step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3052) Error while applying rule MaterializedViewAggregateRule(Project-Aggregate): ArrayIndexOutOfBoundsException

2019-05-14 Thread Jesus Camacho Rodriguez (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839726#comment-16839726
 ] 

Jesus Camacho Rodriguez commented on CALCITE-3052:
--

[~anha], I have pushed an addendum to the PR. Once again I ran all regression 
tests in Hive and Calcite, and they passed. Could you check again? Thanks

> Error while applying rule MaterializedViewAggregateRule(Project-Aggregate): 
> ArrayIndexOutOfBoundsException
> --
>
> Key: CALCITE-3052
> URL: https://issues.apache.org/jira/browse/CALCITE-3052
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.19.0
>Reporter: Anton Haidai
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Materialized views enabled:*
> # {{select avg(grade), count\(*), max(grade), sum(grade), min(grade), team 
> from students group by team}}
> # {{select avg(grade), count\(*), max(grade), sum(grade), min(grade), team, 
> faculty from students group by faculty, team}},
> *Query:*
> # {{select count\(*), team from students group by team}}
> *Error* (stacktrace is obtained using the current *master* branch: 
> "247c7d4f76"):
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:60)
>   at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:841)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewAggregateRule.rewriteView(AbstractMaterializedViewRule.java:1507)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule.perform(AbstractMaterializedViewRule.java:522)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewProjectAggregateRule.onMatch(AbstractMaterializedViewRule.java:1776)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
>   ... 71 common frames omitted
> {noformat}
> Reproducible only if both Materialization views listed are enabled: any 
> single one of these two could be successfully used with the query without any 
> errors. Looks like is is reproducible when AbstractMaterializedViewRule is 
> trying to rewrite one materialized view using the another materialized view.
> Currently, I'm trying to reproduce the issue in "MaterializationTest": 
> without a success so far, I'll update the ticket if I'll find a working way 
> to reproduce the issue in the test will be discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3050) Integrate SqlDialect and SqlParser.Config

2019-05-14 Thread Julian Hyde (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839647#comment-16839647
 ] 

Julian Hyde commented on CALCITE-3050:
--

I have added the method 
{{SqlDialect.configureParser(SqlParser.ConfigBuilder)}}, among other things, in 
https://github.com/julianhyde/calcite/tree/3050-dialect-parser-config. Can 
someone please review?

> Integrate SqlDialect and SqlParser.Config
> -
>
> Key: CALCITE-3050
> URL: https://issues.apache.org/jira/browse/CALCITE-3050
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Danny Chan
>Priority: Major
>
> {{SqlDialect}} is used by the JDBC adapter to generate SQL in the target 
> dialect of a data source. {{SqlParser.Config}} is used to set what the parser 
> should allow for SQL statements sent to Calcite. But they both are a 
> representation of "dialect". And they come together when we want to use a 
> Babel parser to understand SQL statements that are meant for a data source.
> So it makes sense to integrate them, somehow. We could add a method 
> {code}void SqlParser.ConfigBuilder.setFrom(SqlDialect dialect){code} or do it 
> from the other end, {code}SqlDialect.configureParser(SqlParser.ConfigBuilder 
> configBuilder){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2973) Allow theta joins that have equi conditions to be executed using a hash join algorithm

2019-05-14 Thread Stamatis Zampetakis (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839561#comment-16839561
 ] 

Stamatis Zampetakis commented on CALCITE-2973:
--

It seems that the majority ([~hhlai1990], [~hyuan], [~julianhyde], [~rubenql]) 
believes that changing the operator is better (or at least less complex) than 
adding a new rule. If that's the case I am willing to follow. 

[~rubenql] from your comments it seems that you have done a rather exhaustive 
review. Don't hesitate to merge the PR if you think it is done. You can mark it 
as LGTM-will-merge-soon and if nobody complains over the next few days you can 
proceed with the merge.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Lai Zhou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 1*1), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2979) Add a block-based nested loop join algorithm

2019-05-14 Thread Ruben Quesada Lopez (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839555#comment-16839555
 ] 

Ruben Quesada Lopez commented on CALCITE-2979:
--

Thanks for your answer [~zabetak]! I agree, that kind of solution should work 
fine with a correlated INNER join (though it would not be applicable for SEMI / 
ANTI, and I'm not quite sure about the LEFT). Anyway, let us wait until 
[~khawlamhb] shares with the rest of us her study on this subject :)

> Add a block-based nested loop join algorithm
> 
>
> Key: CALCITE-2979
> URL: https://issues.apache.org/jira/browse/CALCITE-2979
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Stamatis Zampetakis
>Assignee: Khawla Mouhoubi
>Priority: Major
>  Labels: performance
>
> Currently, Calcite provides a tuple-based nested loop join algorithm 
> implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. 
> This means that for each tuple of the outer relation we probe (set variables) 
> in the inner relation.
> The goal of this issue is to add new algorithm (or extend the correlateJoin 
> method) which first gathers blocks (batches) of tuples from the outer 
> relation and then probes the inner relation once per block.
> There are cases (eg., indexes) where the inner relation can be accessed by 
> more than one value which can greatly improve the performance in particular 
> when the outer relation is big.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2979) Add a block-based nested loop join algorithm

2019-05-14 Thread Stamatis Zampetakis (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839542#comment-16839542
 ] 

Stamatis Zampetakis commented on CALCITE-2979:
--

Thanks for the analysis [~rubenql]! 

I haven't figured out all the details of what is the best way to do it and I 
guess there is not only one choice. It would be nice if [~khawlamhb], who is 
working on it right now, outlines some possible alternatives with 
advantages/disadvantages. Just a quick thought (that I guess could work) would 
be to generate a plan like the following:

{noformat}
Filter(A.id > B.id)
  Correlate(blockSize=3)
Scan(A)
Filter(OR(>(cor0_0,B.id), >(cor0_1,B.id), >(cor0_2,B.id))
  Scan(B)
{noformat}

so the implementation of correlate basically does a cartesian product and the 
filter on top eliminates the tuples that shouldn't be there.

> Add a block-based nested loop join algorithm
> 
>
> Key: CALCITE-2979
> URL: https://issues.apache.org/jira/browse/CALCITE-2979
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Stamatis Zampetakis
>Assignee: Khawla Mouhoubi
>Priority: Major
>  Labels: performance
>
> Currently, Calcite provides a tuple-based nested loop join algorithm 
> implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. 
> This means that for each tuple of the outer relation we probe (set variables) 
> in the inner relation.
> The goal of this issue is to add new algorithm (or extend the correlateJoin 
> method) which first gathers blocks (batches) of tuples from the outer 
> relation and then probes the inner relation once per block.
> There are cases (eg., indexes) where the inner relation can be accessed by 
> more than one value which can greatly improve the performance in particular 
> when the outer relation is big.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2965) Implement string functions: REPEAT, SPACE, SOUNDEX, DIFFERENCE

2019-05-14 Thread Chunwei Lei (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839420#comment-16839420
 ] 

Chunwei Lei commented on CALCITE-2965:
--

[~julianhyde]，PR is updated. Could you please review it? Thanks.

> Implement string functions: REPEAT, SPACE, SOUNDEX, DIFFERENCE
> --
>
> Key: CALCITE-2965
> URL: https://issues.apache.org/jira/browse/CALCITE-2965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Chunwei Lei
>Assignee: Chunwei Lei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Some string functions including REPEAT, SPACE, SOUNDEX, DIFFERENCE are not 
> implemented now. It would be great if these functions can be implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2979) Add a block-based nested loop join algorithm

2019-05-14 Thread Ruben Quesada Lopez (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839358#comment-16839358
 ] 

Ruben Quesada Lopez edited comment on CALCITE-2979 at 5/14/19 12:25 PM:


[~zabetak], I have a general question regarding the design of this new 
block-based nested loop join algorithm. Let us say that the plan that you just 
wrote in your last comment implements an inner join:
{code}
Join(A.id > B.id; type=INNER)
  Scan(A)
  Scan(B)
==>
NestedLoop(blockSize=3)
  Scan(A)
  ElasticScan(table=B, query="OR(>(cor0[0],B.id), >(cor0[1],B.id), 
>(cor0[2],B.id)")
{code}

The result of the NestedLoop operator must contain a tupleA concatenated with a 
tupleB that match. If we have a block of correlated variables, and we translate 
the join condition as a filter with an OR condition, we would get the 
appropriate tupleB, but how does the NestedLoop know with which tuple(s) from A 
it must be concatenated? We know for sure that the tupleB match with cor0[0] or 
cor0[1] or cor0[2], but which one(s) exactly? Maybe I'm missing something, but 
I think this information will not be available from NestedLoop perspective.

The only scenario that I can think of where this strategy could work is a 
correlated SEMI (or ANTI) join, because only the left part is returned after 
the first match (or no match at all) with the right part, and we do not care 
which exact tuple from B matched. But in order to implement such strategy, we 
would need to inverse the correlated variable logic, and set it from right to 
left (instead of left to right):
{code}
-- fetch the departments with at least one employee
Join (dept.id=emp.deptId, type=SEMI)
  Scan (dept)
  Scan (emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  Filter( OR( =(dept.id, cor0[0].deptId), =(dept.id, cor0[1].deptId), 
=(dept.id, cor0[2].deptId) )
Scan(dept)
  Scan(emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  ElasticScan(table=dept, query="OR( =(dept.id, cor0[0].deptId), =(dept.id, 
cor0[1].deptId), =(dept.id, cor0[2].deptId)" )
  Scan(emp)
{code}


was (Author: rubenql):
[~zabetak], I have a general question regarding the design of this new 
block-based nested loop join algorithm. Let use say that the plan that you just 
wrote in your last comment implements an inner join:
{code}
Join(A.id > B.id; type=INNER)
  Scan(A)
  Scan(B)
==>
NestedLoop(blockSize=3)
  Scan(A)
  ElasticScan(table=B, query="OR(>(cor0[0],B.id), >(cor0[1],B.id), 
>(cor0[2],B.id)")
{code}

The result of the NestedLoop operator must contain a tupleA concatenated with a 
tupleB that match. If we have a block of correlated variables, and we translate 
the join condition as a filter with an OR conditions, we would get the 
appropriate tupleB, but how does the NestedLoop know with which tuple(s) from A 
it must be concatenated. We know for sure that the tupleB match with cor0[0] or 
cor0[1] or cor0[2], but which one(s) exactly? Maybe I'm missing something, but 
this information will not be available from NestedLoop perspective.

The only scenario that I can think of where this strategy could work is a 
correlated SEMI (or ANTI) join, because only the left part is returned after 
the first match (or no match at all) with the right part, and we do not care 
which exact tuple from B matched. But in order to implement such strategy, we 
would to inverse the correlated variable logic, and set it from right to left 
(instead of left to right):
{code}
-- fetch the depts with at least one emp
Join (dept.id=emp.dptoId, type=SEMI)
  Scan (dept)
  Scan (emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  Filter( OR( =(cor0[0].dptId, dept.id), =(cor0[1].dptId, dept.id), 
=(cor0[2].dptId, dept.id) )
Scan(dept)
  Scan(emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  ElasticScan(table=dept, query="OR( =(cor0[0].dptId, dept.id), 
=(cor0[1].dptId, dept.id), =(cor0[2].dptId, dept.id)" )
  Scan(emp)
{code}

> Add a block-based nested loop join algorithm
> 
>
> Key: CALCITE-2979
> URL: https://issues.apache.org/jira/browse/CALCITE-2979
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Stamatis Zampetakis
>Assignee: Khawla Mouhoubi
>Priority: Major
>  Labels: performance
>
> Currently, Calcite provides a tuple-based nested loop join algorithm 
> implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. 
> This means that for each tuple of the outer relation we probe (set variables) 
> in the inner relation.
> The goal of this issue is to add new algorithm (or extend the correlateJoin 
> method) which first gathers blocks (batches) of tuples from the outer 
> relation and then probes the inner relation once per block.
> There are cases (eg., indexes) where the inner

[jira] [Commented] (CALCITE-2979) Add a block-based nested loop join algorithm

2019-05-14 Thread Ruben Quesada Lopez (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839358#comment-16839358
 ] 

Ruben Quesada Lopez commented on CALCITE-2979:
--

[~zabetak], I have a general question regarding the design of this new 
block-based nested loop join algorithm. Let use say that the plan that you just 
wrote in your last comment implements an inner join:
{code}
Join(A.id > B.id; type=INNER)
  Scan(A)
  Scan(B)
==>
NestedLoop(blockSize=3)
  Scan(A)
  ElasticScan(table=B, query="OR(>(cor0[0],B.id), >(cor0[1],B.id), 
>(cor0[2],B.id)")
{code}

The result of the NestedLoop operator must contain a tupleA concatenated with a 
tupleB that match. If we have a block of correlated variables, and we translate 
the join condition as a filter with an OR conditions, we would get the 
appropriate tupleB, but how does the NestedLoop know with which tuple(s) from A 
it must be concatenated. We know for sure that the tupleB match with cor0[0] or 
cor0[1] or cor0[2], but which one(s) exactly? Maybe I'm missing something, but 
this information will not be available from NestedLoop perspective.

The only scenario that I can think of where this strategy could work is a 
correlated SEMI (or ANTI) join, because only the left part is returned after 
the first match (or no match at all) with the right part, and we do not care 
which exact tuple from B matched. But in order to implement such strategy, we 
would to inverse the correlated variable logic, and set it from right to left 
(instead of left to right):
{code}
-- fetch the depts with at least one emp
Join (dept.id=emp.dptoId, type=SEMI)
  Scan (dept)
  Scan (emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  Filter( OR( =(cor0[0].dptId, dept.id), =(cor0[1].dptId, dept.id), 
=(cor0[2].dptId, dept.id) )
Scan(dept)
  Scan(emp)
==>
NestedLoop(blockSize=3, $cor0=emp)
  ElasticScan(table=dept, query="OR( =(cor0[0].dptId, dept.id), 
=(cor0[1].dptId, dept.id), =(cor0[2].dptId, dept.id)" )
  Scan(emp)
{code}

> Add a block-based nested loop join algorithm
> 
>
> Key: CALCITE-2979
> URL: https://issues.apache.org/jira/browse/CALCITE-2979
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.19.0
>Reporter: Stamatis Zampetakis
>Assignee: Khawla Mouhoubi
>Priority: Major
>  Labels: performance
>
> Currently, Calcite provides a tuple-based nested loop join algorithm 
> implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. 
> This means that for each tuple of the outer relation we probe (set variables) 
> in the inner relation.
> The goal of this issue is to add new algorithm (or extend the correlateJoin 
> method) which first gathers blocks (batches) of tuples from the outer 
> relation and then probes the inner relation once per block.
> There are cases (eg., indexes) where the inner relation can be accessed by 
> more than one value which can greatly improve the performance in particular 
> when the outer relation is big.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CALCITE-3034) CSV test case description does not match it's code logic

2019-05-14 Thread Danny Chan (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chan reassigned CALCITE-3034:
---

 Assignee: Danny Chan
Affects Version/s: 1.19.0
Fix Version/s: 1.20.0

> CSV test case description does not match it's code logic
> 
>
> Key: CALCITE-3034
> URL: https://issues.apache.org/jira/browse/CALCITE-3034
> Project: Calcite
>  Issue Type: Bug
>  Components: csv-adapter
>Affects Versions: 1.19.0
>Reporter: FaxianZhao
>Assignee: Danny Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The javadoc of the o.a.c.t.CsvTest#testFilterableWhere2 is 'Filter that can 
> be partly handled by CsvFilterableTable.' 
>  Actually it never handle it, o.a.c.a.c.CsvFilterableTable#scan will get the 
> whole WHERE condition RexNode and that's SqlKind is AND.
>  So, the code logic will return false directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CALCITE-3034) CSV test case description does not match it's code logic

2019-05-14 Thread Danny Chan (JIRA)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chan resolved CALCITE-3034.
-
Resolution: Fixed

> CSV test case description does not match it's code logic
> 
>
> Key: CALCITE-3034
> URL: https://issues.apache.org/jira/browse/CALCITE-3034
> Project: Calcite
>  Issue Type: Bug
>  Components: csv-adapter
>Affects Versions: 1.19.0
>Reporter: FaxianZhao
>Assignee: Danny Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The javadoc of the o.a.c.t.CsvTest#testFilterableWhere2 is 'Filter that can 
> be partly handled by CsvFilterableTable.' 
>  Actually it never handle it, o.a.c.a.c.CsvFilterableTable#scan will get the 
> whole WHERE condition RexNode and that's SqlKind is AND.
>  So, the code logic will return false directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3034) CSV test case description does not match it's code logic

2019-05-14 Thread Danny Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839200#comment-16839200
 ] 

Danny Chan commented on CALCITE-3034:
-

Fixed in 
[b039a3|https://github.com/apache/calcite/commit/b039a36a3d683f0948531b16a9d97ee50615a5eb],
 thx for your PR [~Faxian] !

> CSV test case description does not match it's code logic
> 
>
> Key: CALCITE-3034
> URL: https://issues.apache.org/jira/browse/CALCITE-3034
> Project: Calcite
>  Issue Type: Bug
>  Components: csv-adapter
>Reporter: FaxianZhao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The javadoc of the o.a.c.t.CsvTest#testFilterableWhere2 is 'Filter that can 
> be partly handled by CsvFilterableTable.' 
>  Actually it never handle it, o.a.c.a.c.CsvFilterableTable#scan will get the 
> whole WHERE condition RexNode and that's SqlKind is AND.
>  So, the code logic will return false directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-3052) Error while applying rule MaterializedViewAggregateRule(Project-Aggregate): ArrayIndexOutOfBoundsException

2019-05-14 Thread Anton Haidai (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839178#comment-16839178
 ] 

Anton Haidai commented on CALCITE-3052:
---

[~jcamachorodriguez], thank you for your help!
While the error is still reproducible on my environment, I believe, that you 
identified the issue correctly because the rule can move through the fixed code 
now and it fails much further, here is the stacktrace:
{code}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:60)
at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:841)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule$3.visitInputRef(AbstractMaterializedViewRule.java:2448)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule$3.visitInputRef(AbstractMaterializedViewRule.java:2412)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
at org.apache.calcite.rex.RexShuttle.apply(RexShuttle.java:277)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule.shuttleReferences(AbstractMaterializedViewRule.java:2473)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule.access$900(AbstractMaterializedViewRule.java:105)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewAggregateRule.rewriteView(AbstractMaterializedViewRule.java:1552)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule.perform(AbstractMaterializedViewRule.java:522)
at 
org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewProjectAggregateRule.onMatch(AbstractMaterializedViewRule.java:1777)
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
{code}

Here is the dump of variables involved:
{code}
queryAggregate.getRowType(): RecordType(VARCHAR team, DOUBLE EXPR$0, BIGINT 
EXPR$1, BIGINT EXPR$2, BIGINT EXPR$3, BIGINT EXPR$4)

queryAggregate.getGroupCount(): 1

queryAggregate.getAggCallList(): 5

result.getRowType(): RecordType(VARCHAR team, BIGINT EXPR$1, BIGINT EXPR$2)

topExprs: "$0" "$2" "$3"

expr: "$3"

rewrittenExpr: $4

viewExprs: {$0=[6], $1=[5], $2=[0], $3=[1], $4=[2], $5=[3], $6=[4]}

rewritingMapping: [size=3, sourceCount=7, targetCount=6, elements=[1:2, 2:3, 
6:0]]
{code}


> Error while applying rule MaterializedViewAggregateRule(Project-Aggregate): 
> ArrayIndexOutOfBoundsException
> --
>
> Key: CALCITE-3052
> URL: https://issues.apache.org/jira/browse/CALCITE-3052
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.19.0
>Reporter: Anton Haidai
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Materialized views enabled:*
> # {{select avg(grade), count\(*), max(grade), sum(grade), min(grade), team 
> from students group by team}}
> # {{select avg(grade), count\(*), max(grade), sum(grade), min(grade), team, 
> faculty from students group by faculty, team}},
> *Query:*
> # {{select count\(*), team from students group by team}}
> *Error* (stacktrace is obtained using the current *master* branch: 
> "247c7d4f76"):
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:60)
>   at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:841)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewAggregateRule.rewriteView(AbstractMaterializedViewRule.java:1507)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule.perform(AbstractMaterializedViewRule.java:522)
>   at 
> org.apache.calcite.rel.rules.AbstractMaterializedViewRule$MaterializedViewProjectAggregateRule.onMatch(AbstractMaterializedViewRule.java:1776)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
>   ... 71 common frames omitted
> {noformat}
> Reproducible only if both Materialization views listed are enabled: any 
> single one of these two could be successfully used with the query without any 
> errors. Looks like is is reproducible when AbstractMaterializedViewRule is 
> trying to rewrite one materialized view using the another materialized view.
> Currently, I'm trying to reproduce the issue in "MaterializationTest": 
> without a success so far, I'll update the ticket if I'll find a working way 
> to reproduce the issue in the test will be discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-3021) ArrayEqualityComparer should use Arrays#deepEquals/deepHashCode instead of Arrays#equals/hashCode

2019-05-14 Thread Ruben Quesada Lopez (JIRA)



[ 
https://issues.apache.org/jira/browse/CALCITE-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829168#comment-16829168
 ] 

Ruben Quesada Lopez edited comment on CALCITE-3021 at 5/14/19 7:37 AM:
---

[~julianhyde], did you get a chance to take a look at the new test in the PR?


was (Author: rubenql):
[~julianhyde], did you get a chance to take a look at the new test in the PR?


> ArrayEqualityComparer should use Arrays#deepEquals/deepHashCode instead of 
> Arrays#equals/hashCode
> -
>
> Key: CALCITE-3021
> URL: https://issues.apache.org/jira/browse/CALCITE-3021
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.19.0
>Reporter: Ruben Quesada Lopez
>Assignee: Ruben Quesada Lopez
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, ArrayEqualityComparer (which is used as comparer for 
> JavaRowFormat.ARRAY) performs the array comparison based on Arrays#equals and 
> Arrays#hashCode (see Functions.java):
> {code:java}
>   private static class ArrayEqualityComparer implements 
> EqualityComparer {
> public boolean equal(Object[] v1, Object[] v2) {
>   return Arrays.equals(v1, v2);
> }
> public int hashCode(Object[] t) {
>   return Arrays.hashCode(t);
> }
>   }
> {code}
> This will lead to incorrect comparisons in case of multidimensional arrays, 
> e.g. a row (array) with a struct field (another array) inside. To fix the 
> issue, Arrays#deepEquals / Arrays#deepHashCode should be used:
> {code:java}
>   private static class ArrayEqualityComparer implements 
> EqualityComparer {
> public boolean equal(Object[] v1, Object[] v2) {
>   return Arrays.deepEquals(v1, v2);
> }
> public int hashCode(Object[] t) {
>   return Arrays.deepHashCode(t);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

40 matches

Mail list logo