[jira] [Resolved] (CALCITE-3881) SqlFunctions#addMonths yields incorrect results in some corner case

2021-04-07 Thread Francis Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang resolved CALCITE-3881.
-
Resolution: Fixed

> SqlFunctions#addMonths yields incorrect results in some corner case
> ---
>
> Key: CALCITE-3881
> URL: https://issues.apache.org/jira/browse/CALCITE-3881
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica, core
>Affects Versions: avatica-1.16.0
>Reporter: Zhenghua Gao
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: avatica-1.18.0, 1.23.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SqlFunctions#addMonths use DateTimeUtils#ymdToUnixDate to calculate the 
> JDN(julian day number). But in some corner cases it yields incorrent results. 
> The root cause is: the algorithm of DateTimeUtils#ymdToUnixDate requires 
> reasonable month(1 to 12)[1], but SqlFunctions#addMonths may pass in a month 
> out of the reasonable range.
> BTW: I didn't find the reference of the original paper of the algorithm, but 
> an jdn explanation. Please correct me if anyone can find the original paper.
>  
> The following case can reproduce the bug:
> addMonth('2019-09-01', 6) should yield '2020-03-01'
> {code:java}
> @Test public void testAddMonths() { 
>   checkAddMonths(2019, 9, 1, 2020, 3, 1, 6); 
> } {code}
>  
> [1] 
> [http://www.cs.utsa.edu/~cs1063/projects/Spring2011/Project1/jdn-explanation.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3881) SqlFunctions#addMonths yields incorrect results in some corner case

2021-04-07 Thread Francis Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang updated CALCITE-3881:

Fix Version/s: avatica-1.18.0

> SqlFunctions#addMonths yields incorrect results in some corner case
> ---
>
> Key: CALCITE-3881
> URL: https://issues.apache.org/jira/browse/CALCITE-3881
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica, core
>Affects Versions: avatica-1.16.0
>Reporter: Zhenghua Gao
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.23.0, avatica-1.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SqlFunctions#addMonths use DateTimeUtils#ymdToUnixDate to calculate the 
> JDN(julian day number). But in some corner cases it yields incorrent results. 
> The root cause is: the algorithm of DateTimeUtils#ymdToUnixDate requires 
> reasonable month(1 to 12)[1], but SqlFunctions#addMonths may pass in a month 
> out of the reasonable range.
> BTW: I didn't find the reference of the original paper of the algorithm, but 
> an jdn explanation. Please correct me if anyone can find the original paper.
>  
> The following case can reproduce the bug:
> addMonth('2019-09-01', 6) should yield '2020-03-01'
> {code:java}
> @Test public void testAddMonths() { 
>   checkAddMonths(2019, 9, 1, 2020, 3, 1, 6); 
> } {code}
>  
> [1] 
> [http://www.cs.utsa.edu/~cs1063/projects/Spring2011/Project1/jdn-explanation.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (CALCITE-3881) SqlFunctions#addMonths yields incorrect results in some corner case

2021-04-07 Thread Francis Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang reopened CALCITE-3881:
-

> SqlFunctions#addMonths yields incorrect results in some corner case
> ---
>
> Key: CALCITE-3881
> URL: https://issues.apache.org/jira/browse/CALCITE-3881
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica, core
>Affects Versions: avatica-1.16.0
>Reporter: Zhenghua Gao
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.23.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SqlFunctions#addMonths use DateTimeUtils#ymdToUnixDate to calculate the 
> JDN(julian day number). But in some corner cases it yields incorrent results. 
> The root cause is: the algorithm of DateTimeUtils#ymdToUnixDate requires 
> reasonable month(1 to 12)[1], but SqlFunctions#addMonths may pass in a month 
> out of the reasonable range.
> BTW: I didn't find the reference of the original paper of the algorithm, but 
> an jdn explanation. Please correct me if anyone can find the original paper.
>  
> The following case can reproduce the bug:
> addMonth('2019-09-01', 6) should yield '2020-03-01'
> {code:java}
> @Test public void testAddMonths() { 
>   checkAddMonths(2019, 9, 1, 2020, 3, 1, 6); 
> } {code}
>  
> [1] 
> [http://www.cs.utsa.edu/~cs1063/projects/Spring2011/Project1/jdn-explanation.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer

2021-04-07 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316765#comment-17316765
 ] 

Julian Hyde commented on CALCITE-4568:
--

The code is in 
[alibaba/cost-based-incremental-optimizer|https://github.com/alibaba/cost-based-incremental-optimizer].

I would ask Calcite committers to review the high-level design rather than the 
lines of code. (For example, what capabilities are missing, and would need to 
be added before this is a generally usable mode of operation for Calcite?)

> Tempura: extending Calcite into an incremental query optimizer
> --
>
> Key: CALCITE-4568
> URL: https://issues.apache.org/jira/browse/CALCITE-4568
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Botong Huang
>Priority: Major
>
> As discussed in the email thread, this is an attempt to extend the Calcite 
> optimizer into a general incremental query optimizer, based on our research 
> paper published in VLDB 2021:
> Tempura: a general cost-based optimizer framework for incremental data 
> processing
> To our best knowledge, this is the first general cost-based incremental 
> optimizer that can find the best plan across multiple families of incremental 
> computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in 
> the paper) shows that the generated best plan is consistently much better 
> than the plans from each individual method alone.
> In general, incremental query planning is central to database view 
> maintenance and stream processing systems, and are being adopted in active 
> databases, resumable query execution, approximate query processing, etc. We 
> are hoping that this feature can help widening the spectrum of Calcite, 
> solicit more use cases and adoption of Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer

2021-04-07 Thread Botong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated CALCITE-4568:
--
Summary: Tempura: extending Calcite into an incremental query optimizer  
(was: Tempura: extending Calcite into a incremental query optimizer)

> Tempura: extending Calcite into an incremental query optimizer
> --
>
> Key: CALCITE-4568
> URL: https://issues.apache.org/jira/browse/CALCITE-4568
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Botong Huang
>Priority: Major
>
> As discussed in the email thread, this is an attempt to extend the Calcite 
> optimizer into a general incremental query optimizer, based on our research 
> paper published in VLDB 2021:
> Tempura: a general cost-based optimizer framework for incremental data 
> processing
> To our best knowledge, this is the first general cost-based incremental 
> optimizer that can find the best plan across multiple families of incremental 
> computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in 
> the paper) shows that the generated best plan is consistently much better 
> than the plans from each individual method alone.
> In general, incremental query planning is central to database view 
> maintenance and stream processing systems, and are being adopted in active 
> databases, resumable query execution, approximate query processing, etc. We 
> are hoping that this feature can help widening the spectrum of Calcite, 
> solicit more use cases and adoption of Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4568) Tempura: extending Calcite into a incremental query optimizer

2021-04-07 Thread Botong Huang (Jira)
Botong Huang created CALCITE-4568:
-

 Summary: Tempura: extending Calcite into a incremental query 
optimizer
 Key: CALCITE-4568
 URL: https://issues.apache.org/jira/browse/CALCITE-4568
 Project: Calcite
  Issue Type: New Feature
Reporter: Botong Huang


As discussed in the email thread, this is an attempt to extend the Calcite 
optimizer into a general incremental query optimizer, based on our research 
paper published in VLDB 2021:
Tempura: a general cost-based optimizer framework for incremental data 
processing

To our best knowledge, this is the first general cost-based incremental 
optimizer that can find the best plan across multiple families of incremental 
computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in 
the paper) shows that the generated best plan is consistently much better than 
the plans from each individual method alone.

In general, incremental query planning is central to database view maintenance 
and stream processing systems, and are being adopted in active databases, 
resumable query execution, approximate query processing, etc. We are hoping 
that this feature can help widening the spectrum of Calcite, solicit more use 
cases and adoption of Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CALCITE-4503) Order of fields in records should follow that of the SQL types

2021-04-07 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved CALCITE-4503.
--
Resolution: Fixed

Fixed in 
[9e37120b1c6512354357f83dce0abb85176fc2c3|https://github.com/apache/calcite-avatica/commit/9e37120b1c6512354357f83dce0abb85176fc2c3].
 Thanks for the PR [~asolimando]!

> Order of fields in records should follow that of the SQL types
> --
>
> Key: CALCITE-4503
> URL: https://issues.apache.org/jira/browse/CALCITE-4503
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Affects Versions: 1.17.0
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> When dealing with records coming from Java classes, Avatica relies on the 
> order of fields coming from {{java.lang.Class#getFields}} instead of using 
> the order defined in the underlying SQL data type:
>  # [org.apache.calcite.avatica.MetaImpl#createGetter(int 
> ordinal)|https://github.com/apache/calcite-avatica/blob/ba20936bb1387793f34ae489760ec0cdbe205e4e/core/src/main/java/org/apache/calcite/avatica/MetaImpl.java#L145]
>  # 
> [org.apache.calcite.avatica.util.RecordIteratorCursor#RecordIteratorCursor(Iterator
>  iterator, Class 
> clazz)|https://github.com/apache/calcite-avatica/blob/ba20936bb1387793f34ae489760ec0cdbe205e4e/core/src/main/java/org/apache/calcite/avatica/util/RecordIteratorCursor.java#L42]
> This behaviour prevents the change of fields orders, and it's particularly 
> problematic because {{#getFields}} is JVM-specific.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3956) Unify comparison logic for RelOptCost

2021-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-3956:

Labels: pull-request-available  (was: )

> Unify comparison logic for RelOptCost
> -
>
> Key: CALCITE-3956
> URL: https://issues.apache.org/jira/browse/CALCITE-3956
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, comparisons between RelOptCost objects are based on 3 methods:
> 1. {{boolean isLe(RelOptCost cost)}}
> 2. {{boolean isLt(RelOptCost cost)}}
> 3. {{boolean equals(RelOptCost cost)}}
> The 3 methods used in combination determine the relation between RelOptCost 
> objects. 
> There are some problems with this implementation:
> 1. Some logic is duplicate in the above methods, making it difficult to 
> maintain. 
> 2. To determine the relation between RelOptCost objects, we often need to 
> call more than one comparison methods, leading to performance overhead.
> 3. Since the logic is spread in multiple methods, it is easy to end up with 
> contradictive comparison logic, which will suprise the users. For example, 
> the following assertion should hold according to common sense:
> {{if a >=b, then we have a > b or a == b}}
> However, with the current implementation of {{VolcanoCost}}, we can easily 
> create instances that violate the above assertion. 
> To solve the problems, we want to make {{RelOptCost}} extends the 
> {{Comparable}}, so the comparison logic is unified in the 
> {{compareTo}} method, which solves the above problems. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4567) Revise modification of CursorFactory in LocalService#toResponse

2021-04-07 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created CALCITE-4567:


 Summary: Revise modification of CursorFactory in 
LocalService#toResponse
 Key: CALCITE-4567
 URL: https://issues.apache.org/jira/browse/CALCITE-4567
 Project: Calcite
  Issue Type: Improvement
  Components: avatica
Reporter: Stamatis Zampetakis


The {{LocalService#toResponse}} method is responsible for transforming a 
{{MetaResultSet}} to {{ResultSetResponse}} and in the process of doing this it 
may also change the {{CursorFactory}} that was passed in {{MetaResultSet}}.

The reasons for changing the {{CursorFactory}} inside this method are not 
obvious and appear a bit arbitrary at the moment. It seems that changing the 
{{CursorFactory}} in this method is tightly connected to Calcite (as a client) 
and may not work for other clients.

>From a high level perspective a {{CursorFactory}} is necessary to be able to 
>generate the appropriate Cursor. In principle the client who creates 
>{{MetaResultSet}} is supposed to know what cursor it needs and so I assume 
>that it should pass the correct {{CursorFactory}}.

The goal of this issue is to revise the respective code in 
{{LocalService#toResponse}} to make the intentions clear. Changes here are 
potentially breaking and may require also changes in Calcite and other clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4522) CPU cost of Sort should be lower if sort keys are empty

2021-04-07 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316148#comment-17316148
 ] 

Ruben Q L commented on CALCITE-4522:


I think this change introduces a regression in EnumerableLimitSort cost 
computation, specifically in the rowCount part (issue detected by a test suite 
in a downstream project).

EnumerableLimitSort cost formula used to be:
{code:java}
planner.getCostFactory().makeCost(inputRowCount, cpu, 0); // cpu is the nLogM * 
bytesPerRow
{code}
After this change, the first parameter (rowCount) of this formula in case of a 
Sort with fetch (e.g. an EnumerableLimitSort), will not be the inputRowCount, 
but {{readCount=Math.min(inCount, offsetValue + fetchValue);}} (which in 
practice in most cases would just be {{offsetValue + fetchValue}}:
{code:java}
planner.getCostFactory().makeCost(offsetValue + fetchValue, cpu, 0); // cpu is 
the nLogM * bytesPerRow
{code}
In my understanding this is wrong, since a Sort operator, even with fetch (such 
as EnumerableLimitSort) will still need to read and process inputRowCount of 
rows (even though it just needs to keep offsetValue + fetchValue rows sorted), 
so I think the new formula underestimates the cost of Sort with fetch, and its 
first parameter should still be inputRowCount in all cases. Should I create a 
ticket to address this issue?

> CPU cost of Sort should be lower if sort keys are empty
> ---
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: hqx
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.27.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
>  # When the RelCollation is empty, there is no need to sort, but it still 
> compute the cpu cost of sort.
>  # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, 
> where n means the output row count of the sort operator, and row_byte means 
> the average bytes of one row .
> Instead, I give follow suggestion.
>  # the cpu cost is zero if the RelCollation is empty.
>  # let heap_size be min(offset + fetch, input_count), and use input_count * 
> max(1, log(heap_size))* row_byte to compute the cpu cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)