[jira] [Created] (CALCITE-4445) Create new conformance level for Postgresql

2020-12-23 Thread Jira
Ondřej Štumpf created CALCITE-4445:
--

 Summary: Create new conformance level for Postgresql
 Key: CALCITE-4445
 URL: https://issues.apache.org/jira/browse/CALCITE-4445
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Ondřej Štumpf
Assignee: Ondřej Štumpf


In CALCITE-4443,  a new Postgresql-specific operator {{ILIKE}} was introduced. 
Currently that operator is only enabled in {{LENIENT}} and {{BABEL}} 
conformance levels. Let's:
* create a new conformance level {{POSTGRESQL}}
** make sure it supports {{ILIKE}}
* use this new conformance level in Postgresql and Vertica dialects
* update the tests for {{ILIKE}} to set {{POSTGRESQL}} conformance level 
instead of {{LENIENT}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Calcite PMC chair: Haisheng Yuan

2020-12-23 Thread Michael Mior
Congratulations and thank you to both Stamatis and Haisheng for your
contributions!
--
Michael Mior
mm...@apache.org

Le lun. 21 déc. 2020 à 17:50, Haisheng Yuan  a écrit :
>
> Thanks everyone.
> It is my great honor to be appointed to serve as the community's PMC chair, 
> and thanks Stamatis for your hard work and contribution you have done for the 
> Calcite community.
>
> Regards,
> Haisheng Yuan
>
> On 2020/12/18 09:07:07, Ruben Q L  wrote:
> > Congratulations Haisheng!
> > Thanks for your work Stamatis!
> >
> >
> > On Fri, Dec 18, 2020 at 8:17 AM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Thanks Stamatis for your hard work and dedication, and congratulations to
> > > Haisheng for this appointment!
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > On Fri, 18 Dec 2020 at 07:31, Xin Wang  wrote:
> > >
> > > > Congrats Haisheng! Thanks for your work, Stamatis!
> > > >
> > > >
> > > > Fan Liya  于2020年12月18日周五 下午12:08写道:
> > > >
> > > > > Congratulations, Haisheng!
> > > > > Looking forward to your great work in the coming year!
> > > > >
> > > > > Stamatis, thanks for your great work in the past year!
> > > > >
> > > > > Best,
> > > > > Liya Fan
> > > > >
> > > > >
> > > > > On Fri, Dec 18, 2020 at 11:11 AM Feng Zhu 
> > > wrote:
> > > > >
> > > > > > Thanks for your work and effort, Stamatis!
> > > > > > Congratulations, Haisheng!
> > > > > >
> > > > > > Stamatis Zampetakis  于2020年12月17日周四 下午9:49写道:
> > > > > >
> > > > > > > Calcite community members,
> > > > > > >
> > > > > > > I am pleased to announce that we have a new PMC chair and VP as 
> > > > > > > per
> > > > our
> > > > > > > tradition of rotating the chair once a year. I have resigned, and
> > > > > > > Haisheng was duly elected by the PMC and approved unanimously by
> > > the
> > > > > > Board.
> > > > > > >
> > > > > > > Please join me in congratulating Haisheng!
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Xin
> > > >
> > >
> >


Proposal to extend Calcite into a incremental query optimizer

2020-12-23 Thread Botong Huang
Hi all,

This is a proposal to extend the Calcite optimizer into a general
incremental query optimizer, based on our research paper published in VLDB
2021:
Tempura: a general cost-based optimizer framework for incremental data
processing

We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data
warehouse is planning to use this incremental query optimizer to alleviate
cluster-wise resource skewness:
Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing

To our best knowledge, this is the first general cost-based incremental
optimizer that can find the best plan across multiple families of
incremental computing methods, including IVM, Streaming, DBToaster, etc.
Experiments (in the paper) shows that the generated best plan is
consistently much better than the plans from each individual method alone.

In general, incremental query planning is central to database view
maintenance and stream processing systems, and are being adopted in active
databases, resumable query execution, approximate query processing, etc. We
are hoping that this feature can help widening the spectrum of Calcite,
solicit more use cases and adoption of Calcite.

Below is a brief description of the technical details. Please refer to the
Tempura paper for more details. We are also working on a journal version of
the paper with more implementation details.

Currently the query plan generated by Calcite is meant to be executed
altogether at once. In the proposal, Calcite’s memo will be extended with
temporal information so that it is capable of generating incremental plans
that include multiple sub-plans to execute at different time points.

The main idea is to view each table as one that changes over time (Time
Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into
Calcite’s memo besides RelSet and RelSubset to track related RelSets of a
changing table (e.g. snapshot of the table at certain time, delta of the
table between two time points, etc.).

[image: image.png]

For example in the above figure, each vertical line is a TvrMetaSet
representing a TVR (S, R, S left outer join R, etc.). Horizontal lines
represent time. Each black dot in the grid is a RelSet. Users can write TVR
Rewrite Rules to describe valid transformations between these dots. For
example, the blues lines are inter-TVR rules that describe how to compute
certain RelSet of a TVR from RelSets of other TVRs. The red lines are
intra-TVR rules that describe transformations within a TVR. All TVR rewrite
rules are logical rules. All existing Calcite rules still work in the new
volcano system without modification.

All changes in this feature will consist of four parts:
1. Memo extension with TvrMetaSet
2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
well as links in between the nodes.
3. A basic set of TvrRules, written using the upgraded rule engine API.
4. Multi-query optimization, used to find the best incremental plan
involving multiple time points.

Note that this feature is an extension in nature and thus when disabled,
does not change any existing Calcite behavior.

Other than scenarios in the paper, we also applied this Calcite-extended
incremental query optimizer to a type of periodic query called the ‘‘range
query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on
total CPU and memory consumption, and 60% on end-to-end execution time.

All comments and suggestions are welcome. Thanks and happy holidays!

Best,
Botong


[jira] [Created] (CALCITE-4446) Implement three-valued logic for SEARCH operator

2020-12-23 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-4446:


 Summary: Implement three-valued logic for SEARCH operator 
 Key: CALCITE-4446
 URL: https://issues.apache.org/jira/browse/CALCITE-4446
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Implement three-valued logic for SEARCH operator.

Consider the expression {{x IN (10, 20)}}, which we might represent as 
{{SEARCH(x, SARG(10, 20))}}. Suppose we invoke this with a value of {{NULL}} 
for {{x}}. Do we want it to return UNKNOWN, FALSE or TRUE? The answer is: all 
of the above.

Here are the 3 variants:
* {{SEARCH(10, 20, UNKNOWN AS TRUE)}}: {{x IS NULL OR x IN (10, 20)}} → 
TRUE
* {{SEARCH(10, 20, UNKNOWN AS UNKNOWN)}}: {{x IN (10, 20)}} → UNKNOWN
* {{SEARCH(10, 20, UNKNOWN AS FALSE)}}: {{x IS NOT NULL AND (x IN (10, 20))}} 
→ FALSE

Currently {{class Sarg}} has a field {{boolean containsNull}} which deals with 
the first two cases. Changing {{boolean containsNull}} to {{RexUnknownAs 
unknownAs}} (which has 3 values) will allow us to represent the third. The new 
representation is symmetrical under negation, which de Morgan's law suggests is 
a good thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Proposal to extend Calcite into a incremental query optimizer

2020-12-23 Thread JiaTao Tao
Seems interesting, the pic can not be seen in the mail, may you open a JIRA
for this, people who are interested in this can subscribe to the JIRA?


Regards!

Aron Tao


Botong Huang  于2020年12月24日周四 上午3:18写道:

> Hi all,
>
> This is a proposal to extend the Calcite optimizer into a general
> incremental query optimizer, based on our research paper published in VLDB
> 2021:
> Tempura: a general cost-based optimizer framework for incremental data
> processing
>
> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data
> warehouse is planning to use this incremental query optimizer to alleviate
> cluster-wise resource skewness:
> Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing
>
> To our best knowledge, this is the first general cost-based incremental
> optimizer that can find the best plan across multiple families of
> incremental computing methods, including IVM, Streaming, DBToaster, etc.
> Experiments (in the paper) shows that the generated best plan is
> consistently much better than the plans from each individual method alone.
>
> In general, incremental query planning is central to database view
> maintenance and stream processing systems, and are being adopted in active
> databases, resumable query execution, approximate query processing, etc. We
> are hoping that this feature can help widening the spectrum of Calcite,
> solicit more use cases and adoption of Calcite.
>
> Below is a brief description of the technical details. Please refer to the
> Tempura paper for more details. We are also working on a journal version of
> the paper with more implementation details.
>
> Currently the query plan generated by Calcite is meant to be executed
> altogether at once. In the proposal, Calcite’s memo will be extended with
> temporal information so that it is capable of generating incremental plans
> that include multiple sub-plans to execute at different time points.
>
> The main idea is to view each table as one that changes over time (Time
> Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into
> Calcite’s memo besides RelSet and RelSubset to track related RelSets of a
> changing table (e.g. snapshot of the table at certain time, delta of the
> table between two time points, etc.).
>
> [image: image.png]
>
> For example in the above figure, each vertical line is a TvrMetaSet
> representing a TVR (S, R, S left outer join R, etc.). Horizontal lines
> represent time. Each black dot in the grid is a RelSet. Users can write TVR
> Rewrite Rules to describe valid transformations between these dots. For
> example, the blues lines are inter-TVR rules that describe how to compute
> certain RelSet of a TVR from RelSets of other TVRs. The red lines are
> intra-TVR rules that describe transformations within a TVR. All TVR rewrite
> rules are logical rules. All existing Calcite rules still work in the new
> volcano system without modification.
>
> All changes in this feature will consist of four parts:
> 1. Memo extension with TvrMetaSet
> 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
> well as links in between the nodes.
> 3. A basic set of TvrRules, written using the upgraded rule engine API.
> 4. Multi-query optimization, used to find the best incremental plan
> involving multiple time points.
>
> Note that this feature is an extension in nature and thus when disabled,
> does not change any existing Calcite behavior.
>
> Other than scenarios in the paper, we also applied this Calcite-extended
> incremental query optimizer to a type of periodic query called the ‘‘range
> query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on
> total CPU and memory consumption, and 60% on end-to-end execution time.
>
> All comments and suggestions are welcome. Thanks and happy holidays!
>
> Best,
> Botong
>


Re: Proposal to extend Calcite into a incremental query optimizer

2020-12-23 Thread Botong Huang
Thanks Aron for pointing this out. To see the figure, please refer to Fig
3(a) in our paper: https://kai-zeng.github.io/papers/tempura-vldb2021.pdf

Best,
Botong

On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao  wrote:

> Seems interesting, the pic can not be seen in the mail, may you open a JIRA
> for this, people who are interested in this can subscribe to the JIRA?
>
>
> Regards!
>
> Aron Tao
>
>
> Botong Huang  于2020年12月24日周四 上午3:18写道:
>
> > Hi all,
> >
> > This is a proposal to extend the Calcite optimizer into a general
> > incremental query optimizer, based on our research paper published in
> VLDB
> > 2021:
> > Tempura: a general cost-based optimizer framework for incremental data
> > processing
> >
> > We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data
> > warehouse is planning to use this incremental query optimizer to
> alleviate
> > cluster-wise resource skewness:
> > Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental
> Computing
> >
> > To our best knowledge, this is the first general cost-based incremental
> > optimizer that can find the best plan across multiple families of
> > incremental computing methods, including IVM, Streaming, DBToaster, etc.
> > Experiments (in the paper) shows that the generated best plan is
> > consistently much better than the plans from each individual method
> alone.
> >
> > In general, incremental query planning is central to database view
> > maintenance and stream processing systems, and are being adopted in
> active
> > databases, resumable query execution, approximate query processing, etc.
> We
> > are hoping that this feature can help widening the spectrum of Calcite,
> > solicit more use cases and adoption of Calcite.
> >
> > Below is a brief description of the technical details. Please refer to
> the
> > Tempura paper for more details. We are also working on a journal version
> of
> > the paper with more implementation details.
> >
> > Currently the query plan generated by Calcite is meant to be executed
> > altogether at once. In the proposal, Calcite’s memo will be extended with
> > temporal information so that it is capable of generating incremental
> plans
> > that include multiple sub-plans to execute at different time points.
> >
> > The main idea is to view each table as one that changes over time (Time
> > Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into
> > Calcite’s memo besides RelSet and RelSubset to track related RelSets of a
> > changing table (e.g. snapshot of the table at certain time, delta of the
> > table between two time points, etc.).
> >
> > [image: image.png]
> >
> > For example in the above figure, each vertical line is a TvrMetaSet
> > representing a TVR (S, R, S left outer join R, etc.). Horizontal lines
> > represent time. Each black dot in the grid is a RelSet. Users can write
> TVR
> > Rewrite Rules to describe valid transformations between these dots. For
> > example, the blues lines are inter-TVR rules that describe how to compute
> > certain RelSet of a TVR from RelSets of other TVRs. The red lines are
> > intra-TVR rules that describe transformations within a TVR. All TVR
> rewrite
> > rules are logical rules. All existing Calcite rules still work in the new
> > volcano system without modification.
> >
> > All changes in this feature will consist of four parts:
> > 1. Memo extension with TvrMetaSet
> > 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
> > well as links in between the nodes.
> > 3. A basic set of TvrRules, written using the upgraded rule engine API.
> > 4. Multi-query optimization, used to find the best incremental plan
> > involving multiple time points.
> >
> > Note that this feature is an extension in nature and thus when disabled,
> > does not change any existing Calcite behavior.
> >
> > Other than scenarios in the paper, we also applied this Calcite-extended
> > incremental query optimizer to a type of periodic query called the
> ‘‘range
> > query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on
> > total CPU and memory consumption, and 60% on end-to-end execution time.
> >
> > All comments and suggestions are welcome. Thanks and happy holidays!
> >
> > Best,
> > Botong
> >
>