Re: Proposal to extend Calcite into a incremental query optimizer

2021-01-11 Thread Rui Wang
Congratulations on your work! And thanks for considering contributing it to
Calcite!

I tried to have a pass on your paper, and have a question:

It seems like for concrete implementations for merge operator and delta
query for joins/aggregation, the paper adopts some algorithms and ideas
that were proposed in prior works. Do you have a list of sources about
those algorithms? I assume those implementations can be customized but some
common ones can be offered within Calcite. If so, having a list might help
people understand implementation details. (or maybe I should just read
relevant references in the paper :-))



-Rui

On Fri, Jan 1, 2021 at 11:05 AM Botong Huang  wrote:

> Hi Julian,
>
> Thanks for your interest! Sure let's figure out a plan that best benefits
> the community. Here are some clarifications that hopefully answer your
> questions.
>
> In our work (Tempura), users specify the set of time points to consider
> running and a cost function that expresses users' preference over time,
> Tempura will generate the best incremental plan that minimizes the overall
> cost function.
>
> In this incremental plan, the sub-plans at different time points can be
> different from each other, as opposed to identical plans in all delta runs
> as in streaming or IVM. As mentioned in $2.1 of the Tempura paper, we can
> mimic the current streaming implementation by specifying two (logical) time
> points in Tempura, representing the initial run and later delta runs
> respectively. In general, note that Tempura supports various form of
> incremental computing, not only the small-delta append-only data model in
> streaming systems. That's why we believe Tempura subsumes the current
> streaming support, as well as any IVM implementations.
>
> About the cost model, we did not come up with a seperate cost model, but
> rather extended the existing one. Similar to multi-objective optimization,
> costs incurred at different time points are considered different
> dimensions. Tempura lets users supply a function that converts this cost
> vector into a final cost. So under this function, any two incremental plans
> are still comparable and there is an overall optimum. I guess we can go
> down the route of multi-objective parametric query optimization instead if
> there is a need.
>
> Next on materialized views and multi-query optimization, since our
> multi-time-point plan naturally involves materializing intermediate results
> for later time points, we need to solve the problem of choosing
> materializations and include the cost of saving and reusing the
> materializations when costing and comparing plans. We borrowed the
> multi-query optimization techniques to solve this problem even though we
> are looking at a single query. As a result, we think our work is orthogonal
> to Calcite's facilities around utilizing existing views, lattice etc. We do
> feel that the multi-query optimization component can be adopted to wider
> use, but probably need more suggestions from the community.
>
> Lastly, our current implementation is set up in java code, it should be
> straightforward to hook it up with SQL shell.
>
> Thanks,
> Botong
>
> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde 
> wrote:
>
> > Botong,
> >
> > This is very exciting; congratulations on this research, and thank you
> for
> > contributing it back to Calcite.
> >
> > The research touches several areas in Calcite: streaming, materialized
> > view maintenance, and multi-query optimization. As we have already some
> > solutions in those areas (Sigma and Delta relational operators, lattice,
> > and Spool operator), it will be interesting to see whether we can make
> them
> > compatible, or whether one concept can subsume others.
> >
> > Your work differs from streaming queries in that your relations are used
> > by “external” user queries, whereas in pure streaming queries, the only
> > activity is the change propagation. Did you find that you needed two
> > separate cost models - one for “view maintenance” and another for “user
> > queries” - since the objectives of each activity are so different?
> >
> > I wonder whether this work will hasten the arrival of multi-objective
> > parametric query optimization [1] in Calcite.
> >
> > I will make time over the next few days to read and digest your paper.
> > Then I expect that we will have a back-and-forth process to create
> > something that will be useful for the broader community.
> >
> > One thing will be particularly useful: making this functionality
> available
> > from a SQL shell, so that people can experiment with this functionality
> > without writing Java code or setting up complex databases and metadata. I
> > have in mind something like the simple DDL operations that are available
> in
> > Calcite’s ’server’ module. I wonder whether we could devise some kind of
> > SQL syntax for a “multi-query”.
> >
> > Julian
> >
> > [1]
> >
> 

Re: [DISCUSS] Apache Calcite Online Meetup January 2021

2021-01-11 Thread Stamatis Zampetakis
That would be great Vladimir, I will update the agenda.

If possible please provide a title, duration, and abstract.

Best,
Stamatis

On Mon, Jan 11, 2021 at 8:35 PM Vladimir Ozerov  wrote:

> Hi,
>
> I can share our experience with Apache Calcite  integration into Hazelcast
> distributed SQL engine.
>
> Regards,
> Vladimir
>
> Вт, 5 янв. 2021 г. в 00:48, Vineet G :
>
> > Hi Stamatis,
> >
> > Something has come up and unfortunately I will not be able to present the
> > talk.
> >
> > Vineet
> >
> > > On Jan 3, 2021, at 1:37 PM, Stamatis Zampetakis 
> > wrote:
> > >
> > > I updated the agenda on meetup to include Julian's talk around spatial
> > > queries.
> > >
> > > So far we have four presentations lasting approximately 1h45 so I still
> > > find the duration reasonable.
> > >
> > > Of course if there are more people interested to present something we
> can
> > > schedule another meetup in April as Julian suggested.
> > > I am always happy to see what other people are working on and more
> > Calcite
> > > use-cases.
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Sun, Jan 3, 2021 at 2:09 AM Julian Hyde 
> > wrote:
> > >
> > >> In other news I’ll be co-presenting (with Mosha Pasumansky) a talk
> > >> “Open source SQL - beyond parsers: ZetaSQL and Apache Calcite” at the
> > >> Northwest Database Society Annual Meeting on January 29th. It’s
> virtual
> > and
> > >> free, but you must sign up to attend.
> > >>
> > >> Julian
> > >>
> > >> [1] https://sites.google.com/view/nwds2021
> > >>
> > >>> On Jan 2, 2021, at 12:47 PM, Julian Hyde 
> > wrote:
> > >>>
> > >>> I can give a talk “Implementing spatial queries using algebra
> > >> rewrites”, 20 minutes.
> > >>>
> > >>> But if that makes the meetup too long, I am equally happy to postpone
> > >> the talk. How about scheduling another meetup  in say April?
> > >>>
> > >>> Julian
> > >>>
> >  On Dec 31, 2020, at 3:10 AM, Stamatis Zampetakis  >
> > >> wrote:
> > 
> >  I just published the event on Meetup [1].
> > 
> >  The agenda is not yet finalized so if there are people who would
> like
> > to
> >  give a talk or add/remove things from the agenda please reply to
> this
> >  thread.
> > 
> >  Best,
> >  Stamatis
> > 
> >  [1] https://www.meetup.com/Apache-Calcite/events/275461117/
> > 
> > >> On Mon, Nov 30, 2020 at 12:37 AM Rui Wang 
> > >> wrote:
> > >
> > > Title: event timestamp semantic based streaming SQL
> > > Abstract: this talk will cover in Calcite Streaming SQL case, how
> to
> > >> reason
> > > data completeness in terms of event timestamp semantic and how to
> > >> control
> > > materialization latency given unbounded input data (in Calcite
> > roadmap
> > >> but
> > > not implemented yet).
> > >
> > > Duration: 20~30 mins
> > >
> > >> On Tue, Nov 24, 2020 at 8:56 AM Slim Bouguerra 
> > >> wrote:
> > >>
> > >> this is a great idea thanks @Statmatis looking forward to learning
> > >> more
> > >> about Calcite especially the Streaming work.
> > >>
> > >>> On Mon, Nov 23, 2020 at 2:19 PM Rui Wang 
> > >> wrote:
> > >>
> > >>> Sorry for the late reply Statmatis. I have recently been pretty
> > busy
> > >> on
> > >>> work as it is approaching the end of the year.
> > >>>
> > >>> The time in [1] works perfectly for me. I will share the abstract
> > and
> > >>> expected duration soon (should within this week).
> > >>>
> > >>>
> > >>> -Rui
> > >>>
> > >>> On Fri, Nov 20, 2020 at 2:11 AM Stamatis Zampetakis <
> > >> zabe...@gmail.com
> > >>
> > >>> wrote:
> > >>>
> >  That would be great Vineet!
> > 
> >  @Julian, @Rui, @Vineet:
> >  Can you share a small abstract (2-3 sentences) and expected
> > >> duration?
> >  Can you check if the date/times proposed previously [1] work for
> > >> you.
> > >> If
> >  not feel free to propose another slot.
> > 
> >  Best,
> >  Stamatis
> > 
> >  [1] https://s.apache.org/uhrzo
> > 
> >  On Thu, Nov 19, 2020 at 6:18 PM Vineet Garg 
> > > wrote:
> > 
> > > I think this is a great idea. +1 for the online meetup.
> > >
> > > If there are slots left I can also talk about how Hive
> leverages
> > >>> Calcite
> >  to
> > > do query optimization.
> > >
> > > -Vineet
> > >
> > > On Fri, Nov 6, 2020 at 7:21 AM Stamatis Zampetakis <
> > >> zabe...@gmail.com>
> > > wrote:
> > >
> > >> Let's try to fix the date/time and tentative agenda so that we
> > > can
> > >>> add
> > > some
> > >> information on meetup [1].
> > >>
> > >> So far we have three presenters, Julian, Rui, and myself. We
> can
> > >>> start
> > > like
> > >> that and if in the process there are more people 

Re: Incorrect SQL generation from RelToSqlConverter

2021-01-11 Thread Julian Hyde
It looks like a bug. Please log a JIRA case.

I don't think that Correlate has been extensively tested in RelToSql.
However, there is one test that is similar to yours (not identical)
[1] and it passes.

Julian

[1] 
https://github.com/apache/calcite/blob/174a707e1c199c97d7cc3531f0cd2e94745f4366/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L4842

On Mon, Jan 11, 2021 at 12:49 PM Bali  wrote:
>
> Hi,
>
> Thanks in advance for looking into this. I have been experimenting some
> proof of concept for my firm with Apache Calcite. I got into this issue
> while I was creating a simple correlated query. My understanding may be
> wrong here. Following code constructs correlated query:
>
> final FrameworkConfig config = RelBuilderTest.config().build();
> final RelBuilder builder = RelBuilder.create(config);
> final Holder v = Holder.of(null);
>
>  builder.scan("EMP")
> .variable(v)
> .scan("DEPT")
> .filter(
> builder.equals(builder.field("DEPTNO"),
> builder.field(v.get(), "DEPTNO")))
> .project(builder.field("DEPT", "DNAME"), builder.field("DEPT", "LOC"))
> .correlate(
> JoinRelType.LEFT, v.get().id,
> builder.field(2, 0, "DEPTNO")
> );
> RelDataType dataType = builder.peek().getRowType();
> RelNode relNode = builder.project(builder.field("ENAME"),
> builder.field("DNAME"))
> .build();
> System.out.println(RelOptUtil.toString(relNode));
> final RelToSqlConverter converter = new
> RelToSqlConverter(AnsiSqlDialect.DEFAULT);
> final SqlNode sqlNode = converter.visitRoot(relNode).asStatement();
> final String sql = sqlNode.toSqlString(AnsiSqlDialect.DEFAULT).getSql();
> System.out.println(sql);
>
> RowType after correlation step is:
> RecordType(SMALLINT EMPNO, VARCHAR(10) ENAME, VARCHAR(9) JOB, SMALLINT MGR,
> DATE HIREDATE, DECIMAL(7, 2) SAL, DECIMAL(7, 2) COMM, TINYINT DEPTNO,
> VARCHAR(14) DNAME, VARCHAR(13) LOC)
>
> Projecting "ENAME" and "DNAME" should work fine. Relational Tree for the
> query looks like this:
> LogicalProject(EMPNO=[$0], DNAME=[$8])
>   LogicalCorrelate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{7}])
> LogicalTableScan(table=[[scott, EMP]])
> LogicalProject(DNAME=[$1], LOC=[$2])
>   LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
> LogicalTableScan(table=[[scott, DEPT]])
>
> while generated SQL from the query looks like this:
> SELECT `$cor0`.`EMPNO`, `$cor0`.`DNAME`
> FROM `scott`.`EMP` AS `$cor0`,
> LATERAL (SELECT `DNAME`, `LOC`
> FROM `scott`.`DEPT`
> WHERE `DEPTNO` = `$cor0`.`DEPTNO`) AS `t0`
>
> If I try to run this SQL directly through JDBC, then it fails with the
> error "Column 'DNAME' not found in table '$cor0'". Wondering if there is an
> error in RelToSqlConverter or am I missing anything here?
>
> I will highly appreciate any help here.
>
> Thanks,
> ~Bali


Incorrect SQL generation from RelToSqlConverter

2021-01-11 Thread Bali
Hi,

Thanks in advance for looking into this. I have been experimenting some
proof of concept for my firm with Apache Calcite. I got into this issue
while I was creating a simple correlated query. My understanding may be
wrong here. Following code constructs correlated query:

final FrameworkConfig config = RelBuilderTest.config().build();
final RelBuilder builder = RelBuilder.create(config);
final Holder v = Holder.of(null);

 builder.scan("EMP")
.variable(v)
.scan("DEPT")
.filter(
builder.equals(builder.field("DEPTNO"),
builder.field(v.get(), "DEPTNO")))
.project(builder.field("DEPT", "DNAME"), builder.field("DEPT", "LOC"))
.correlate(
JoinRelType.LEFT, v.get().id,
builder.field(2, 0, "DEPTNO")
);
RelDataType dataType = builder.peek().getRowType();
RelNode relNode = builder.project(builder.field("ENAME"),
builder.field("DNAME"))
.build();
System.out.println(RelOptUtil.toString(relNode));
final RelToSqlConverter converter = new
RelToSqlConverter(AnsiSqlDialect.DEFAULT);
final SqlNode sqlNode = converter.visitRoot(relNode).asStatement();
final String sql = sqlNode.toSqlString(AnsiSqlDialect.DEFAULT).getSql();
System.out.println(sql);

RowType after correlation step is:
RecordType(SMALLINT EMPNO, VARCHAR(10) ENAME, VARCHAR(9) JOB, SMALLINT MGR,
DATE HIREDATE, DECIMAL(7, 2) SAL, DECIMAL(7, 2) COMM, TINYINT DEPTNO,
VARCHAR(14) DNAME, VARCHAR(13) LOC)

Projecting "ENAME" and "DNAME" should work fine. Relational Tree for the
query looks like this:
LogicalProject(EMPNO=[$0], DNAME=[$8])
  LogicalCorrelate(correlation=[$cor0], joinType=[left],
requiredColumns=[{7}])
LogicalTableScan(table=[[scott, EMP]])
LogicalProject(DNAME=[$1], LOC=[$2])
  LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
LogicalTableScan(table=[[scott, DEPT]])

while generated SQL from the query looks like this:
SELECT `$cor0`.`EMPNO`, `$cor0`.`DNAME`
FROM `scott`.`EMP` AS `$cor0`,
LATERAL (SELECT `DNAME`, `LOC`
FROM `scott`.`DEPT`
WHERE `DEPTNO` = `$cor0`.`DEPTNO`) AS `t0`

If I try to run this SQL directly through JDBC, then it fails with the
error "Column 'DNAME' not found in table '$cor0'". Wondering if there is an
error in RelToSqlConverter or am I missing anything here?

I will highly appreciate any help here.

Thanks,
~Bali


Re: [DISCUSS] Apache Calcite Online Meetup January 2021

2021-01-11 Thread Vladimir Ozerov
Hi,

I can share our experience with Apache Calcite  integration into Hazelcast
distributed SQL engine.

Regards,
Vladimir

Вт, 5 янв. 2021 г. в 00:48, Vineet G :

> Hi Stamatis,
>
> Something has come up and unfortunately I will not be able to present the
> talk.
>
> Vineet
>
> > On Jan 3, 2021, at 1:37 PM, Stamatis Zampetakis 
> wrote:
> >
> > I updated the agenda on meetup to include Julian's talk around spatial
> > queries.
> >
> > So far we have four presentations lasting approximately 1h45 so I still
> > find the duration reasonable.
> >
> > Of course if there are more people interested to present something we can
> > schedule another meetup in April as Julian suggested.
> > I am always happy to see what other people are working on and more
> Calcite
> > use-cases.
> >
> > Best,
> > Stamatis
> >
> > On Sun, Jan 3, 2021 at 2:09 AM Julian Hyde 
> wrote:
> >
> >> In other news I’ll be co-presenting (with Mosha Pasumansky) a talk
> >> “Open source SQL - beyond parsers: ZetaSQL and Apache Calcite” at the
> >> Northwest Database Society Annual Meeting on January 29th. It’s virtual
> and
> >> free, but you must sign up to attend.
> >>
> >> Julian
> >>
> >> [1] https://sites.google.com/view/nwds2021
> >>
> >>> On Jan 2, 2021, at 12:47 PM, Julian Hyde 
> wrote:
> >>>
> >>> I can give a talk “Implementing spatial queries using algebra
> >> rewrites”, 20 minutes.
> >>>
> >>> But if that makes the meetup too long, I am equally happy to postpone
> >> the talk. How about scheduling another meetup  in say April?
> >>>
> >>> Julian
> >>>
>  On Dec 31, 2020, at 3:10 AM, Stamatis Zampetakis 
> >> wrote:
> 
>  I just published the event on Meetup [1].
> 
>  The agenda is not yet finalized so if there are people who would like
> to
>  give a talk or add/remove things from the agenda please reply to this
>  thread.
> 
>  Best,
>  Stamatis
> 
>  [1] https://www.meetup.com/Apache-Calcite/events/275461117/
> 
> >> On Mon, Nov 30, 2020 at 12:37 AM Rui Wang 
> >> wrote:
> >
> > Title: event timestamp semantic based streaming SQL
> > Abstract: this talk will cover in Calcite Streaming SQL case, how to
> >> reason
> > data completeness in terms of event timestamp semantic and how to
> >> control
> > materialization latency given unbounded input data (in Calcite
> roadmap
> >> but
> > not implemented yet).
> >
> > Duration: 20~30 mins
> >
> >> On Tue, Nov 24, 2020 at 8:56 AM Slim Bouguerra 
> >> wrote:
> >>
> >> this is a great idea thanks @Statmatis looking forward to learning
> >> more
> >> about Calcite especially the Streaming work.
> >>
> >>> On Mon, Nov 23, 2020 at 2:19 PM Rui Wang 
> >> wrote:
> >>
> >>> Sorry for the late reply Statmatis. I have recently been pretty
> busy
> >> on
> >>> work as it is approaching the end of the year.
> >>>
> >>> The time in [1] works perfectly for me. I will share the abstract
> and
> >>> expected duration soon (should within this week).
> >>>
> >>>
> >>> -Rui
> >>>
> >>> On Fri, Nov 20, 2020 at 2:11 AM Stamatis Zampetakis <
> >> zabe...@gmail.com
> >>
> >>> wrote:
> >>>
>  That would be great Vineet!
> 
>  @Julian, @Rui, @Vineet:
>  Can you share a small abstract (2-3 sentences) and expected
> >> duration?
>  Can you check if the date/times proposed previously [1] work for
> >> you.
> >> If
>  not feel free to propose another slot.
> 
>  Best,
>  Stamatis
> 
>  [1] https://s.apache.org/uhrzo
> 
>  On Thu, Nov 19, 2020 at 6:18 PM Vineet Garg 
> > wrote:
> 
> > I think this is a great idea. +1 for the online meetup.
> >
> > If there are slots left I can also talk about how Hive leverages
> >>> Calcite
>  to
> > do query optimization.
> >
> > -Vineet
> >
> > On Fri, Nov 6, 2020 at 7:21 AM Stamatis Zampetakis <
> >> zabe...@gmail.com>
> > wrote:
> >
> >> Let's try to fix the date/time and tentative agenda so that we
> > can
> >>> add
> > some
> >> information on meetup [1].
> >>
> >> So far we have three presenters, Julian, Rui, and myself. We can
> >>> start
> > like
> >> that and if in the process there are more people interested to
> >> give a
> > small
> >> talk we can update the program.
> >>
> >> Let's try to get a date in the last two weeks of January to give
> >> us a
>  bit
> >> more time to prepare. Personally, I don't have a preference for
> >> that
> > being
> >> a business day or not and I am in UTC+1.
> >> For instance, how do you feel about Wednesday, 20 January 2021,
>  18:00:00
> > to
> >> 21:00 UTC+1 [2] ?
> 

[jira] [Created] (CALCITE-4461) Do not cast to logical node inside Enumerable rules

2021-01-11 Thread Vladimir Ozerov (Jira)
Vladimir Ozerov created CALCITE-4461:


 Summary: Do not cast to logical node inside Enumerable rules
 Key: CALCITE-4461
 URL: https://issues.apache.org/jira/browse/CALCITE-4461
 Project: Calcite
  Issue Type: Task
  Components: core
Affects Versions: 1.26.0
Reporter: Vladimir Ozerov
Assignee: Vladimir Ozerov
 Fix For: 1.27.0


Currently, some `Enumerable` rules work with the base operator classes, such as 
`Join`, while others cast to `Logical` counterparts, such as `LogicalJoin`, 
`LogicalProject`, etc. 

This makes it impossible to convert custom non-logical nodes into `Enumerable` 
using the built-in rules.

The proposal is to change all existing rules so that they work with the base 
`RelNode` classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Contributor rights

2021-01-11 Thread Vladimir Ozerov
Thank you very much, Francis!

вс, 10 янв. 2021 г. в 00:45, Francis Chuang :

> Hey Vladimir,
>
> I've added you to the contributor role in jira.
>
> Francis
>
> On 9/01/2021 8:47 pm, Vladimir Ozerov wrote:
> > Hi,
> >
> > Could you please grant me contributor rights in Calcite JIRA? My username
> > is "vozerov".
> >
> > Thank you.
> > Vladimir.
> >
>