Re: [DISCUSS] The state of the project - 2017

Julian Hyde Sat, 11 Nov 2017 23:09:07 -0800

I agree. Let’s make it actionable: create a JIRA case, and to complete the task 
we should add the list of components and their owners in one of the web pages.


We should make the list of components should line up with the components in 
JIRA. I don’t think it’s very important how we slice up the project into 
components — components do not always correspond to a particular java package 
or piece of code, but more often to an area of functionality.

Julian


> On Nov 9, 2017, at 6:30 PM, Jacques Nadeau <jacq...@apache.org> wrote:
> 
> Michael,
> 
> I think the ownership thinking is a really good idea. Things like trait
> behaviors, volcano, types, hep, decorrelation, parsing, sql-to-rel,
> materialized views are all good chunks that could be owned by someone. (in
> addition to avatica and each of the connectors).
> 
> On Wed, Nov 8, 2017 at 2:50 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
> 
>> Interesting thoughts about the paper you pointed to Julian. I believe I
>> read it some time ago, but I'll have to dust it off and think about it in
>> the context of Calcite. All your other thoughts also sound like exciting
>> directions for Calcite.
>> 
>> I hope we can all find ways to take some of the burden off your shoulders.
>> While I am happy to serve as PMC chair, I'm still working on familiarizing
>> myself with the code base to the point where I can more quickly review PRs
>> with some level of confidence. (For the time being, I'm also not actively
>> using Calcite.) I wonder if others would be willing to step up to "own"
>> parts of the code base (e.g. as Josh does in many ways with Avatica). I
>> think if we could have the majority of components on JIRA assigned by
>> default to someone other than you, that might be a start. Of course,
>> practically speaking so much is contained within core, that this might have
>> marginal impact. We could also consider (on JIRA only) creating some
>> additional components to further partition things.
>> 
>> I forgot when I was thinking about CI that you have your own build suite
>> running for the project which is much appreciated :) But I'm sure we would
>> both agree that it would be nice if this extra testing wasn't resting
>> solely on you. I'll start a separate thread when I have time to start
>> hacking on CI-related things to get some more input.
>> 
>> --
>> Michael Mior
>> mm...@apache.org
>> 
>> 2017-11-08 16:34 GMT-05:00 Julian Hyde <jh...@apache.org>:
>> 
>>> Thanks for starting this discussion, Jesus. Here are some thoughts, in
>>> no particular order.
>>> 
>>> I too have noticed the increase in academic adoption. This is
>>> excellent. Shall we add a section to the "Powered by" page [1] on
>>> academic projects and papers?
>>> 
>>> I worry a lot about audience (or audiences). Who is using Calcite? Are
>>> we giving them what they need? Data engines (such as Drill, Hive and
>>> Flink) are one category, and I think they are fairly well served.
>>> Academics are another audience; some are succeeding, but I wonder
>>> whether it would be easier for them if we had some relevant examples,
>>> such as how to parse a query and optimize it using several different
>>> cost models and combinations of rules. What other audiences are there?
>>> 
>>> There is an audience who would like to use Calcite as a standalone
>>> engine; and folks who would like to incorporate materialized views,
>>> indexes and constraints into their engine but prefer to speak SQL
>>> rather than Java APIs. Those groups are not well served today. I am
>>> working on a server which has DDL support[2][3]; it would provide a
>>> (simple) standalone engine, but also allow us to demo materialized
>>> views, virtual columns, check constraints and foreign tables/schemas
>>> via SQL so that people building engines can more easily grasp the
>>> concepts.
>>> 
>>> I read Trumer & Koch's paper "Multi-objective parametric query
>>> optimization" [4] in CACM recently. It is a very exciting advance, and
>>> too much to cover in this thread, but it got me thinking about how
>>> Calcite could evolve to incorporate their ideas. I realize that giving
>>> RelOptCost multiple fields was a mistake, unless we also add the
>>> mechanics (piecewise-linear cost functions and polytopes) to handle
>>> them. The vast majority of Calcite remains applicable, so this would
>>> be evolutionary: Calcite's rules and algebra emerge intact in the new
>>> order, and Calcite's metadata framework can model the new cost
>>> functions. Extending Calcite could raise some interesting research
>>> topics; is it possible to extend the parameter space (either the
>>> number of parameters or the value range of those parameters) after
>>> initiial planning?; can we use parameters to model whether
>>> intermediate results are materialized (see [5]) or whether ephemeral
>>> materialized views happen to be present in cache?; what new statistics
>>> do we need to gather to power the new cost functions? There is enough
>>> here to interest several researchers.
>>> 
>>> As for features:
>>> * I would like to get to full compliance with OpenGIS, because spatial
>>> support is much more straightforward in Calcite's algebraic approach
>>> than in engines which need to build a new data structure.
>>> * I also would like to give users a choice of engines in Calcite:
>>> Spark and perhaps something based on Arrow, in addition to the
>>> existing Enumerable engine.
>>> * I would like to continue to make the planner more modular, so that
>>> people can supply a program (a collection of rules organized into
>>> planning phases) and basically just say "go".
>>> * And I plan to continue my work to make data systems learn and adapt,
>>> creating and populating materialized views based on observed query
>>> patterns and data statistics.
>>> 
>>> Regarding governance. I think we are functioning well as a
>>> meritocratic community. High-quality contributions arrive from people
>>> who have never contributed before; this is happening more and more
>>> frequently, which is really excellent. On the other hand, this
>>> increases the load for reviewing (and pro-actively fixing)
>>> contributions, and too much of that work still falls on my shoulders.
>>> There are times when I get close to burn out, especially when people
>>> explicitly direct questions and pull requests at me.
>>> 
>>> I think Michael would be an excellent PMC chair. I am delighted that
>>> he is prepared to do the job.
>>> 
>>> Regarding CI. There is a bit more CI going on than meets the eye; I
>>> run several tests nightly on my home server, and also on a Windows VM,
>>> and speak up if things get broken. But I admit there has been bit-rot
>>> in some of the adapters, and having a public CI for those adapters
>>> would be useful, if we can do so without generating too much noise.
>>> 
>>> Julian
>>> 
>>> [1] https://calcite.apache.org/docs/powered_by.html
>>> 
>>> [2] https://issues.apache.org/jira/browse/CALCITE-707
>>> 
>>> [3] https://issues.apache.org/jira/browse/CALCITE-1991
>>> 
>>> [4] https://cacm.acm.org/magazines/2017/10/221322-
>>> multi-objective-parametric-query-optimization/abstract
>>> 
>>> [5] https://issues.apache.org/jira/browse/CALCITE-481
>>> 
>>> On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <els...@apache.org> wrote:
>>>> On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
>>>>> 
>>>>> I am not involved in the Avatica effort, but it has been great to see
>>>>> Avatica continue maturing, moving into its own repository and
>> following
>>> with
>>>>> its own release cadence. Josh, Julian, if you want to add a few lines
>>> about
>>>>> the state of Avatica, that would be great.
>>>> 
>>>> 
>>>> Would be happy to :)
>>>> 
>>>> I've certainly been spending less time on core-functionality. Avatica
>> has
>>>> definitely passed the cusp for what most developers need. The majority
>> of
>>>> users would find Avatica to be fully-featured as a JDBC interface (but
>>> there
>>>> are some gaps that still exist).
>>>> 
>>>> We've started to see the focus on non-JDBC drivers for Avatica which
>> is a
>>>> great sign. Our Francis has been making progress on trying to adopt the
>>>> driver written in Go into the Apache codebase. There are a few other
>>> drivers
>>>> available as well. The presence of these drivers, and their ability to
>>>> continue to function is good validation of the protocol/stability model
>>> that
>>>> we outlined/implemented in the past 1-2 years.
>>>> 
>>>> Avatica is still fairly low-volume, with only a few people
>> contributing.
>>> I'd
>>>> love to see more people take an interest (it's a great stepping stone
>>> into
>>>> Calcite too ;P).
>>> 
>>

Re: [DISCUSS] The state of the project - 2017

Reply via email to