RE: [PROPOSAL] Optiq
Good section. I do agree to what it says and somehow hope we can eventually help each other out with e.g. a library of adaptors. -Original Message- From: Julian Hyde [mailto:julianh...@gmail.com] Sent: 8. maj 2014 20:03 To: general@incubator.apache.org Subject: Re: [PROPOSAL] Optiq The "Relationships with Other Apache Products" section has been updated to cover Optiq's functional overlaps with existing Apache projects. https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products Julian On May 2, 2014, at 11:23 AM, Henry Saputra wrote: > Ah sorry, I did not mean "asking to update", I meant "proposing to update". > > Thanks, > > - Henry > > On Fri, May 2, 2014 at 11:20 AM, Henry Saputra > wrote: >> HI Ashutosh, >> >> Since there was a question/ comment about relationship with Apache >> MetaModel, I am asking to update the proposal to include this >> discussion in either "Relationships with Other Apache Products" or >> "Alignment" section before going for a VOTE. >> >> Apache Slider did the same thing with relation to Apache Twill and >> Apache Helix projects. >> >> Thanks, >> >> - Henry >> >> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan >> wrote: >>> I would like to propose Optiq as an Apache Incubator project. I >>> have posted the proposal to >>> https://wiki.apache.org/incubator/OptiqProposal and posted the text of the >>> proposal below. >>> >>> Ashutosh. >>> >>> = Optiq = >>> == Abstract == >>> >>> Optiq is a framework that allows efficient translation of queries >>> involving heterogeneous and federated data. >>> >>> == Proposal == >>> >>> Optiq is a highly customizable engine for parsing and planning >>> queries on data in a wide variety of formats. It allows >>> database-like access, and in particular a SQL interface and advanced >>> query optimization, for data not residing in a traditional database. >>> >>> == Background == >>> >>> Databases were traditionally engineered in a monolithic stack, >>> providing a data storage format, data processing algorithms, query >>> parser, query planner, built-in functions, metadata repository and >>> connectivity layer. >>> They innovate in some areas but rarely in all. >>> >>> Modern data management systems are decomposing that stack into >>> separate components, separating data, processing engine, metadata, >>> and query language support. They are highly heterogeneous, with data >>> in multiple locations and formats, caching and redundant data, >>> different workloads, and processing occurring in different engines. >>> >>> Query planning (sometimes called query optimization) has always been >>> a key function of a DBMS, because it allows the implementors to >>> introduce new query-processing algorithms, and allows data >>> administrators to re-organize the data without affecting >>> applications built on that data. In a componentized system, the >>> query planner integrates the components (data formats, engines, >>> algorithms) without introducing unncessary coupling or performance >>> tradeoffs. >>> >>> But building a query planner is hard; many systems muddle along >>> without a planner, and indeed a SQL interface, until the demand from >>> their customers is overwhelming. >>> >>> There is an opportunity to make this process more efficient by >>> creating a re-usable framework. >>> >>> == Rationale == >>> >>> Optiq allows database-like access, and in particular a SQL interface >>> and advanced query optimization, for data not residing in a >>> traditional database. It is complementary to many current Hadoop and >>> NoSQL systems, which have innovative and performant storage and >>> runtime systems but lack a SQL interface and intelligent query translation. >>> >>> Optiq is already in use by several projects, including Apache Drill, >>> Apache Hive and Cascading Lingual, and commercial products. >>> >>> Optiq's architecture consists of: >>> >>> An extensible relational algebra. >>> SPIs (service-provider interfaces) for metadata (schemas and >>> tables), planner rules, statistics, cost-estimates, user-defined functions. >>> Built-in sets of rules for logical transformations a
Re: [PROPOSAL] Optiq
Yes, thanks updating the proposal. Really appreciate it. - Henry On Thu, May 8, 2014 at 11:03 AM, Julian Hyde wrote: > The “Relationships with Other Apache Products” section has been updated to > cover Optiq’s functional overlaps with existing Apache projects. > > https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products > > Julian > > On May 2, 2014, at 11:23 AM, Henry Saputra wrote: > >> Ah sorry, I did not mean "asking to update", I meant "proposing to update". >> >> Thanks, >> >> - Henry >> >> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra >> wrote: >>> HI Ashutosh, >>> >>> Since there was a question/ comment about relationship with Apache >>> MetaModel, I am asking to update the proposal to include this >>> discussion in either "Relationships with Other Apache Products" or >>> "Alignment" section before going for a VOTE. >>> >>> Apache Slider did the same thing with relation to Apache Twill and >>> Apache Helix projects. >>> >>> Thanks, >>> >>> - Henry >>> >>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan >>> wrote: I would like to propose Optiq as an Apache Incubator project. I have posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and posted the text of the proposal below. Ashutosh. = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. Built-in sets of rules for logical transformations and common data-sources. Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. Optional SQL parser, validator and translator to relational algebra. Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL par
Re: [PROPOSAL] Optiq
The “Relationships with Other Apache Products” section has been updated to cover Optiq’s functional overlaps with existing Apache projects. https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products Julian On May 2, 2014, at 11:23 AM, Henry Saputra wrote: > Ah sorry, I did not mean "asking to update", I meant "proposing to update". > > Thanks, > > - Henry > > On Fri, May 2, 2014 at 11:20 AM, Henry Saputra > wrote: >> HI Ashutosh, >> >> Since there was a question/ comment about relationship with Apache >> MetaModel, I am asking to update the proposal to include this >> discussion in either "Relationships with Other Apache Products" or >> "Alignment" section before going for a VOTE. >> >> Apache Slider did the same thing with relation to Apache Twill and >> Apache Helix projects. >> >> Thanks, >> >> - Henry >> >> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan >> wrote: >>> I would like to propose Optiq as an Apache Incubator project. I have >>> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and >>> posted the text of the proposal below. >>> >>> Ashutosh. >>> >>> = Optiq = >>> == Abstract == >>> >>> Optiq is a framework that allows efficient translation of queries involving >>> heterogeneous and federated data. >>> >>> == Proposal == >>> >>> Optiq is a highly customizable engine for parsing and planning queries on >>> data in a wide variety of formats. It allows database-like access, and in >>> particular a SQL interface and advanced query optimization, for data not >>> residing in a traditional database. >>> >>> == Background == >>> >>> Databases were traditionally engineered in a monolithic stack, providing a >>> data storage format, data processing algorithms, query parser, query >>> planner, built-in functions, metadata repository and connectivity layer. >>> They innovate in some areas but rarely in all. >>> >>> Modern data management systems are decomposing that stack into separate >>> components, separating data, processing engine, metadata, and query >>> language support. They are highly heterogeneous, with data in multiple >>> locations and formats, caching and redundant data, different workloads, and >>> processing occurring in different engines. >>> >>> Query planning (sometimes called query optimization) has always been a key >>> function of a DBMS, because it allows the implementors to introduce new >>> query-processing algorithms, and allows data administrators to re-organize >>> the data without affecting applications built on that data. In a >>> componentized system, the query planner integrates the components (data >>> formats, engines, algorithms) without introducing unncessary coupling or >>> performance tradeoffs. >>> >>> But building a query planner is hard; many systems muddle along without a >>> planner, and indeed a SQL interface, until the demand from their customers >>> is overwhelming. >>> >>> There is an opportunity to make this process more efficient by creating a >>> re-usable framework. >>> >>> == Rationale == >>> >>> Optiq allows database-like access, and in particular a SQL interface and >>> advanced query optimization, for data not residing in a traditional >>> database. It is complementary to many current Hadoop and NoSQL systems, >>> which have innovative and performant storage and runtime systems but lack a >>> SQL interface and intelligent query translation. >>> >>> Optiq is already in use by several projects, including Apache Drill, Apache >>> Hive and Cascading Lingual, and commercial products. >>> >>> Optiq's architecture consists of: >>> >>> An extensible relational algebra. >>> SPIs (service-provider interfaces) for metadata (schemas and tables), >>> planner rules, statistics, cost-estimates, user-defined functions. >>> Built-in sets of rules for logical transformations and common data-sources. >>> Two query planning engines driven by rules, statistics, etc. One engine is >>> cost-based, the other rule-based. >>> Optional SQL parser, validator and translator to relational algebra. >>> Optional JDBC driver. >>> == Initial Goals == >>> >>> The initial goals are be to move the existing codebase to Apache and >>> integrate with the Apache development process. Once this is accomplished, >>> we plan for incremental development and releases that follow the Apache >>> guidelines. >>> >>> As we move the code into the org.apache namespace, we will restructure >>> components as necessary to allow clients to use just the components of >>> Optiq that they need. >>> >>> A version 1.0 release, including pre-built binaries, will foster wider >>> adoption. >>> >>> == Current Status == >>> >>> Optiq has had over a dozen minor releases over the last 18 months. Its core >>> SQL parser and validator, and its planning engine and core rules, are >>> mature and robust and are the basis for several production systems; but >>> other components and SPIs are still undergoing rapid evolution. >>> >>>
Re: [PROPOSAL] Optiq
Now that discussion is settling down, I will start a vote thread shortly. On Mon, May 5, 2014 at 3:22 PM, Ashutosh Chauhan wrote: > Thanks everyone for great feedback. With Julian's help I have updated the > section "Relationships with Other Apache projects" so that folks can get a > sense where Optiq stands w.r.t other projects going on at ASF. > > Thanks, > Ashutosh > > > On Fri, May 2, 2014 at 11:23 AM, Henry Saputra wrote: > >> Ah sorry, I did not mean "asking to update", I meant "proposing to >> update". >> >> Thanks, >> >> - Henry >> >> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra >> wrote: >> > HI Ashutosh, >> > >> > Since there was a question/ comment about relationship with Apache >> > MetaModel, I am asking to update the proposal to include this >> > discussion in either "Relationships with Other Apache Products" or >> > "Alignment" section before going for a VOTE. >> > >> > Apache Slider did the same thing with relation to Apache Twill and >> > Apache Helix projects. >> > >> > Thanks, >> > >> > - Henry >> > >> > On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan >> wrote: >> >> I would like to propose Optiq as an Apache Incubator project. I have >> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland >> >> posted the text of the proposal below. >> >> >> >> Ashutosh. >> >> >> >> = Optiq = >> >> == Abstract == >> >> >> >> Optiq is a framework that allows efficient translation of queries >> involving >> >> heterogeneous and federated data. >> >> >> >> == Proposal == >> >> >> >> Optiq is a highly customizable engine for parsing and planning queries >> on >> >> data in a wide variety of formats. It allows database-like access, and >> in >> >> particular a SQL interface and advanced query optimization, for data >> not >> >> residing in a traditional database. >> >> >> >> == Background == >> >> >> >> Databases were traditionally engineered in a monolithic stack, >> providing a >> >> data storage format, data processing algorithms, query parser, query >> >> planner, built-in functions, metadata repository and connectivity >> layer. >> >> They innovate in some areas but rarely in all. >> >> >> >> Modern data management systems are decomposing that stack into separate >> >> components, separating data, processing engine, metadata, and query >> >> language support. They are highly heterogeneous, with data in multiple >> >> locations and formats, caching and redundant data, different >> workloads, and >> >> processing occurring in different engines. >> >> >> >> Query planning (sometimes called query optimization) has always been a >> key >> >> function of a DBMS, because it allows the implementors to introduce new >> >> query-processing algorithms, and allows data administrators to >> re-organize >> >> the data without affecting applications built on that data. In a >> >> componentized system, the query planner integrates the components (data >> >> formats, engines, algorithms) without introducing unncessary coupling >> or >> >> performance tradeoffs. >> >> >> >> But building a query planner is hard; many systems muddle along >> without a >> >> planner, and indeed a SQL interface, until the demand from their >> customers >> >> is overwhelming. >> >> >> >> There is an opportunity to make this process more efficient by >> creating a >> >> re-usable framework. >> >> >> >> == Rationale == >> >> >> >> Optiq allows database-like access, and in particular a SQL interface >> and >> >> advanced query optimization, for data not residing in a traditional >> >> database. It is complementary to many current Hadoop and NoSQL systems, >> >> which have innovative and performant storage and runtime systems but >> lack a >> >> SQL interface and intelligent query translation. >> >> >> >> Optiq is already in use by several projects, including Apache Drill, >> Apache >> >> Hive and Cascading Lingual, and commercial products. >> >> >> >> Optiq's architecture consists of: >> >> >> >> An extensible relational algebra. >> >> SPIs (service-provider interfaces) for metadata (schemas and tables), >> >> planner rules, statistics, cost-estimates, user-defined functions. >> >> Built-in sets of rules for logical transformations and common >> data-sources. >> >> Two query planning engines driven by rules, statistics, etc. One >> engine is >> >> cost-based, the other rule-based. >> >> Optional SQL parser, validator and translator to relational algebra. >> >> Optional JDBC driver. >> >> == Initial Goals == >> >> >> >> The initial goals are be to move the existing codebase to Apache and >> >> integrate with the Apache development process. Once this is >> accomplished, >> >> we plan for incremental development and releases that follow the Apache >> >> guidelines. >> >> >> >> As we move the code into the org.apache namespace, we will restructure >> >> components as necessary to allow clients to use just the components of >> >> Optiq that they need. >> >> >> >> A version 1.0 release, including pre-built binaries,
Re: [PROPOSAL] Optiq
On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell wrote: > All that I suggest is that candidate Apache projects articulate how they > differ from related projects, and that we consider the strength of this > argument when evaluating the long term viability of the effort and > community. It would be good if proposals have a "related work" section done > with the diligence and detail as the typical academic publication, I > haven't seen that at least recently. Thanks Andrew for articulating it even more clearly -- this is exactly the extra bit of of info I was suggesting we add to the template. IOW, an explicit informational section may help bring clarity not only for the casual IPCM members, but also to the folks proposing a new project in the first place. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Optiq
I realize this is a discussion about Optiq in particular, so please pardon the detour. I won't continue this discussion in this thread further. On the subject of Optiq, I'd be +1 for incubation considering that at least two Apache projects incorporate it substantially, and a third is considering it. It would benefit them and Optiq I should hope. Getting back to the question of admitting projects with a high degree of overlap. Or even a fork. In my opinion, this should be considered more carefully and with a less liberal attitude than I've seen. It would be unfortunate for incubation to serve as a tool for end runs against well functioning Apache communities, where the differences are commercial externalities not technical matters or personal issues between individuals. What are the substantive technical differences is quite important to determine. Hand waving shouldn't be sufficient. Hypothetically, maybe the core difference is not technical but instead the initial committer list is stacked with individuals from a single organization, and the proposal is for an as yet undeveloped codebase. Or otherwise rooted in the control freakery of a third party. The Foundation can become a tool for competition against healthy projects that has nothing to do with code or abstractions or personal differences. I think this betrays the Apache Way. Maybe I'm in an ethical minority. On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell wrote: > All that I suggest is that candidate Apache projects articulate how they > differ from related projects, and that we consider the strength of this > argument when evaluating the long term viability of the effort and > community. It would be good if proposals have a "related work" section done > with the diligence and detail as the typical academic publication, I > haven't seen that at least recently. > > Differences in project direction leading to new projects (effectively, > sanctioned forks) is fine, although regrettable, since that would represent > an acknowledged failure of the Apache community process. "Creative > competition" between differing abstractions is fine. Etc. But if I come to > Apache to set up Apache Foo, with presumably the focus and care on > community development a motivating factor for that (otherwise why shouldn't > I just go to GitHub?), then if later the Incubator admits Apache FooBar > (incubating) and Apache FooBaz (incubating) that significantly overlap and > duplicate my efforts - overriding my concerns or objections - then I'd be > inclined to not view Apache as a particularly good steward of my community > development. The devil is in the details, which takes me back to the point > made in the above paragraph. > > > > > > > > On Fri, May 2, 2014 at 10:52 AM, Chris Douglas wrote: > >> On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell >> wrote: >> > If not part of the initial proposal, then >> > at least making a good case as a criteria for graduation, and writing up >> > related work and how the new project differentiates could be an initial >> > task done on JIRA after acceptance along the lines of the trademark >> search. >> >> I see this differently. Project overlap (particularly in the >> incubator) is neither surprising nor regrettable. Recently we've seen >> several SQL, streaming, and security projects. While these are all >> mature domains, the "best practices" are still being explored. Each >> branch in architecture may accommodate a new project, and each path >> through those tradeoffs will define those communities. They'll also >> define each other; by way of illustration, a project that's a subset >> of another becomes the "lightweight" implementation. If the enthusiasm >> for a project wanes, that's not a tragedy the incubator can prevent by >> forcing alignment based on the goal of the project. Rejecting a >> community will not cause them to join an existing one; they'll just >> leave Apache. >> >> More than losing an opportunity to foster a community, a policy >> favoring consolidation would actively harm innovation and >> experimentation. A requirement for uniqueness would reward first >> movers and leave no outlet for legitimate differences in project >> direction. Granting existing projects authority over prospective >> communities _because_ they compete is not an optimization. As we saw >> with HCatalog, sometimes revolutions don't become distinct communities >> and the effort is reabsorbed. The incubator should continue to support >> that natural process. >> >> Finally, it's not surprising that the incubator will see projects with >> similar goals in waves. The need for new abstractions is experienced >> jointly and solutions are explored concurrently. That's a feature of >> the incubator, not a bug. >> >> Articulating the project's "related work" is a useful exercise, which >> is why it's a section in the proposal. -C >> >> > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra > >wrote: >> > >> >> Unfortunately, similar projects entering A
Re: [PROPOSAL] Optiq
All fair points. However (as your example demonstrates), referring to this duplication as "failure" instead of evolution biases the incubator to protect existing projects. Putting new projects on the defensive is almost always unfair, unless they're literally forking an existing project. As you say, the details are more important than the general point, but the default lamentation over duplication is, in my view, misguided. More concretely, the proposal is required to fill out a "related work" section. We don't need new processes, particularly if that section is fleshed out in threads like this one. -C On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell wrote: > All that I suggest is that candidate Apache projects articulate how they > differ from related projects, and that we consider the strength of this > argument when evaluating the long term viability of the effort and > community. It would be good if proposals have a "related work" section done > with the diligence and detail as the typical academic publication, I > haven't seen that at least recently. > > Differences in project direction leading to new projects (effectively, > sanctioned forks) is fine, although regrettable, since that would represent > an acknowledged failure of the Apache community process. "Creative > competition" between differing abstractions is fine. Etc. But if I come to > Apache to set up Apache Foo, with presumably the focus and care on > community development a motivating factor for that (otherwise why shouldn't > I just go to GitHub?), then if later the Incubator admits Apache FooBar > (incubating) and Apache FooBaz (incubating) that significantly overlap and > duplicate my efforts - overriding my concerns or objections - then I'd be > inclined to not view Apache as a particularly good steward of my community > development. The devil is in the details, which takes me back to the point > made in the above paragraph. > > > > > > > > On Fri, May 2, 2014 at 10:52 AM, Chris Douglas wrote: > >> On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell >> wrote: >> > If not part of the initial proposal, then >> > at least making a good case as a criteria for graduation, and writing up >> > related work and how the new project differentiates could be an initial >> > task done on JIRA after acceptance along the lines of the trademark >> search. >> >> I see this differently. Project overlap (particularly in the >> incubator) is neither surprising nor regrettable. Recently we've seen >> several SQL, streaming, and security projects. While these are all >> mature domains, the "best practices" are still being explored. Each >> branch in architecture may accommodate a new project, and each path >> through those tradeoffs will define those communities. They'll also >> define each other; by way of illustration, a project that's a subset >> of another becomes the "lightweight" implementation. If the enthusiasm >> for a project wanes, that's not a tragedy the incubator can prevent by >> forcing alignment based on the goal of the project. Rejecting a >> community will not cause them to join an existing one; they'll just >> leave Apache. >> >> More than losing an opportunity to foster a community, a policy >> favoring consolidation would actively harm innovation and >> experimentation. A requirement for uniqueness would reward first >> movers and leave no outlet for legitimate differences in project >> direction. Granting existing projects authority over prospective >> communities _because_ they compete is not an optimization. As we saw >> with HCatalog, sometimes revolutions don't become distinct communities >> and the effort is reabsorbed. The incubator should continue to support >> that natural process. >> >> Finally, it's not surprising that the incubator will see projects with >> similar goals in waves. The need for new abstractions is experienced >> jointly and solutions are explored concurrently. That's a feature of >> the incubator, not a bug. >> >> Articulating the project's "related work" is a useful exercise, which >> is why it's a section in the proposal. -C >> >> > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra > >wrote: >> > >> >> Unfortunately, similar projects entering Apache incubator are common >> >> things =( >> >> >> >> Even though each original project proposers can argue about >> >> differences in one way or another, it will eventually decided by >> >> adoption and community growth, and at the end the quality of the >> >> project itself. >> >> >> >> Some other incoming projects had been in similar questions/concerns >> >> regarding "competing" with existing ASF projects, e.g.: Twill vs >> >> Slider, Samza vs Storm vs S4, and several others. >> >> >> >> >> >> - Henry >> >> >> >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning >> >> wrote: >> >> > I think that there is a huge difference between Metamodel and Optiq. >> >> > >> >> > In particular: >> >> > >> >> > - Optiq provides real SQL including nested queries, correlated >> >> sub-q
Re: [PROPOSAL] Optiq
Ah sorry, I did not mean "asking to update", I meant "proposing to update". Thanks, - Henry On Fri, May 2, 2014 at 11:20 AM, Henry Saputra wrote: > HI Ashutosh, > > Since there was a question/ comment about relationship with Apache > MetaModel, I am asking to update the proposal to include this > discussion in either "Relationships with Other Apache Products" or > "Alignment" section before going for a VOTE. > > Apache Slider did the same thing with relation to Apache Twill and > Apache Helix projects. > > Thanks, > > - Henry > > On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan > wrote: >> I would like to propose Optiq as an Apache Incubator project. I have >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and >> posted the text of the proposal below. >> >> Ashutosh. >> >> = Optiq = >> == Abstract == >> >> Optiq is a framework that allows efficient translation of queries involving >> heterogeneous and federated data. >> >> == Proposal == >> >> Optiq is a highly customizable engine for parsing and planning queries on >> data in a wide variety of formats. It allows database-like access, and in >> particular a SQL interface and advanced query optimization, for data not >> residing in a traditional database. >> >> == Background == >> >> Databases were traditionally engineered in a monolithic stack, providing a >> data storage format, data processing algorithms, query parser, query >> planner, built-in functions, metadata repository and connectivity layer. >> They innovate in some areas but rarely in all. >> >> Modern data management systems are decomposing that stack into separate >> components, separating data, processing engine, metadata, and query >> language support. They are highly heterogeneous, with data in multiple >> locations and formats, caching and redundant data, different workloads, and >> processing occurring in different engines. >> >> Query planning (sometimes called query optimization) has always been a key >> function of a DBMS, because it allows the implementors to introduce new >> query-processing algorithms, and allows data administrators to re-organize >> the data without affecting applications built on that data. In a >> componentized system, the query planner integrates the components (data >> formats, engines, algorithms) without introducing unncessary coupling or >> performance tradeoffs. >> >> But building a query planner is hard; many systems muddle along without a >> planner, and indeed a SQL interface, until the demand from their customers >> is overwhelming. >> >> There is an opportunity to make this process more efficient by creating a >> re-usable framework. >> >> == Rationale == >> >> Optiq allows database-like access, and in particular a SQL interface and >> advanced query optimization, for data not residing in a traditional >> database. It is complementary to many current Hadoop and NoSQL systems, >> which have innovative and performant storage and runtime systems but lack a >> SQL interface and intelligent query translation. >> >> Optiq is already in use by several projects, including Apache Drill, Apache >> Hive and Cascading Lingual, and commercial products. >> >> Optiq's architecture consists of: >> >> An extensible relational algebra. >> SPIs (service-provider interfaces) for metadata (schemas and tables), >> planner rules, statistics, cost-estimates, user-defined functions. >> Built-in sets of rules for logical transformations and common data-sources. >> Two query planning engines driven by rules, statistics, etc. One engine is >> cost-based, the other rule-based. >> Optional SQL parser, validator and translator to relational algebra. >> Optional JDBC driver. >> == Initial Goals == >> >> The initial goals are be to move the existing codebase to Apache and >> integrate with the Apache development process. Once this is accomplished, >> we plan for incremental development and releases that follow the Apache >> guidelines. >> >> As we move the code into the org.apache namespace, we will restructure >> components as necessary to allow clients to use just the components of >> Optiq that they need. >> >> A version 1.0 release, including pre-built binaries, will foster wider >> adoption. >> >> == Current Status == >> >> Optiq has had over a dozen minor releases over the last 18 months. Its core >> SQL parser and validator, and its planning engine and core rules, are >> mature and robust and are the basis for several production systems; but >> other components and SPIs are still undergoing rapid evolution. >> >> === Meritocracy === >> >> We plan to invest in supporting a meritocracy. We will discuss the >> requirements in an open forum. We encourage the companies and projects >> using Optiq to discuss their requirements in an open forum and to >> participate in development. We will encourage and monitor community >> participation so that privileges can be extended to those that contribute. >> >> Optiq's pluggable architecture encourages develope
Re: [PROPOSAL] Optiq
HI Ashutosh, Since there was a question/ comment about relationship with Apache MetaModel, I am asking to update the proposal to include this discussion in either "Relationships with Other Apache Products" or "Alignment" section before going for a VOTE. Apache Slider did the same thing with relation to Apache Twill and Apache Helix projects. Thanks, - Henry On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan wrote: > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and > posted the text of the proposal below. > > Ashutosh. > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand from their customers > is overwhelming. > > There is an opportunity to make this process more efficient by creating a > re-usable framework. > > == Rationale == > > Optiq allows database-like access, and in particular a SQL interface and > advanced query optimization, for data not residing in a traditional > database. It is complementary to many current Hadoop and NoSQL systems, > which have innovative and performant storage and runtime systems but lack a > SQL interface and intelligent query translation. > > Optiq is already in use by several projects, including Apache Drill, Apache > Hive and Cascading Lingual, and commercial products. > > Optiq's architecture consists of: > > An extensible relational algebra. > SPIs (service-provider interfaces) for metadata (schemas and tables), > planner rules, statistics, cost-estimates, user-defined functions. > Built-in sets of rules for logical transformations and common data-sources. > Two query planning engines driven by rules, statistics, etc. One engine is > cost-based, the other rule-based. > Optional SQL parser, validator and translator to relational algebra. > Optional JDBC driver. > == Initial Goals == > > The initial goals are be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is accomplished, > we plan for incremental development and releases that follow the Apache > guidelines. > > As we move the code into the org.apache namespace, we will restructure > components as necessary to allow clients to use just the components of > Optiq that they need. > > A version 1.0 release, including pre-built binaries, will foster wider > adoption. > > == Current Status == > > Optiq has had over a dozen minor releases over the last 18 months. Its core > SQL parser and validator, and its planning engine and core rules, are > mature and robust and are the basis for several production systems; but > other components and SPIs are still undergoing rapid evolution. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. We encourage the companies and projects > using Optiq to discuss their requirements in an open forum and to > participate in development. We will encourage and monitor community > participation so that privileges can be extended to those that contribute. > > Optiq's pluggable architecture encourages developers to contribute > extensions such as adapters for data sources, new planning rules, and > better statistics and cost-estimation functions. We look forward to > fostering a rich ecosystem of extensions. > > === Community === > > Building a data management system requires a hi
Re: [PROPOSAL] Optiq
All that I suggest is that candidate Apache projects articulate how they differ from related projects, and that we consider the strength of this argument when evaluating the long term viability of the effort and community. It would be good if proposals have a "related work" section done with the diligence and detail as the typical academic publication, I haven't seen that at least recently. Differences in project direction leading to new projects (effectively, sanctioned forks) is fine, although regrettable, since that would represent an acknowledged failure of the Apache community process. "Creative competition" between differing abstractions is fine. Etc. But if I come to Apache to set up Apache Foo, with presumably the focus and care on community development a motivating factor for that (otherwise why shouldn't I just go to GitHub?), then if later the Incubator admits Apache FooBar (incubating) and Apache FooBaz (incubating) that significantly overlap and duplicate my efforts - overriding my concerns or objections - then I'd be inclined to not view Apache as a particularly good steward of my community development. The devil is in the details, which takes me back to the point made in the above paragraph. On Fri, May 2, 2014 at 10:52 AM, Chris Douglas wrote: > On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell > wrote: > > If not part of the initial proposal, then > > at least making a good case as a criteria for graduation, and writing up > > related work and how the new project differentiates could be an initial > > task done on JIRA after acceptance along the lines of the trademark > search. > > I see this differently. Project overlap (particularly in the > incubator) is neither surprising nor regrettable. Recently we've seen > several SQL, streaming, and security projects. While these are all > mature domains, the "best practices" are still being explored. Each > branch in architecture may accommodate a new project, and each path > through those tradeoffs will define those communities. They'll also > define each other; by way of illustration, a project that's a subset > of another becomes the "lightweight" implementation. If the enthusiasm > for a project wanes, that's not a tragedy the incubator can prevent by > forcing alignment based on the goal of the project. Rejecting a > community will not cause them to join an existing one; they'll just > leave Apache. > > More than losing an opportunity to foster a community, a policy > favoring consolidation would actively harm innovation and > experimentation. A requirement for uniqueness would reward first > movers and leave no outlet for legitimate differences in project > direction. Granting existing projects authority over prospective > communities _because_ they compete is not an optimization. As we saw > with HCatalog, sometimes revolutions don't become distinct communities > and the effort is reabsorbed. The incubator should continue to support > that natural process. > > Finally, it's not surprising that the incubator will see projects with > similar goals in waves. The need for new abstractions is experienced > jointly and solutions are explored concurrently. That's a feature of > the incubator, not a bug. > > Articulating the project's "related work" is a useful exercise, which > is why it's a section in the proposal. -C > > > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra >wrote: > > > >> Unfortunately, similar projects entering Apache incubator are common > >> things =( > >> > >> Even though each original project proposers can argue about > >> differences in one way or another, it will eventually decided by > >> adoption and community growth, and at the end the quality of the > >> project itself. > >> > >> Some other incoming projects had been in similar questions/concerns > >> regarding "competing" with existing ASF projects, e.g.: Twill vs > >> Slider, Samza vs Storm vs S4, and several others. > >> > >> > >> - Henry > >> > >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning > >> wrote: > >> > I think that there is a huge difference between Metamodel and Optiq. > >> > > >> > In particular: > >> > > >> > - Optiq provides real SQL including nested queries, correlated > >> sub-queries > >> > and so on > >> > > >> > - Metamodel uses a fluent Java API ... SQL parsing and transformation > >> > doesn't appear to be a goal > >> > > >> > - Optiq provides highly advanced query transformations including > >> > decorrelations based on estimated execution costs. > >> > > >> > - Metamodel appears to provide no significant query transformations > >> > > >> > - Optiq only provides query execution as a by-product for testing > >> > > >> > - Metamodel has query execution as a central goal > >> > > >> > - Optiq provides a form of type inferencing for SQL queries. This is > >> > unique to Optiq as far as I know. > >> > > >> > > >> > > >> > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < > >> > kasper.soren...@humaninference.com> wrote: > >> > > >> >> I see
Re: [PROPOSAL] Optiq
On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell wrote: > If not part of the initial proposal, then > at least making a good case as a criteria for graduation, and writing up > related work and how the new project differentiates could be an initial > task done on JIRA after acceptance along the lines of the trademark search. I see this differently. Project overlap (particularly in the incubator) is neither surprising nor regrettable. Recently we've seen several SQL, streaming, and security projects. While these are all mature domains, the "best practices" are still being explored. Each branch in architecture may accommodate a new project, and each path through those tradeoffs will define those communities. They'll also define each other; by way of illustration, a project that's a subset of another becomes the "lightweight" implementation. If the enthusiasm for a project wanes, that's not a tragedy the incubator can prevent by forcing alignment based on the goal of the project. Rejecting a community will not cause them to join an existing one; they'll just leave Apache. More than losing an opportunity to foster a community, a policy favoring consolidation would actively harm innovation and experimentation. A requirement for uniqueness would reward first movers and leave no outlet for legitimate differences in project direction. Granting existing projects authority over prospective communities _because_ they compete is not an optimization. As we saw with HCatalog, sometimes revolutions don't become distinct communities and the effort is reabsorbed. The incubator should continue to support that natural process. Finally, it's not surprising that the incubator will see projects with similar goals in waves. The need for new abstractions is experienced jointly and solutions are explored concurrently. That's a feature of the incubator, not a bug. Articulating the project's "related work" is a useful exercise, which is why it's a section in the proposal. -C > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra wrote: > >> Unfortunately, similar projects entering Apache incubator are common >> things =( >> >> Even though each original project proposers can argue about >> differences in one way or another, it will eventually decided by >> adoption and community growth, and at the end the quality of the >> project itself. >> >> Some other incoming projects had been in similar questions/concerns >> regarding "competing" with existing ASF projects, e.g.: Twill vs >> Slider, Samza vs Storm vs S4, and several others. >> >> >> - Henry >> >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning >> wrote: >> > I think that there is a huge difference between Metamodel and Optiq. >> > >> > In particular: >> > >> > - Optiq provides real SQL including nested queries, correlated >> sub-queries >> > and so on >> > >> > - Metamodel uses a fluent Java API ... SQL parsing and transformation >> > doesn't appear to be a goal >> > >> > - Optiq provides highly advanced query transformations including >> > decorrelations based on estimated execution costs. >> > >> > - Metamodel appears to provide no significant query transformations >> > >> > - Optiq only provides query execution as a by-product for testing >> > >> > - Metamodel has query execution as a central goal >> > >> > - Optiq provides a form of type inferencing for SQL queries. This is >> > unique to Optiq as far as I know. >> > >> > >> > >> > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < >> > kasper.soren...@humaninference.com> wrote: >> > >> >> I see a lot of conceptual similarity between Optiq and the Apache >> >> MetaModel (incubator) project [1]. Maybe something can be done to align >> the >> >> two projects, so that we avoid having two incubating projects that do >> >> basically the same thing? >> >> >> >> Or maybe there's some glaring difference that I am missing? At least it >> >> seems to me both to be projects that try to provide uniform querying >> >> capabilities to a wide array of data backends. Both project also favor a >> >> type-safe Java querying API instead of a String/SQL oriented query API. >> >> >> >> Regards, >> >> Kasper Sørensen >> >> >> >> [1] http://metamodel.incubator.apache.org/ >> >> >> >> >> >> From: Ashutosh Chauhan [hashut...@apache.org] >> >> Sent: 01 May 2014 00:21 >> >> To: general@incubator.apache.org >> >> Subject: [PROPOSAL] Optiq >> >> >> >> I would like to propose Optiq as an Apache Incubator project. I have >> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland >> >> posted the text of the proposal below. >> >> >> >> Ashutosh. >> >> >> >> = Optiq = >> >> == Abstract == >> >> >> >> Optiq is a framework that allows efficient translation of queries >> involving >> >> heterogeneous and federated data. >> >> >> >> == Proposal == >> >> >> >> Optiq is a highly customizable engine for parsing and planning queries >> on >> >> data in a wide variety of formats. It allows database-like acce
RE: [PROPOSAL] Optiq
I feel the same way. And to clarify my position a bit - I am in no way against having Optiq in the incubator, it sounds like a very impressive library. I was merely probing if it would be possible to merge or standardize some of the aspects of the projects - work together where it's possible, and differentiate where it makes sense. -Original Message- From: shaposh...@gmail.com [mailto:shaposh...@gmail.com] On Behalf Of Roman Shaposhnik Sent: 2. maj 2014 01:49 To: general@incubator.apache.org Subject: Re: [PROPOSAL] Optiq On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell wrote: > One could imagine as part of the case for incubation and graduation > both an articulation of the project's place in the larger ecosystem, > similar to how academic papers customarily place their work and novel > findings within the larger field in 'Related Work'. If not part of the > initial proposal, then at least making a good case as a criteria for > graduation, and writing up related work and how the new project > differentiates could be an initial task done on JIRA after acceptance along > the lines of the trademark search. I would be a strong +1 to modify our proposal template to include a section like that. It will be, if nothing else, a strong forcing function to spend some time considering similar projects. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Optiq
On Wed, Apr 30, 2014, at 03:21 PM, Ashutosh Chauhan wrote: > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal > and > posted the text of the proposal below. > > Ashutosh. Given the importance of Optiq for several larger projects, my belief is that it is immensely relevant to see it transform into a community project under the ASF stewardship. I very much second the torch-carrying remark of Ted. Steven. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Optiq
On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell wrote: > One could imagine as part of the case for incubation and graduation both an > articulation of the project's place in the larger ecosystem, similar to how > academic papers customarily place their work and novel findings within the > larger field in 'Related Work'. If not part of the initial proposal, then > at least making a good case as a criteria for graduation, and writing up > related work and how the new project differentiates could be an initial > task done on JIRA after acceptance along the lines of the trademark search. I would be a strong +1 to modify our proposal template to include a section like that. It will be, if nothing else, a strong forcing function to spend some time considering similar projects. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Optiq
On Thu, May 1, 2014 at 10:19 PM, Kasper Sørensen < kasper.soren...@humaninference.com> wrote: > - Can you explain or link to more information about the type inference you > mention? > The type inferencing is used by Drill. The problem is that strong typing is normally required to parse SQL statements. If you don't even yet know what columns exist, parsing a query is difficult for normal SQL parsers. Generating good code for such situations has to be delayed until type information is available. See http://tnachen.wordpress.com/2013/11/05/lifetime-of-a-query-in-drill-alpha-release/ https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
Re: [PROPOSAL] Optiq
One could imagine as part of the case for incubation and graduation both an articulation of the project's place in the larger ecosystem, similar to how academic papers customarily place their work and novel findings within the larger field in 'Related Work'. If not part of the initial proposal, then at least making a good case as a criteria for graduation, and writing up related work and how the new project differentiates could be an initial task done on JIRA after acceptance along the lines of the trademark search. On Thu, May 1, 2014 at 2:22 PM, Henry Saputra wrote: > Unfortunately, similar projects entering Apache incubator are common > things =( > > Even though each original project proposers can argue about > differences in one way or another, it will eventually decided by > adoption and community growth, and at the end the quality of the > project itself. > > Some other incoming projects had been in similar questions/concerns > regarding "competing" with existing ASF projects, e.g.: Twill vs > Slider, Samza vs Storm vs S4, and several others. > > > - Henry > > On Thu, May 1, 2014 at 12:14 AM, Ted Dunning > wrote: > > I think that there is a huge difference between Metamodel and Optiq. > > > > In particular: > > > > - Optiq provides real SQL including nested queries, correlated > sub-queries > > and so on > > > > - Metamodel uses a fluent Java API ... SQL parsing and transformation > > doesn't appear to be a goal > > > > - Optiq provides highly advanced query transformations including > > decorrelations based on estimated execution costs. > > > > - Metamodel appears to provide no significant query transformations > > > > - Optiq only provides query execution as a by-product for testing > > > > - Metamodel has query execution as a central goal > > > > - Optiq provides a form of type inferencing for SQL queries. This is > > unique to Optiq as far as I know. > > > > > > > > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < > > kasper.soren...@humaninference.com> wrote: > > > >> I see a lot of conceptual similarity between Optiq and the Apache > >> MetaModel (incubator) project [1]. Maybe something can be done to align > the > >> two projects, so that we avoid having two incubating projects that do > >> basically the same thing? > >> > >> Or maybe there's some glaring difference that I am missing? At least it > >> seems to me both to be projects that try to provide uniform querying > >> capabilities to a wide array of data backends. Both project also favor a > >> type-safe Java querying API instead of a String/SQL oriented query API. > >> > >> Regards, > >> Kasper Sørensen > >> > >> [1] http://metamodel.incubator.apache.org/ > >> > >> > >> From: Ashutosh Chauhan [hashut...@apache.org] > >> Sent: 01 May 2014 00:21 > >> To: general@incubator.apache.org > >> Subject: [PROPOSAL] Optiq > >> > >> I would like to propose Optiq as an Apache Incubator project. I have > >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland > >> posted the text of the proposal below. > >> > >> Ashutosh. > >> > >> = Optiq = > >> == Abstract == > >> > >> Optiq is a framework that allows efficient translation of queries > involving > >> heterogeneous and federated data. > >> > >> == Proposal == > >> > >> Optiq is a highly customizable engine for parsing and planning queries > on > >> data in a wide variety of formats. It allows database-like access, and > in > >> particular a SQL interface and advanced query optimization, for data not > >> residing in a traditional database. > >> > >> == Background == > >> > >> Databases were traditionally engineered in a monolithic stack, > providing a > >> data storage format, data processing algorithms, query parser, query > >> planner, built-in functions, metadata repository and connectivity layer. > >> They innovate in some areas but rarely in all. > >> > >> Modern data management systems are decomposing that stack into separate > >> components, separating data, processing engine, metadata, and query > >> language support. They are highly heterogeneous, with data in multiple > >> locations and formats, caching and redundant data, different workloads, > and > >> processing occurring in different engines. > >> > >> Query planning (sometimes called query optimization) has always been a > key > >> function of a DBMS, because it allows the implementors to introduce new > >> query-processing algorithms, and allows data administrators to > re-organize > >> the data without affecting applications built on that data. In a > >> componentized system, the query planner integrates the components (data > >> formats, engines, algorithms) without introducing unncessary coupling or > >> performance tradeoffs. > >> > >> But building a query planner is hard; many systems muddle along without > a > >> planner, and indeed a SQL interface, until the demand from their > customers > >> is overwhelming. > >> > >> There is an opportunity to make this pr
Re: [PROPOSAL] Optiq
Unfortunately, similar projects entering Apache incubator are common things =( Even though each original project proposers can argue about differences in one way or another, it will eventually decided by adoption and community growth, and at the end the quality of the project itself. Some other incoming projects had been in similar questions/concerns regarding "competing" with existing ASF projects, e.g.: Twill vs Slider, Samza vs Storm vs S4, and several others. - Henry On Thu, May 1, 2014 at 12:14 AM, Ted Dunning wrote: > I think that there is a huge difference between Metamodel and Optiq. > > In particular: > > - Optiq provides real SQL including nested queries, correlated sub-queries > and so on > > - Metamodel uses a fluent Java API ... SQL parsing and transformation > doesn't appear to be a goal > > - Optiq provides highly advanced query transformations including > decorrelations based on estimated execution costs. > > - Metamodel appears to provide no significant query transformations > > - Optiq only provides query execution as a by-product for testing > > - Metamodel has query execution as a central goal > > - Optiq provides a form of type inferencing for SQL queries. This is > unique to Optiq as far as I know. > > > > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < > kasper.soren...@humaninference.com> wrote: > >> I see a lot of conceptual similarity between Optiq and the Apache >> MetaModel (incubator) project [1]. Maybe something can be done to align the >> two projects, so that we avoid having two incubating projects that do >> basically the same thing? >> >> Or maybe there's some glaring difference that I am missing? At least it >> seems to me both to be projects that try to provide uniform querying >> capabilities to a wide array of data backends. Both project also favor a >> type-safe Java querying API instead of a String/SQL oriented query API. >> >> Regards, >> Kasper Sørensen >> >> [1] http://metamodel.incubator.apache.org/ >> >> >> From: Ashutosh Chauhan [hashut...@apache.org] >> Sent: 01 May 2014 00:21 >> To: general@incubator.apache.org >> Subject: [PROPOSAL] Optiq >> >> I would like to propose Optiq as an Apache Incubator project. I have >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and >> posted the text of the proposal below. >> >> Ashutosh. >> >> = Optiq = >> == Abstract == >> >> Optiq is a framework that allows efficient translation of queries involving >> heterogeneous and federated data. >> >> == Proposal == >> >> Optiq is a highly customizable engine for parsing and planning queries on >> data in a wide variety of formats. It allows database-like access, and in >> particular a SQL interface and advanced query optimization, for data not >> residing in a traditional database. >> >> == Background == >> >> Databases were traditionally engineered in a monolithic stack, providing a >> data storage format, data processing algorithms, query parser, query >> planner, built-in functions, metadata repository and connectivity layer. >> They innovate in some areas but rarely in all. >> >> Modern data management systems are decomposing that stack into separate >> components, separating data, processing engine, metadata, and query >> language support. They are highly heterogeneous, with data in multiple >> locations and formats, caching and redundant data, different workloads, and >> processing occurring in different engines. >> >> Query planning (sometimes called query optimization) has always been a key >> function of a DBMS, because it allows the implementors to introduce new >> query-processing algorithms, and allows data administrators to re-organize >> the data without affecting applications built on that data. In a >> componentized system, the query planner integrates the components (data >> formats, engines, algorithms) without introducing unncessary coupling or >> performance tradeoffs. >> >> But building a query planner is hard; many systems muddle along without a >> planner, and indeed a SQL interface, until the demand from their customers >> is overwhelming. >> >> There is an opportunity to make this process more efficient by creating a >> re-usable framework. >> >> == Rationale == >> >> Optiq allows database-like access, and in particular a SQL interface and >> advanced query optimization, for data not residing in a traditional >> database. It is complementary to many current Hadoop and NoSQL systems, >> which have innovative and performant storage and runtime systems but lack a >> SQL interface and intelligent query translation. >> >> Optiq is already in use by several projects, including Apache Drill, Apache >> Hive and Cascading Lingual, and commercial products. >> >> Optiq's architecture consists of: >> >> An extensible relational algebra. >> SPIs (service-provider interfaces) for metadata (schemas and tables), >> planner rules, statistics, cost-estimates, user-defined functions. >> Built-in sets
RE: [PROPOSAL] Optiq
I am certainly not questioning the power of Optiq. Just noting that I think it has a lot of similarities, and obviously there are differences. But even in your list I find a lot of the points to be close or similar. I would hate to start a war over words in this thread, since I actually only wanted to point to a related project. But to your points: - MetaModel also provides SQL support, including nested/sub-queries. Not correlated sub-queries though, which is a difference. - SQL parsing is a goal. Not exactly sure what you mean by "transformations"; I think it's either User Defined Functions (UDFs) or transformations of the query itself to fit with a particular backend. UDFs are currently not supported no, but we do quite a lot of tricks to transform and optimize the query plan based on the backing store. - Sounds like Optiq is ahead of MM in terms of query transformations. - What do you mean when you say query execution is not a central goal of Optiq? What would you otherwise be needing your query for? - Can you explain or link to more information about the type inference you mention? Best regards, Kasper From: Ted Dunning [ted.dunn...@gmail.com] Sent: 01 May 2014 09:14 To: general@incubator.apache.org Subject: Re: [PROPOSAL] Optiq I think that there is a huge difference between Metamodel and Optiq. In particular: - Optiq provides real SQL including nested queries, correlated sub-queries and so on - Metamodel uses a fluent Java API ... SQL parsing and transformation doesn't appear to be a goal - Optiq provides highly advanced query transformations including decorrelations based on estimated execution costs. - Metamodel appears to provide no significant query transformations - Optiq only provides query execution as a by-product for testing - Metamodel has query execution as a central goal - Optiq provides a form of type inferencing for SQL queries. This is unique to Optiq as far as I know. On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < kasper.soren...@humaninference.com> wrote: > I see a lot of conceptual similarity between Optiq and the Apache > MetaModel (incubator) project [1]. Maybe something can be done to align the > two projects, so that we avoid having two incubating projects that do > basically the same thing? > > Or maybe there's some glaring difference that I am missing? At least it > seems to me both to be projects that try to provide uniform querying > capabilities to a wide array of data backends. Both project also favor a > type-safe Java querying API instead of a String/SQL oriented query API. > > Regards, > Kasper Sørensen > > [1] http://metamodel.incubator.apache.org/ > > > From: Ashutosh Chauhan [hashut...@apache.org] > Sent: 01 May 2014 00:21 > To: general@incubator.apache.org > Subject: [PROPOSAL] Optiq > > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and > posted the text of the proposal below. > > Ashutosh. > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand fro
Re: [PROPOSAL] Optiq
Apache Hive has recently started work to integrate with Optiq as well. Having it as an Apache project will be good for both Optiq and Apache. Alan. On Apr 30, 2014, at 3:21 PM, Ashutosh Chauhan wrote: > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and > posted the text of the proposal below. > > Ashutosh. > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand from their customers > is overwhelming. > > There is an opportunity to make this process more efficient by creating a > re-usable framework. > > == Rationale == > > Optiq allows database-like access, and in particular a SQL interface and > advanced query optimization, for data not residing in a traditional > database. It is complementary to many current Hadoop and NoSQL systems, > which have innovative and performant storage and runtime systems but lack a > SQL interface and intelligent query translation. > > Optiq is already in use by several projects, including Apache Drill, Apache > Hive and Cascading Lingual, and commercial products. > > Optiq's architecture consists of: > > An extensible relational algebra. > SPIs (service-provider interfaces) for metadata (schemas and tables), > planner rules, statistics, cost-estimates, user-defined functions. > Built-in sets of rules for logical transformations and common data-sources. > Two query planning engines driven by rules, statistics, etc. One engine is > cost-based, the other rule-based. > Optional SQL parser, validator and translator to relational algebra. > Optional JDBC driver. > == Initial Goals == > > The initial goals are be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is accomplished, > we plan for incremental development and releases that follow the Apache > guidelines. > > As we move the code into the org.apache namespace, we will restructure > components as necessary to allow clients to use just the components of > Optiq that they need. > > A version 1.0 release, including pre-built binaries, will foster wider > adoption. > > == Current Status == > > Optiq has had over a dozen minor releases over the last 18 months. Its core > SQL parser and validator, and its planning engine and core rules, are > mature and robust and are the basis for several production systems; but > other components and SPIs are still undergoing rapid evolution. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. We encourage the companies and projects > using Optiq to discuss their requirements in an open forum and to > participate in development. We will encourage and monitor community > participation so that privileges can be extended to those that contribute. > > Optiq's pluggable architecture encourages developers to contribute > extensions such as adapters for data sources, new planning rules, and > better statistics and cost-estimation functions. We look forward to > fostering a rich ecosystem of extensions. > > === Community === > > Building a data management system requires a high degree of technical > skill, and correspondingly, the community of developers directly using > Optiq is potentially fairly small, albeit highly technical and engaged. But > we also expect
Re: [PROPOSAL] Optiq
I agree with Ted. Optiq is a full fledged cost-based query optimization framework for relational workloads. I also want to highlight Optiq's JDBC infrastructure (and ODBC at a later point as well). Rather than implementing the JDBC specification, Optiq users only have to implement a few interfaces for JDBC support. The Stratosphere project (which recently entered the Apache Incubator) also decided for using Optiq for their SQL interface. After some research, we found that Optiq is a perfect fit for our requirements. And I can confirm from a developer's perspective that Optiq is doing a great job. I am confident that Optiq will become the standard query optimization framework for SQL-on-"BigData". With Drill and Hive relying on Optiq, major projects already invested into the project. Robert On Thu, May 1, 2014 at 9:14 AM, Ted Dunning wrote: > I think that there is a huge difference between Metamodel and Optiq. > > In particular: > > - Optiq provides real SQL including nested queries, correlated sub-queries > and so on > > - Metamodel uses a fluent Java API ... SQL parsing and transformation > doesn't appear to be a goal > > - Optiq provides highly advanced query transformations including > decorrelations based on estimated execution costs. > > - Metamodel appears to provide no significant query transformations > > - Optiq only provides query execution as a by-product for testing > > - Metamodel has query execution as a central goal > > - Optiq provides a form of type inferencing for SQL queries. This is > unique to Optiq as far as I know. > > > > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < > kasper.soren...@humaninference.com> wrote: > > > I see a lot of conceptual similarity between Optiq and the Apache > > MetaModel (incubator) project [1]. Maybe something can be done to align > the > > two projects, so that we avoid having two incubating projects that do > > basically the same thing? > > > > Or maybe there's some glaring difference that I am missing? At least it > > seems to me both to be projects that try to provide uniform querying > > capabilities to a wide array of data backends. Both project also favor a > > type-safe Java querying API instead of a String/SQL oriented query API. > > > > Regards, > > Kasper Sørensen > > > > [1] http://metamodel.incubator.apache.org/ > > > > > > From: Ashutosh Chauhan [hashut...@apache.org] > > Sent: 01 May 2014 00:21 > > To: general@incubator.apache.org > > Subject: [PROPOSAL] Optiq > > > > I would like to propose Optiq as an Apache Incubator project. I have > > posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland > > posted the text of the proposal below. > > > > Ashutosh. > > > > = Optiq = > > == Abstract == > > > > Optiq is a framework that allows efficient translation of queries > involving > > heterogeneous and federated data. > > > > == Proposal == > > > > Optiq is a highly customizable engine for parsing and planning queries on > > data in a wide variety of formats. It allows database-like access, and in > > particular a SQL interface and advanced query optimization, for data not > > residing in a traditional database. > > > > == Background == > > > > Databases were traditionally engineered in a monolithic stack, providing > a > > data storage format, data processing algorithms, query parser, query > > planner, built-in functions, metadata repository and connectivity layer. > > They innovate in some areas but rarely in all. > > > > Modern data management systems are decomposing that stack into separate > > components, separating data, processing engine, metadata, and query > > language support. They are highly heterogeneous, with data in multiple > > locations and formats, caching and redundant data, different workloads, > and > > processing occurring in different engines. > > > > Query planning (sometimes called query optimization) has always been a > key > > function of a DBMS, because it allows the implementors to introduce new > > query-processing algorithms, and allows data administrators to > re-organize > > the data without affecting applications built on that data. In a > > componentized system, the query planner integrates the components (data > > formats, engines, algorithms) without introducing unncessary coupling or > > performance tradeoffs. > > > > But building a query planner is hard; many systems muddle along without a > > planner, and indeed a SQL interface, until the demand from their > customers > > is overwhelming. > > > > There is an opportunity to make this process more efficient by creating a > > re-usable framework. > > > > == Rationale == > > > > Optiq allows database-like access, and in particular a SQL interface and > > advanced query optimization, for data not residing in a traditional > > database. It is complementary to many current Hadoop and NoSQL systems, > > which have innovative and performant storage and runtime systems but > lack a > > SQL inter
Re: [PROPOSAL] Optiq
I think that there is a huge difference between Metamodel and Optiq. In particular: - Optiq provides real SQL including nested queries, correlated sub-queries and so on - Metamodel uses a fluent Java API ... SQL parsing and transformation doesn't appear to be a goal - Optiq provides highly advanced query transformations including decorrelations based on estimated execution costs. - Metamodel appears to provide no significant query transformations - Optiq only provides query execution as a by-product for testing - Metamodel has query execution as a central goal - Optiq provides a form of type inferencing for SQL queries. This is unique to Optiq as far as I know. On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen < kasper.soren...@humaninference.com> wrote: > I see a lot of conceptual similarity between Optiq and the Apache > MetaModel (incubator) project [1]. Maybe something can be done to align the > two projects, so that we avoid having two incubating projects that do > basically the same thing? > > Or maybe there's some glaring difference that I am missing? At least it > seems to me both to be projects that try to provide uniform querying > capabilities to a wide array of data backends. Both project also favor a > type-safe Java querying API instead of a String/SQL oriented query API. > > Regards, > Kasper Sørensen > > [1] http://metamodel.incubator.apache.org/ > > > From: Ashutosh Chauhan [hashut...@apache.org] > Sent: 01 May 2014 00:21 > To: general@incubator.apache.org > Subject: [PROPOSAL] Optiq > > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and > posted the text of the proposal below. > > Ashutosh. > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand from their customers > is overwhelming. > > There is an opportunity to make this process more efficient by creating a > re-usable framework. > > == Rationale == > > Optiq allows database-like access, and in particular a SQL interface and > advanced query optimization, for data not residing in a traditional > database. It is complementary to many current Hadoop and NoSQL systems, > which have innovative and performant storage and runtime systems but lack a > SQL interface and intelligent query translation. > > Optiq is already in use by several projects, including Apache Drill, Apache > Hive and Cascading Lingual, and commercial products. > > Optiq's architecture consists of: > > An extensible relational algebra. > SPIs (service-provider interfaces) for metadata (schemas and tables), > planner rules, statistics, cost-estimates, user-defined functions. > Built-in sets of rules for logical transformations and common data-sources. > Two query planning engines driven by rules, statistics, etc. One engine is > cost-based, the other rule-based. > Optional SQL parser, validator and translator to relational algebra. > Optional JDBC driver. > == Initial Goals == > > The initial goals are be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is accomplished, > we plan for incremental development and releases that follow the Apache > guidelines. > > As we move the code into the org.apache namespace, we will restructure > components as necessary to allow clients to use just th
RE: [PROPOSAL] Optiq
I see a lot of conceptual similarity between Optiq and the Apache MetaModel (incubator) project [1]. Maybe something can be done to align the two projects, so that we avoid having two incubating projects that do basically the same thing? Or maybe there's some glaring difference that I am missing? At least it seems to me both to be projects that try to provide uniform querying capabilities to a wide array of data backends. Both project also favor a type-safe Java querying API instead of a String/SQL oriented query API. Regards, Kasper Sørensen [1] http://metamodel.incubator.apache.org/ From: Ashutosh Chauhan [hashut...@apache.org] Sent: 01 May 2014 00:21 To: general@incubator.apache.org Subject: [PROPOSAL] Optiq I would like to propose Optiq as an Apache Incubator project. I have posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and posted the text of the proposal below. Ashutosh. = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. Built-in sets of rules for logical transformations and common data-sources. Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. Optional SQL parser, validator and translator to relational algebra. Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better stat
Re: [PROPOSAL] Optiq
Optiq has been a key technology underscoring the progress of Drill. It has wide applicability for any project that needs SQL parsing and cost based optimization. Julian has been carrying this torch for a long time, but I really think that having a wider community would help. On Thu, May 1, 2014 at 12:21 AM, Ashutosh Chauhan wrote: > I would like to propose Optiq as an Apache Incubator project. I have > posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and > posted the text of the proposal below. > > Ashutosh. > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand from their customers > is overwhelming. > > There is an opportunity to make this process more efficient by creating a > re-usable framework. > > == Rationale == > > Optiq allows database-like access, and in particular a SQL interface and > advanced query optimization, for data not residing in a traditional > database. It is complementary to many current Hadoop and NoSQL systems, > which have innovative and performant storage and runtime systems but lack a > SQL interface and intelligent query translation. > > Optiq is already in use by several projects, including Apache Drill, Apache > Hive and Cascading Lingual, and commercial products. > > Optiq's architecture consists of: > > An extensible relational algebra. > SPIs (service-provider interfaces) for metadata (schemas and tables), > planner rules, statistics, cost-estimates, user-defined functions. > Built-in sets of rules for logical transformations and common data-sources. > Two query planning engines driven by rules, statistics, etc. One engine is > cost-based, the other rule-based. > Optional SQL parser, validator and translator to relational algebra. > Optional JDBC driver. > == Initial Goals == > > The initial goals are be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is accomplished, > we plan for incremental development and releases that follow the Apache > guidelines. > > As we move the code into the org.apache namespace, we will restructure > components as necessary to allow clients to use just the components of > Optiq that they need. > > A version 1.0 release, including pre-built binaries, will foster wider > adoption. > > == Current Status == > > Optiq has had over a dozen minor releases over the last 18 months. Its core > SQL parser and validator, and its planning engine and core rules, are > mature and robust and are the basis for several production systems; but > other components and SPIs are still undergoing rapid evolution. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. We encourage the companies and projects > using Optiq to discuss their requirements in an open forum and to > participate in development. We will encourage and monitor community > participation so that privileges can be extended to those that contribute. > > Optiq's pluggable architecture encourages developers to contribute > extensions such as adapters for data sources, new planning rules, and > better statistics and cost-estimation functions. We look forward to > fostering a rich ecosystem of extensions. > > === Community === > > Building a data management system requires a high degree of technical > skill, and correspondingly, the community of developers dir