Re: [VOTE] Accept Optiq into the incubator
With 6 +1s vote passes. Thanks everyone for taking time to vote. Vote thread is now closed. I will proceed with next steps now. Thanks, Ashutosh On Mon, May 12, 2014 at 12:53 PM, Suresh Srinivas sur...@hortonworks.comwrote: +1 (binding) On Fri, May 9, 2014 at 11:03 AM, Ashutosh Chauhan hashut...@apache.org wrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute
Re: [VOTE] Accept Optiq into the incubator
+1 (non-binding) On Fri, May 9, 2014 at 11:33 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using
Re: [VOTE] Accept Optiq into the incubator
+1 - binding Regards, Alan On May 9, 2014, at 11:03 AM, Ashutosh Chauhan hashut...@apache.org wrote: [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ...
Re: [VOTE] Accept Optiq into the incubator
+1. Alan. On May 9, 2014, at 11:03 AM, Ashutosh Chauhan hashut...@apache.org wrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers
Re: [VOTE] Accept Optiq into the incubator
+1 (non-binding) 2014-05-12 17:11 GMT+02:00 Alan Gates ga...@hortonworks.com: +1. Alan. On May 9, 2014, at 11:03 AM, Ashutosh Chauhan hashut...@apache.org wrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich
Re: [VOTE] Accept Optiq into the incubator
+1 (binding) On Fri, May 9, 2014 at 11:03 AM, Ashutosh Chauhan hashut...@apache.orgwrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using Optiq
Re: [VOTE] Accept Optiq into the incubator
+1 On Sat, May 10, 2014 at 2:03 AM, Ashutosh Chauhan hashut...@apache.orgwrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using Optiq is
[VOTE] Accept Optiq into the incubator
Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using Optiq is potentially fairly small, albeit highly technical and engaged. But we also expect engagement from members of the communities of projects that use Optiq, such as Drill and Hive.
Re: [VOTE] Accept Optiq into the incubator
On Fri, May 9, 2014 at 8:03 PM, Ashutosh Chauhan hashut...@apache.orgwrote: [X] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The appearance of Stratosphere is a clear typo. I still vote to accept.
Re: [VOTE] Accept Optiq into the incubator
+1 -C On Fri, May 9, 2014 at 11:03 AM, Ashutosh Chauhan hashut...@apache.org wrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using Optiq is