Good section. I do agree to what it says and somehow hope we can eventually help each other out with e.g. a library of adaptors.
-----Original Message----- From: Julian Hyde [mailto:julianh...@gmail.com] Sent: 8. maj 2014 20:03 To: general@incubator.apache.org Subject: Re: [PROPOSAL] Optiq The "Relationships with Other Apache Products" section has been updated to cover Optiq's functional overlaps with existing Apache projects. https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products Julian On May 2, 2014, at 11:23 AM, Henry Saputra <henry.sapu...@gmail.com> wrote: > Ah sorry, I did not mean "asking to update", I meant "proposing to update". > > Thanks, > > - Henry > > On Fri, May 2, 2014 at 11:20 AM, Henry Saputra <henry.sapu...@gmail.com> > wrote: >> HI Ashutosh, >> >> Since there was a question/ comment about relationship with Apache >> MetaModel, I am asking to update the proposal to include this >> discussion in either "Relationships with Other Apache Products" or >> "Alignment" section before going for a VOTE. >> >> Apache Slider did the same thing with relation to Apache Twill and >> Apache Helix projects. >> >> Thanks, >> >> - Henry >> >> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan <hashut...@apache.org> >> wrote: >>> I would like to propose Optiq as an Apache Incubator project. I >>> have posted the proposal to >>> https://wiki.apache.org/incubator/OptiqProposal and posted the text of the >>> proposal below. >>> >>> Ashutosh. >>> >>> = Optiq = >>> == Abstract == >>> >>> Optiq is a framework that allows efficient translation of queries >>> involving heterogeneous and federated data. >>> >>> == Proposal == >>> >>> Optiq is a highly customizable engine for parsing and planning >>> queries on data in a wide variety of formats. It allows >>> database-like access, and in particular a SQL interface and advanced >>> query optimization, for data not residing in a traditional database. >>> >>> == Background == >>> >>> Databases were traditionally engineered in a monolithic stack, >>> providing a data storage format, data processing algorithms, query >>> parser, query planner, built-in functions, metadata repository and >>> connectivity layer. >>> They innovate in some areas but rarely in all. >>> >>> Modern data management systems are decomposing that stack into >>> separate components, separating data, processing engine, metadata, >>> and query language support. They are highly heterogeneous, with data >>> in multiple locations and formats, caching and redundant data, >>> different workloads, and processing occurring in different engines. >>> >>> Query planning (sometimes called query optimization) has always been >>> a key function of a DBMS, because it allows the implementors to >>> introduce new query-processing algorithms, and allows data >>> administrators to re-organize the data without affecting >>> applications built on that data. In a componentized system, the >>> query planner integrates the components (data formats, engines, >>> algorithms) without introducing unncessary coupling or performance >>> tradeoffs. >>> >>> But building a query planner is hard; many systems muddle along >>> without a planner, and indeed a SQL interface, until the demand from >>> their customers is overwhelming. >>> >>> There is an opportunity to make this process more efficient by >>> creating a re-usable framework. >>> >>> == Rationale == >>> >>> Optiq allows database-like access, and in particular a SQL interface >>> and advanced query optimization, for data not residing in a >>> traditional database. It is complementary to many current Hadoop and >>> NoSQL systems, which have innovative and performant storage and >>> runtime systems but lack a SQL interface and intelligent query translation. >>> >>> Optiq is already in use by several projects, including Apache Drill, >>> Apache Hive and Cascading Lingual, and commercial products. >>> >>> Optiq's architecture consists of: >>> >>> An extensible relational algebra. >>> SPIs (service-provider interfaces) for metadata (schemas and >>> tables), planner rules, statistics, cost-estimates, user-defined functions. >>> Built-in sets of rules for logical transformations and common data-sources. >>> Two query planning engines driven by rules, statistics, etc. One >>> engine is cost-based, the other rule-based. >>> Optional SQL parser, validator and translator to relational algebra. >>> Optional JDBC driver. >>> == Initial Goals == >>> >>> The initial goals are be to move the existing codebase to Apache and >>> integrate with the Apache development process. Once this is >>> accomplished, we plan for incremental development and releases that >>> follow the Apache guidelines. >>> >>> As we move the code into the org.apache namespace, we will >>> restructure components as necessary to allow clients to use just the >>> components of Optiq that they need. >>> >>> A version 1.0 release, including pre-built binaries, will foster >>> wider adoption. >>> >>> == Current Status == >>> >>> Optiq has had over a dozen minor releases over the last 18 months. >>> Its core SQL parser and validator, and its planning engine and core >>> rules, are mature and robust and are the basis for several >>> production systems; but other components and SPIs are still undergoing >>> rapid evolution. >>> >>> === Meritocracy === >>> >>> We plan to invest in supporting a meritocracy. We will discuss the >>> requirements in an open forum. We encourage the companies and >>> projects using Optiq to discuss their requirements in an open forum >>> and to participate in development. We will encourage and monitor >>> community participation so that privileges can be extended to those that >>> contribute. >>> >>> Optiq's pluggable architecture encourages developers to contribute >>> extensions such as adapters for data sources, new planning rules, >>> and better statistics and cost-estimation functions. We look forward >>> to fostering a rich ecosystem of extensions. >>> >>> === Community === >>> >>> Building a data management system requires a high degree of >>> technical skill, and correspondingly, the community of developers >>> directly using Optiq is potentially fairly small, albeit highly >>> technical and engaged. But we also expect engagement from members of >>> the communities of projects that use Optiq, such as Drill and Hive. >>> And we intend to structure Optiq so that it can be used for lighter >>> weight applications, such as providing a SQL and JDBC interface to a NoSQL >>> system. >>> >>> === Core Developers === >>> >>> The developers on the initial committers list are all experienced >>> open source developers, and are actively using Optiq in their projects. >>> >>> * Julian Hyde is lead developer of Mondrian, an open source OLAP >>> engine, and an Apache Drill committer. >>> * Chris Wensel is lead developer of Cascading, and of Lingual, the >>> SQL interface to Cascading built using Optiq. >>> * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq. >>> >>> In addition, there are several regular contributors whom we hope >>> will graduate to committers during the incubation process. >>> >>> We realize that additional employer diversity is needed, and we will >>> work aggressively to recruit developers from additional companies. >>> >>> === Alignment === >>> >>> Apache, and in particular the ecosystem surrounding Hadoop, contains >>> several projects for building data management systems that leverage >>> each other's capabilities. Optiq is a natural fit for that >>> ecosystem, and will help foster projects meeting new challenges. >>> >>> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds >>> Apache Spark as an optional engine; we are in discussion with Apache >>> Phoenix about integrating JDBC and query planning. >>> >>> == Known Risks == >>> >>> === Orphaned Products === >>> >>> Optiq is already a key component in three independent projects, each >>> backed by a different company, so the risk of being orphaned is >>> relatively low. We plan to mitigate this risk by recruiting >>> additional committers, and promoting Optiq's adoption as a framework by >>> other projects. >>> >>> === Inexperience with Open Source === >>> >>> The initial committers are all Apache members, some of whom have >>> several years in the Apache Hadoop community. The founder of the >>> project, Julian Hyde, has been a founder and key developer in open >>> source projects for over ten years. >>> >>> === Homogenous Developers === >>> >>> The initial committers are employed by a number of companies, >>> including Concurrent, Hortonworks, MapR Technologies and >>> Salesforce.com. We are committed to recruiting additional committers from >>> outside these companies. >>> >>> === Reliance on Salaried Developers === >>> >>> Like most open source projects, Optiq receives substantial support >>> from salaried developers. This is to be expected given that it is a >>> highly technical framework. However, they are all passionate about >>> the project, and we are confident that the project will continue >>> even if no salaried developers contribute to the project. As a >>> framework, the project encourages the involvement of members of >>> other projects, and of academic researchers. We are committed to >>> recruiting additional committers including non-salaried developers. >>> >>> === Relationships with Other Apache Products === >>> >>> As mentioned in the Alignment section, Optiq is being used by Apache >>> Hive and Apache Drill, and has adapters for Apache Phoenix and Apache Spark. >>> Optiq often operates on data in a Hadoop environment, so >>> collaboration with other Hadoop projects is desirable and highly likely. >>> >>> === An Excessive Fascination with the Apache Brand === >>> >>> Optiq solves a real problem, as evidenced by its take-up by other projects. >>> This proposal is not for the purpose of generating publicity. >>> Rather, the primary benefits to joining Apache are those outlined in >>> the Rationale section. >>> >>> == Documentation == >>> >>> Additional documentation for Optiq may be found on its github site: >>> >>> * >>> [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview >>> ]] >>> * [[ >>> https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tuto >>> rial]] >>> * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]] >>> * >>> [[https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Refer >>> enceguide]] >>> >>> === Presentation: === >>> >>> *[[ >>> https://github.com/julianhyde/share/blob/master/slides/optiq-richrel >>> evance-2013.pdf?raw=true| >>> SQL on Big Data using Optiq]] >>> == Initial Source == >>> >>> The initial code codebase resides in three projects, all hosted on github: >>> >>> * https://github.com/julianhyde/optiq >>> * https://github.com/julianhyde/optiq-csv >>> * https://github.com/julianhyde/linq4j >>> >>> === Source and Intellectual Property Submission Plan === >>> >>> The initial codebase is already distributed under the Apache 2.0 License. >>> The owners of the IP have indicated willingness to sign the SGA. >>> >>> === External Dependencies === >>> >>> Optiq and Linq4j have the following external dependencies. >>> >>> * Java 1.6, 1.7 or 1.8 >>> * Apache Maven, Commons >>> * JavaCC (BSD license) >>> * Sqlline 1.1.6 (BSD license) >>> * Junit 4.11 (EPL) >>> * Janino (BSD license) >>> * Guava (Apache 2.0 license) >>> * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0 >>> license) >>> >>> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark, >>> optiq-splunk) are currently developed alongside core Optiq, and have >>> the following additional dependencies: >>> >>> * Open CSV 2.3 (Apache 2.0 license) >>> * Apache Incubator Spark >>> * Mongo Java driver (Apache 2.0 license) Upon acceptance to the >>> incubator, we would begin a thorough analysis of all transitive >>> dependencies to verify this information and introduce license >>> checking into the build and release process by integrating with Apache Rat. >>> >>> === Cryptography === >>> >>> Optiq will eventually support encryption on the wire. This is not >>> one of the initial goals, and we do not expect Optiq to be a >>> controlled export item due to the use of encryption. >>> >>> == Required Resources == >>> >>> === Mailing Lists === >>> >>> * priv...@optiq.incubator.apache.org >>> * d...@optiq.incubator.apache.org (will be migrated from >>> optiq-...@googlegroups.com) >>> * comm...@optiq.incubator.apache.org >>> >>> === Source control === >>> >>> The Optiq team would like to use git for source control, due to our >>> current use of git/github. We request a writeable git repo git:// >>> git.apache.org/incubator-optiq, and mirroring to be set up to github >>> through INFRA. >>> >>> === Issue Tracking === >>> >>> Optiq currently uses the github issue tracking system associated >>> with its github repo: https://github.com/julianhyde/optiq/issues. We >>> will migrate to the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ. >>> >>> == Initial Committers == >>> >>> * Julian Hyde (jhyde at apache dot org) >>> * Jacques Nadeau (jacques at apache dot org) >>> * James R. Taylor (jamestaylor at apache dot org) >>> * Chris Wensel (cwensel at apache dot org) >>> >>> === Affiliations === >>> >>> The initial committers are employees of Concurrent, Hortonworks, >>> MapR and Salesforce.com. >>> >>> * Julian Hyde (Hortonworks) >>> * Jacques Nadeau (MapR Technologies) >>> * James R. Taylor (Salesforce.com) >>> * Chris Wensel (Concurrent) >>> >>> == Sponsors == >>> >>> === Champion === >>> >>> * Ashutosh Chauhan (hashutosh at apache dot org) >>> >>> === Nominated Mentors === >>> >>> * Ted Dunning (tdunning at apache dot org) - Chief Application >>> Architect at MapR Technologies; committer for Lucene, Mahout and ZooKeeper. >>> * Alan Gates (gates at apache dot org) - Architect at Hortonworks; >>> committer for Pig, Hive and others. >>> * Steven Noels (stevenn at apache dot org) - Chief Technical Officer >>> at NGDATA; committer for Cocoon and Forrest, mentor for Phoenix. >>> >>> === Sponsoring Entity === >>> >>> The Apache Incubator. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org