Re: [VOTE] Graduate Lens from the Incubator
+1 (binding) On Sat, Jul 25, 2015 at 3:49 AM, Jakob Homan jgho...@gmail.com wrote: Following two positive discussions[1][2] about its curent status, the Lens community has voted[3] to graduate from the Incubator. The vote passed with 22 +1s: Binding +1 x 14: {Jakob, Jean-Baptiste, Yash, Amareshwari, Sharad, Raghavendra, Raju, Jaideep, Suma, Himanshu, Rajat, Srikanth, Chris, Arshad} Non-binding +1 x 8: {Jothi, Kartheek, Tushar, Nitin, Pranav, Deepak, Ajay, Naresh} The Lens community has: * completed all required paperwork: https://incubator.apache.org/projects/lens.html * completed multiple releases (2.0.1-beta, 2.1.-beta, 2.2.0-beta) * completed the name check procedure: https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-63 * opened nearly 700 JIRAs: https://issues.apache.org/jira/issues/?jql=project%20%3D%20LENS * voted in multiple new committers/PPMC members. * been recommended as ready to graduate by the Incubator's shepards: https://wiki.apache.org/incubator/July2015 Therefore, I'm calling a VOTE to graduate Lens with the following Board resolution. The VOTE will run 96 hours (an extra day since we're starting on a Friday), ending Tuesday July 28 4 PM PST. [ ] +1 Graduate Apache Lens from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Lens from the Incubator because ... Here's my binding vote: +1. -Jakob [1] http://s.apache.org/LensGradDiscuss1 [2] http://s.apache.org/LensGradDiscuss2 [3] http://s.apache.org/LensGradVotePPMC Apache Lens graduation resolution draft WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to unified analytics across multiple tiered data stores. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Lens Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Lens Project be and hereby is responsible for the creation and maintenance of software related to unified analytics across multiple tiered data stores; and be it further RESOLVED, that the office of Vice President, Apache Lens be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Lens Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Lens Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Lens Project: * Amareshwari Sriramadasu amareshwari at apache dot org * Arshad Matin arshadmatin at apache dot org * Gunther Hagleitner gunther at apache dot org * Himanshu Gahlaut himanshugahlaut at apache dot org * Jaideep Dhok jdhok at apache dot org * Jean Baptiste Onofre jbonofre at apache dot org * Raghavendra Singh raghavsingh at apache dot org * Rajat Khandelwal prongs at apache dot org * Raju Bairishetti raju at apache dot org * Sharad Agarwal sharad at apache dot org * Sreekanth Ramakrishnan sreekanth at apache dot org * Srikanth Sundarrajan sriksun at apache dot org * Suma Shivaprasad sumasai at apache dot org * Vikram Dixit vikram at apache dot org * Vinod Kumar Vavilapalli vinodkv at apache dot org * Yash Sharma yash at apache dot org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Amareshwari Sriramadasu be appointed to the office of Vice President, Apache Lens, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Lens Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Lens podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Lens podling encumbered upon the Apache Incubator Project are hereafter discharged. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release of Apache Lens 2.2.0-beta-incubating
+1 (binding) On Sun, Jul 12, 2015 at 8:39 AM, Jaideep Dhok jaideep.d...@inmobi.com wrote: Hello everyone, This is the call for vote for the following RC to be released as official Apache Lens 2.2.0-beta-incubating release. This is our third release. Apache Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one. Vote on dev list: http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThzQCHdYVASR35zeYqHj_tWo93GuzTLzCrmRAq3qMjecg%40mail.gmail.com%3E Result of vote on dev list: http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThOEAeMiNdtef%3D35QxRLetryRFKs3ED-oeCh2xi1KEqww%40mail.gmail.com%3E The commit id is 9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f - https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f This corresponds to the tag: apache-lens-2.2.0-beta-incubating - https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.2.0-beta-incubating The release archives (tar.gz/.zip), signature, and checksums are here: * https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/ https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/ * You can find the KEYS file here: * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS The release candidate consists of the following source distribution archive: apache-lens-2.2.0-beta-incubating-source-release.zip In addition, the following supplementary binary distributions are provided for user convenience at the same location: apache-lens-2.2.0-beta-incubating-bin.tar.gz The licensing of bundled bits in the archives have not changed from previous release, which are documented https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens The Nexus Staging URL: https://repository.apache.org/content/repositories/orgapachelens-1005 Release notes available at https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329586projectId=12315923 Vote will be open for at least 72 hours [ ] +1 approve [ ] 0 no opinion [ ] -1 disapprove (and reason why) +1 from my side for the release. Thanks, Jaideep Dhok -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: [VOTE] Release of Apache Lens 2.1.0-beta-incubating
+1 (binding) On Thu, Apr 30, 2015 at 5:35 PM, Amareshwari Sriramdasu amareshw...@apache.org wrote: Hello everyone, This is the call for vote for the following RC to be released as official ApacheLens 2.1.0-beta-incubating release. This is our second release. Apache Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one. Vote on dev list: http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZfT4HDK3c4rKxPg0_Kkc8KDfRjUr%2BHmKaJH44H77OeU0g%40mail.gmail.com%3E Results of vote on dev list: http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZe7rbjbwoiiOWKL8Lef%3Dsc%2BXcV173aiQ6Tpdwq7jz9ycQ%40mail.gmail.com%3E The commit id is fdd19b9c2b17e329465cbde62dbce6f8be435cec : https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=commit;h=fdd19b9c2b17e329465cbde62dbce6f8be435cec This corresponds to the tag: apache-lens-2.1.0-beta-incubating : https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.1.0-beta-incubating The release archives (tar.gz/.zip), signature, and checksums are here: https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.1.0-beta-incubating-rc0 You can find the KEYS file here: * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS The release candidate consists of the following source distribution archive: apache-lens-2.1.0-beta-incubating-source-release.zip In addition, the following supplementary binary distributions are provided for user convenience at the same location: apache-lens-2.1.0-beta-incubating-bin.tar.gz The licensing of bundled bits in the archives have not changed from previous release, which are documented at https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens The Nexus Staging URL: https://repository.apache.org/content/repositories/orgapachelens-1003 Release notes available at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315923version=12328991 Vote will be open for at least 72 hours . Please vote on releasing this RC [ ] +1 approve [ ] 0 no opinion [ ] -1 disapprove (and reason why) Thanks, Amareshwari
Re: [VOTE] Accept Apache Atlas into Apache Incubator
+1 (binding) On Fri, May 1, 2015 at 12:56 PM, Seetharam Venkatesh venkat...@innerzeal.com wrote: Hello folks, Following the discussion earlier in the thread: http://s.apache.org/r2 I would like to call a VOTE for accepting Apache Atlas as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/AtlasProposal Also, the text of the latest wiki proposal is included at the bottom of this email. The VOTE is open for at least the next 72 hours: [ ] +1 accept Apache Atlas into the Apache Incubator [ ] ±0 Abstain [ ] -1 because... Of course I am +1! (non-binding) Thanks! = Apache Atlas Proposal = == Abstract == Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem. == Proposal == Apache Atlas allows agnostic governance visibility into Hadoop, these abilities are enabled through a set of core foundational services powered by a flexible metadata repository. These services include: * Search and Lineage for datasets * Metadata driven data access control * Indexed and Searchable Centralized Auditing operational Events * Data lifecycle management – ingestion to disposition * Metadata interchange with other metadata tools == Background == Hadoop is one of many platforms in the modern enterprise data ecosystem and requires governance controls commensurate with this reality. Currently, there is no easy or complete way to provide comprehensive visibility and control into Hadoop audit, lineage, and security for workflows that require Hadoop and non-Hadoop processing. Many solutions are usually point based, and require a monolithic application workflow. Multi-tenancy and concurrency are problematic as these offerings are not aware of activity outside of their narrow focus. As Hadoop gains greater popularity, governance concerns will become increasingly vital to increasing maturity and furthering adoption. It is a particular barrier to expanding enterprise data under management. == Rationale == Atlas will address issues previously discussed by providing governance capabilities in Hadoop -- using both a prescriptive and forensic model enriched by business taxonomical metadata.Atlas, at its core, is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack -- enable governance controls that are truly platform agnostic and effectively (and defensibly) address compliance concerns. Initially working with a group of leading partners in several industries, Atlas is built to solve specific real world governance problems that accelerate product maturity and time to value. Atlas aims to grow a community to help build a widely adopted pattern for governance, metadata modeling and exchange in Hadoop – which will advance the interests for the whole community. == Current Status == An initial version with a valuable set of features is developed by the list of initial committers and is hosted on github. === Meritocracy === Our intent with this proposal is to start building a diverse developer community around Atlas following the Apache meritocracy model. We have wanted to make the project open source and encourage contributors from multiple organizations from the start. We plan to provide plenty of support to new developers and to quickly recruit those who make solid contributions to committer status. === Community === We are happy to report that the initial team already represents multiple organizations. We hope to extend the user and developer base further in the future and build a solid open source community around Atlas. === Core Developers === Atlas development is currently being led by engineers from Hortonworks – Harish Butani, Venkatesh Seetharam, Shwetha G S, and Jon Maron. All the engineers have deep expertise in Hadoop and are quite familiar with the Hadoop Ecosystem. === Alignment === The ASF is a natural host for Atlas given that it is already the home of Hadoop, Falcon, Hive, Pig, Oozie, Knox, Ranger, and other emerging “big data” software projects. Atlas has been designed to solve the data governance challenges and opportunities of the Hadoop ecosystem family of products as well as integration to the tradition Enterprise Data ecosystem. Atlas fills the gap that the Hadoop Ecosystem has been lacking in the areas of data governance and compliance management. == Known Risks == === Orphaned products Reliance on Salaried Developers === The core developers plan to work full time on the project. There is very little risk of Atlas getting orphaned. A prototype of Atlas is in use and being actively developed by several companies and have vested interest in its continued vitality and adoption. === Inexperience with
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (non-binding) On Fri, Dec 19, 2014 at 10:59 AM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors,
[RESULT] [VOTE] Accept Lens into the Apache Incubator (earlier called Grill)
The vote has passed with 9 binding +1, 5 non binding +1, no 0 and -1s. Binding +1s : Jean Baptiste Jan i Alan D Cabrera Jakob Homan Chris Douglas Roman Shaposhnik Joe Brockmeier Vinod K V Suresh Srinivas Non Binding +1s: Sharad Agarwal Amareshwari S Seetharam Venkatesh Srikanth Sundarrajan Ashish Thanks everyone for voting. We will proceed with the next steps as per the IPMC guidelines. Thanks Sharad On Mon, Oct 6, 2014 at 5:21 PM, Sharad Agarwal sha...@apache.org wrote: Following the discussion earlier in the thread https://www.mail-archive.com/general@incubator.apache.org/msg45208.html I would like to call a Vote for accepting Lens as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/LensProposal Vote is open till Oct 09, 2014 4 PM PST. [ ] +1 accept Lens in the Incubator [ ] +/-0 [ ] -1 because... Only Votes from Incubator PMC members are binding, but all are welcome to express their thoughts. I am +1 (non-binding). Thanks Sharad
[VOTE] Accept Lens into the Apache Incubator (earlier called Grill)
Following the discussion earlier in the thread https://www.mail-archive.com/general@incubator.apache.org/msg45208.html I would like to call a Vote for accepting Lens as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/LensProposal Vote is open till Oct 09, 2014 4 PM PST. [ ] +1 accept Lens in the Incubator [ ] +/-0 [ ] -1 because... Only Votes from Incubator PMC members are binding, but all are welcome to express their thoughts. I am +1 (non-binding). Thanks Sharad
Re: [PROPOSAL] Grill as new Incubator project
The discussion seems to be settled down. I will start the vote thread on Lens shortly.
Re: [PROPOSAL] Grill as new Incubator project
Lens has the functional test suite that includes cube ddls, queries, test data, scripts etc that requires standard build and test infra. On Sep 27, 2014 3:45 AM, David Nalley da...@gnsa.us wrote: currently employed by SoftwareAG. Raghavendra Singh from InMobi has built the QA automation for Grill. What kind of QA environment does Drill/Lens have currently? How much do you expect to need going forward? --David
Re: [PROPOSAL] Grill as new Incubator project
Thanks Ted. We have renamed the proposal to Lens. The proposal is pasted here - https://wiki.apache.org/incubator/LensProposal Thanks Sharad On Tue, Sep 23, 2014 at 12:04 AM, Ted Dunning ted.dunn...@gmail.com wrote: Both Lens and Blend are nice names. Nice connotations as well. I am slightly stunned by a quick search on the name Lens. I only found one software package with that name (and it was for lens calibration so far from databases). A name like that is usually massively over used. This might be a really nice opportunity to get a nice one syllable name. On Mon, Sep 22, 2014 at 2:59 AM, Sharad Agarwal sha...@apache.org wrote: Based on the feedback, we are considering to rename the project. Please provide feedback on following names: Apache Lens Apache Blend Thanks, Sharad
Re: [PROPOSAL] Grill as new Incubator project
Based on the feedback, we are considering to rename the project. Please provide feedback on following names: Apache Lens Apache Blend Thanks, Sharad
Re: [PROPOSAL] Grill as new Incubator project
Chris, Thanks for your comments. The differences that I see are: - SciDB exposes Array Data model and Array Query Language (AQL). Grill data model is based on OLAP Fact and Dimensions. Grill exposes SQL like language (a subset of Hive QL) that works on *logical* entities (facts, dimensions) - The goal of Grill is not to build a new query execution database, but to unify them by having a central metadata catalog, and provide a Cube abstraction layer on top of it. Thanks, Sharad On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: This sounds super cool! How does this relate to SciDB? is it trying to do a similar thing? Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sharad Agarwal sha...@apache.org Reply-To: general@incubator.apache.org general@incubator.apache.org, sha...@apache.org sha...@apache.org Date: Thursday, September 18, 2014 8:54 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [PROPOSAL] Grill as new Incubator project Grill Proposal == # Abstract Grill is a platform that enables multi-dimensional queries in a unified way over datasets stored in multiple warehouses. Grill integrates Apache Hive with other data warehouses by tiering them together to form logical data cubes. # Proposal Grill provides a unified Cube abstraction for data stored in different stores. Grill tiers multiple data warehouses for unified representation and efficient access. It provides SQL-like Cube query language to query and describe data sets organized in data cubes. It enables users to run queries against Facts and Dimensions that can span multiple physical tables stored in different stores. The primary use cases that Grill aims to solve: - Facilitate analytical queries by providing the OLAP like Cube abstraction - Data Discovery by providing single metadata layer for data stored in different stores - Unified access to data by integrating Hive with other traditional data warehouses # Background Apache Hive is a data warehouse that facilitates querying and managing large datasets stored in distributed storage systems like HDFS. It provides SQL like language called HiveQL aka HQL. Apache Hive is a widely used platform in various organizations for doing adhoc analytical queries. In a typical Data warehouse scenario, the data is multi-dimensional and organized into Facts and Dimensions to form Data Cubes. Grill provides this logical layer to enable querying and manage data as Cubes. The Grill project is actively being developed at InMobi to provide the higher level of analytical abstraction to query data stored in different storages including Hive and beyond seamlessly. # Rationale The Grill project aims to ease the analytical querying capabilities and cut the data-silos by providing a single view of data across multiple data stores. Conceiving data as a cube with hierarchical dimensions leads to conceptually straightforward operations to facilitate analysis. Integrating Apache Hive with other traditional warehouses provides the opportunity to optimize on the query execution cost by tiering the data across multiple warehouses. Grill provides - Access to data Cubes via Cube Query language similar to HiveQL. - Driver based architecture to allow for plugging systems like Hive and other warehouses such as columnar data RDBMS. - Cost based engine selection that provides optimal use of resources by selecting the best execution engine for a given query. In a typical Data warehouse, data is organized in Cubes with multiple dimensions and measures. This facilitates the analysis by conceiving the data in terms of Facts and Dimensions instead of physical tables. Grill aims to provide this logical Cube abstraction on Data warehouses like Hive and other traditional warehouses. # Initial Goals - Donate the Grill source code and documentation to Apache Software Foundation - Build a user and developer community - Support Hive and other Columnar data warehouses - Support full query life cycle management - Add authentication for querying cubes - Provide detailed query statistics # Long Term Goals Here are some longer-term capabilities that would be added to Grill - Add authorization for managing and querying Cubes - Provide REST and CLI for full Admin
Re: [PROPOSAL] Grill as new Incubator project
Chris, Multi-dimensional here is in the context of OLAP cube - http://en.wikipedia.org/wiki/OLAP_cube Grill data model consists of set of measures which can be analysed on different dimensions. For remote sensing, data can be modelled as cube - measurement on various set of attributes(dimensions) as Facts; and time and space can be thought of dimensions. Yes, it supports numerical data. Ted, Both are in same general area, but I think there is very little chance of confusion as clearly their propositions are completely different. And both words are simple and widely used nouns. We liked the name Grill as it is simple to spell and pronounce, and in some way convey the project's meaning - to question intensely. Thanks, Sharad On Sat, Sep 20, 2014 at 12:11 AM, Ted Dunning ted.dunn...@gmail.com wrote: There is a strong phonetic similarity to Apache Drill, a project in the same general domain. Is the Grill name already baked in (pun intended)? On Fri, Sep 19, 2014 at 7:24 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Thank you Sharad. So I could use this system for remote sensing data, like 3-dimension (time, space, and measurement) type of cubes? Does it support numerical data well? Sorry for so many questions just excited :) ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sharad Agarwal sha...@apache.org Reply-To: sha...@apache.org sha...@apache.org Date: Friday, September 19, 2014 4:06 AM To: Chris Mattmann chris.a.mattm...@jpl.nasa.gov Cc: general@incubator.apache.org general@incubator.apache.org Subject: Re: [PROPOSAL] Grill as new Incubator project Chris, Thanks for your comments. The differences that I see are: - SciDB exposes Array Data model and Array Query Language (AQL). Grill data model is based on OLAP Fact and Dimensions. Grill exposes SQL like language (a subset of Hive QL) that works on *logical* entities (facts, dimensions) - The goal of Grill is not to build a new query execution database, but to unify them by having a central metadata catalog, and provide a Cube abstraction layer on top of it. Thanks, Sharad On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: This sounds super cool! How does this relate to SciDB? is it trying to do a similar thing? Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sharad Agarwal sha...@apache.org Reply-To: general@incubator.apache.org general@incubator.apache.org , sha...@apache.org sha...@apache.org Date: Thursday, September 18, 2014 8:54 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [PROPOSAL] Grill as new Incubator project Grill Proposal == # Abstract Grill is a platform that enables multi-dimensional queries in a unified way over datasets stored in multiple warehouses. Grill integrates Apache Hive with other data warehouses by tiering them together to form logical data cubes. # Proposal Grill provides a unified Cube abstraction for data stored in different stores. Grill tiers multiple data warehouses for unified representation and efficient access. It provides SQL-like Cube query language to query and describe data sets organized in data cubes. It enables users to run queries against Facts and Dimensions that can span multiple physical tables stored in different stores. The primary use cases that Grill aims to solve: - Facilitate analytical queries by providing the OLAP like Cube abstraction - Data Discovery by providing single metadata layer for data stored in different stores - Unified access to data by integrating Hive with other traditional data warehouses
[PROPOSAL] Grill as new Incubator project
Grill Proposal == # Abstract Grill is a platform that enables multi-dimensional queries in a unified way over datasets stored in multiple warehouses. Grill integrates Apache Hive with other data warehouses by tiering them together to form logical data cubes. # Proposal Grill provides a unified Cube abstraction for data stored in different stores. Grill tiers multiple data warehouses for unified representation and efficient access. It provides SQL-like Cube query language to query and describe data sets organized in data cubes. It enables users to run queries against Facts and Dimensions that can span multiple physical tables stored in different stores. The primary use cases that Grill aims to solve: - Facilitate analytical queries by providing the OLAP like Cube abstraction - Data Discovery by providing single metadata layer for data stored in different stores - Unified access to data by integrating Hive with other traditional data warehouses # Background Apache Hive is a data warehouse that facilitates querying and managing large datasets stored in distributed storage systems like HDFS. It provides SQL like language called HiveQL aka HQL. Apache Hive is a widely used platform in various organizations for doing adhoc analytical queries. In a typical Data warehouse scenario, the data is multi-dimensional and organized into Facts and Dimensions to form Data Cubes. Grill provides this logical layer to enable querying and manage data as Cubes. The Grill project is actively being developed at InMobi to provide the higher level of analytical abstraction to query data stored in different storages including Hive and beyond seamlessly. # Rationale The Grill project aims to ease the analytical querying capabilities and cut the data-silos by providing a single view of data across multiple data stores. Conceiving data as a cube with hierarchical dimensions leads to conceptually straightforward operations to facilitate analysis. Integrating Apache Hive with other traditional warehouses provides the opportunity to optimize on the query execution cost by tiering the data across multiple warehouses. Grill provides - Access to data Cubes via Cube Query language similar to HiveQL. - Driver based architecture to allow for plugging systems like Hive and other warehouses such as columnar data RDBMS. - Cost based engine selection that provides optimal use of resources by selecting the best execution engine for a given query. In a typical Data warehouse, data is organized in Cubes with multiple dimensions and measures. This facilitates the analysis by conceiving the data in terms of Facts and Dimensions instead of physical tables. Grill aims to provide this logical Cube abstraction on Data warehouses like Hive and other traditional warehouses. # Initial Goals - Donate the Grill source code and documentation to Apache Software Foundation - Build a user and developer community - Support Hive and other Columnar data warehouses - Support full query life cycle management - Add authentication for querying cubes - Provide detailed query statistics # Long Term Goals Here are some longer-term capabilities that would be added to Grill - Add authorization for managing and querying Cubes - Provide REST and CLI for full Admin controls - Capability to schedule queries - Query caching - Integrate with Apache Spark. Creating Spark RDD from Grill query - Integrate with Apache Optiq # Current Status The project is actively developed at InMobi. The first version is deployed at InMobi 4 months back. This version allows querying dimension and fact data stored in Hive over CLI. The source code and documentation is hosted at GitHub. ## Meritocracy We intend to build a diverse developer and user community for the project following the Apache meritocracy model. We want to encourage contributors from multiple organizations, provide plenty of support to new developers and welcome them to be committers. ## Community Currently the project is being developed at InMobi. We hope to extend our contributor and user base significantly in the future and build a solid open source community around Grill. Core Developers Grill is currently being developed by Amareshwari Sriramadasu, Sharad Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is currently employed by SoftwareAG. Raghavendra Singh from InMobi has built the QA automation for Grill. ## Alignment The ASF is a natural home to Grill as it is for Apache Hadoop, Apache Hive, Apache Spark and other emerging projects in Big Data space. We believe in any enterprise, multiple data warehouses will co-exist, as not all workloads are cost effective to run on single one. Apache Hive is one of the crucial data warehouse along with upcoming projects like Apache Spark in Hadoop ecosystem. Grill will benefit in working in close proximity with these projects. The traditional Columnar data warehouses complement Apache Hive as certain workloads continue to be cost
Re: [VOTE] Argus as a new incubator project
+1 (non-binding) On Mon, Jul 21, 2014 at 9:33 PM, Owen O'Malley omal...@apache.org wrote: Following the discussion earlier, I'm calling a vote to accept Argus as a new Incubator project. The proposal draft is available at: https://wiki.apache.org/incubator/ArgusProposal, and is also included below. Vote is open for 72h and closes at 24 July 2014 at 10am PST. [ ] +1 accept Argus in the Incubator [ ] +/-0 [ ] -1 because... I'm +1. .. Owen
Re: [VOTE] Accept Optiq into the incubator
+1 (non-binding) On Fri, May 9, 2014 at 11:33 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Based on the results of the discussion thread ( http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E ), I would like to call a vote on accepting Optiq into the incubator. [ ] +1 Accept Optiq into the Incubator [ ] +0 Indifferent to the acceptance of Stratosphere [ ] -1 Do not accept Optiq because ... The vote will be open until Tuesday May 13 18:00 UTC. https://wiki.apache.org/incubator/OptiqProposal = Optiq = == Abstract == Optiq is a framework that allows efficient translation of queries involving heterogeneous and federated data. == Proposal == Optiq is a highly customizable engine for parsing and planning queries on data in a wide variety of formats. It allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. == Background == Databases were traditionally engineered in a monolithic stack, providing a data storage format, data processing algorithms, query parser, query planner, built-in functions, metadata repository and connectivity layer. They innovate in some areas but rarely in all. Modern data management systems are decomposing that stack into separate components, separating data, processing engine, metadata, and query language support. They are highly heterogeneous, with data in multiple locations and formats, caching and redundant data, different workloads, and processing occurring in different engines. Query planning (sometimes called query optimization) has always been a key function of a DBMS, because it allows the implementors to introduce new query-processing algorithms, and allows data administrators to re-organize the data without affecting applications built on that data. In a componentized system, the query planner integrates the components (data formats, engines, algorithms) without introducing unncessary coupling or performance tradeoffs. But building a query planner is hard; many systems muddle along without a planner, and indeed a SQL interface, until the demand from their customers is overwhelming. There is an opportunity to make this process more efficient by creating a re-usable framework. == Rationale == Optiq allows database-like access, and in particular a SQL interface and advanced query optimization, for data not residing in a traditional database. It is complementary to many current Hadoop and NoSQL systems, which have innovative and performant storage and runtime systems but lack a SQL interface and intelligent query translation. Optiq is already in use by several projects, including Apache Drill, Apache Hive and Cascading Lingual, and commercial products. Optiq's architecture consists of: An extensible relational algebra. * SPIs (service-provider interfaces) for metadata (schemas and tables), planner rules, statistics, cost-estimates, user-defined functions. * Built-in sets of rules for logical transformations and common data-sources. * Two query planning engines driven by rules, statistics, etc. One engine is cost-based, the other rule-based. * Optional SQL parser, validator and translator to relational algebra. * Optional JDBC driver. == Initial Goals == The initial goals are be to move the existing codebase to Apache and integrate with the Apache development process. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. As we move the code into the org.apache namespace, we will restructure components as necessary to allow clients to use just the components of Optiq that they need. A version 1.0 release, including pre-built binaries, will foster wider adoption. == Current Status == Optiq has had over a dozen minor releases over the last 18 months. Its core SQL parser and validator, and its planning engine and core rules, are mature and robust and are the basis for several production systems; but other components and SPIs are still undergoing rapid evolution. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. We encourage the companies and projects using Optiq to discuss their requirements in an open forum and to participate in development. We will encourage and monitor community participation so that privileges can be extended to those that contribute. Optiq's pluggable architecture encourages developers to contribute extensions such as adapters for data sources, new planning rules, and better statistics and cost-estimation functions. We look forward to fostering a rich ecosystem of extensions. === Community === Building a data management system requires a high degree of technical skill, and correspondingly, the community of developers directly using
Re: [VOTE] Accept Storm into the Incubator
+1 (non-binding) On Fri, Sep 13, 2013 at 12:49 AM, Doug Cutting cutt...@apache.org wrote: Discussion about the Storm proposal has subsided, issues raised now seemingly resolved. I'd like to call a vote to accept Storm as a new Incubator podling. The proposal is included below and is also at: https://wiki.apache.org/incubator/StormProposal Let's keep the vote open for four working days, until 18 September. [ ] +1 Accept Storm into the Incubator [ ] +0 Don't care. [ ] -1 Don't accept Storm because... Doug = Storm Proposal = == Abstract == Storm is a distributed, fault-tolerant, and high-performance realtime computation system that provides strong guarantees on the processing of data. == Proposal == Storm is a distributed real-time computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing real-time computation. Its use cases span stream processing, distributed RPC, continuous computation, and more. Storm has become a preferred technology for near-realtime big-data processing by many organizations worldwide (see a partial list at https://github.com/nathanmarz/storm/wiki/Powered-By). As an open source project, Storm’s developer community has grown rapidly to 46 members. == Background == The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. The lack of a Hadoop of realtime has become the biggest hole in the data processing ecosystem. Storm fills that hole. Storm was initially developed and deployed at BackType in 2011. After 7 months of development BackType was acquired by Twitter in July 2011. Storm was open sourced in September 2011. Storm has been under continuous development on its Github repository since being open-sourced. It has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many minor ones. == Rationale == Storm is a general platform for low-latency big-data processing. It is complementary to the existing Apache projects, such as Hadoop. Many applications are actually exploring using both Hadoop and Storm for big-data processing. Bringing Storm into Apache is very beneficial to both Apache community and Storm community. The rapid growth of Storm community is empowered by open source. We believe the Apache foundation is a great fit as the long-term home for Storm, as it provides an established process for community-driven development and decision making by consensus. This is exactly the model we want for future Storm development. == Initial Goals == * Move the existing codebase to Apache * Integrate with the Apache development process * Ensure all dependencies are compliant with Apache License version 2.0 * Incremental development and releases per Apache guidelines == Current Status == Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many minor ones. Storm 0.9 is about to be released. Storm is being used in production by over 50 organizations. Storm codebase is currently hosted at github.com, which will seed the Apache git repository. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. Several companies have already expressed interest in this project, and we intend to invite additional developers to participate. We will encourage and monitor community participation so that privileges can be extended to those that contribute. === Community === The need for a low-latency big-data processing platform in the open source is tremendous. Storm is currently being used by at least 50 organizations worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By), and is the most starred Java project on Github. By bringing Storm into Apache, we believe that the community will grow even bigger. === Core Developers === Storm was started by Nathan Marz at BackType, and now has developers from Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies. === Alignment === In the big-data processing ecosystem, Storm is a very popular low-latency platform, while Hadoop is the primary platform for batch processing. We believe that it will help the further growth of big-data community by having Hadoop and Storm aligned within Apache foundation. The alignment is also beneficial to other Apache communities (such as Zookeeper, Thrift, Mesos). We could include additional sub-projects, Storm-on-YARN and Storm-on-Mesos, in the near future. == Known Risks == === Orphaned Products === The risk of the Storm project being abandoned is minimal. There are at least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu, Alibaba, Alipay, Taobao, PARC, RocketFuel etc)
Re: [PROPOSAL] Storm for Apache Incubator
+1 (non-binding)
Re: [VOTE] Accept Falcon into the Apache Incubator (was originally named Ivory)
those who make solid contributions to committer status. === Community === We are happy to report that the initial team already represents multiple organizations. We hope to extend the user and developer base further in the future and build a solid open source community around Falcon. === Core Developers === Falcon is currently being developed by three engineers from InMobi – Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini Palaniswamy and Thiruvel Thirumoolan, were also involved in the initial design discussions. Srikanth, Shwetha and Shaik are the original developers. All the engineers have built two generations of Data Management on Hadoop, having deep expertise in Hadoop and are quite familiar with the Hadoop Ecosystem. Samarth Gupta Rishu Mehrothra, both from InMobi have build the QA automation for Falcon. === Alignment === The ASF is a natural host for Falcon given that it is already the home of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software projects. Falcon has been designed to solve the data management challenges and opportunities of the Hadoop ecosystem family of products. Falcon fills the gap that Hadoop ecosystem has been lacking in the areas of data processing and data lifecycle management. == Known Risks == === Orphaned products Reliance on Salaried Developers === The core developers plan to work full time on the project. There is very little risk of Falcon getting orphaned. Falcon is in use by companies we work for so the companies have an interest in its continued vitality. === Inexperience with Open Source === All of the core developers are active users and followers of open source. Srikanth Sundarrajan has been contributing patches to Apache Hadoop and Apache Oozie, Shwetha GS has been contributing patches to Apache Oozie. Seetharam Venkatesh is a committer on Apache Knox. Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and Sanjay Radia are PMC members on Apache Hadoop. === Homogeneous Developers === The current core developers are from diverse set of organizations such as InMobi and Hortonworks. We expect to quickly establish a developer community that includes contributors from several corporations post incubation. === Reliance on Salaried Developers === Currently, most developers are paid to do work on Falcon but few are contributing in their spare time. However, once the project has a community built around it post incubation, we expect to get committers and developers from outside the current core developers. === Relationships with Other Apache Products === Falcon is going to be used by the users of Hadoop and the Hadoop ecosystem in general. === A Excessive Fascination with the Apache Brand === While we respect the reputation of the Apache brand and have no doubts that it will attract contributors and users, our interest is primarily to give Falcon a solid home as an open source project following an established development model. We have also given reasons in the Rationale and Alignment sections. == Documentation ==http://wiki.apache.org/incubator/FalconProposal == Initial Source == The source is currently in github repository at: https://github.com/sriksun/Falcon == Source and Intellectual Property Submission Plan == The complete Falcon code is under Apache Software License 2. == External Dependencies == The dependencies all have Apache compatible licenses. These include BSD, MIT licensed dependencies. == Cryptography == None == Required Resources == === Mailing lists === * falcon-dev AT incubator DOT apache DOT org * falcon-commits AT incubator DOT apache DOT org * falcon-user AT incubator apache DOT org * falcon-private AT incubator DOT apache DOT org === Subversion Directory === Git is the preferred source control system: git://git.apache.org/falcon === Issue Tracking === JIRA FALCON == Initial Committers == * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com) * Shwetha GS (shwetha.gs AT inmobi DOT com) * Shaik Idris (shaik.idris AT inmobi DOT com) * Venkatesh Seetharam (Venkatesh AT apache DOT org) * Sanjay Radia (sanjay AT apache DOT org) * Sharad Agarwal (sharad AT apache DOT org) * Amareshwari SR (amareshwari AT apache DOT org) * Samarth Gupta (samarth.gupta AT inmobi DOT com) * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com) == Affiliations == * Srikanth Sundarrajan (InMobi) * Shwetha GS (InMobi) * Shaik Idris (InMobi) * Venkatesh Seetharam (Hortonworks Inc.) * Sanjay Radia (Hortonworks Inc.) * Sharad Agarwal (InMobi) * Amareshwari SR (InMobi) * Samarth Gupta (InMobi) * Rishu Mehrothra (InMobi) == Sponsors == === Champion === * Arun C Murthy (acmurthy at apache dot org) === Nominated Mentors === * Alan Gates (gates AT apache DOT org) * Chris Douglas (cdouglas AT apache DOT org
Re: [VOTE] Accept Tajo into the Apache Incubator
+1 (non-binding) On Thu, Feb 28, 2013 at 11:41 PM, Hyunsik Choi hyun...@apache.org wrote: Hi Folks, I'd like to call a VOTE for acceptance of Tajo into the Apache incubator. The vote will close on Mar 7 at 6:00 PM (PST). [] +1 Accept Tajo into the Apache incubator [] +0 Don't care. [] -1 Don't accept Tajo into the incubator because... Full proposal is pasted at the bottom on this email, and the corresponding wiki is http://wiki.apache.org/incubator/TajoProposal. Only VOTEs from Incubator PMC members are binding, but all are welcome to express their thoughts. Thanks, Hyunsik PS: From the initial discussion, the main changes are that I've added 4 new committers. Also, I've revised some description of Known Risks because the initial committers have been diverse. Tajo Proposal = Abstract = Tajo is a distributed data warehouse system for Hadoop. = Proposal = Tajo is a relational and distributed data warehouse system for Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation and ETL on large-data sets by leveraging advanced database techniques. It supports SQL standards. Tajo is inspired by Dryad, MapReduce, Dremel, Scope, and parallel databases. Tajo uses HDFS as a primary storage layer, and it has its own query engine which allows direct control of distributed execution and data flow. As a result, Tajo has a variety of query evaluation strategies and more optimization opportunities. In addition, Tajo will have a native columnar execution and and its optimizer. Tajo will be an alternative choice to Hive/Pig on the top of MapReduce. = Background = Big data analysis has gained much attention in the industrial. Open source communities have proposed scalable and distributed solutions for ad-hoc queries on big data. However, there is still room for improvement. Markets need more faster and efficient solutions. Recently, some alternatives (e.g., Cloudera's Impala and Amazon Redshift) have come out. = Rationale = There are a variety of open source distributed execution engines (e.g., hive, and pig) running on the top of MapReduce. They are limited by MR framework. They cannot directly control distributed execution and data flow, and they just use MR framework. So, they have limited query evaluation strategies and optimization opportunities. It is hard for them to be optimized for a certain type of data processing. = Initial Goals = The initial goal is to write more documents to describe Tajo's internal. It will be helpful to recruit more committers and to build a solid community. Then, we will make milestones for short/long term plans. = Current Status = Tajo is in the alpha stage. Users can execute usual SQL queries (e.g., selection, projection, group-by, join, union and sort) except for nested queries. Tajo provides various row/column storage formats, such as CSV, RowFile (a row-store file we have implemented), RCFile, and Trevni, and it also has a rudimentary ETL feature to transform one data format to another data format. In addition, Tajo provides hash and range repartitions. By using both repartition methods, Tajo processes aggregation, join, and sort queries over a number of cluster nodes. To evaluate the performance, we have carried out benchmark test using TPC-H 1TB on 32 cluster nodes. == Meritocracy == We will discuss the milestone and the future plan in an open forum. We plan to encourage an environment that supports a meritocracy. The contributors will have different privileges according to their contributions. == Community == Big data analysis has gained attention from open source communities, industrial and academic areas. Some projects related to Hadoop already have very large and active communities. We expect that Tajo also will establish an active community. Since Tajo already works for some features and is in the alpha stage, it will attract a large community soon. == Core Developers == Core developers are a diverse group of developers, many of which are very experienced in open source and the Apache Hadoop ecosystem. * Eli Reisman ereisman AT apache DOT org * Henry Saputra hsaputra AT apache DOT org * Hyunsik Choi hyunsik AT apache DOT org * Jae Hwa Jung jhjung AT gruter DOT com * Jihoon Son ghoonson AT gmail DOT com * Jin Ho Kim jhkim AT gruter DOT com * Roshan Sumbaly rsumbaly AT gmail DOT com * Sangwook Kim swkim AT inervit DOT com * Yi A Liu yi DOT a DOT liu AT intel DOT com == Alignment == Tajo employs Apache Hadoop Yarn as a resource management platform for large clusters. It uses HDFS as a primary storage layer. It already supports Hadoop-related data formats (RCFile, Trevni) and will support ORC file. In addition, we have a plan to integrate Tajo with other products of Hadoop ecosystem. Tajo's modules are well organized, and these modules can also be used for other projects. = Known Risks = ==
Re: [VOTE] Accept Tez into Incubator
+1 (non-binding) sharad