+1, this is a great addition! On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:
> = Ivory Proposal = > > == Abstract == > Ivory is a data processing and management solution for Hadoop designed for > data motion, coordination of data pipelines, lifecycle management, and > data discovery. Ivory enables end consumers to quickly onboard their data > and its associated processing and management tasks on Hadoop clusters. > > == Proposal == > Ivory will enable easy data management via declarative mechanism for > Hadoop. Users of Ivory platform simply define infrastructure endpoints, > data sets and processing rules declaratively. These configurations > are expressed in such a way that the dependencies between > these entities are explicitly described. This information about > inter-dependencies between various entities allows Ivory to orchestrate and > manage various data management functions. > > The key use cases that Ivory addresses are: > * Data Motion > * Process orchestration and scheduling > * Policy-based Lifecycle Management > * Data Discovery > * Operability/Usability > > With these features it is possible for users to onboard their data sets > with > a comprehensive and holistic understanding of how, when and where their > data > is managed across its lifecycle. Complex functions such as retrying > failures, > identifying possible SLA breaches or automated handling of input data > changes > are now simple directives. All the administrative functions and user level > functions are available via RESTful APIs. CLI is simply a wrapper over the > RESTful APIs. > > == Background == > Hadoop and its ecosystem of products have made storing and processing > massive > amounts of data commonplace. This has enabled numerous organizations to > gain > valuable insights that they never could have achieved in the past. While it > is easy to leverage Hadoop for crunching large volumes of data, organizing > data, managing life cycle of data and processing data is fairly involved. > This is solved adequately well in a classic data platform involving data > warehouses and standard ETL (extract-transform-load) tools, but remains > largely > unsolved today. In addition to data processing complexities, Hadoop > presents > new sets of challenges and opportunities relating to management of data. > > Data Management on Hadoop encompasses data motion, process orchestration, > lifecycle management, data discovery, etc. among other concerns that are > beyond > ETL. Ivory is a new data processing and management platform for Hadoop that > solves this problem and creates additional opportunities by building on > existing > components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop > DistCp > etc.) without reinventing the wheel. Ivory has been in production at > InMobi, > going on its second year and has been managing hundreds of feeds and > processes. > > Ivory is being developed by engineers employed with InMobi, Hortonworks and > Yahoo!. This platform addition will increase the adoption of Apache Hadoop > by > driving data management tractable for end users. We are therefore proposing > to > make Ivory an Apache open source project. > > == Rationale == > The Ivory project aims to improve the usability of Apache Hadoop. As a > result > Apache Hadoop will grow its community of users by increasing the places > Hadoop > can be utilized and the use cases it will solve. By developing Ivory in > Apache > we hope to gather a diverse community of contributors, helping to ensure > that > Ivory is deployable for a broad range of scenarios. Members of the Hadoop > development community will be able to influence Ivory’s roadmap, and > contribute > to it. We believe having Ivory as part of the Apache Hadoop ecosystem will > be > a great benefit to all of Hadoop's users. > > == Current Status == > Ivory is widely deployed in production within InMobi and moving on to its > second year. A version with a valuable set of features is developed by the > list of initial committers and is hosted on github. > > === Meritocracy === > Our intent with this incubator proposal is to start building a diverse > developer > community around Ivory following the Apache meritocracy model. We have > wanted to > make the project open source and encourage contributors from multiple > organizations from the start. We plan to provide plenty of support to new > developers and to quickly recruit those who make solid contributions to > committer status. > > === Community === > We are happy to report that the initial team already represents multiple > organizations. We hope to extend the user and developer base further in the > future and build a solid open source community around Ivory. > > === Core Developers === > Ivory is currently being developed by three engineers from InMobi – > Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks > employees – > Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees, > Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth, > Shwetha and Shaik are the original developers. All the engineers have built > two generations of Data Management on Hadoop, having deep expertise in > Hadoop > and are quite familiar with the Hadoop Ecosystem. > > === Alignment === > The ASF is a natural host for Ivory given that it is already the home of > Hadoop, > Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory > has > been designed to solve the data management challenges and opportunities of > the > Hadoop ecosystem family of products. Ivory fills the gap that Hadoop > ecosystem > has been lacking in the areas of data processing and data lifecycle > management. > > == Known Risks == > > === Orphaned products & Reliance on Salaried Developers === > The core developers plan to work full time on the project. There is very > little > risk of Ivory getting orphaned. Ivory is in use by companies we work for so > the > companies have an interest in its continued vitality. > > === Inexperience with Open Source === > All of the core developers are active users and followers of open source. > Srikanth Sundarrajan has been contributing patches to Apache Hadoop and > Apache > Oozie, Shwetha GS has been contributing patches to Apache Oozie. > Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a > committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive > PMC member) and Sanjay Radia are PMC members on Apache Hadoop. > > === Homogeneous Developers === > The current core developers are from diverse set of organizations such as > InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a > developer > community that includes contributors from several corporations post > incubation. > > === Reliance on Salaried Developers === > Currently, most developers are paid to do work on Ivory but few are > contributing > in their spare time. However, once the project has a community built around > it > post incubation, we expect to get committers and developers from outside > the > current core developers. > > === Relationships with Other Apache Products === > Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem > in > general. > > === A Excessive Fascination with the Apache Brand === > While we respect the reputation of the Apache brand and have no doubts that > it > will attract contributors and users, our interest is primarily to give > Ivory a > solid home as an open source project following an established development > model. > We have also given reasons in the Rationale and Alignment sections. > > == Documentation == > There is documentation in github repository at: > https://github.com/sriksun/Ivory > > == Initial Source == > The source is currently in github repository at: > https://github.com/sriksun/Ivory > > == Source and Intellectual Property Submission Plan == > The complete Ivory code is under Apache Software License 2. > > == External Dependencies == > The dependencies all have Apache compatible licenses. These include BSD, > MIT licensed dependencies. > > == Cryptography == > None > > == Required Resources == > > === Mailing lists === > > * ivory-dev AT incubator DOT apache DOT org > * ivory-commits AT incubator DOT apache DOT org > * ivory-user AT incubator apache DOT org > * ivory-private AT incubator DOT apache DOT org > > === Subversion Directory === > https://svn.apache.org/repos/asf/incubator/ivory > > === Issue Tracking === > JIRA IVORY > > == Initial Committers == > * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com) > * Shwetha GS (shwetha.gs AT inmobi DOT com) > * Shaik Idris (shaik.idris AT inmobi DOT com) > * Venkatesh Seetharam (Venkatesh AT apache DOT com) > * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com) > * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com) > * Sanjay Radia (sanjay AT apache DOT org) > * Sharad Agarwal (sharad AT apache DOT org) > * Amareshwari SR (amareshwari AT apache DOT org) > > == Affiliations == > * Srikanth Sundarrajan (InMobi) > * Shwetha GS (InMobi) > * Shaik Idris (InMobi) > * Venkatesh Seetharam (Hortonworks Inc) > * Rohini Palaniswamy (Yahoo! Inc) > * Thiruvel Thirumoolan (Yahoo! Inc) > * Sanjay Radia (Hortonworks Inc) > * Sharad Agarwal (InMobi) > * Amareshwari SR (InMobi) > > == Sponsors == > > === Champion === > * Arun C Murthy (acmurthy at apache dot org) > > === Nominated Mentors === > * Alan Gates (gates AT apache DOT org) > * Chris Douglas (cdouglas AT apache DOT org) > * Devaraj Das (ddas AT apache DOT org) > * Owen O’Malley (omalley AT apache DOT org) > > === Sponsoring Entity === > Incubator PMC > > -- > _____________________________________________________________ > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/