Re: [VOTE] Graduate Lens from the Incubator

2015-07-24 Thread Sharad Agarwal
+1 (binding)

On Sat, Jul 25, 2015 at 3:49 AM, Jakob Homan jgho...@gmail.com wrote:

 Following two positive discussions[1][2] about its curent status, the
 Lens community has voted[3] to graduate from the Incubator.  The vote
 passed with 22 +1s:

 Binding +1 x 14: {Jakob, Jean-Baptiste, Yash, Amareshwari, Sharad,
 Raghavendra, Raju, Jaideep, Suma, Himanshu, Rajat, Srikanth, Chris,
 Arshad}

 Non-binding +1 x 8: {Jothi, Kartheek, Tushar, Nitin, Pranav, Deepak,
 Ajay, Naresh}

 The Lens community has:
 * completed all required paperwork:
 https://incubator.apache.org/projects/lens.html
 * completed multiple releases (2.0.1-beta, 2.1.-beta, 2.2.0-beta)
 * completed the name check procedure:
 https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-63
 * opened nearly 700 JIRAs:
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20LENS
 * voted in multiple new committers/PPMC members.
 * been recommended as ready to graduate by the Incubator's shepards:
 https://wiki.apache.org/incubator/July2015

 Therefore, I'm calling a VOTE to graduate Lens with the following
 Board resolution.  The VOTE will run 96 hours (an extra day since
 we're starting on a Friday), ending Tuesday July 28 4 PM PST.

 [ ] +1 Graduate Apache Lens from the Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't graduate Apache Lens from the Incubator because ...

 Here's my binding vote: +1.
 -Jakob

 [1] http://s.apache.org/LensGradDiscuss1
 [2] http://s.apache.org/LensGradDiscuss2
 [3] http://s.apache.org/LensGradVotePPMC

  Apache Lens graduation resolution draft
 WHEREAS, the Board of Directors deems it to be in the best interests of
 the Foundation and consistent with the Foundation's purpose to establish
 a Project Management Committee charged with the creation and maintenance
 of open-source software, for distribution at no charge to the public,
 related to unified analytics across multiple tiered data stores.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
 (PMC), to be known as the Apache Lens Project, be and hereby is
 established pursuant to Bylaws of the Foundation; and be it further

 RESOLVED, that the Apache Lens Project be and hereby is responsible
 for the creation and maintenance of software related to unified analytics
 across multiple tiered data stores; and be it further

 RESOLVED, that the office of Vice President, Apache Lens be and hereby
 is created, the person holding such office to serve at the direction of
 the Board of Directors as the chair of the Apache Lens Project, and to
 have primary responsibility for management of the projects within the
 scope of responsibility of the Apache Lens Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby are
 appointed to serve as the initial members of the Apache Lens Project:

 * Amareshwari Sriramadasu amareshwari at apache dot org
 * Arshad Matin arshadmatin at apache dot org
 * Gunther Hagleitner gunther at apache dot org
 * Himanshu Gahlaut himanshugahlaut at apache dot org
 * Jaideep Dhok jdhok at apache dot org
 * Jean Baptiste Onofre jbonofre at apache dot org
 * Raghavendra Singh raghavsingh at apache dot org
 * Rajat Khandelwal prongs at apache dot org
 * Raju Bairishetti raju at apache dot org
 * Sharad Agarwal sharad at apache dot org
 * Sreekanth Ramakrishnan sreekanth at apache dot org
 * Srikanth Sundarrajan sriksun at apache dot org
 * Suma Shivaprasad sumasai at apache dot org
 * Vikram Dixit vikram at apache dot org
 * Vinod Kumar Vavilapalli vinodkv at apache dot org
 * Yash Sharma yash at apache dot org

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Amareshwari Sriramadasu
 be appointed to the office of Vice President, Apache Lens, to serve in
 accordance with and subject to the direction of the Board of Directors
 and the Bylaws of the Foundation until death, resignation, retirement,
 removal or disqualification, or until a successor is appointed; and be
 it further

 RESOLVED, that the Apache Lens Project be and hereby is tasked with
 the migration and rationalization of the Apache Incubator Lens
 podling; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache Incubator
 Lens podling encumbered upon the Apache Incubator Project are
 hereafter discharged.

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [VOTE] Release of Apache Lens 2.2.0-beta-incubating

2015-07-13 Thread Sharad Agarwal
+1 (binding)

On Sun, Jul 12, 2015 at 8:39 AM, Jaideep Dhok jaideep.d...@inmobi.com
wrote:

 Hello everyone,

 This is the call for vote for the following RC to be released as official
 Apache Lens 2.2.0-beta-incubating release. This is our third release.

 Apache Lens provides an Unified Analytics interface. Lens aims to cut the
 Data Analytics silos by providing a single view of data across multiple
 tiered data stores and optimal execution environment for the analytical
 query. It seamlessly integrates Hadoop with traditional data warehouses to
 appear like one.

 Vote on dev list:

 http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThzQCHdYVASR35zeYqHj_tWo93GuzTLzCrmRAq3qMjecg%40mail.gmail.com%3E

 Result of vote on dev list:

 http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThOEAeMiNdtef%3D35QxRLetryRFKs3ED-oeCh2xi1KEqww%40mail.gmail.com%3E

 The commit id is 9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f
 
 https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f
 
  -

 https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f


 This corresponds to the tag: apache-lens-2.2.0-beta-incubating -

 https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.2.0-beta-incubating


 The release archives (tar.gz/.zip), signature, and checksums are
 here:
 *
 https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/
 
 https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/
 *

 You can find the KEYS file here:
 * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS

 The release candidate consists of the following source distribution
 archive:
 apache-lens-2.2.0-beta-incubating-source-release.zip

 In addition, the following supplementary binary distributions are
 provided for user convenience at the same location:
 apache-lens-2.2.0-beta-incubating-bin.tar.gz

 The licensing of bundled bits in the archives have not changed from
 previous release, which are
 documented
 https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens

 The Nexus Staging URL:
 https://repository.apache.org/content/repositories/orgapachelens-1005

 Release notes available at


 https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329586projectId=12315923


 Vote will be open for at least 72 hours

 [ ] +1 approve
 [ ] 0 no opinion
 [ ] -1 disapprove (and reason why)

 +1 from my side for the release.

 Thanks,
 Jaideep Dhok

 --
 _
 The information contained in this communication is intended solely for the
 use of the individual or entity to whom it is addressed and others
 authorized to receive it. It may contain confidential or legally privileged
 information. If you are not the intended recipient you are hereby notified
 that any disclosure, copying, distribution or taking any action in reliance
 on the contents of this information is strictly prohibited and may be
 unlawful. If you have received this communication in error, please notify
 us immediately by responding to this email and then delete it from your
 system. The firm is neither liable for the proper and complete transmission
 of the information contained in this communication nor for any delay in its
 receipt.



Re: [VOTE] Release of Apache Lens 2.1.0-beta-incubating

2015-05-04 Thread Sharad Agarwal
+1 (binding)

On Thu, Apr 30, 2015 at 5:35 PM, Amareshwari Sriramdasu 
amareshw...@apache.org wrote:

 Hello everyone,

 This is the call for vote for the following RC to be released as official
 ApacheLens 2.1.0-beta-incubating release. This is our second release.

 Apache Lens provides an Unified Analytics interface. Lens aims to cut the
 Data Analytics silos by providing a single view of data across multiple
 tiered data stores and optimal execution environment for the analytical
 query. It seamlessly integrates Hadoop with traditional data warehouses to
 appear like one.
 Vote on dev list:

 http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZfT4HDK3c4rKxPg0_Kkc8KDfRjUr%2BHmKaJH44H77OeU0g%40mail.gmail.com%3E

 Results of vote on dev list:

 http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZe7rbjbwoiiOWKL8Lef%3Dsc%2BXcV173aiQ6Tpdwq7jz9ycQ%40mail.gmail.com%3E

 The commit id is fdd19b9c2b17e329465cbde62dbce6f8be435cec
 :
 https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=commit;h=fdd19b9c2b17e329465cbde62dbce6f8be435cec

 This corresponds to the tag: apache-lens-2.1.0-beta-incubating
 :
 https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.1.0-beta-incubating

 The release archives (tar.gz/.zip), signature, and checksums are
 here:
 https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.1.0-beta-incubating-rc0

 You can find the KEYS file here:
 * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS

 The release candidate consists of the following source distribution
 archive:
 apache-lens-2.1.0-beta-incubating-source-release.zip

 In addition, the following supplementary binary distributions are
 provided for user convenience at the same location:
 apache-lens-2.1.0-beta-incubating-bin.tar.gz

 The licensing of bundled bits in the archives have not changed from
 previous release, which are documented at
 https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens

 The Nexus Staging URL:
 https://repository.apache.org/content/repositories/orgapachelens-1003

 Release notes available at

 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315923version=12328991


 Vote will be open for at least 72 hours . Please vote on releasing this RC

 [ ] +1 approve
 [ ] 0 no opinion
 [ ] -1 disapprove (and reason why)

 Thanks,
 Amareshwari



Re: [VOTE] Accept Apache Atlas into Apache Incubator

2015-05-04 Thread Sharad Agarwal
+1 (binding)

On Fri, May 1, 2015 at 12:56 PM, Seetharam Venkatesh 
venkat...@innerzeal.com wrote:

 Hello folks,

 Following the discussion earlier in the thread: http://s.apache.org/r2

 I would like to call a VOTE for accepting Apache Atlas as a new incubator
 project.

 The proposal is available at:
 https://wiki.apache.org/incubator/AtlasProposal
 Also, the text of the latest wiki proposal is included at the bottom of
 this email.

 The VOTE is open for at least the next 72 hours:

  [ ] +1 accept Apache Atlas into the Apache Incubator
  [ ] ±0 Abstain
  [ ] -1 because...

 Of course I am +1! (non-binding)

 Thanks!


 = Apache Atlas Proposal =

 == Abstract ==

 Apache Atlas is a scalable and extensible set of core foundational
 governance services that enables enterprises to effectively and efficiently
 meet their compliance requirements within Hadoop and allows integration
 with the complete enterprise data ecosystem.

 == Proposal ==

 Apache Atlas allows agnostic governance visibility into Hadoop, these
 abilities are enabled through a set of core foundational services powered
 by a flexible metadata repository.

 These services include:

  * Search and Lineage for datasets
  * Metadata driven data access control
  * Indexed and Searchable Centralized Auditing operational Events
  * Data lifecycle management – ingestion to disposition
  * Metadata interchange with other metadata tools

 == Background ==

 Hadoop is one of many platforms in the modern enterprise data ecosystem and
 requires governance controls commensurate with this reality.

 Currently, there is no easy or complete way to provide comprehensive
 visibility and control into Hadoop audit, lineage, and security for
 workflows that require Hadoop and non-Hadoop processing.

 Many solutions are usually point based, and require a monolithic
 application workflow.  Multi-tenancy and concurrency are problematic as
 these offerings are not aware of activity outside of their narrow focus.

 As Hadoop gains greater popularity, governance concerns will become
 increasingly vital to increasing maturity and furthering adoption. It is a
 particular barrier to expanding enterprise data under management.

 == Rationale ==

 Atlas will address issues previously discussed by providing governance
 capabilities in Hadoop -- using both a prescriptive and forensic model
 enriched by business taxonomical metadata.Atlas, at its core, is
 designed to exchange metadata with other tools and processes within and
 outside of the Hadoop stack -- enable governance controls that are truly
 platform agnostic and effectively (and defensibly) address compliance
 concerns.

 Initially working with a group of leading partners in several industries,
 Atlas is built to solve specific real world governance problems that
 accelerate product maturity and time to value.

 Atlas aims to grow a community to help build a widely adopted pattern for
 governance, metadata modeling and exchange in Hadoop – which will advance
 the interests for the whole community.

 == Current Status ==

 An initial version with a valuable set of features is developed by the list
 of initial committers and is hosted on github.

 === Meritocracy ===

 Our intent with this proposal is to start building a diverse  developer
 community around Atlas following the Apache meritocracy model. We have
 wanted to make the project open source and encourage contributors from
 multiple organizations from the start.

 We plan to provide plenty of support to new developers and to quickly
 recruit those who make solid contributions to committer status.

 === Community ===

 We are happy to report that the initial team already represents multiple
 organizations. We hope to extend the user and developer base further in the
 future and build a solid open source community around Atlas.

 === Core Developers ===

 Atlas development is currently being led by engineers from Hortonworks –
 Harish Butani, Venkatesh Seetharam, Shwetha G S, and Jon Maron. All the
 engineers have deep expertise in Hadoop and are quite familiar with the
 Hadoop Ecosystem.

 === Alignment ===

 The ASF is a natural host for Atlas given that it is already the home of
 Hadoop, Falcon, Hive,  Pig, Oozie, Knox, Ranger, and other emerging “big
 data” software projects.

 Atlas has been designed to solve the data governance challenges and
 opportunities of the Hadoop ecosystem family of products as well as
 integration to the tradition Enterprise Data ecosystem.

 Atlas fills the gap that the Hadoop Ecosystem has been lacking in the areas
 of data governance and compliance management.

 == Known Risks ==

 === Orphaned products  Reliance on Salaried Developers ===
 The core developers plan to work full time on the project. There is very
 little risk of Atlas getting orphaned.  A prototype of Atlas is in use and
 being actively developed by several companies and have vested interest in
 its continued vitality and adoption.

 === Inexperience with 

Re: [VOTE] Accept Zeppelin into the Apache Incubator

2014-12-18 Thread Sharad Agarwal
+1 (non-binding)

On Fri, Dec 19, 2014 at 10:59 AM, Roman Shaposhnik r...@apache.org wrote:

 Following the discussion earlier:
 http://s.apache.org/kTp

 I would like to call a VOTE for accepting
 Zeppelin as a new Incubator project.

 The proposal is available at:
 https://wiki.apache.org/incubator/ZeppelinProposal
 and is also attached to the end of this email.

 Vote is open until at least Sunday, 21th December 2014,
 23:59:00 PST

 [ ] +1 Accept Zeppelin into the Incubator
 [ ] ±0 Indifferent to the acceptance of Zeppelin
 [ ] -1 Do not accept Zeppelin because ...

 Thanks,
 Roman.

 == Abstract ==
 Zeppelin is a collaborative data analytics and visualization tool for
 distributed, general-purpose data processing systems such as Apache
 Spark, Apache Flink, etc.

 == Proposal ==
 Zeppelin is a modern web-based tool for the data scientists to
 collaborate over large-scale data exploration and visualization
 projects. It is a notebook style interpreter that enable collaborative
 analysis sessions sharing between users. Zeppelin is independent of
 the execution framework itself. Current version runs on top of Apache
 Spark but it has pluggable interpreter APIs to support other data
 processing systems. More execution frameworks could be added at a
 later date i.e Apache Flink, Crunch as well as SQL-like backends such
 as Hive, Tajo, MRQL.

 We have a strong preference for the project to be called Zeppelin. In
 case that may not be feasible, alternative names could be: “Mir”,
 “Yuga” or “Sora”.

 == Background ==
 Large scale data analysis workflow includes multiple steps like data
 acquisition, pre-processing, visualization, etc and may include
 inter-operation of multiple different tools and technologies. With the
 widespread of the open source general-purpose data processing systems
 like Spark there is a lack of open source, modern user-friendly tools
 that combine strengths of interpreted language for data analysis with
 new in-browser visualization libraries and collaborative capabilities.

 Zeppelin initially started as a GUI tool for diverse set of
 SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open
 source since its inception in Sep 2013. Later, it became clear that
 there was a need for a greater web-based tool for data scientists to
 collaborate on data exploration over the large-scale projects, not
 limited to SQL. So Zeppelin integrated full support of Apache Spark
 while adding a collaborative environment with the ability to run and
 share interpreter sessions in-browser

 == Rationale ==
 There are no open source alternatives for a collaborative
 notebook-based interpreter with support of multiple distributed data
 processing systems.

 As a number of companies adopting and contributing back to Zeppelin is
 growing, we think that having a long-term home at Apache foundation
 would be a great fit for the project ensuring that processes and
 procedures are in place to keep project and community “healthy” and
 free of any commercial, political or legal faults.

 == Initial Goals ==
 The initial goals will be to move the existing codebase to Apache and
 integrate with the Apache development process. This includes moving
 all infrastructure that we currently maintain, such as: a website, a
 mailing list, an issues tracker and a Jenkins CI, as mentioned in
 “Required Resources” section of current proposal.
 Once this is accomplished, we plan for incremental development and
 releases that follow the Apache guidelines.
 To increase adoption the major goal for the project would be to
 provide integration with as much projects from Apache data ecosystem
 as possible, including new interpreters for Apache Hive, Apache Drill
 and adding Zeppelin distribution to Apache Bigtop.
 On the community building side the main goal is to attract a diverse
 set of contributors by promoting Zeppelin to wide variety of
 engineers, starting a Zeppelin user groups around the globe and by
 engaging with other existing Apache projects communities online.


 == Current Status ==
 Currently, Zeppelin has 4 released versions and is used in production
 at a number of companies across the globe mentioned in Affiliation
 section. Current implementation status is pre-release with public API
 not being finalized yet. Current main and default backend processing
 engine is Apache Spark with consistent support of SparkSQL.
 Zeppelin is distributed as a binary package which includes an embedded
 webserver, application itself, a set of libraries and startup/shutdown
 scripts. No platform-specific installation packages are provided yet
 but it is something we are looking to provide as part of Apache Bigtop
 integration.
 Project codebase is currently hosted at github.com, which will form
 the basis of the Apache git repository.

 === Meritocracy ===
 Zeppelin is an open source project that already leverages meritocracy
 principles.  It was started by a handfull of people and now it has
 multiple contributors, 

[RESULT] [VOTE] Accept Lens into the Apache Incubator (earlier called Grill)

2014-10-09 Thread Sharad Agarwal
The vote has passed with 9 binding +1, 5 non binding +1, no 0 and -1s.

Binding +1s :
Jean Baptiste
Jan i
Alan D Cabrera
Jakob Homan
Chris Douglas
Roman Shaposhnik
Joe Brockmeier
Vinod K V
Suresh Srinivas


Non Binding +1s:
Sharad Agarwal
Amareshwari S
Seetharam Venkatesh
Srikanth Sundarrajan
Ashish

Thanks everyone for voting. We will proceed with the next steps as per the
IPMC guidelines.

Thanks
Sharad




On Mon, Oct 6, 2014 at 5:21 PM, Sharad Agarwal sha...@apache.org wrote:

 Following the discussion earlier in the thread
 https://www.mail-archive.com/general@incubator.apache.org/msg45208.html
 I would like to call a Vote for accepting Lens as a new incubator project.

 The proposal is available at:
 https://wiki.apache.org/incubator/LensProposal

 Vote is open till Oct 09, 2014 4 PM PST.

  [ ] +1 accept Lens in the Incubator
  [ ] +/-0
  [ ] -1 because...

 Only Votes from Incubator PMC members are binding, but all are welcome to
 express their thoughts.
 I am +1 (non-binding).

 Thanks
 Sharad



[VOTE] Accept Lens into the Apache Incubator (earlier called Grill)

2014-10-06 Thread Sharad Agarwal
Following the discussion earlier in the thread
https://www.mail-archive.com/general@incubator.apache.org/msg45208.html
I would like to call a Vote for accepting Lens as a new incubator project.

The proposal is available at:
https://wiki.apache.org/incubator/LensProposal

Vote is open till Oct 09, 2014 4 PM PST.

 [ ] +1 accept Lens in the Incubator
 [ ] +/-0
 [ ] -1 because...

Only Votes from Incubator PMC members are binding, but all are welcome to
express their thoughts.
I am +1 (non-binding).

Thanks
Sharad


Re: [PROPOSAL] Grill as new Incubator project

2014-10-03 Thread Sharad Agarwal
The discussion seems to be settled down. I will start the vote thread on
Lens shortly.


Re: [PROPOSAL] Grill as new Incubator project

2014-09-26 Thread Sharad Agarwal
Lens has the functional test suite that includes cube ddls, queries, test
data, scripts etc that requires standard build and test infra.
On Sep 27, 2014 3:45 AM, David Nalley da...@gnsa.us wrote:

  currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
  the QA automation for Grill.
 

 What kind of QA environment does Drill/Lens have currently? How much
 do you expect to need going forward?

 --David



Re: [PROPOSAL] Grill as new Incubator project

2014-09-23 Thread Sharad Agarwal
Thanks Ted. We have renamed the proposal to Lens.

The proposal is pasted here -
https://wiki.apache.org/incubator/LensProposal

Thanks
Sharad

On Tue, Sep 23, 2014 at 12:04 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Both Lens and Blend are nice names.  Nice connotations as well.

 I am slightly stunned by a quick search on the name Lens.  I only found one
 software package with that name (and it was for lens calibration so far
 from databases).  A name like that is usually massively over used.

 This might be a really nice opportunity to get a nice one syllable name.




 On Mon, Sep 22, 2014 at 2:59 AM, Sharad Agarwal sha...@apache.org wrote:

  Based on the feedback, we are considering to rename the project.
 
  Please provide feedback on following names:
  Apache Lens
  Apache Blend
 
  Thanks,
  Sharad
 



Re: [PROPOSAL] Grill as new Incubator project

2014-09-22 Thread Sharad Agarwal
Based on the feedback, we are considering to rename the project.

Please provide feedback on following names:
Apache Lens
Apache Blend

Thanks,
Sharad


Re: [PROPOSAL] Grill as new Incubator project

2014-09-19 Thread Sharad Agarwal
Chris, Thanks for your comments.

The differences that I see are:
- SciDB exposes Array Data model and Array Query Language (AQL). Grill data
model is based on OLAP Fact and Dimensions. Grill exposes SQL like language
(a subset of Hive QL) that works on *logical* entities (facts, dimensions)

- The goal of Grill is not to build a new query execution database, but to
unify them by having a central metadata catalog, and provide a Cube
abstraction layer on top of it.

Thanks,
Sharad

On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980) 
chris.a.mattm...@jpl.nasa.gov wrote:

 This sounds super cool!

 How does this relate to SciDB? is it trying to do a similar thing?

 Cheers,
 Chris


 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398)
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Associate Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++






 -Original Message-
 From: Sharad Agarwal sha...@apache.org
 Reply-To: general@incubator.apache.org general@incubator.apache.org,
 sha...@apache.org sha...@apache.org
 Date: Thursday, September 18, 2014 8:54 PM
 To: general@incubator.apache.org general@incubator.apache.org
 Subject: [PROPOSAL] Grill as new Incubator project

 Grill Proposal
 ==
 
 # Abstract
 
 Grill is a platform that enables multi-dimensional queries in a unified
 way
 over datasets stored in multiple warehouses. Grill integrates Apache Hive
 with other data warehouses by tiering them together to form logical data
 cubes.
 
 
 # Proposal
 
 Grill provides a unified Cube abstraction for data stored in different
 stores. Grill tiers multiple data warehouses for unified representation
 and
 efficient access. It provides SQL-like Cube query language to query and
 describe data sets organized in data cubes. It enables users to run
 queries
 against Facts and Dimensions that can span multiple physical tables stored
 in different stores.
 
 The primary use cases that Grill aims to solve:
 - Facilitate analytical queries by providing the OLAP like Cube
 abstraction
 - Data Discovery by providing single metadata layer for data stored in
 different stores
 - Unified access to data by integrating Hive with other traditional data
 warehouses
 
 
 # Background
 
 Apache Hive is a data warehouse that facilitates querying and managing
 large datasets stored in distributed storage systems like HDFS. It
 provides
 SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
 platform in various organizations for doing adhoc analytical queries.
 In a typical Data warehouse scenario, the data is multi-dimensional and
 organized into Facts and Dimensions to form Data Cubes. Grill provides
 this
 logical layer to enable querying and manage data as Cubes.
 The Grill project is actively being developed at InMobi to provide the
 higher level of analytical abstraction to query data stored in different
 storages including Hive and beyond seamlessly.
 
 
 # Rationale
 
 The Grill project aims to ease the analytical querying capabilities and
 cut
 the data-silos by providing a single view of data across multiple data
 stores.
 Conceiving data as a cube with hierarchical dimensions leads to
 conceptually straightforward operations to facilitate analysis.
 Integrating
 Apache Hive with other traditional warehouses provides the opportunity to
 optimize on the query execution cost by tiering the data across multiple
 warehouses. Grill provides
 - Access to data Cubes via Cube Query language similar to HiveQL.
 - Driver based architecture to allow for plugging systems like Hive and
 other warehouses such as columnar data RDBMS.
 - Cost based engine selection that provides optimal use of resources by
 selecting the best execution engine for a given query.
 
 In a typical Data warehouse, data is organized in Cubes with multiple
 dimensions and measures. This facilitates the analysis by conceiving the
 data in terms of Facts and Dimensions instead of physical tables. Grill
 aims to provide this logical Cube abstraction on Data warehouses like Hive
 and other traditional warehouses.
 
 
 # Initial Goals
 
 - Donate the Grill source code and documentation to Apache Software
 Foundation
 - Build a user and developer community
 - Support Hive and other Columnar data warehouses
 - Support full query life cycle management
 - Add authentication for querying cubes
 - Provide detailed query statistics
 
 
 # Long Term Goals
 
 Here are some longer-term capabilities that would be added to Grill
 - Add authorization for managing and querying Cubes
 - Provide REST and CLI for full Admin

Re: [PROPOSAL] Grill as new Incubator project

2014-09-19 Thread Sharad Agarwal
Chris,
Multi-dimensional here is in the context of OLAP cube -
http://en.wikipedia.org/wiki/OLAP_cube
Grill data model consists of set of measures which can be analysed on
different dimensions.
For remote sensing, data can be modelled as cube -  measurement on various
set of attributes(dimensions) as Facts; and time and space can be thought
of dimensions.
Yes, it supports numerical data.


Ted,
Both are in same general area, but I think there is very little chance of
confusion as clearly their propositions are completely different. And both
words are simple and widely used nouns.
We liked the name Grill as it is simple to spell and pronounce, and in some
way convey the project's meaning - to question intensely.

Thanks,
Sharad

On Sat, Sep 20, 2014 at 12:11 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 There is a strong phonetic similarity to Apache Drill, a project in the
 same general domain.

 Is the Grill name already baked in (pun intended)?



 On Fri, Sep 19, 2014 at 7:24 AM, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Thank you Sharad. So I could use this system for remote sensing
  data, like 3-dimension (time, space, and measurement) type of cubes?
  Does it support numerical data well?
 
  Sorry for so many questions just excited :)
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Sharad Agarwal sha...@apache.org
  Reply-To: sha...@apache.org sha...@apache.org
  Date: Friday, September 19, 2014 4:06 AM
  To: Chris Mattmann chris.a.mattm...@jpl.nasa.gov
  Cc: general@incubator.apache.org general@incubator.apache.org
  Subject: Re: [PROPOSAL] Grill as new Incubator project
 
  Chris, Thanks for your comments.
  
  
  The differences that I see are:
  - SciDB exposes Array Data model and Array Query Language (AQL). Grill
  data model is based on OLAP Fact and Dimensions. Grill exposes SQL like
  language (a subset of Hive QL) that works on *logical* entities (facts,
  dimensions)
  
  
  - The goal of Grill is not to build a new query execution database, but
  to unify them by having a central metadata catalog, and provide a Cube
  abstraction layer on top of it.
  
  
  
  Thanks,
  Sharad
  
  
  On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980)
  chris.a.mattm...@jpl.nasa.gov wrote:
  
  This sounds super cool!
  
  How does this relate to SciDB? is it trying to do a similar thing?
  
  Cheers,
  Chris
  
  
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
  
  
  
  
  
  
  -Original Message-
  From: Sharad Agarwal sha...@apache.org
  Reply-To: general@incubator.apache.org general@incubator.apache.org
 ,
  sha...@apache.org sha...@apache.org
  Date: Thursday, September 18, 2014 8:54 PM
  To: general@incubator.apache.org general@incubator.apache.org
  Subject: [PROPOSAL] Grill as new Incubator project
  
  Grill Proposal
  ==
  
  # Abstract
  
  Grill is a platform that enables multi-dimensional queries in a unified
  way
  over datasets stored in multiple warehouses. Grill integrates Apache
 Hive
  with other data warehouses by tiering them together to form logical
 data
  cubes.
  
  
  # Proposal
  
  Grill provides a unified Cube abstraction for data stored in different
  stores. Grill tiers multiple data warehouses for unified representation
  and
  efficient access. It provides SQL-like Cube query language to query and
  describe data sets organized in data cubes. It enables users to run
  queries
  against Facts and Dimensions that can span multiple physical tables
  stored
  in different stores.
  
  The primary use cases that Grill aims to solve:
  - Facilitate analytical queries by providing the OLAP like Cube
  abstraction
  - Data Discovery by providing single metadata layer for data stored in
  different stores
  - Unified access to data by integrating Hive with other traditional
 data
  warehouses

[PROPOSAL] Grill as new Incubator project

2014-09-18 Thread Sharad Agarwal
Grill Proposal
==

# Abstract

Grill is a platform that enables multi-dimensional queries in a unified way
over datasets stored in multiple warehouses. Grill integrates Apache Hive
with other data warehouses by tiering them together to form logical data
cubes.


# Proposal

Grill provides a unified Cube abstraction for data stored in different
stores. Grill tiers multiple data warehouses for unified representation and
efficient access. It provides SQL-like Cube query language to query and
describe data sets organized in data cubes. It enables users to run queries
against Facts and Dimensions that can span multiple physical tables stored
in different stores.

The primary use cases that Grill aims to solve:
- Facilitate analytical queries by providing the OLAP like Cube abstraction
- Data Discovery by providing single metadata layer for data stored in
different stores
- Unified access to data by integrating Hive with other traditional data
warehouses


# Background

Apache Hive is a data warehouse that facilitates querying and managing
large datasets stored in distributed storage systems like HDFS. It provides
SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
platform in various organizations for doing adhoc analytical queries.
In a typical Data warehouse scenario, the data is multi-dimensional and
organized into Facts and Dimensions to form Data Cubes. Grill provides this
logical layer to enable querying and manage data as Cubes.
The Grill project is actively being developed at InMobi to provide the
higher level of analytical abstraction to query data stored in different
storages including Hive and beyond seamlessly.


# Rationale

The Grill project aims to ease the analytical querying capabilities and cut
the data-silos by providing a single view of data across multiple data
stores.
Conceiving data as a cube with hierarchical dimensions leads to
conceptually straightforward operations to facilitate analysis. Integrating
Apache Hive with other traditional warehouses provides the opportunity to
optimize on the query execution cost by tiering the data across multiple
warehouses. Grill provides
- Access to data Cubes via Cube Query language similar to HiveQL.
- Driver based architecture to allow for plugging systems like Hive and
other warehouses such as columnar data RDBMS.
- Cost based engine selection that provides optimal use of resources by
selecting the best execution engine for a given query.

In a typical Data warehouse, data is organized in Cubes with multiple
dimensions and measures. This facilitates the analysis by conceiving the
data in terms of Facts and Dimensions instead of physical tables. Grill
aims to provide this logical Cube abstraction on Data warehouses like Hive
and other traditional warehouses.


# Initial Goals

- Donate the Grill source code and documentation to Apache Software
Foundation
- Build a user and developer community
- Support Hive and other Columnar data warehouses
- Support full query life cycle management
- Add authentication for querying cubes
- Provide detailed query statistics


# Long Term Goals

Here are some longer-term capabilities that would be added to Grill
- Add authorization for managing and querying Cubes
- Provide REST and CLI for full Admin controls
- Capability to schedule queries
- Query caching
- Integrate with Apache Spark. Creating Spark RDD from Grill query
- Integrate with Apache Optiq


# Current Status

The project is actively developed at InMobi. The first version is deployed
at InMobi 4 months back. This version allows querying dimension and fact
data stored in Hive over CLI. The source code and documentation is hosted
at GitHub.

## Meritocracy

We intend to build a diverse developer and user community for the project
following the Apache meritocracy model. We want to encourage contributors
from multiple organizations, provide plenty of support to new developers
and welcome them to be committers.

## Community

Currently the project is being developed at InMobi. We hope to extend our
contributor and user base significantly in the future and build a solid
open source community around Grill.
Core Developers
Grill is currently being developed by Amareshwari Sriramadasu, Sharad
Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
the QA automation for Grill.

## Alignment

The ASF is a natural home to Grill as it is for Apache Hadoop, Apache Hive,
Apache Spark and other emerging projects in Big Data space.
We believe in any enterprise, multiple data warehouses will co-exist, as
not all workloads are cost effective to run on single one. Apache Hive is
one of the crucial data warehouse along with upcoming projects like Apache
Spark in Hadoop ecosystem. Grill will benefit in working in close proximity
with these projects.
The traditional Columnar data warehouses complement Apache Hive as certain
workloads continue to be cost

Re: [VOTE] Argus as a new incubator project

2014-07-22 Thread Sharad Agarwal
+1 (non-binding)


On Mon, Jul 21, 2014 at 9:33 PM, Owen O'Malley omal...@apache.org wrote:

 Following the discussion earlier, I'm calling a vote to accept Argus as a
 new Incubator project.

  The proposal draft is available at:
 https://wiki.apache.org/incubator/ArgusProposal, and is also included
  below.

  Vote is open for 72h and closes at 24 July 2014 at 10am PST.

  [ ] +1 accept Argus in the Incubator
  [ ] +/-0
  [ ] -1 because...

 I'm +1.

 .. Owen



Re: [VOTE] Accept Optiq into the incubator

2014-05-12 Thread Sharad Agarwal
+1 (non-binding)


On Fri, May 9, 2014 at 11:33 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Based on the results of the discussion thread (

 http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E
  ),  I would like to call a vote on accepting Optiq into the incubator.

 [ ] +1 Accept Optiq into the Incubator
 [ ] +0 Indifferent to the acceptance of Stratosphere
 [ ] -1 Do not accept Optiq because ...

 The vote will be open until Tuesday May 13 18:00 UTC.

 https://wiki.apache.org/incubator/OptiqProposal

 = Optiq =
 == Abstract ==

 Optiq is a framework that allows efficient translation of queries involving
 heterogeneous and federated data.

 == Proposal ==

 Optiq is a highly customizable engine for parsing and planning queries on
 data in a wide variety of formats. It allows database-like access, and in
 particular a SQL interface and advanced query optimization, for data not
 residing in a traditional database.

 == Background ==

 Databases were traditionally engineered in a monolithic stack, providing a
 data storage format, data processing algorithms, query parser, query
 planner, built-in functions, metadata repository and connectivity layer.
 They innovate in some areas but rarely in all.

 Modern data management systems are decomposing that stack into separate
 components, separating data, processing engine, metadata, and query
 language support. They are highly heterogeneous, with data in multiple
 locations and formats, caching and redundant data, different workloads, and
 processing occurring in different engines.

 Query planning (sometimes called query optimization) has always been a key
 function of a DBMS, because it allows the implementors to introduce new
 query-processing algorithms, and allows data administrators to re-organize
 the data without affecting applications built on that data. In a
 componentized system, the query planner integrates the components (data
 formats, engines, algorithms) without introducing unncessary coupling or
 performance tradeoffs.

 But building a query planner is hard; many systems muddle along without a
 planner, and indeed a SQL interface, until the demand from their customers
 is overwhelming.

 There is an opportunity to make this process more efficient by creating a
 re-usable framework.

 == Rationale ==

 Optiq allows database-like access, and in particular a SQL interface and
 advanced query optimization, for data not residing in a traditional
 database. It is complementary to many current Hadoop and NoSQL systems,
 which have innovative and performant storage and runtime systems but lack a
 SQL interface and intelligent query translation.

 Optiq is already in use by several projects, including Apache Drill, Apache
 Hive and Cascading Lingual, and commercial products.

 Optiq's architecture consists of:

 An extensible relational algebra.
  * SPIs (service-provider interfaces) for metadata (schemas and tables),
 planner rules, statistics, cost-estimates, user-defined functions.
  * Built-in sets of rules for logical transformations and common
 data-sources.
  * Two query planning engines driven by rules, statistics, etc. One engine
 is cost-based, the other rule-based.
  * Optional SQL parser, validator and translator to relational algebra.
  * Optional JDBC driver.

 == Initial Goals ==

 The initial goals are be to move the existing codebase to Apache and
 integrate with the Apache development process. Once this is accomplished,
 we plan for incremental development and releases that follow the Apache
 guidelines.

 As we move the code into the org.apache namespace, we will restructure
 components as necessary to allow clients to use just the components of
 Optiq that they need.

 A version 1.0 release, including pre-built binaries, will foster wider
 adoption.

 == Current Status ==

 Optiq has had over a dozen minor releases over the last 18 months. Its core
 SQL parser and validator, and its planning engine and core rules, are
 mature and robust and are the basis for several production systems; but
 other components and SPIs are still undergoing rapid evolution.

 === Meritocracy ===

 We plan to invest in supporting a meritocracy. We will discuss the
 requirements in an open forum. We encourage the companies and projects
 using Optiq to discuss their requirements in an open forum and to
 participate in development. We will encourage and monitor community
 participation so that privileges can be extended to those that contribute.

 Optiq's pluggable architecture encourages developers to contribute
 extensions such as adapters for data sources, new planning rules, and
 better statistics and cost-estimation functions. We look forward to
 fostering a rich ecosystem of extensions.

 === Community ===

 Building a data management system requires a high degree of technical
 skill, and correspondingly, the community of developers directly using
 

Re: [VOTE] Accept Storm into the Incubator

2013-09-14 Thread Sharad Agarwal
+1 (non-binding)


On Fri, Sep 13, 2013 at 12:49 AM, Doug Cutting cutt...@apache.org wrote:

 Discussion about the Storm proposal has subsided, issues raised now
 seemingly resolved.

 I'd like to call a vote to accept Storm as a new Incubator podling.

 The proposal is included below and is also at:

   https://wiki.apache.org/incubator/StormProposal

 Let's keep the vote open for four working days, until 18 September.

 [ ] +1 Accept Storm into the Incubator
 [ ] +0 Don't care.
 [ ] -1 Don't accept Storm because...

 Doug


 = Storm Proposal =

 == Abstract ==

 Storm is a distributed, fault-tolerant, and high-performance realtime
 computation system that provides strong guarantees on the processing
 of data.

 == Proposal ==

 Storm is a distributed real-time computation system. Similar to how
 Hadoop provides a set of general primitives for doing batch
 processing, Storm provides a set of general primitives for doing
 real-time computation. Its use cases span stream processing,
 distributed RPC, continuous computation, and more. Storm has become a
 preferred technology for near-realtime big-data processing by many
 organizations worldwide (see a partial list at
 https://github.com/nathanmarz/storm/wiki/Powered-By). As an open
 source project, Storm’s developer community has grown rapidly to 46
 members.

 == Background ==

 The past decade has seen a revolution in data processing. MapReduce,
 Hadoop, and related technologies have made it possible to store and
 process data at scales previously unthinkable. Unfortunately, these
 data processing technologies are not realtime systems, nor are they
 meant to be. The lack of a Hadoop of realtime has become the biggest
 hole in the data processing ecosystem. Storm fills that hole.

 Storm was initially developed and deployed at BackType in 2011. After
 7 months of development BackType was acquired by Twitter in July 2011.
 Storm was open sourced in September 2011.

 Storm has been under continuous development on its Github repository
 since being open-sourced. It has undergone four major releases (0.5,
 0.6, 0.7, 0.8) and many minor ones.


 == Rationale ==

 Storm is a general platform for low-latency big-data processing. It is
 complementary to the existing Apache projects, such as Hadoop. Many
 applications are actually exploring using both Hadoop and Storm for
 big-data processing. Bringing Storm into Apache is very beneficial to
 both Apache community and Storm community.

 The rapid growth of Storm community is empowered by open source. We
 believe the Apache foundation is a great fit as the long-term home for
 Storm, as it provides an established process for community-driven
 development and decision making by consensus. This is exactly the
 model we want for future Storm development.

 == Initial Goals ==

* Move the existing codebase to Apache
* Integrate with the Apache development process
* Ensure all dependencies are compliant with Apache License version 2.0
* Incremental development and releases per Apache guidelines

 == Current Status ==

 Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many
 minor ones. Storm 0.9 is about to be released. Storm is being used in
 production by over 50 organizations. Storm codebase is currently
 hosted at github.com, which will seed the Apache git repository.

 === Meritocracy ===

 We plan to invest in supporting a meritocracy. We will discuss the
 requirements in an open forum. Several companies have already
 expressed interest in this project, and we intend to invite additional
 developers to participate. We will encourage and monitor community
 participation so that privileges can be extended to those that
 contribute.

 === Community ===

 The need for a low-latency big-data processing platform in the open
 source is tremendous. Storm is currently being used by at least 50
 organizations worldwide (see
 https://github.com/nathanmarz/storm/wiki/Powered-By), and is the most
 starred Java project on Github. By bringing Storm into Apache, we
 believe that the community will grow even bigger.

 === Core Developers ===

 Storm was started by Nathan Marz at BackType, and now has developers
 from Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies.

 === Alignment ===

 In the big-data processing ecosystem, Storm is a very popular
 low-latency platform, while Hadoop is the primary platform for batch
 processing. We believe that it will help the further growth of
 big-data community by having Hadoop and Storm aligned within Apache
 foundation. The alignment is also beneficial to other Apache
 communities (such as Zookeeper, Thrift, Mesos). We could include
 additional sub-projects, Storm-on-YARN and Storm-on-Mesos, in the near
 future.

 == Known Risks ==

 === Orphaned Products ===

 The risk of the Storm project being abandoned is minimal. There are at
 least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu,
 Alibaba, Alipay, Taobao, PARC, RocketFuel etc) 

Re: [PROPOSAL] Storm for Apache Incubator

2013-09-04 Thread Sharad Agarwal
+1 (non-binding)


Re: [VOTE] Accept Falcon into the Apache Incubator (was originally named Ivory)

2013-03-21 Thread Sharad Agarwal
 those who make solid contributions to committer status.

 === Community ===
 We are happy to report that the initial team already represents
 multiple organizations. We hope to extend the user and developer base
 further in the future and build a solid open source community around
 Falcon.

 === Core Developers ===
 Falcon is currently being developed by three engineers from InMobi –
 Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
 employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
 Palaniswamy and Thiruvel Thirumoolan, were also involved in the
 initial design discussions. Srikanth, Shwetha and Shaik are the
 original developers. All the engineers have built two generations of
 Data Management on Hadoop, having deep expertise in Hadoop and are
 quite familiar with the Hadoop Ecosystem. Samarth Gupta  Rishu
 Mehrothra, both from InMobi have build the QA automation for Falcon.

 === Alignment ===
 The ASF is a natural host for Falcon given that it is already the home
 of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
 projects. Falcon has been designed to solve the data management
 challenges and opportunities of the Hadoop ecosystem family of
 products. Falcon fills the gap that Hadoop ecosystem has been lacking
 in the areas of data processing and data lifecycle management.

 == Known Risks ==

 === Orphaned products  Reliance on Salaried Developers ===
 The core developers plan to work full time on the project. There is
 very little risk of Falcon getting orphaned. Falcon is in use by
 companies we work for so the companies have an interest in its
 continued vitality.

 === Inexperience with Open Source ===
 All of the core developers are active users and followers of open
 source. Srikanth Sundarrajan has been contributing patches to Apache
 Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
 Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
 Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
 Sanjay Radia are PMC members on Apache Hadoop.

 === Homogeneous Developers ===
 The current core developers are from diverse set of organizations such
 as InMobi and Hortonworks. We expect to quickly establish a developer
 community that includes contributors from several corporations post
 incubation.

 === Reliance on Salaried Developers ===
 Currently, most developers are paid to do work on Falcon but few are
 contributing in their spare time. However, once the project has a
 community built around it post incubation, we expect to get committers
 and developers from outside the current core developers.

 === Relationships with Other Apache Products ===
 Falcon is going to be used by the users of Hadoop and the Hadoop
 ecosystem in general.

 === A Excessive Fascination with the Apache Brand ===
 While we respect the reputation of the Apache brand and have no doubts
 that it will attract contributors and users, our interest is primarily
 to give Falcon a solid home as an open source project following an
 established development model. We have also given reasons in the
 Rationale and Alignment sections.

 == Documentation ==http://wiki.apache.org/incubator/FalconProposal

 == Initial Source ==
 The source is currently in github repository at:
 https://github.com/sriksun/Falcon

 == Source and Intellectual Property Submission Plan ==
 The complete Falcon code is under Apache Software License 2.

 == External Dependencies ==
 The dependencies all have Apache compatible licenses. These include
 BSD, MIT licensed dependencies.

 == Cryptography ==
 None

 == Required Resources ==

 === Mailing lists ===

  * falcon-dev AT incubator DOT apache DOT org
  * falcon-commits AT incubator DOT apache DOT org
  * falcon-user AT incubator apache DOT org
  * falcon-private AT incubator DOT apache DOT org

 === Subversion Directory ===
 Git is the preferred source control system: git://git.apache.org/falcon

 === Issue Tracking ===
 JIRA FALCON

 == Initial Committers ==
  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
  * Shwetha GS (shwetha.gs AT inmobi DOT com)
  * Shaik Idris (shaik.idris AT inmobi DOT com)
  * Venkatesh Seetharam (Venkatesh AT apache DOT org)
  * Sanjay Radia (sanjay AT apache DOT org)
  * Sharad Agarwal (sharad AT apache DOT org)
  * Amareshwari SR (amareshwari AT apache DOT org)
  * Samarth Gupta (samarth.gupta AT inmobi DOT com)
  * Rishu Mehrothra (rishu.mehrothra AT inmobi DOT com)

 == Affiliations ==
  * Srikanth Sundarrajan (InMobi)
  * Shwetha GS (InMobi)
  * Shaik Idris (InMobi)
  * Venkatesh Seetharam (Hortonworks Inc.)
  * Sanjay Radia (Hortonworks Inc.)
  * Sharad Agarwal (InMobi)
  * Amareshwari SR (InMobi)
  * Samarth Gupta (InMobi)
  * Rishu Mehrothra (InMobi)

 == Sponsors ==

 === Champion ===
  * Arun C Murthy (acmurthy at apache dot org)

 === Nominated Mentors ===
  * Alan Gates (gates AT apache DOT org)
  * Chris Douglas (cdouglas AT apache DOT org

Re: [VOTE] Accept Tajo into the Apache Incubator

2013-02-28 Thread Sharad Agarwal
+1 (non-binding)

On Thu, Feb 28, 2013 at 11:41 PM, Hyunsik Choi hyun...@apache.org wrote:

 Hi Folks,

 I'd like to call a VOTE for acceptance of Tajo into the Apache incubator.
 The vote will close on Mar 7 at 6:00 PM (PST).

 [] +1 Accept Tajo into the Apache incubator
 [] +0 Don't care.
 [] -1 Don't accept Tajo into the incubator because...

 Full proposal is pasted at the bottom on this email, and the corresponding
 wiki is http://wiki.apache.org/incubator/TajoProposal.

 Only VOTEs from Incubator PMC members are binding, but all are welcome to
 express their thoughts.

 Thanks,
 Hyunsik

 PS: From the initial discussion, the main changes are that I've added 4 new
 committers. Also, I've revised some description of Known Risks because the
 initial committers have been diverse.

 
 Tajo Proposal

 = Abstract =

 Tajo is a distributed data warehouse system for Hadoop.


 = Proposal =

 Tajo is a relational and distributed data warehouse system for Hadoop. Tajo
 is designed for low-latency and scalable ad-hoc queries, online aggregation
 and ETL on large-data sets by leveraging advanced database techniques. It
 supports SQL standards. Tajo is inspired by Dryad, MapReduce, Dremel,
 Scope, and parallel databases. Tajo uses HDFS as a primary storage layer,
 and it has its own query engine which allows direct control of distributed
 execution and data flow. As a result, Tajo has a variety of query
 evaluation strategies and more optimization opportunities. In addition,
 Tajo will have a native columnar execution and and its optimizer. Tajo will
 be an alternative choice to Hive/Pig on the top of MapReduce.


 = Background =

 Big data analysis has gained much attention in the industrial. Open source
 communities have proposed scalable and distributed solutions for ad-hoc
 queries on big data. However, there is still room for improvement. Markets
 need more faster and efficient solutions. Recently, some alternatives
 (e.g., Cloudera's Impala and Amazon Redshift) have come out.


 = Rationale =

 There are a variety of open source distributed execution engines (e.g.,
 hive, and pig) running on the top of MapReduce. They are limited by MR
 framework. They cannot directly control distributed execution and data
 flow, and they just use MR framework. So, they have limited query
 evaluation strategies and optimization opportunities. It is hard for them
 to be optimized for a certain type of data processing.


 = Initial Goals =

 The initial goal is to write more documents to describe Tajo's internal. It
 will be helpful to recruit more committers and to build a solid community.
 Then, we will make milestones for short/long term plans.


 = Current Status =

 Tajo is in the alpha stage. Users can execute usual SQL queries (e.g.,
 selection, projection, group-by, join, union and sort) except for nested
 queries. Tajo provides various row/column storage formats, such as CSV,
 RowFile (a row-store file we have implemented), RCFile, and Trevni, and it
 also has a rudimentary ETL feature to transform one data format to another
 data format. In addition, Tajo provides hash and range repartitions. By
 using both repartition methods, Tajo processes aggregation, join, and sort
 queries over a number of cluster nodes. To evaluate the performance, we
 have carried out benchmark test using TPC-H 1TB on 32 cluster nodes.


 == Meritocracy ==

 We will discuss the milestone and the future plan in an open forum. We plan
 to encourage an environment that supports a meritocracy. The contributors
 will have different privileges according to their contributions.


 == Community ==

 Big data analysis has gained attention from open source communities,
 industrial and academic areas. Some projects related to Hadoop already have
 very large and active communities. We expect that Tajo also will establish
 an active community. Since Tajo already works for some features and is in
 the alpha stage, it will attract a large community soon.


 == Core Developers ==

 Core developers are a diverse group of developers, many of which are very
 experienced in open source and the Apache Hadoop ecosystem.

  * Eli Reisman ereisman AT apache DOT org

  * Henry Saputra hsaputra AT apache DOT org

  * Hyunsik Choi hyunsik AT apache DOT org

  * Jae Hwa Jung jhjung AT gruter DOT com

  * Jihoon Son ghoonson AT gmail DOT com

  * Jin Ho Kim jhkim AT gruter DOT com

  * Roshan Sumbaly rsumbaly AT gmail DOT com

  * Sangwook Kim swkim AT inervit DOT com

  * Yi A Liu yi DOT a DOT liu AT intel DOT com


 == Alignment ==

 Tajo employs Apache Hadoop Yarn as a resource management platform for large
 clusters. It uses HDFS as a primary storage layer. It already supports
 Hadoop-related data formats (RCFile, Trevni) and will support ORC file. In
 addition, we have a plan to integrate Tajo with other products of Hadoop
 ecosystem. Tajo's modules are well organized, and these modules can also be
 used for other projects.


 = Known Risks =

 == 

Re: [VOTE] Accept Tez into Incubator

2013-02-22 Thread Sharad Agarwal
+1 (non-binding)

sharad