Re: [VOTE] Graduate Lens from the Incubator

2015-07-24 Thread Sharad Agarwal
+1 (binding)

On Sat, Jul 25, 2015 at 3:49 AM, Jakob Homan  wrote:

> Following two positive discussions[1][2] about its curent status, the
> Lens community has voted[3] to graduate from the Incubator.  The vote
> passed with 22 +1s:
>
> Binding +1 x 14: {Jakob, Jean-Baptiste, Yash, Amareshwari, Sharad,
> Raghavendra, Raju, Jaideep, Suma, Himanshu, Rajat, Srikanth, Chris,
> Arshad}
>
> Non-binding +1 x 8: {Jothi, Kartheek, Tushar, Nitin, Pranav, Deepak,
> Ajay, Naresh}
>
> The Lens community has:
> * completed all required paperwork:
> https://incubator.apache.org/projects/lens.html
> * completed multiple releases (2.0.1-beta, 2.1.-beta, 2.2.0-beta)
> * completed the name check procedure:
> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-63
> * opened nearly 700 JIRAs:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20LENS
> * voted in multiple new committers/PPMC members.
> * been recommended as ready to graduate by the Incubator's shepards:
> https://wiki.apache.org/incubator/July2015
>
> Therefore, I'm calling a VOTE to graduate Lens with the following
> Board resolution.  The VOTE will run 96 hours (an extra day since
> we're starting on a Friday), ending Tuesday July 28 4 PM PST.
>
> [ ] +1 Graduate Apache Lens from the Incubator.
> [ ] +0 Don't care.
> [ ] -1 Don't graduate Apache Lens from the Incubator because ...
>
> Here's my binding vote: +1.
> -Jakob
>
> [1] http://s.apache.org/LensGradDiscuss1
> [2] http://s.apache.org/LensGradDiscuss2
> [3] http://s.apache.org/LensGradVotePPMC
>
>  Apache Lens graduation resolution draft
> WHEREAS, the Board of Directors deems it to be in the best interests of
> the Foundation and consistent with the Foundation's purpose to establish
> a Project Management Committee charged with the creation and maintenance
> of open-source software, for distribution at no charge to the public,
> related to unified analytics across multiple tiered data stores.
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> (PMC), to be known as the "Apache Lens Project", be and hereby is
> established pursuant to Bylaws of the Foundation; and be it further
>
> RESOLVED, that the Apache Lens Project be and hereby is responsible
> for the creation and maintenance of software related to unified analytics
> across multiple tiered data stores; and be it further
>
> RESOLVED, that the office of "Vice President, Apache Lens" be and hereby
> is created, the person holding such office to serve at the direction of
> the Board of Directors as the chair of the Apache Lens Project, and to
> have primary responsibility for management of the projects within the
> scope of responsibility of the Apache Lens Project; and be it further
>
> RESOLVED, that the persons listed immediately below be and hereby are
> appointed to serve as the initial members of the Apache Lens Project:
>
> * Amareshwari Sriramadasu 
> * Arshad Matin 
> * Gunther Hagleitner 
> * Himanshu Gahlaut 
> * Jaideep Dhok 
> * Jean Baptiste Onofre 
> * Raghavendra Singh 
> * Rajat Khandelwal 
> * Raju Bairishetti 
> * Sharad Agarwal 
> * Sreekanth Ramakrishnan 
> * Srikanth Sundarrajan 
> * Suma Shivaprasad 
> * Vikram Dixit 
> * Vinod Kumar Vavilapalli 
> * Yash Sharma 
>
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Amareshwari Sriramadasu
> be appointed to the office of Vice President, Apache Lens, to serve in
> accordance with and subject to the direction of the Board of Directors
> and the Bylaws of the Foundation until death, resignation, retirement,
> removal or disqualification, or until a successor is appointed; and be
> it further
>
> RESOLVED, that the Apache Lens Project be and hereby is tasked with
> the migration and rationalization of the Apache Incubator Lens
> podling; and be it further
>
> RESOLVED, that all responsibilities pertaining to the Apache Incubator
> Lens podling encumbered upon the Apache Incubator Project are
> hereafter discharged.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Release of Apache Lens 2.2.0-beta-incubating

2015-07-13 Thread Sharad Agarwal
+1 (binding)

On Sun, Jul 12, 2015 at 8:39 AM, Jaideep Dhok 
wrote:

> Hello everyone,
>
> This is the call for vote for the following RC to be released as official
> Apache Lens 2.2.0-beta-incubating release. This is our third release.
>
> Apache Lens provides an Unified Analytics interface. Lens aims to cut the
> Data Analytics silos by providing a single view of data across multiple
> tiered data stores and optimal execution environment for the analytical
> query. It seamlessly integrates Hadoop with traditional data warehouses to
> appear like one.
>
> Vote on dev list:
>
> http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThzQCHdYVASR35zeYqHj_tWo93GuzTLzCrmRAq3qMjecg%40mail.gmail.com%3E
>
> Result of vote on dev list:
>
> http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201507.mbox/%3CCAPYoVThOEAeMiNdtef%3D35QxRLetryRFKs3ED-oeCh2xi1KEqww%40mail.gmail.com%3E
>
> The commit id is 9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f
> <
> https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f
> >
>  -
>
> https://git-wip-us.apache.org/repos/asf/incubator-lens/repo?p=incubator-lens.git;a=commit;h=9c45f1cb4c69ec5de6fe3320abdd5bd85c250e9f
>
>
> This corresponds to the tag: apache-lens-2.2.0-beta-incubating -
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.2.0-beta-incubating
>
>
> The release archives (tar.gz/.zip), signature, and checksums are
> here:
> *
> https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/
> <
> https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.2.0-beta-incubating-rc0/
> >*
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS
>
> The release candidate consists of the following source distribution
> archive:
> apache-lens-2.2.0-beta-incubating-source-release.zip
>
> In addition, the following supplementary binary distributions are
> provided for user convenience at the same location:
> apache-lens-2.2.0-beta-incubating-bin.tar.gz
>
> The licensing of bundled bits in the archives have not changed from
> previous release, which are
> documented
> https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens
>
> The Nexus Staging URL:
> https://repository.apache.org/content/repositories/orgapachelens-1005
>
> Release notes available at
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329586&projectId=12315923
>
>
> Vote will be open for at least 72 hours
>
> [ ] +1 approve
> [ ] 0 no opinion
> [ ] -1 disapprove (and reason why)
>
> +1 from my side for the release.
>
> Thanks,
> Jaideep Dhok
>
> --
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>


Re: [VOTE] Accept Apache Atlas into Apache Incubator

2015-05-04 Thread Sharad Agarwal
+1 (binding)

On Fri, May 1, 2015 at 12:56 PM, Seetharam Venkatesh <
venkat...@innerzeal.com> wrote:

> Hello folks,
>
> Following the discussion earlier in the thread: http://s.apache.org/r2
>
> I would like to call a VOTE for accepting Apache Atlas as a new incubator
> project.
>
> The proposal is available at:
> https://wiki.apache.org/incubator/AtlasProposal
> Also, the text of the latest wiki proposal is included at the bottom of
> this email.
>
> The VOTE is open for at least the next 72 hours:
>
>  [ ] +1 accept Apache Atlas into the Apache Incubator
>  [ ] ±0 Abstain
>  [ ] -1 because...
>
> Of course I am +1! (non-binding)
>
> Thanks!
>
>
> = Apache Atlas Proposal =
>
> == Abstract ==
>
> Apache Atlas is a scalable and extensible set of core foundational
> governance services that enables enterprises to effectively and efficiently
> meet their compliance requirements within Hadoop and allows integration
> with the complete enterprise data ecosystem.
>
> == Proposal ==
>
> Apache Atlas allows agnostic governance visibility into Hadoop, these
> abilities are enabled through a set of core foundational services powered
> by a flexible metadata repository.
>
> These services include:
>
>  * Search and Lineage for datasets
>  * Metadata driven data access control
>  * Indexed and Searchable Centralized Auditing operational Events
>  * Data lifecycle management – ingestion to disposition
>  * Metadata interchange with other metadata tools
>
> == Background ==
>
> Hadoop is one of many platforms in the modern enterprise data ecosystem and
> requires governance controls commensurate with this reality.
>
> Currently, there is no easy or complete way to provide comprehensive
> visibility and control into Hadoop audit, lineage, and security for
> workflows that require Hadoop and non-Hadoop processing.
>
> Many solutions are usually point based, and require a monolithic
> application workflow.  Multi-tenancy and concurrency are problematic as
> these offerings are not aware of activity outside of their narrow focus.
>
> As Hadoop gains greater popularity, governance concerns will become
> increasingly vital to increasing maturity and furthering adoption. It is a
> particular barrier to expanding enterprise data under management.
>
> == Rationale ==
>
> Atlas will address issues previously discussed by providing governance
> capabilities in Hadoop -- using both a prescriptive and forensic model
> enriched by business taxonomical metadata.Atlas, at its core, is
> designed to exchange metadata with other tools and processes within and
> outside of the Hadoop stack -- enable governance controls that are truly
> platform agnostic and effectively (and defensibly) address compliance
> concerns.
>
> Initially working with a group of leading partners in several industries,
> Atlas is built to solve specific real world governance problems that
> accelerate product maturity and time to value.
>
> Atlas aims to grow a community to help build a widely adopted pattern for
> governance, metadata modeling and exchange in Hadoop – which will advance
> the interests for the whole community.
>
> == Current Status ==
>
> An initial version with a valuable set of features is developed by the list
> of initial committers and is hosted on github.
>
> === Meritocracy ===
>
> Our intent with this proposal is to start building a diverse  developer
> community around Atlas following the Apache meritocracy model. We have
> wanted to make the project open source and encourage contributors from
> multiple organizations from the start.
>
> We plan to provide plenty of support to new developers and to quickly
> recruit those who make solid contributions to committer status.
>
> === Community ===
>
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Atlas.
>
> === Core Developers ===
>
> Atlas development is currently being led by engineers from Hortonworks –
> Harish Butani, Venkatesh Seetharam, Shwetha G S, and Jon Maron. All the
> engineers have deep expertise in Hadoop and are quite familiar with the
> Hadoop Ecosystem.
>
> === Alignment ===
>
> The ASF is a natural host for Atlas given that it is already the home of
> Hadoop, Falcon, Hive,  Pig, Oozie, Knox, Ranger, and other emerging “big
> data” software projects.
>
> Atlas has been designed to solve the data governance challenges and
> opportunities of the Hadoop ecosystem family of products as well as
> integration to the tradition Enterprise Data ecosystem.
>
> Atlas fills the gap that the Hadoop Ecosystem has been lacking in the areas
> of data governance and compliance management.
>
> == Known Risks ==
>
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little risk of Atlas getting orphaned.  A prototype of Atlas is in use and
> being

Re: [VOTE] Release of Apache Lens 2.1.0-beta-incubating

2015-05-04 Thread Sharad Agarwal
+1 (binding)

On Thu, Apr 30, 2015 at 5:35 PM, Amareshwari Sriramdasu <
amareshw...@apache.org> wrote:

> Hello everyone,
>
> This is the call for vote for the following RC to be released as official
> ApacheLens 2.1.0-beta-incubating release. This is our second release.
>
> Apache Lens provides an Unified Analytics interface. Lens aims to cut the
> Data Analytics silos by providing a single view of data across multiple
> tiered data stores and optimal execution environment for the analytical
> query. It seamlessly integrates Hadoop with traditional data warehouses to
> appear like one.
> Vote on dev list:
>
> http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZfT4HDK3c4rKxPg0_Kkc8KDfRjUr%2BHmKaJH44H77OeU0g%40mail.gmail.com%3E
>
> Results of vote on dev list:
>
> http://mail-archives.apache.org/mod_mbox/incubator-lens-dev/201504.mbox/%3CCABJEuZe7rbjbwoiiOWKL8Lef%3Dsc%2BXcV173aiQ6Tpdwq7jz9ycQ%40mail.gmail.com%3E
>
> The commit id is fdd19b9c2b17e329465cbde62dbce6f8be435cec
> :
> https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=commit;h=fdd19b9c2b17e329465cbde62dbce6f8be435cec
>
> This corresponds to the tag: apache-lens-2.1.0-beta-incubating
> :
> https://git-wip-us.apache.org/repos/asf?p=incubator-lens.git;a=tag;h=refs/tags/apache-lens-2.1.0-beta-incubating
>
> The release archives (tar.gz/.zip), signature, and checksums are
> here:
> https://dist.apache.org/repos/dist/dev/incubator/lens/apache-lens-2.1.0-beta-incubating-rc0
>
> You can find the KEYS file here:
> * https://dist.apache.org/repos/dist/release/incubator/lens/KEYS
>
> The release candidate consists of the following source distribution
> archive:
> apache-lens-2.1.0-beta-incubating-source-release.zip
>
> In addition, the following supplementary binary distributions are
> provided for user convenience at the same location:
> apache-lens-2.1.0-beta-incubating-bin.tar.gz
>
> The licensing of bundled bits in the archives have not changed from
> previous release, which are documented at
> https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens
>
> The Nexus Staging URL:
> https://repository.apache.org/content/repositories/orgapachelens-1003
>
> Release notes available at
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315923&version=12328991
>
>
> Vote will be open for at least 72 hours . Please vote on releasing this RC
>
> [ ] +1 approve
> [ ] 0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Amareshwari
>


Re: [VOTE] Accept Zeppelin into the Apache Incubator

2014-12-18 Thread Sharad Agarwal
+1 (non-binding)

On Fri, Dec 19, 2014 at 10:59 AM, Roman Shaposhnik  wrote:
>
> Following the discussion earlier:
> http://s.apache.org/kTp
>
> I would like to call a VOTE for accepting
> Zeppelin as a new Incubator project.
>
> The proposal is available at:
> https://wiki.apache.org/incubator/ZeppelinProposal
> and is also attached to the end of this email.
>
> Vote is open until at least Sunday, 21th December 2014,
> 23:59:00 PST
>
> [ ] +1 Accept Zeppelin into the Incubator
> [ ] ±0 Indifferent to the acceptance of Zeppelin
> [ ] -1 Do not accept Zeppelin because ...
>
> Thanks,
> Roman.
>
> == Abstract ==
> Zeppelin is a collaborative data analytics and visualization tool for
> distributed, general-purpose data processing systems such as Apache
> Spark, Apache Flink, etc.
>
> == Proposal ==
> Zeppelin is a modern web-based tool for the data scientists to
> collaborate over large-scale data exploration and visualization
> projects. It is a notebook style interpreter that enable collaborative
> analysis sessions sharing between users. Zeppelin is independent of
> the execution framework itself. Current version runs on top of Apache
> Spark but it has pluggable interpreter APIs to support other data
> processing systems. More execution frameworks could be added at a
> later date i.e Apache Flink, Crunch as well as SQL-like backends such
> as Hive, Tajo, MRQL.
>
> We have a strong preference for the project to be called Zeppelin. In
> case that may not be feasible, alternative names could be: “Mir”,
> “Yuga” or “Sora”.
>
> == Background ==
> Large scale data analysis workflow includes multiple steps like data
> acquisition, pre-processing, visualization, etc and may include
> inter-operation of multiple different tools and technologies. With the
> widespread of the open source general-purpose data processing systems
> like Spark there is a lack of open source, modern user-friendly tools
> that combine strengths of interpreted language for data analysis with
> new in-browser visualization libraries and collaborative capabilities.
>
> Zeppelin initially started as a GUI tool for diverse set of
> SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open
> source since its inception in Sep 2013. Later, it became clear that
> there was a need for a greater web-based tool for data scientists to
> collaborate on data exploration over the large-scale projects, not
> limited to SQL. So Zeppelin integrated full support of Apache Spark
> while adding a collaborative environment with the ability to run and
> share interpreter sessions in-browser
>
> == Rationale ==
> There are no open source alternatives for a collaborative
> notebook-based interpreter with support of multiple distributed data
> processing systems.
>
> As a number of companies adopting and contributing back to Zeppelin is
> growing, we think that having a long-term home at Apache foundation
> would be a great fit for the project ensuring that processes and
> procedures are in place to keep project and community “healthy” and
> free of any commercial, political or legal faults.
>
> == Initial Goals ==
> The initial goals will be to move the existing codebase to Apache and
> integrate with the Apache development process. This includes moving
> all infrastructure that we currently maintain, such as: a website, a
> mailing list, an issues tracker and a Jenkins CI, as mentioned in
> “Required Resources” section of current proposal.
> Once this is accomplished, we plan for incremental development and
> releases that follow the Apache guidelines.
> To increase adoption the major goal for the project would be to
> provide integration with as much projects from Apache data ecosystem
> as possible, including new interpreters for Apache Hive, Apache Drill
> and adding Zeppelin distribution to Apache Bigtop.
> On the community building side the main goal is to attract a diverse
> set of contributors by promoting Zeppelin to wide variety of
> engineers, starting a Zeppelin user groups around the globe and by
> engaging with other existing Apache projects communities online.
>
>
> == Current Status ==
> Currently, Zeppelin has 4 released versions and is used in production
> at a number of companies across the globe mentioned in Affiliation
> section. Current implementation status is pre-release with public API
> not being finalized yet. Current main and default backend processing
> engine is Apache Spark with consistent support of SparkSQL.
> Zeppelin is distributed as a binary package which includes an embedded
> webserver, application itself, a set of libraries and startup/shutdown
> scripts. No platform-specific installation packages are provided yet
> but it is something we are looking to provide as part of Apache Bigtop
> integration.
> Project codebase is currently hosted at github.com, which will form
> the basis of the Apache git repository.
>
> === Meritocracy ===
> Zeppelin is an open source project that already leverages meritocracy
> p

[RESULT] [VOTE] Accept Lens into the Apache Incubator (earlier called Grill)

2014-10-09 Thread Sharad Agarwal
The vote has passed with 9 binding +1, 5 non binding +1, no 0 and -1s.

Binding +1s :
Jean Baptiste
Jan i
Alan D Cabrera
Jakob Homan
Chris Douglas
Roman Shaposhnik
Joe Brockmeier
Vinod K V
Suresh Srinivas


Non Binding +1s:
Sharad Agarwal
Amareshwari S
Seetharam Venkatesh
Srikanth Sundarrajan
Ashish

Thanks everyone for voting. We will proceed with the next steps as per the
IPMC guidelines.

Thanks
Sharad




On Mon, Oct 6, 2014 at 5:21 PM, Sharad Agarwal  wrote:

> Following the discussion earlier in the thread
> https://www.mail-archive.com/general@incubator.apache.org/msg45208.html
> I would like to call a Vote for accepting Lens as a new incubator project.
>
> The proposal is available at:
> https://wiki.apache.org/incubator/LensProposal
>
> Vote is open till Oct 09, 2014 4 PM PST.
>
>  [ ] +1 accept Lens in the Incubator
>  [ ] +/-0
>  [ ] -1 because...
>
> Only Votes from Incubator PMC members are binding, but all are welcome to
> express their thoughts.
> I am +1 (non-binding).
>
> Thanks
> Sharad
>


[VOTE] Accept Lens into the Apache Incubator (earlier called Grill)

2014-10-06 Thread Sharad Agarwal
Following the discussion earlier in the thread
https://www.mail-archive.com/general@incubator.apache.org/msg45208.html
I would like to call a Vote for accepting Lens as a new incubator project.

The proposal is available at:
https://wiki.apache.org/incubator/LensProposal

Vote is open till Oct 09, 2014 4 PM PST.

 [ ] +1 accept Lens in the Incubator
 [ ] +/-0
 [ ] -1 because...

Only Votes from Incubator PMC members are binding, but all are welcome to
express their thoughts.
I am +1 (non-binding).

Thanks
Sharad


Re: [PROPOSAL] Grill as new Incubator project

2014-10-03 Thread Sharad Agarwal
The discussion seems to be settled down. I will start the vote thread on
Lens shortly.


Re: [PROPOSAL] Grill as new Incubator project

2014-09-26 Thread Sharad Agarwal
Lens has the functional test suite that includes cube ddls, queries, test
data, scripts etc that requires standard build and test infra.
On Sep 27, 2014 3:45 AM, "David Nalley"  wrote:

> > currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
> > the QA automation for Grill.
> >
>
> What kind of QA environment does Drill/Lens have currently? How much
> do you expect to need going forward?
>
> --David
>


Re: [PROPOSAL] Grill as new Incubator project

2014-09-23 Thread Sharad Agarwal
Thanks Ted. We have renamed the proposal to Lens.

The proposal is pasted here ->
https://wiki.apache.org/incubator/LensProposal

Thanks
Sharad

On Tue, Sep 23, 2014 at 12:04 AM, Ted Dunning  wrote:

> Both Lens and Blend are nice names.  Nice connotations as well.
>
> I am slightly stunned by a quick search on the name Lens.  I only found one
> software package with that name (and it was for lens calibration so far
> from databases).  A name like that is usually massively over used.
>
> This might be a really nice opportunity to get a nice one syllable name.
>
>
>
>
> On Mon, Sep 22, 2014 at 2:59 AM, Sharad Agarwal  wrote:
>
> > Based on the feedback, we are considering to rename the project.
> >
> > Please provide feedback on following names:
> > Apache Lens
> > Apache Blend
> >
> > Thanks,
> > Sharad
> >
>


Re: [PROPOSAL] Grill as new Incubator project

2014-09-22 Thread Sharad Agarwal
Based on the feedback, we are considering to rename the project.

Please provide feedback on following names:
Apache Lens
Apache Blend

Thanks,
Sharad


Re: [PROPOSAL] Grill as new Incubator project

2014-09-19 Thread Sharad Agarwal
Chris,
Multi-dimensional here is in the context of OLAP cube ->
http://en.wikipedia.org/wiki/OLAP_cube
Grill data model consists of set of measures which can be analysed on
different dimensions.
For remote sensing, data can be modelled as cube ->  measurement on various
set of attributes(dimensions) as Facts; and time and space can be thought
of dimensions.
Yes, it supports numerical data.


Ted,
Both are in same general area, but I think there is very little chance of
confusion as clearly their propositions are completely different. And both
words are simple and widely used nouns.
We liked the name Grill as it is simple to spell and pronounce, and in some
way convey the project's meaning -> to question intensely.

Thanks,
Sharad

On Sat, Sep 20, 2014 at 12:11 AM, Ted Dunning  wrote:

> There is a strong phonetic similarity to Apache Drill, a project in the
> same general domain.
>
> Is the Grill name already baked in (pun intended)?
>
>
>
> On Fri, Sep 19, 2014 at 7:24 AM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Thank you Sharad. So I could use this system for remote sensing
> > data, like 3-dimension (time, space, and measurement) type of cubes?
> > Does it support numerical data well?
> >
> > Sorry for so many questions just excited :)
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> >
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Sharad Agarwal 
> > Reply-To: "sha...@apache.org" 
> > Date: Friday, September 19, 2014 4:06 AM
> > To: Chris Mattmann 
> > Cc: "general@incubator.apache.org" 
> > Subject: Re: [PROPOSAL] Grill as new Incubator project
> >
> > >Chris, Thanks for your comments.
> > >
> > >
> > >The differences that I see are:
> > >- SciDB exposes Array Data model and Array Query Language (AQL). Grill
> > >data model is based on OLAP Fact and Dimensions. Grill exposes SQL like
> > >language (a subset of Hive QL) that works on *logical* entities (facts,
> > >dimensions)
> > >
> > >
> > >- The goal of Grill is not to build a new query execution database, but
> > >to unify them by having a central metadata catalog, and provide a Cube
> > >abstraction layer on top of it.
> > >
> > >
> > >
> > >Thanks,
> > >Sharad
> > >
> > >
> > >On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980)
> > > wrote:
> > >
> > >This sounds super cool!
> > >
> > >How does this relate to SciDB? is it trying to do a similar thing?
> > >
> > >Cheers,
> > >Chris
> > >
> > >
> > >++
> > >Chris Mattmann, Ph.D.
> > >Chief Architect
> > >Instrument Software and Science Data Systems Section (398)
> > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >Office: 168-519, Mailstop: 168-527
> > >Email: chris.a.mattm...@nasa.gov
> > >WWW:  http://sunset.usc.edu/~mattmann/
> > >++
> > >Adjunct Associate Professor, Computer Science Department
> > >University of Southern California, Los Angeles, CA 90089 USA
> > >++
> > >
> > >
> > >
> > >
> > >
> > >
> > >-Original Message-
> > >From: Sharad Agarwal 
> > >Reply-To: "general@incubator.apache.org"  >,
> > >"sha...@apache.org" 
> > >Date: Thursday, September 18, 2014 8:54 PM
> > >To: "general@incubator.apache.org" 
> > >Subject: [PROPOSAL] Grill as new Incubator project
> > >
> > >>Grill Proposal
> > >>==
> > >>
> > >># Abstract
> > >>
> > >>Grill is a platform that enables multi-dimensional q

Re: [PROPOSAL] Grill as new Incubator project

2014-09-19 Thread Sharad Agarwal
Chris, Thanks for your comments.

The differences that I see are:
- SciDB exposes Array Data model and Array Query Language (AQL). Grill data
model is based on OLAP Fact and Dimensions. Grill exposes SQL like language
(a subset of Hive QL) that works on *logical* entities (facts, dimensions)

- The goal of Grill is not to build a new query execution database, but to
unify them by having a central metadata catalog, and provide a Cube
abstraction layer on top of it.

Thanks,
Sharad

On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> This sounds super cool!
>
> How does this relate to SciDB? is it trying to do a similar thing?
>
> Cheers,
> Chris
>
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++
>
>
>
>
>
>
> -Original Message-
> From: Sharad Agarwal 
> Reply-To: "general@incubator.apache.org" ,
> "sha...@apache.org" 
> Date: Thursday, September 18, 2014 8:54 PM
> To: "general@incubator.apache.org" 
> Subject: [PROPOSAL] Grill as new Incubator project
>
> >Grill Proposal
> >==
> >
> ># Abstract
> >
> >Grill is a platform that enables multi-dimensional queries in a unified
> >way
> >over datasets stored in multiple warehouses. Grill integrates Apache Hive
> >with other data warehouses by tiering them together to form logical data
> >cubes.
> >
> >
> ># Proposal
> >
> >Grill provides a unified Cube abstraction for data stored in different
> >stores. Grill tiers multiple data warehouses for unified representation
> >and
> >efficient access. It provides SQL-like Cube query language to query and
> >describe data sets organized in data cubes. It enables users to run
> >queries
> >against Facts and Dimensions that can span multiple physical tables stored
> >in different stores.
> >
> >The primary use cases that Grill aims to solve:
> >- Facilitate analytical queries by providing the OLAP like Cube
> >abstraction
> >- Data Discovery by providing single metadata layer for data stored in
> >different stores
> >- Unified access to data by integrating Hive with other traditional data
> >warehouses
> >
> >
> ># Background
> >
> >Apache Hive is a data warehouse that facilitates querying and managing
> >large datasets stored in distributed storage systems like HDFS. It
> >provides
> >SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
> >platform in various organizations for doing adhoc analytical queries.
> >In a typical Data warehouse scenario, the data is multi-dimensional and
> >organized into Facts and Dimensions to form Data Cubes. Grill provides
> >this
> >logical layer to enable querying and manage data as Cubes.
> >The Grill project is actively being developed at InMobi to provide the
> >higher level of analytical abstraction to query data stored in different
> >storages including Hive and beyond seamlessly.
> >
> >
> ># Rationale
> >
> >The Grill project aims to ease the analytical querying capabilities and
> >cut
> >the data-silos by providing a single view of data across multiple data
> >stores.
> >Conceiving data as a cube with hierarchical dimensions leads to
> >conceptually straightforward operations to facilitate analysis.
> >Integrating
> >Apache Hive with other traditional warehouses provides the opportunity to
> >optimize on the query execution cost by tiering the data across multiple
> >warehouses. Grill provides
> >- Access to data Cubes via Cube Query language similar to HiveQL.
> >- Driver based architecture to allow for plugging systems like Hive and
> >other warehouses such as columnar data RDBMS.
> >- Cost based engine selection that provides optimal use of resources by
> >selecting the best execution engine for a given query.
> >
> >In a typical Data warehouse, data is organized in Cubes with multiple
> >dimensions and measures. This facilitates the analysis by conceiving the
> >data in terms of Facts and Dimensions instead of

[PROPOSAL] Grill as new Incubator project

2014-09-18 Thread Sharad Agarwal
Grill Proposal
==

# Abstract

Grill is a platform that enables multi-dimensional queries in a unified way
over datasets stored in multiple warehouses. Grill integrates Apache Hive
with other data warehouses by tiering them together to form logical data
cubes.


# Proposal

Grill provides a unified Cube abstraction for data stored in different
stores. Grill tiers multiple data warehouses for unified representation and
efficient access. It provides SQL-like Cube query language to query and
describe data sets organized in data cubes. It enables users to run queries
against Facts and Dimensions that can span multiple physical tables stored
in different stores.

The primary use cases that Grill aims to solve:
- Facilitate analytical queries by providing the OLAP like Cube abstraction
- Data Discovery by providing single metadata layer for data stored in
different stores
- Unified access to data by integrating Hive with other traditional data
warehouses


# Background

Apache Hive is a data warehouse that facilitates querying and managing
large datasets stored in distributed storage systems like HDFS. It provides
SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
platform in various organizations for doing adhoc analytical queries.
In a typical Data warehouse scenario, the data is multi-dimensional and
organized into Facts and Dimensions to form Data Cubes. Grill provides this
logical layer to enable querying and manage data as Cubes.
The Grill project is actively being developed at InMobi to provide the
higher level of analytical abstraction to query data stored in different
storages including Hive and beyond seamlessly.


# Rationale

The Grill project aims to ease the analytical querying capabilities and cut
the data-silos by providing a single view of data across multiple data
stores.
Conceiving data as a cube with hierarchical dimensions leads to
conceptually straightforward operations to facilitate analysis. Integrating
Apache Hive with other traditional warehouses provides the opportunity to
optimize on the query execution cost by tiering the data across multiple
warehouses. Grill provides
- Access to data Cubes via Cube Query language similar to HiveQL.
- Driver based architecture to allow for plugging systems like Hive and
other warehouses such as columnar data RDBMS.
- Cost based engine selection that provides optimal use of resources by
selecting the best execution engine for a given query.

In a typical Data warehouse, data is organized in Cubes with multiple
dimensions and measures. This facilitates the analysis by conceiving the
data in terms of Facts and Dimensions instead of physical tables. Grill
aims to provide this logical Cube abstraction on Data warehouses like Hive
and other traditional warehouses.


# Initial Goals

- Donate the Grill source code and documentation to Apache Software
Foundation
- Build a user and developer community
- Support Hive and other Columnar data warehouses
- Support full query life cycle management
- Add authentication for querying cubes
- Provide detailed query statistics


# Long Term Goals

Here are some longer-term capabilities that would be added to Grill
- Add authorization for managing and querying Cubes
- Provide REST and CLI for full Admin controls
- Capability to schedule queries
- Query caching
- Integrate with Apache Spark. Creating Spark RDD from Grill query
- Integrate with Apache Optiq


# Current Status

The project is actively developed at InMobi. The first version is deployed
at InMobi 4 months back. This version allows querying dimension and fact
data stored in Hive over CLI. The source code and documentation is hosted
at GitHub.

## Meritocracy

We intend to build a diverse developer and user community for the project
following the Apache meritocracy model. We want to encourage contributors
from multiple organizations, provide plenty of support to new developers
and welcome them to be committers.

## Community

Currently the project is being developed at InMobi. We hope to extend our
contributor and user base significantly in the future and build a solid
open source community around Grill.
Core Developers
Grill is currently being developed by Amareshwari Sriramadasu, Sharad
Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
the QA automation for Grill.

## Alignment

The ASF is a natural home to Grill as it is for Apache Hadoop, Apache Hive,
Apache Spark and other emerging projects in Big Data space.
We believe in any enterprise, multiple data warehouses will co-exist, as
not all workloads are cost effective to run on single one. Apache Hive is
one of the crucial data warehouse along with upcoming projects like Apache
Spark in Hadoop ecosystem. Grill will benefit in working in close proximity
with these projects.
The traditional Columnar data warehouses complement Apache Hive as certain
workloads continue to be cost

Re: [VOTE] Argus as a new incubator project

2014-07-22 Thread Sharad Agarwal
+1 (non-binding)


On Mon, Jul 21, 2014 at 9:33 PM, Owen O'Malley  wrote:

> Following the discussion earlier, I'm calling a vote to accept Argus as a
> new Incubator project.
>
>  The proposal draft is available at:
> https://wiki.apache.org/incubator/ArgusProposal, and is also included
>  below.
>
>  Vote is open for 72h and closes at 24 July 2014 at 10am PST.
>
>  [ ] +1 accept Argus in the Incubator
>  [ ] +/-0
>  [ ] -1 because...
>
> I'm +1.
>
> .. Owen
>


Re: [VOTE] Accept Optiq into the incubator

2014-05-12 Thread Sharad Agarwal
+1 (non-binding)


On Fri, May 9, 2014 at 11:33 PM, Ashutosh Chauhan wrote:

> Based on the results of the discussion thread (
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E
>  ),  I would like to call a vote on accepting Optiq into the incubator.
>
> [ ] +1 Accept Optiq into the Incubator
> [ ] +0 Indifferent to the acceptance of Stratosphere
> [ ] -1 Do not accept Optiq because ...
>
> The vote will be open until Tuesday May 13 18:00 UTC.
>
> https://wiki.apache.org/incubator/OptiqProposal
>
> = Optiq =
> == Abstract ==
>
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
>
> == Proposal ==
>
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
>
> == Background ==
>
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
>
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
>
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
>
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand from their customers
> is overwhelming.
>
> There is an opportunity to make this process more efficient by creating a
> re-usable framework.
>
> == Rationale ==
>
> Optiq allows database-like access, and in particular a SQL interface and
> advanced query optimization, for data not residing in a traditional
> database. It is complementary to many current Hadoop and NoSQL systems,
> which have innovative and performant storage and runtime systems but lack a
> SQL interface and intelligent query translation.
>
> Optiq is already in use by several projects, including Apache Drill, Apache
> Hive and Cascading Lingual, and commercial products.
>
> Optiq's architecture consists of:
>
> An extensible relational algebra.
>  * SPIs (service-provider interfaces) for metadata (schemas and tables),
> planner rules, statistics, cost-estimates, user-defined functions.
>  * Built-in sets of rules for logical transformations and common
> data-sources.
>  * Two query planning engines driven by rules, statistics, etc. One engine
> is cost-based, the other rule-based.
>  * Optional SQL parser, validator and translator to relational algebra.
>  * Optional JDBC driver.
>
> == Initial Goals ==
>
> The initial goals are be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines.
>
> As we move the code into the org.apache namespace, we will restructure
> components as necessary to allow clients to use just the components of
> Optiq that they need.
>
> A version 1.0 release, including pre-built binaries, will foster wider
> adoption.
>
> == Current Status ==
>
> Optiq has had over a dozen minor releases over the last 18 months. Its core
> SQL parser and validator, and its planning engine and core rules, are
> mature and robust and are the basis for several production systems; but
> other components and SPIs are still undergoing rapid evolution.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. We encourage the companies and projects
> using Optiq to discuss their requirements in an open forum and to
> participate in development. We will encourage and monitor community
> participation so that privileges can be extended to those that contribute.
>
> Optiq's pluggable architecture encourages developers to contribute
> extensions such as adapters for data sources, new planning rules, and
> better statistics and cost-estimation functions. We look forward to
> fostering a rich ecosystem of extensions.
>
> === Community ===
>
> Building a data management system requires a high d

Re: [VOTE] Accept Storm into the Incubator

2013-09-14 Thread Sharad Agarwal
+1 (non-binding)


On Fri, Sep 13, 2013 at 12:49 AM, Doug Cutting  wrote:

> Discussion about the Storm proposal has subsided, issues raised now
> seemingly resolved.
>
> I'd like to call a vote to accept Storm as a new Incubator podling.
>
> The proposal is included below and is also at:
>
>   https://wiki.apache.org/incubator/StormProposal
>
> Let's keep the vote open for four working days, until 18 September.
>
> [ ] +1 Accept Storm into the Incubator
> [ ] +0 Don't care.
> [ ] -1 Don't accept Storm because...
>
> Doug
>
>
> = Storm Proposal =
>
> == Abstract ==
>
> Storm is a distributed, fault-tolerant, and high-performance realtime
> computation system that provides strong guarantees on the processing
> of data.
>
> == Proposal ==
>
> Storm is a distributed real-time computation system. Similar to how
> Hadoop provides a set of general primitives for doing batch
> processing, Storm provides a set of general primitives for doing
> real-time computation. Its use cases span stream processing,
> distributed RPC, continuous computation, and more. Storm has become a
> preferred technology for near-realtime big-data processing by many
> organizations worldwide (see a partial list at
> https://github.com/nathanmarz/storm/wiki/Powered-By). As an open
> source project, Storm’s developer community has grown rapidly to 46
> members.
>
> == Background ==
>
> The past decade has seen a revolution in data processing. MapReduce,
> Hadoop, and related technologies have made it possible to store and
> process data at scales previously unthinkable. Unfortunately, these
> data processing technologies are not realtime systems, nor are they
> meant to be. The lack of a "Hadoop of realtime" has become the biggest
> hole in the data processing ecosystem. Storm fills that hole.
>
> Storm was initially developed and deployed at BackType in 2011. After
> 7 months of development BackType was acquired by Twitter in July 2011.
> Storm was open sourced in September 2011.
>
> Storm has been under continuous development on its Github repository
> since being open-sourced. It has undergone four major releases (0.5,
> 0.6, 0.7, 0.8) and many minor ones.
>
>
> == Rationale ==
>
> Storm is a general platform for low-latency big-data processing. It is
> complementary to the existing Apache projects, such as Hadoop. Many
> applications are actually exploring using both Hadoop and Storm for
> big-data processing. Bringing Storm into Apache is very beneficial to
> both Apache community and Storm community.
>
> The rapid growth of Storm community is empowered by open source. We
> believe the Apache foundation is a great fit as the long-term home for
> Storm, as it provides an established process for community-driven
> development and decision making by consensus. This is exactly the
> model we want for future Storm development.
>
> == Initial Goals ==
>
>* Move the existing codebase to Apache
>* Integrate with the Apache development process
>* Ensure all dependencies are compliant with Apache License version 2.0
>* Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many
> minor ones. Storm 0.9 is about to be released. Storm is being used in
> production by over 50 organizations. Storm codebase is currently
> hosted at github.com, which will seed the Apache git repository.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already
> expressed interest in this project, and we intend to invite additional
> developers to participate. We will encourage and monitor community
> participation so that privileges can be extended to those that
> contribute.
>
> === Community ===
>
> The need for a low-latency big-data processing platform in the open
> source is tremendous. Storm is currently being used by at least 50
> organizations worldwide (see
> https://github.com/nathanmarz/storm/wiki/Powered-By), and is the most
> starred Java project on Github. By bringing Storm into Apache, we
> believe that the community will grow even bigger.
>
> === Core Developers ===
>
> Storm was started by Nathan Marz at BackType, and now has developers
> from Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies.
>
> === Alignment ===
>
> In the big-data processing ecosystem, Storm is a very popular
> low-latency platform, while Hadoop is the primary platform for batch
> processing. We believe that it will help the further growth of
> big-data community by having Hadoop and Storm aligned within Apache
> foundation. The alignment is also beneficial to other Apache
> communities (such as Zookeeper, Thrift, Mesos). We could include
> additional sub-projects, Storm-on-YARN and Storm-on-Mesos, in the near
> future.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Storm project being abandoned is minimal. There are at
> least 50

Re: [PROPOSAL] Storm for Apache Incubator

2013-09-04 Thread Sharad Agarwal
+1 (non-binding)


Re: [VOTE] Accept Falcon into the Apache Incubator (was originally named Ivory)

2013-03-21 Thread Sharad Agarwal
y the list of initial committers and is hosted on github.
>
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Falcon following the Apache meritocracy
> model. We have wanted to make the project open source and encourage
> contributors from multiple organizations from the start. We plan to
> provide plenty of support to new developers and to quickly recruit
> those who make solid contributions to committer status.
>
> === Community ===
> We are happy to report that the initial team already represents
> multiple organizations. We hope to extend the user and developer base
> further in the future and build a solid open source community around
> Falcon.
>
> === Core Developers ===
> Falcon is currently being developed by three engineers from InMobi –
> Srikanth Sunderrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees – Sanjay Radia and Venkatesh Seetharam. In addition, Rohini
> Palaniswamy and Thiruvel Thirumoolan, were also involved in the
> initial design discussions. Srikanth, Shwetha and Shaik are the
> original developers. All the engineers have built two generations of
> Data Management on Hadoop, having deep expertise in Hadoop and are
> quite familiar with the Hadoop Ecosystem. Samarth Gupta & Rishu
> Mehrothra, both from InMobi have build the QA automation for Falcon.
>
> === Alignment ===
> The ASF is a natural host for Falcon given that it is already the home
> of Hadoop, Pig, Knox, HCatalog, and other emerging “big data” software
> projects. Falcon has been designed to solve the data management
> challenges and opportunities of the Hadoop ecosystem family of
> products. Falcon fills the gap that Hadoop ecosystem has been lacking
> in the areas of data processing and data lifecycle management.
>
> == Known Risks ==
>
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is
> very little risk of Falcon getting orphaned. Falcon is in use by
> companies we work for so the companies have an interest in its
> continued vitality.
>
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open
> source. Srikanth Sundarrajan has been contributing patches to Apache
> Hadoop and Apache Oozie, Shwetha GS has been contributing patches to
> Apache Oozie.  Seetharam Venkatesh is a committer on Apache Knox.
> Sharad Agarwal, Amareshwari SR (also a Apache Hive PMC member) and
> Sanjay Radia are PMC members on Apache Hadoop.
>
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such
> as InMobi and Hortonworks. We expect to quickly establish a developer
> community that includes contributors from several corporations post
> incubation.
>
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Falcon but few are
> contributing in their spare time. However, once the project has a
> community built around it post incubation, we expect to get committers
> and developers from outside the current core developers.
>
> === Relationships with Other Apache Products ===
> Falcon is going to be used by the users of Hadoop and the Hadoop
> ecosystem in general.
>
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts
> that it will attract contributors and users, our interest is primarily
> to give Falcon a solid home as an open source project following an
> established development model. We have also given reasons in the
> Rationale and Alignment sections.
>
> == Documentation ==http://wiki.apache.org/incubator/FalconProposal
>
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Falcon
>
> == Source and Intellectual Property Submission Plan ==
> The complete Falcon code is under Apache Software License 2.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include
> BSD, MIT licensed dependencies.
>
> == Cryptography ==
> None
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * falcon-dev AT incubator DOT apache DOT org
>  * falcon-commits AT incubator DOT apache DOT org
>  * falcon-user AT incubator apache DOT org
>  * falcon-private AT incubator DOT apache DOT org
>
> === Subversion Directory ===
> Git is the preferred source control system: git://git.apache.org/falcon
>
> === Issue Tracking ===
> JIRA FALCON
>
> == Initial Committers ==
>  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>  * Shwetha GS (shwetha.gs AT inmobi DO

Re: [VOTE] Accept Tajo into the Apache Incubator

2013-02-28 Thread Sharad Agarwal
+1 (non-binding)

On Thu, Feb 28, 2013 at 11:41 PM, Hyunsik Choi  wrote:

> Hi Folks,
>
> I'd like to call a VOTE for acceptance of Tajo into the Apache incubator.
> The vote will close on Mar 7 at 6:00 PM (PST).
>
> [] +1 Accept Tajo into the Apache incubator
> [] +0 Don't care.
> [] -1 Don't accept Tajo into the incubator because...
>
> Full proposal is pasted at the bottom on this email, and the corresponding
> wiki is http://wiki.apache.org/incubator/TajoProposal.
>
> Only VOTEs from Incubator PMC members are binding, but all are welcome to
> express their thoughts.
>
> Thanks,
> Hyunsik
>
> PS: From the initial discussion, the main changes are that I've added 4 new
> committers. Also, I've revised some description of Known Risks because the
> initial committers have been diverse.
>
> 
> Tajo Proposal
>
> = Abstract =
>
> Tajo is a distributed data warehouse system for Hadoop.
>
>
> = Proposal =
>
> Tajo is a relational and distributed data warehouse system for Hadoop. Tajo
> is designed for low-latency and scalable ad-hoc queries, online aggregation
> and ETL on large-data sets by leveraging advanced database techniques. It
> supports SQL standards. Tajo is inspired by Dryad, MapReduce, Dremel,
> Scope, and parallel databases. Tajo uses HDFS as a primary storage layer,
> and it has its own query engine which allows direct control of distributed
> execution and data flow. As a result, Tajo has a variety of query
> evaluation strategies and more optimization opportunities. In addition,
> Tajo will have a native columnar execution and and its optimizer. Tajo will
> be an alternative choice to Hive/Pig on the top of MapReduce.
>
>
> = Background =
>
> Big data analysis has gained much attention in the industrial. Open source
> communities have proposed scalable and distributed solutions for ad-hoc
> queries on big data. However, there is still room for improvement. Markets
> need more faster and efficient solutions. Recently, some alternatives
> (e.g., Cloudera's Impala and Amazon Redshift) have come out.
>
>
> = Rationale =
>
> There are a variety of open source distributed execution engines (e.g.,
> hive, and pig) running on the top of MapReduce. They are limited by MR
> framework. They cannot directly control distributed execution and data
> flow, and they just use MR framework. So, they have limited query
> evaluation strategies and optimization opportunities. It is hard for them
> to be optimized for a certain type of data processing.
>
>
> = Initial Goals =
>
> The initial goal is to write more documents to describe Tajo's internal. It
> will be helpful to recruit more committers and to build a solid community.
> Then, we will make milestones for short/long term plans.
>
>
> = Current Status =
>
> Tajo is in the alpha stage. Users can execute usual SQL queries (e.g.,
> selection, projection, group-by, join, union and sort) except for nested
> queries. Tajo provides various row/column storage formats, such as CSV,
> RowFile (a row-store file we have implemented), RCFile, and Trevni, and it
> also has a rudimentary ETL feature to transform one data format to another
> data format. In addition, Tajo provides hash and range repartitions. By
> using both repartition methods, Tajo processes aggregation, join, and sort
> queries over a number of cluster nodes. To evaluate the performance, we
> have carried out benchmark test using TPC-H 1TB on 32 cluster nodes.
>
>
> == Meritocracy ==
>
> We will discuss the milestone and the future plan in an open forum. We plan
> to encourage an environment that supports a meritocracy. The contributors
> will have different privileges according to their contributions.
>
>
> == Community ==
>
> Big data analysis has gained attention from open source communities,
> industrial and academic areas. Some projects related to Hadoop already have
> very large and active communities. We expect that Tajo also will establish
> an active community. Since Tajo already works for some features and is in
> the alpha stage, it will attract a large community soon.
>
>
> == Core Developers ==
>
> Core developers are a diverse group of developers, many of which are very
> experienced in open source and the Apache Hadoop ecosystem.
>
>  * Eli Reisman 
>
>  * Henry Saputra 
>
>  * Hyunsik Choi 
>
>  * Jae Hwa Jung 
>
>  * Jihoon Son 
>
>  * Jin Ho Kim 
>
>  * Roshan Sumbaly 
>
>  * Sangwook Kim 
>
>  * Yi A Liu 
>
>
> == Alignment ==
>
> Tajo employs Apache Hadoop Yarn as a resource management platform for large
> clusters. It uses HDFS as a primary storage layer. It already supports
> Hadoop-related data formats (RCFile, Trevni) and will support ORC file. In
> addition, we have a plan to integrate Tajo with other products of Hadoop
> ecosystem. Tajo's modules are well organized, and these modules can also be
> used for other projects.
>
>
> = Known Risks =
>
> == Orphaned Products ==
>
> Most of codes have been developed by only two core developers, who are
> Hyunsik Choi and 

Re: [VOTE] Accept Tez into Incubator

2013-02-22 Thread Sharad Agarwal
+1 (non-binding)

sharad