[ANNOUNCE] Apache VXQuery 0.3-incubating released

2014-05-19 Thread Vinayak Borkar

The Apache VXQuery (incubating) Team is happy to announce the second
release of Apache VXQuery, 0.3-incubating.

Apache VXQuery will be a standards compliant XML Query processor
implemented in Java. The focus is on the evaluation of queries on large
amounts of XML data. Specifically the goal is to evaluate queries on
large collections of relatively small XML documents. To achieve this
queries are evaluated on a cluster of shared nothing machines.

More information about the project can be found at

  http://incubator.apache.org/vxquery/

The release is available at

  http://www.apache.org/dyn/closer.cgi/incubator/vxquery/

and the sha1 checksum for

  apache-vxquery-0.3-incubating-source-release.zip

is

  ba9b2d5c886584c604652b24a505ea2d76231964

The Apache VXQuery Team would like to hear from you and welcomes
your comments and contributions.

Thanks
The Apache VXQuery Team

--

Apache VXQuery is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further 
review indicates that the infrastructure, communications, and decision 
making process have stabilized in a manner consistent with other 
successful ASF projects. While incubation status is not necessarily a 
reflection of the completeness or stability of the code, it does 
indicate that the project has yet to be fully endorsed by the ASF.


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Julien Le Dem
[X] +1 Accept Parquet into the Incubator
(non binding)
Julien


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Roman Shaposhnik
On Sun, May 18, 2014 at 2:15 PM, Chris Aniszczyk  wrote:
> [ ] +1 Accept Parquet into the Incubator
> [ ] +0 Indifferent to the acceptance of Parquet
> [ ] -1 Do not accept Parquet because ...

+1 (binding)

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Brock Noland
[X ] +1 Accept Parquet into the Incubator

non-binding


On Mon, May 19, 2014 at 11:24 AM, Andrew Purtell wrote:

> +1 (binding)
>
>
> On Sun, May 18, 2014 at 2:15 PM, Chris Aniszczyk  >wrote:
>
> > Based on the results of the discussion thread:
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
> >
> > I would like to call a vote on accepting Parquet into the incubator.
> > https://wiki.apache.org/incubator/ParquetProposal
> >
> > [ ] +1 Accept Parquet into the Incubator
> > [ ] +0 Indifferent to the acceptance of Parquet
> > [ ] -1 Do not accept Parquet because ...
> >
> > The vote will be open until Thursday May 22nd 18:00 UTC.
> >
> > = Parquet Proposal =
> >
> > == Abstract ==
> > Parquet is a columnar storage format for Hadoop.
> >
> > == Proposal ==
> >
> > We created Parquet to make the advantages of compressed, efficient
> columnar
> > data representation available to any project in the Hadoop ecosystem,
> > regardless of the choice of data processing framework, data model, or
> > programming language.
> >
> > == Background ==
> >
> > Parquet is built from the ground up with complex nested data structures
> in
> > mind, and uses the repetition/definition level approach to encoding such
> > data structures, as popularized by Google Dremel (
> > https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We
> believe
> > this approach is superior to simple flattening of nested name spaces.
> >
> > Parquet is built to support very efficient compression and encoding
> > schemes. Parquet allows compression schemes to be specified on a
> per-column
> > level, and is future-proofed to allow adding more encodings as they are
> > invented and implemented. We separate the concepts of encoding and
> > compression, allowing parquet consumers to implement operators that work
> > directly on encoded data without paying decompression and decoding
> penalty
> > when possible.
> >
> > == Rationale ==
> >
> > Parquet is built to be used by anyone. We believe that an efficient,
> > well-implemented columnar storage substrate should be useful to all
> > frameworks without the cost of extensive and difficult to set up
> > dependencies.
> >
> > Furthermore, the rapid growth of Parquet community is empowered by open
> > source. We believe the Apache foundation is a great fit as the long-term
> > home for Parquet, as it provides an established process for
> > community-driven development and decision making by consensus. This is
> > exactly the model we want for future Parquet development.
> >
> > == Initial Goals ==
> >
> >  * Move the existing codebase to Apache
> >  * Integrate with the Apache development process
> >  * Ensure all dependencies are compliant with Apache License version 2.0
> >  * Incremental development and releases per Apache guidelines
> >
> > == Current Status ==
> >
> > Parquet has undergone 2 major releases:
> > https://github.com/Parquet/parquet-format/releases of the core format
> and
> > 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> > supporting set of Java libraries.
> >
> > The Parquet source is currently hosted at GitHub, which will seed the
> > Apache git repository.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. Several companies have already expressed
> > interest in this project, and we intend to invite additional developers
> to
> > participate. We will encourage and monitor community participation so
> that
> > privileges can be extended to those that contribute.
> >
> > === Community ===
> >
> > There is a large need for an advanced columnar storage format for Hadoop.
> > Parquet is being used in production by many organizations (see
> > https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
> >
> >  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
> >  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
> >  * Salesforce:
> https://twitter.com/TwitterOSS/statuses/392734610116726784
> >  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
> >  * Twitter: https://twitter.com/J_/statuses/315844725611581441
> >
> > By bringing Parquet into Apache, we believe that the community will grow
> > even bigger.
> >
> > === Core Developers ===
> >
> > Parquet was initially developed as a collaboration between Twitter,
> > Cloudera and Criteo.
> >
> > See
> >
> >
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
> >
> > === Alignment ===
> >
> > We believe that having Parquet at Apache will help further the growth of
> > the big-data community, as it will encourage cooperation within the
> greater
> > ecosystem of projects spawned by Apache Hadoop. The alignment is also
> > beneficial to other Apache communities (such as Hadoop, Hive, Avro).
> >
> > == Known Risks ==
> >
> > === Orphane

Re: [VOTE] Accept Optiq into the incubator

2014-05-19 Thread Ashutosh Chauhan
With 6 +1s vote passes. Thanks everyone for taking time to vote. Vote
thread is now closed. I will proceed with next steps now.

Thanks,
Ashutosh


On Mon, May 12, 2014 at 12:53 PM, Suresh Srinivas wrote:

> +1 (binding)
>
>
> On Fri, May 9, 2014 at 11:03 AM, Ashutosh Chauhan  >wrote:
>
> > Based on the results of the discussion thread (
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E
> >  ),  I would like to call a vote on accepting Optiq into the incubator.
> >
> > [ ] +1 Accept Optiq into the Incubator
> > [ ] +0 Indifferent to the acceptance of Stratosphere
> > [ ] -1 Do not accept Optiq because ...
> >
> > The vote will be open until Tuesday May 13 18:00 UTC.
> >
> > https://wiki.apache.org/incubator/OptiqProposal
> >
> > = Optiq =
> > == Abstract ==
> >
> > Optiq is a framework that allows efficient translation of queries
> involving
> > heterogeneous and federated data.
> >
> > == Proposal ==
> >
> > Optiq is a highly customizable engine for parsing and planning queries on
> > data in a wide variety of formats. It allows database-like access, and in
> > particular a SQL interface and advanced query optimization, for data not
> > residing in a traditional database.
> >
> > == Background ==
> >
> > Databases were traditionally engineered in a monolithic stack, providing
> a
> > data storage format, data processing algorithms, query parser, query
> > planner, built-in functions, metadata repository and connectivity layer.
> > They innovate in some areas but rarely in all.
> >
> > Modern data management systems are decomposing that stack into separate
> > components, separating data, processing engine, metadata, and query
> > language support. They are highly heterogeneous, with data in multiple
> > locations and formats, caching and redundant data, different workloads,
> and
> > processing occurring in different engines.
> >
> > Query planning (sometimes called query optimization) has always been a
> key
> > function of a DBMS, because it allows the implementors to introduce new
> > query-processing algorithms, and allows data administrators to
> re-organize
> > the data without affecting applications built on that data. In a
> > componentized system, the query planner integrates the components (data
> > formats, engines, algorithms) without introducing unncessary coupling or
> > performance tradeoffs.
> >
> > But building a query planner is hard; many systems muddle along without a
> > planner, and indeed a SQL interface, until the demand from their
> customers
> > is overwhelming.
> >
> > There is an opportunity to make this process more efficient by creating a
> > re-usable framework.
> >
> > == Rationale ==
> >
> > Optiq allows database-like access, and in particular a SQL interface and
> > advanced query optimization, for data not residing in a traditional
> > database. It is complementary to many current Hadoop and NoSQL systems,
> > which have innovative and performant storage and runtime systems but
> lack a
> > SQL interface and intelligent query translation.
> >
> > Optiq is already in use by several projects, including Apache Drill,
> Apache
> > Hive and Cascading Lingual, and commercial products.
> >
> > Optiq's architecture consists of:
> >
> > An extensible relational algebra.
> >  * SPIs (service-provider interfaces) for metadata (schemas and tables),
> > planner rules, statistics, cost-estimates, user-defined functions.
> >  * Built-in sets of rules for logical transformations and common
> > data-sources.
> >  * Two query planning engines driven by rules, statistics, etc. One
> engine
> > is cost-based, the other rule-based.
> >  * Optional SQL parser, validator and translator to relational algebra.
> >  * Optional JDBC driver.
> >
> > == Initial Goals ==
> >
> > The initial goals are be to move the existing codebase to Apache and
> > integrate with the Apache development process. Once this is accomplished,
> > we plan for incremental development and releases that follow the Apache
> > guidelines.
> >
> > As we move the code into the org.apache namespace, we will restructure
> > components as necessary to allow clients to use just the components of
> > Optiq that they need.
> >
> > A version 1.0 release, including pre-built binaries, will foster wider
> > adoption.
> >
> > == Current Status ==
> >
> > Optiq has had over a dozen minor releases over the last 18 months. Its
> core
> > SQL parser and validator, and its planning engine and core rules, are
> > mature and robust and are the basis for several production systems; but
> > other components and SPIs are still undergoing rapid evolution.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. We encourage the companies and projects
> > using Optiq to discuss their requirements in an open forum and to
> > participate in development. We will en

[VOTE] Release Apache Falcon version 0.5-incubating

2014-05-19 Thread Seetharam Venkatesh
Hello folks,

This is a call for a vote on Apache Falcon 0.5 incubating release.

A vote was held on developer mailing list and it passed with 8 +1's.

Vote thread: http://s.apache.org/xtf
Results thread: Apache mail is slow and do not see it in archives after 2
days.

The source tarball (*.tar.gz), signature (*.asc), checksum (*.md5, *.sha):
http://people.apache.org/~venkatesh/falcon-0.5-incubating-rc2/source/

The SHA1 checksum of the archive is cf076dba0c3eea436916dbe10846dfeac471c183

The release has been signed through key(42C7A5EA):
http://pgp.mit.edu:11371/pks/lookup?op=vindex&search=0x1B16738C42C7A5EA

The tag to be voted upon:
https://git-wip-us.apache.org/repos/asf?p=incubator-falcon
.git;a=tag;h=refs/tags/release-0.5-rc2

The list of fixed issues:
https://git-wip-us.apache.org/repos/asf?p=incubator-falcon
.git;a=blob;f=CHANGES.txt;h=34ec2893b666c22f2a5977ca08b966b751e37e21;hb=refs/tags/release-0.5-rc2

Keys to verify the signature of the release artifact are available at:
http://www.apache.org/dist/incubator/falcon/KEYS
PGP release keys:
http://pgp.mit.edu/pks/lookup?op=vindex&search=0x1B16738C42C7A5EA

Note that this is a source only release and we are voting on the source.
Checksums:

SHA1 (release-0.5-incubating-rc2 / SHA:
cf076dba0c3eea436916dbe10846dfeac471c183)
MD5 (falcon-0.5-incubating-sources.tar.gz) =
6f6b57d1185e96e67ff628f4cb021455

Vote will be open for 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

-- 
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry


Re: Setting up git for a new podling

2014-05-19 Thread Jake Farrell
Hi Alan
Please create an infra ticket with the component git and the initial source
repo you wish to import or if it should be blank along with which list the
commit emails should go to. if you have any questions please let me know

-Jake


On Mon, May 19, 2014 at 1:59 PM, Alan Gates  wrote:

> All of the instruction I could find on podling setup talked about setting
> up SVN.  Are there instructions anywhere on how to setup git if the
> repository has requested that instead?
>
> Alan.
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Setting up git for a new podling

2014-05-19 Thread Alan Gates
All of the instruction I could find on podling setup talked about setting up 
SVN.  Are there instructions anywhere on how to setup git if the repository has 
requested that instead?  

Alan.
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Andrew Purtell
+1 (binding)


On Sun, May 18, 2014 at 2:15 PM, Chris Aniszczyk wrote:

> Based on the results of the discussion thread:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
>
> I would like to call a vote on accepting Parquet into the incubator.
> https://wiki.apache.org/incubator/ParquetProposal
>
> [ ] +1 Accept Parquet into the Incubator
> [ ] +0 Indifferent to the acceptance of Parquet
> [ ] -1 Do not accept Parquet because ...
>
> The vote will be open until Thursday May 22nd 18:00 UTC.
>
> = Parquet Proposal =
>
> == Abstract ==
> Parquet is a columnar storage format for Hadoop.
>
> == Proposal ==
>
> We created Parquet to make the advantages of compressed, efficient columnar
> data representation available to any project in the Hadoop ecosystem,
> regardless of the choice of data processing framework, data model, or
> programming language.
>
> == Background ==
>
> Parquet is built from the ground up with complex nested data structures in
> mind, and uses the repetition/definition level approach to encoding such
> data structures, as popularized by Google Dremel (
> https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
> this approach is superior to simple flattening of nested name spaces.
>
> Parquet is built to support very efficient compression and encoding
> schemes. Parquet allows compression schemes to be specified on a per-column
> level, and is future-proofed to allow adding more encodings as they are
> invented and implemented. We separate the concepts of encoding and
> compression, allowing parquet consumers to implement operators that work
> directly on encoded data without paying decompression and decoding penalty
> when possible.
>
> == Rationale ==
>
> Parquet is built to be used by anyone. We believe that an efficient,
> well-implemented columnar storage substrate should be useful to all
> frameworks without the cost of extensive and difficult to set up
> dependencies.
>
> Furthermore, the rapid growth of Parquet community is empowered by open
> source. We believe the Apache foundation is a great fit as the long-term
> home for Parquet, as it provides an established process for
> community-driven development and decision making by consensus. This is
> exactly the model we want for future Parquet development.
>
> == Initial Goals ==
>
>  * Move the existing codebase to Apache
>  * Integrate with the Apache development process
>  * Ensure all dependencies are compliant with Apache License version 2.0
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> Parquet has undergone 2 major releases:
> https://github.com/Parquet/parquet-format/releases of the core format and
> 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> supporting set of Java libraries.
>
> The Parquet source is currently hosted at GitHub, which will seed the
> Apache git repository.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute.
>
> === Community ===
>
> There is a large need for an advanced columnar storage format for Hadoop.
> Parquet is being used in production by many organizations (see
> https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
>
>  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
>  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
>  * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
>  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
>  * Twitter: https://twitter.com/J_/statuses/315844725611581441
>
> By bringing Parquet into Apache, we believe that the community will grow
> even bigger.
>
> === Core Developers ===
>
> Parquet was initially developed as a collaboration between Twitter,
> Cloudera and Criteo.
>
> See
>
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
>
> === Alignment ===
>
> We believe that having Parquet at Apache will help further the growth of
> the big-data community, as it will encourage cooperation within the greater
> ecosystem of projects spawned by Apache Hadoop. The alignment is also
> beneficial to other Apache communities (such as Hadoop, Hive, Avro).
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Parquet project being abandoned is minimal. There are many
> organizations using Parquet in production, including Twitter, Cloudera,
> Stripe, and Salesforce (
> http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).
>
> === Inexperience with Open Source ===
>
> Parquet has existed as a healthy open source for one year. Duri

Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Andrei Savu
+1 (binding)

-- Andrei Savu (from mobile)
On May 18, 2014 3:15 PM, "Chris Aniszczyk"  wrote:

> Based on the results of the discussion thread:
>
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
>
> I would like to call a vote on accepting Parquet into the incubator.
> https://wiki.apache.org/incubator/ParquetProposal
>
> [ ] +1 Accept Parquet into the Incubator
> [ ] +0 Indifferent to the acceptance of Parquet
> [ ] -1 Do not accept Parquet because ...
>
> The vote will be open until Thursday May 22nd 18:00 UTC.
>
> = Parquet Proposal =
>
> == Abstract ==
> Parquet is a columnar storage format for Hadoop.
>
> == Proposal ==
>
> We created Parquet to make the advantages of compressed, efficient columnar
> data representation available to any project in the Hadoop ecosystem,
> regardless of the choice of data processing framework, data model, or
> programming language.
>
> == Background ==
>
> Parquet is built from the ground up with complex nested data structures in
> mind, and uses the repetition/definition level approach to encoding such
> data structures, as popularized by Google Dremel (
> https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
> this approach is superior to simple flattening of nested name spaces.
>
> Parquet is built to support very efficient compression and encoding
> schemes. Parquet allows compression schemes to be specified on a per-column
> level, and is future-proofed to allow adding more encodings as they are
> invented and implemented. We separate the concepts of encoding and
> compression, allowing parquet consumers to implement operators that work
> directly on encoded data without paying decompression and decoding penalty
> when possible.
>
> == Rationale ==
>
> Parquet is built to be used by anyone. We believe that an efficient,
> well-implemented columnar storage substrate should be useful to all
> frameworks without the cost of extensive and difficult to set up
> dependencies.
>
> Furthermore, the rapid growth of Parquet community is empowered by open
> source. We believe the Apache foundation is a great fit as the long-term
> home for Parquet, as it provides an established process for
> community-driven development and decision making by consensus. This is
> exactly the model we want for future Parquet development.
>
> == Initial Goals ==
>
>  * Move the existing codebase to Apache
>  * Integrate with the Apache development process
>  * Ensure all dependencies are compliant with Apache License version 2.0
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> Parquet has undergone 2 major releases:
> https://github.com/Parquet/parquet-format/releases of the core format and
> 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> supporting set of Java libraries.
>
> The Parquet source is currently hosted at GitHub, which will seed the
> Apache git repository.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute.
>
> === Community ===
>
> There is a large need for an advanced columnar storage format for Hadoop.
> Parquet is being used in production by many organizations (see
> https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
>
>  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
>  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
>  * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
>  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
>  * Twitter: https://twitter.com/J_/statuses/315844725611581441
>
> By bringing Parquet into Apache, we believe that the community will grow
> even bigger.
>
> === Core Developers ===
>
> Parquet was initially developed as a collaboration between Twitter,
> Cloudera and Criteo.
>
> See
>
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
>
> === Alignment ===
>
> We believe that having Parquet at Apache will help further the growth of
> the big-data community, as it will encourage cooperation within the greater
> ecosystem of projects spawned by Apache Hadoop. The alignment is also
> beneficial to other Apache communities (such as Hadoop, Hive, Avro).
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Parquet project being abandoned is minimal. There are many
> organizations using Parquet in production, including Twitter, Cloudera,
> Stripe, and Salesforce (
> http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).
>
> === Inexperience with Open Source ===
>
> Parquet has existed as a healthy open so

Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Mark Struberg
+1 (binding)



LieGrue,
strub





> On Monday, 19 May 2014, 1:59, Chris Aniszczyk  wrote:
> > Based on the results of the discussion thread:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
> 
> I would like to call a vote on accepting Parquet into the incubator.
> https://wiki.apache.org/incubator/ParquetProposal
> 
> [ ] +1 Accept Parquet into the Incubator
> [ ] +0 Indifferent to the acceptance of Parquet
> [ ] -1 Do not accept Parquet because ...
> 
> The vote will be open until Thursday May 22nd 18:00 UTC.
> 
> = Parquet Proposal =
> 
> == Abstract ==
> Parquet is a columnar storage format for Hadoop.
> 
> == Proposal ==
> 
> We created Parquet to make the advantages of compressed, efficient columnar
> data representation available to any project in the Hadoop ecosystem,
> regardless of the choice of data processing framework, data model, or
> programming language.
> 
> == Background ==
> 
> Parquet is built from the ground up with complex nested data structures in
> mind, and uses the repetition/definition level approach to encoding such
> data structures, as popularized by Google Dremel (
> https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
> this approach is superior to simple flattening of nested name spaces.
> 
> Parquet is built to support very efficient compression and encoding
> schemes. Parquet allows compression schemes to be specified on a per-column
> level, and is future-proofed to allow adding more encodings as they are
> invented and implemented. We separate the concepts of encoding and
> compression, allowing parquet consumers to implement operators that work
> directly on encoded data without paying decompression and decoding penalty
> when possible.
> 
> == Rationale ==
> 
> Parquet is built to be used by anyone. We believe that an efficient,
> well-implemented columnar storage substrate should be useful to all
> frameworks without the cost of extensive and difficult to set up
> dependencies.
> 
> Furthermore, the rapid growth of Parquet community is empowered by open
> source. We believe the Apache foundation is a great fit as the long-term
> home for Parquet, as it provides an established process for
> community-driven development and decision making by consensus. This is
> exactly the model we want for future Parquet development.
> 
> == Initial Goals ==
> 
> * Move the existing codebase to Apache
> * Integrate with the Apache development process
> * Ensure all dependencies are compliant with Apache License version 2.0
> * Incremental development and releases per Apache guidelines
> 
> == Current Status ==
> 
> Parquet has undergone 2 major releases:
> https://github.com/Parquet/parquet-format/releases of the core format and
> 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> supporting set of Java libraries.
> 
> The Parquet source is currently hosted at GitHub, which will seed the
> Apache git repository.
> 
> === Meritocracy ===
> 
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute.
> 
> === Community ===
> 
> There is a large need for an advanced columnar storage format for Hadoop.
> Parquet is being used in production by many organizations (see
> https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
> 
> * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
> * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
> * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
> * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
> * Twitter: https://twitter.com/J_/statuses/315844725611581441
> 
> By bringing Parquet into Apache, we believe that the community will grow
> even bigger.
> 
> === Core Developers ===
> 
> Parquet was initially developed as a collaboration between Twitter,
> Cloudera and Criteo.
> 
> See
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
> 
> === Alignment ===
> 
> We believe that having Parquet at Apache will help further the growth of
> the big-data community, as it will encourage cooperation within the greater
> ecosystem of projects spawned by Apache Hadoop. The alignment is also
> beneficial to other Apache communities (such as Hadoop, Hive, Avro).
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The risk of the Parquet project being abandoned is minimal. There are many
> organizations using Parquet in production, including Twitter, Cloudera,
> Stripe, and Salesforce (
> http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).
> 
> === Inexperience with Open Source ===
> 
> Parquet has existed 

Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Tom White
+1

Tom

On Mon, May 19, 2014 at 9:15 AM, Chris Aniszczyk  wrote:
> Based on the results of the discussion thread:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E
>
> I would like to call a vote on accepting Parquet into the incubator.
> https://wiki.apache.org/incubator/ParquetProposal
>
> [ ] +1 Accept Parquet into the Incubator
> [ ] +0 Indifferent to the acceptance of Parquet
> [ ] -1 Do not accept Parquet because ...
>
> The vote will be open until Thursday May 22nd 18:00 UTC.
>
> = Parquet Proposal =
>
> == Abstract ==
> Parquet is a columnar storage format for Hadoop.
>
> == Proposal ==
>
> We created Parquet to make the advantages of compressed, efficient columnar
> data representation available to any project in the Hadoop ecosystem,
> regardless of the choice of data processing framework, data model, or
> programming language.
>
> == Background ==
>
> Parquet is built from the ground up with complex nested data structures in
> mind, and uses the repetition/definition level approach to encoding such
> data structures, as popularized by Google Dremel (
> https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
> this approach is superior to simple flattening of nested name spaces.
>
> Parquet is built to support very efficient compression and encoding
> schemes. Parquet allows compression schemes to be specified on a per-column
> level, and is future-proofed to allow adding more encodings as they are
> invented and implemented. We separate the concepts of encoding and
> compression, allowing parquet consumers to implement operators that work
> directly on encoded data without paying decompression and decoding penalty
> when possible.
>
> == Rationale ==
>
> Parquet is built to be used by anyone. We believe that an efficient,
> well-implemented columnar storage substrate should be useful to all
> frameworks without the cost of extensive and difficult to set up
> dependencies.
>
> Furthermore, the rapid growth of Parquet community is empowered by open
> source. We believe the Apache foundation is a great fit as the long-term
> home for Parquet, as it provides an established process for
> community-driven development and decision making by consensus. This is
> exactly the model we want for future Parquet development.
>
> == Initial Goals ==
>
>  * Move the existing codebase to Apache
>  * Integrate with the Apache development process
>  * Ensure all dependencies are compliant with Apache License version 2.0
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
> Parquet has undergone 2 major releases:
> https://github.com/Parquet/parquet-format/releases of the core format and
> 22 releases: https://github.com/Parquet/parquet-mr/releases of the
> supporting set of Java libraries.
>
> The Parquet source is currently hosted at GitHub, which will seed the
> Apache git repository.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have already expressed
> interest in this project, and we intend to invite additional developers to
> participate. We will encourage and monitor community participation so that
> privileges can be extended to those that contribute.
>
> === Community ===
>
> There is a large need for an advanced columnar storage format for Hadoop.
> Parquet is being used in production by many organizations (see
> https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
>
>  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
>  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
>  * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
>  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
>  * Twitter: https://twitter.com/J_/statuses/315844725611581441
>
> By bringing Parquet into Apache, we believe that the community will grow
> even bigger.
>
> === Core Developers ===
>
> Parquet was initially developed as a collaboration between Twitter,
> Cloudera and Criteo.
>
> See
> https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
>
> === Alignment ===
>
> We believe that having Parquet at Apache will help further the growth of
> the big-data community, as it will encourage cooperation within the greater
> ecosystem of projects spawned by Apache Hadoop. The alignment is also
> beneficial to other Apache communities (such as Hadoop, Hive, Avro).
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Parquet project being abandoned is minimal. There are many
> organizations using Parquet in production, including Twitter, Cloudera,
> Stripe, and Salesforce (
> http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).
>
> === Inexperience with Open Source ===
>
> Parquet has existed as a healthy open source for one year. During that
> 

Re: [VOTE] Accept Parquet into the incubator

2014-05-19 Thread Bertrand Delacretaz
On Sun, May 18, 2014 at 11:15 PM, Chris Aniszczyk  wrote:
> ...I would like to call a vote on accepting Parquet into the incubator.
> https://wiki.apache.org/incubator/ParquetProposal..

+1

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org