Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers
The ONLY issue I see for Flume to graduate is diversity.  No one will convince 
me that the current makeup constitutes diversity of any kind.  

Perhaps I shouldn't have brought up the mailing list issues as that was only 
meant in the spirit of trying to offer some advice on how more diversity could 
be achieved.  Flume is really the only community I participate in that contains 
Cloudera employees so I do find myself wondering if the way the project is run 
is because that is the way all projects with a large number of Cloudera 
employees are run.  That might make all of those participants comfortable but 
might create a barrier to others.

In any case - I'm not insisting that the way the project is run needs to 
change. I'm simply saying I cannot support graduation with the current makeup 
of the committers and PMC. I don't have a hard and fast ratio - gaining 10 new 
unaffiliated committers who don't do much isn't nearly as good as 2 or 3 who 
are very active.  Ultimately the project needs to figure out how to solve this.


Ralph


On May 23, 2012, at 11:48 PM, Eric Sammer wrote:

> I appreciate your position Ralph and I don't want anyone to feel like they
> can't contribute. As we've talked about before, we've been quick to nurture
> new contributors to committer status successfully in a few cases. It's true
> that some of the more active committers are from Cloudera, but it's not to
> the exclusion of anyone. Others aren't from Cloudera. Those of us that work
> together are also very strict about abiding to the "if it's not on the
> mailing list, it didn't happen" rule (where "mailing list" can mean JIRA or
> other ASF infrastructure as well).
> 
> I'm happy to take your guidance as a mentor, but you also need to
> understand that some of the ways the Flume project has elected to operate
> are just a matter of taste. They were proposed, discussed, voted on (and
> not as a block by Cloudera employees, IIRC - pretty sure I was -0), and put
> in place and do not violate the Apache Way (like RTC vs. CTR). They aren't
> unheard of and they do not work to the exclusion of contributors (RTC, for
> instance, only impacts committers). I think the vote that was started was
> only to gauge community opinion as a first step (although I'm not
> completely well versed in the graduation process, to be honest).
> 
> If there are concrete things we can do to improve diversity, in your
> opinion, I am extremely open to hearing them. We already do many of the
> (excellent) things listed earlier in the thread. JIRA noise withstanding
> (again, it's a matter of taste - I use the email frequently as I find
> trolling through JIRA slow) I'm definitely open to ideas. Of course, if
> Flume simply needs to remain in the incubator until we develop greater
> diversity, that's fine too. If we're not ready, we're just not ready.
> 
> On Wed, May 23, 2012 at 11:18 PM, Ralph Goers 
> wrote:
> 
>> 
>> On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:
>> 
>>> On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
>>>  wrote:
 
 On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
 
> On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
>  wrote:
>> Right after I read Jukka's email that started this thread and I
>> posted my reply and discovered to my shock that they had started a
>> graduation vote.  I am shocked because I have pointed out repeatedly the
>> project's complete lack of diversity.  Virtually all the active PMC members
>> and committers work for the same employer.  I have told them several times
>> that I would actually like to participate in the project but the way the
>> project works is very different then every other project I am involved with
>> at the ASF and the barriers to figure out what is actually going on is very
>> high. Almost nothing is discussed directly on the dev list - it is all done
>> through Jira issues or the Review tool.  While all the Jira issue updates
>> and reviews are sent to the dev list most of that is just noise.  Feel free
>> to review the dev list archives to see what I am talking about.
> 
> I don't follow flume, but I'd propose to soften your objection only
> slightly. I've met other groups of people who like a JIRA centric view
> of the world. I suspect that if they did a bunch of other good things
> called out below, you or others would find the JIRA business
> digestible. Also, on the other hand, I fear that the co-employed
> contributors are collaborating in the hallway, and the lack of the
> context in JIRA or on the list is contributing to the problem.
 
 I have reason to doubt the collaboration in the hallway aspect and I
>> certainly do not doubt everyone's good intent.  I'm not objecting to the
>> collaboration style as an issue preventing graduation. I'm just saying I
>> find it difficult to participate with that style and that simply makes me
>> wonder if that is making it harder to attract new committers.  I fully
>> realize 

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Bertrand Delacretaz
> [X ] +1, bring Crunch into Incubator

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Tommaso Teofili
+1

Tommaso

2012/5/23 Josh Wills 

> I would like to call a vote for accepting "Apache Crunch" for
> incubation in the Apache Incubator. The full proposal is available
> below.  We ask the Incubator PMC to sponsor it, with phunt as
> Champion, and phunt, tomwhite, and acmurthy volunteering to be
> Mentors.
>
> Please cast your vote:
>
> [ ] +1, bring Crunch into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Crunch into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.
>
> http://wiki.apache.org/incubator/CrunchProposal
>
> Proposal text from the wiki:
>
> --
> = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
>
> == Abstract ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop.
>
> == Proposal ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
> high-level API for writing and testing complex !MapReduce jobs that
> require multiple processing stages.  It has a simple, flexible, and
> extensible data model that makes it ideal for processing data that
> does not naturally fit into a relational structure, such as time
> series and serialized object formats like JSON and Avro. It supports
> running pipelines either as a series of !MapReduce jobs on an Apache
> Hadoop cluster or in memory on a single machine for fast testing and
> debugging.
>
> == Background ==
>
> Crunch was initially developed by Cloudera to simplify the process of
> creating sequences of dependent !MapReduce jobs, especially jobs that
> processed non-relational data like time series. Its design was based
> on a paper Google published about a Java library they developed called
> !FlumeJava that was created in order to solve a similar class of
> problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
> 2.0 licensed project in October 2011. During this time Crunch has been
> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
> (February 2012), with an incremental update to version 0.2.1 (March
> 2012) .  These releases are also distributed by Cloudera as source and
> binaries from Cloudera's Maven repository.
>
> == Rationale ==
>
> Most of the interesting analytical and data processing tasks that are
> run on an Apache Hadoop cluster require a series of !MapReduce jobs to
> be executed in sequence. Developers who are creating these pipelines
> today need to manually assign the sequence of tasks to perform in a
> dependent chain of !MapReduce jobs, even though there are a number of
> well-known patterns for fusing dependent computations together into a
> single !MapReduce stage and for performing common types of joins and
> aggregations. This results in !MapReduce pipelines that are more
> difficult to test, maintain, and extend to support new functionality.
>
> Furthermore, the type of data that is being stored and processed using
> Apache Hadoop is evolving. Although Hadoop was originally used for
> storing large volumes of structured text in the form of webpages and
> log files, it is now common for Hadoop to store complex, structured
> data formats such as JSON, Apache Avro, and Apache Thrift. These
> formats allow developers to work with serialized objects in
> programming languages like Java, C++, and Python, and allow for new
> types of analysis to be performed on complex data types. Hadoop has
> also been adopted by the scientific research community, who are using
> Hadoop to process time series data, structured binary files in the
> HDF5 format, and large medical and satellite images.
>
> Crunch addresses these challenges by providing a lightweight and
> extensible Java API for defining the stages of a data processing
> pipeline, which can then be run on an Apache Hadoop cluster as a
> sequence of dependent !MapReduce jobs, or in-memory on a single
> machine to facilitate fast testing and debugging. Crunch relies on a
> small set of primitive abstractions that represent immutable,
> distributed collections of objects. Developers define functions that
> are applied to those objects in order to generate new immutable,
> distributed collections of objects. Crunch also provides a library of
> common !MapReduce patterns for performing efficient joins and
> aggregation operations over these distributed collections that
> developers may integrate into their own pipelines. Crunch also
> provides native support for processing structured binary data formats
> like JSON, Apache Avro, and Apache Thrift, and is designed to be
> extensible to support working with any kind of data format that Java
> supports in its native form.
>
> == Initial Goals ==
>
> Crunch is currently in its first major release with a considerable
> number of enhancement reques

[RESULT] [VOTE] Release Apache Wookie 0.10.0-incubating (General Incubation List)

2012-05-24 Thread Scott Wilson
The 72 hour voting period has passed and the vote is now closed. Thanks to 
everyone who took time to review the release. 

With the three IPMC member votes (3 of them mentors) and 3 PPMC votes the vote 
succeeds

IPMC Member voting record:

* Ate Douma: +1
* Ross Gardler +1
* Matt Franklin +1

* Denotes an IPMC member vote cast on the wookie-dev list.

Thanks,

Scott.

On 21 May 2012, at 16:13, Scott Wilson wrote:

> This is the third incubator release for Apache Wookie, with the artifacts
> being versioned as 0.10.0-incubating.
> 
> We are requesting a lazy consensus vote, as we have already received 3
> binding IPMC +1 votes during the release voting on wookie-dev -
> 
> Vote thread:
> http://markmail.org/message/2p4veen6n22w7hnb
> 
> Result:
> http://markmail.org/message/d2jzbrdgic3od5uj
> 
> Svn source tag:
> https://svn.apache.org/repos/asf/incubator/wookie/tags/0.10.0-incubating/
> 
> Release notes:
> https://svn.apache.org/repos/asf/incubator/wookie/tags/0.10.0-incubating/RELEASE_NOTES
> 
> Release artifacts:
> http://people.apache.org/builds/incubator/wookie/0.10.0-incubating/
> 
> Maven artifacts
> https://repository.apache.org/content/repositories/orgapachewookie-094/
> 
> PGP release keys:
> https://svn.apache.org/repos/asf/incubator/wookie/KEYS
> 
> Lazy consensus, vote open for 72 hours.
> 
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why) 
> 
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Franklin, Matthew B.
+1 (binding)

>-Original Message-
>From: Josh Wills [mailto:jwi...@cloudera.com]
>Sent: Wednesday, May 23, 2012 2:46 PM
>To: general@incubator.apache.org
>Subject: [VOTE] Accept Crunch into the Apache Incubator
>
>I would like to call a vote for accepting "Apache Crunch" for
>incubation in the Apache Incubator. The full proposal is available
>below.  We ask the Incubator PMC to sponsor it, with phunt as
>Champion, and phunt, tomwhite, and acmurthy volunteering to be
>Mentors.
>
>Please cast your vote:
>
>[ ] +1, bring Crunch into Incubator
>[ ] +0, I don't care either way,
>[ ] -1, do not bring Crunch into Incubator, because...
>
>This vote will be open for 72 hours and only votes from the Incubator
>PMC are binding.
>
>http://wiki.apache.org/incubator/CrunchProposal
>
>Proposal text from the wiki:
>---
>---
>= Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
>
>== Abstract ==
>
>Crunch is a Java library for writing, testing, and running pipelines
>of !MapReduce jobs on Apache Hadoop.
>
>== Proposal ==
>
>Crunch is a Java library for writing, testing, and running pipelines
>of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
>high-level API for writing and testing complex !MapReduce jobs that
>require multiple processing stages.  It has a simple, flexible, and
>extensible data model that makes it ideal for processing data that
>does not naturally fit into a relational structure, such as time
>series and serialized object formats like JSON and Avro. It supports
>running pipelines either as a series of !MapReduce jobs on an Apache
>Hadoop cluster or in memory on a single machine for fast testing and
>debugging.
>
>== Background ==
>
>Crunch was initially developed by Cloudera to simplify the process of
>creating sequences of dependent !MapReduce jobs, especially jobs that
>processed non-relational data like time series. Its design was based
>on a paper Google published about a Java library they developed called
>!FlumeJava that was created in order to solve a similar class of
>problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
>2.0 licensed project in October 2011. During this time Crunch has been
>formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
>(February 2012), with an incremental update to version 0.2.1 (March
>2012) .  These releases are also distributed by Cloudera as source and
>binaries from Cloudera's Maven repository.
>
>== Rationale ==
>
>Most of the interesting analytical and data processing tasks that are
>run on an Apache Hadoop cluster require a series of !MapReduce jobs to
>be executed in sequence. Developers who are creating these pipelines
>today need to manually assign the sequence of tasks to perform in a
>dependent chain of !MapReduce jobs, even though there are a number of
>well-known patterns for fusing dependent computations together into a
>single !MapReduce stage and for performing common types of joins and
>aggregations. This results in !MapReduce pipelines that are more
>difficult to test, maintain, and extend to support new functionality.
>
>Furthermore, the type of data that is being stored and processed using
>Apache Hadoop is evolving. Although Hadoop was originally used for
>storing large volumes of structured text in the form of webpages and
>log files, it is now common for Hadoop to store complex, structured
>data formats such as JSON, Apache Avro, and Apache Thrift. These
>formats allow developers to work with serialized objects in
>programming languages like Java, C++, and Python, and allow for new
>types of analysis to be performed on complex data types. Hadoop has
>also been adopted by the scientific research community, who are using
>Hadoop to process time series data, structured binary files in the
>HDF5 format, and large medical and satellite images.
>
>Crunch addresses these challenges by providing a lightweight and
>extensible Java API for defining the stages of a data processing
>pipeline, which can then be run on an Apache Hadoop cluster as a
>sequence of dependent !MapReduce jobs, or in-memory on a single
>machine to facilitate fast testing and debugging. Crunch relies on a
>small set of primitive abstractions that represent immutable,
>distributed collections of objects. Developers define functions that
>are applied to those objects in order to generate new immutable,
>distributed collections of objects. Crunch also provides a library of
>common !MapReduce patterns for performing efficient joins and
>aggregation operations over these distributed collections that
>developers may integrate into their own pipelines. Crunch also
>provides native support for processing structured binary data formats
>like JSON, Apache Avro, and Apache Thrift, and is designed to be
>extensible to support working with any kind of data format that Java
>supports in its native form.
>
>== Initial Goal

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Benson Margulies
+1 (binding) ...

And a friendly reminder to the ppmc via their mentors -- in response
to that email about limiting the initial committers to a tight group.
They will soon be learning that the big challenge of incubation is not
writing a lot of code, its recruiting new faces. They'll want to
switch from 'just us chickens' to putting out the welcome mat as soon
as possible.

On Thu, May 24, 2012 at 2:29 AM, Bertrand Delacretaz
 wrote:
>> [X ] +1, bring Crunch into Incubator
>
> -Bertrand
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[ANNOUNCE] Apache Wink 1.2.0-incubating release

2012-05-24 Thread Luciano Resende
The Apache Wink team is pleased to announce the release of Apache Wink
1.2.0-incubating.

Apache Wink is a simple yet solid framework for building RESTful Web
services. It is comprised of a Server module and a Client module for
developing and consuming RESTful Web services.

The Wink Server module is a complete implementation of the JAX-RS v1.1
specification. On top of this implementation, the Wink Server module
provides a set of additional features that were designed to facilitate
the development of RESTful Web services.

The Wink Client module is a Java based framework that provides
functionality for communicating with RESTful Web services. The
framework is built on top of the JDK HttpURLConnection and adds
essential features that facilitate the development of such client
applications.

For full details about the release and to download the distributions
please go to:

http://incubator.apache.org/wink/downloads.html

Apache Wink welcomes your help. Any contribution, including code,
testing, contributions to the documentation, or bug reporting is
always appreciated. For more information on how to get involved in
Apache Wink visit the website at:

http://incubator.apache.org/wink/

Thank you for your interest in Apache Wink!

The Apache Wink Team.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Mike Percy
[x] +1, bring Crunch into Incubator (non-binding)

Regards,
Mike


On Wednesday, May 23, 2012 at 11:45 AM, Josh Wills wrote:

> I would like to call a vote for accepting "Apache Crunch" for
> incubation in the Apache Incubator. The full proposal is available
> below. We ask the Incubator PMC to sponsor it, with phunt as
> Champion, and phunt, tomwhite, and acmurthy volunteering to be
> Mentors.
>
> Please cast your vote:
>
> [ ] +1, bring Crunch into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Crunch into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.
>
> http://wiki.apache.org/incubator/CrunchProposal
>
> Proposal text from the wiki:
> --

> = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
>
> == Abstract ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop.
>
> == Proposal ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
> high-level API for writing and testing complex !MapReduce jobs that
> require multiple processing stages. It has a simple, flexible, and
> extensible data model that makes it ideal for processing data that
> does not naturally fit into a relational structure, such as time
> series and serialized object formats like JSON and Avro. It supports
> running pipelines either as a series of !MapReduce jobs on an Apache
> Hadoop cluster or in memory on a single machine for fast testing and
> debugging.
>
> == Background ==
>
> Crunch was initially developed by Cloudera to simplify the process of
> creating sequences of dependent !MapReduce jobs, especially jobs that
> processed non-relational data like time series. Its design was based
> on a paper Google published about a Java library they developed called
> !FlumeJava that was created in order to solve a similar class of
> problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
> 2.0 licensed project in October 2011. During this time Crunch has been
> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
> (February 2012), with an incremental update to version 0.2.1 (March
> 2012) . These releases are also distributed by Cloudera as source and
> binaries from Cloudera's Maven repository.
>
> == Rationale ==
>
> Most of the interesting analytical and data processing tasks that are
> run on an Apache Hadoop cluster require a series of !MapReduce jobs to
> be executed in sequence. Developers who are creating these pipelines
> today need to manually assign the sequence of tasks to perform in a
> dependent chain of !MapReduce jobs, even though there are a number of
> well-known patterns for fusing dependent computations together into a
> single !MapReduce stage and for performing common types of joins and
> aggregations. This results in !MapReduce pipelines that are more
> difficult to test, maintain, and extend to support new functionality.
>
> Furthermore, the type of data that is being stored and processed using
> Apache Hadoop is evolving. Although Hadoop was originally used for
> storing large volumes of structured text in the form of webpages and
> log files, it is now common for Hadoop to store complex, structured
> data formats such as JSON, Apache Avro, and Apache Thrift. These
> formats allow developers to work with serialized objects in
> programming languages like Java, C++, and Python, and allow for new
> types of analysis to be performed on complex data types. Hadoop has
> also been adopted by the scientific research community, who are using
> Hadoop to process time series data, structured binary files in the
> HDF5 format, and large medical and satellite images.
>
> Crunch addresses these challenges by providing a lightweight and
> extensible Java API for defining the stages of a data processing
> pipeline, which can then be run on an Apache Hadoop cluster as a
> sequence of dependent !MapReduce jobs, or in-memory on a single
> machine to facilitate fast testing and debugging. Crunch relies on a
> small set of primitive abstractions that represent immutable,
> distributed collections of objects. Developers define functions that
> are applied to those objects in order to generate new immutable,
> distributed collections of objects. Crunch also provides a library of
> common !MapReduce patterns for performing efficient joins and
> aggregation operations over these distributed collections that
> developers may integrate into their own pipelines. Crunch also
> provides native support for processing structured binary data formats
> like JSON, Apache Avro, and Apache Thrift, and is designed to be
> extensible to support working with any kind of data format that Java
> supports in its native form.
>
> == Initial Goals ==
>
> Crunch is c

Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Arun C Murthy
+1 (binding)

On May 23, 2012, at 11:45 AM, Josh Wills wrote:

> I would like to call a vote for accepting "Apache Crunch" for
> incubation in the Apache Incubator. The full proposal is available
> below.  We ask the Incubator PMC to sponsor it, with phunt as
> Champion, and phunt, tomwhite, and acmurthy volunteering to be
> Mentors.
> 
> Please cast your vote:
> 
> [ ] +1, bring Crunch into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Crunch into Incubator, because...
> 
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.
> 
> http://wiki.apache.org/incubator/CrunchProposal
> 
> Proposal text from the wiki:
> --
> = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
> 
> == Abstract ==
> 
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop.
> 
> == Proposal ==
> 
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
> high-level API for writing and testing complex !MapReduce jobs that
> require multiple processing stages.  It has a simple, flexible, and
> extensible data model that makes it ideal for processing data that
> does not naturally fit into a relational structure, such as time
> series and serialized object formats like JSON and Avro. It supports
> running pipelines either as a series of !MapReduce jobs on an Apache
> Hadoop cluster or in memory on a single machine for fast testing and
> debugging.
> 
> == Background ==
> 
> Crunch was initially developed by Cloudera to simplify the process of
> creating sequences of dependent !MapReduce jobs, especially jobs that
> processed non-relational data like time series. Its design was based
> on a paper Google published about a Java library they developed called
> !FlumeJava that was created in order to solve a similar class of
> problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
> 2.0 licensed project in October 2011. During this time Crunch has been
> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
> (February 2012), with an incremental update to version 0.2.1 (March
> 2012) .  These releases are also distributed by Cloudera as source and
> binaries from Cloudera's Maven repository.
> 
> == Rationale ==
> 
> Most of the interesting analytical and data processing tasks that are
> run on an Apache Hadoop cluster require a series of !MapReduce jobs to
> be executed in sequence. Developers who are creating these pipelines
> today need to manually assign the sequence of tasks to perform in a
> dependent chain of !MapReduce jobs, even though there are a number of
> well-known patterns for fusing dependent computations together into a
> single !MapReduce stage and for performing common types of joins and
> aggregations. This results in !MapReduce pipelines that are more
> difficult to test, maintain, and extend to support new functionality.
> 
> Furthermore, the type of data that is being stored and processed using
> Apache Hadoop is evolving. Although Hadoop was originally used for
> storing large volumes of structured text in the form of webpages and
> log files, it is now common for Hadoop to store complex, structured
> data formats such as JSON, Apache Avro, and Apache Thrift. These
> formats allow developers to work with serialized objects in
> programming languages like Java, C++, and Python, and allow for new
> types of analysis to be performed on complex data types. Hadoop has
> also been adopted by the scientific research community, who are using
> Hadoop to process time series data, structured binary files in the
> HDF5 format, and large medical and satellite images.
> 
> Crunch addresses these challenges by providing a lightweight and
> extensible Java API for defining the stages of a data processing
> pipeline, which can then be run on an Apache Hadoop cluster as a
> sequence of dependent !MapReduce jobs, or in-memory on a single
> machine to facilitate fast testing and debugging. Crunch relies on a
> small set of primitive abstractions that represent immutable,
> distributed collections of objects. Developers define functions that
> are applied to those objects in order to generate new immutable,
> distributed collections of objects. Crunch also provides a library of
> common !MapReduce patterns for performing efficient joins and
> aggregation operations over these distributed collections that
> developers may integrate into their own pipelines. Crunch also
> provides native support for processing structured binary data formats
> like JSON, Apache Avro, and Apache Thrift, and is designed to be
> extensible to support working with any kind of data format that Java
> supports in its native form.
> 
> == Initial Goals ==
> 
> Crunch is currently in its first major release with a c

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Eric Sammer
On May 24, 2012, at 12:20 AM, Ralph Goers  wrote:

> The ONLY issue I see for Flume to graduate is diversity.  No one will 
> convince me that the current makeup constitutes diversity of any kind.
>
> Perhaps I shouldn't have brought up the mailing list issues as that was only 
> meant in the spirit of trying to offer some advice on how more diversity 
> could be achieved.  Flume is really the only community I participate in that 
> contains Cloudera employees so I do find myself wondering if the way the 
> project is run is because that is the way all projects with a large number of 
> Cloudera employees are run.  That might make all of those participants 
> comfortable but might create a barrier to others.

There are others where this is the case that are easily referenceable.
There's an obvious (to me) implication that this is the cause of the
problem and that's simply not true. If there are concrete
recommendations of things you feel we can do better I know the flume
community is open to those sightings. There's no practice in place
within flume that isn't in place in some other ASF TLP to my
knowledge.

>
> In any case - I'm not insisting that the way the project is run needs to 
> change. I'm simply saying I cannot support graduation with the current makeup 
> of the committers and PMC. I don't have a hard and fast ratio - gaining 10 
> new unaffiliated committers who don't do much isn't nearly as good as 2 or 3 
> who are very active.  Ultimately the project needs to figure out how to solve 
> this.

That's fine. So let's have a discussion about actionable tasks. I've
mentioned my thoughts on growing diversity in the past, although
admittedly it was within a response to a similar thread on our private
list. I'll start a thread on our dev list with the same thoughts for
the larger community to comment on. I welcome your contribution to
such a discussion!

Thanks.

>
>
> Ralph
>
>
> On May 23, 2012, at 11:48 PM, Eric Sammer wrote:
>
>> I appreciate your position Ralph and I don't want anyone to feel like they
>> can't contribute. As we've talked about before, we've been quick to nurture
>> new contributors to committer status successfully in a few cases. It's true
>> that some of the more active committers are from Cloudera, but it's not to
>> the exclusion of anyone. Others aren't from Cloudera. Those of us that work
>> together are also very strict about abiding to the "if it's not on the
>> mailing list, it didn't happen" rule (where "mailing list" can mean JIRA or
>> other ASF infrastructure as well).
>>
>> I'm happy to take your guidance as a mentor, but you also need to
>> understand that some of the ways the Flume project has elected to operate
>> are just a matter of taste. They were proposed, discussed, voted on (and
>> not as a block by Cloudera employees, IIRC - pretty sure I was -0), and put
>> in place and do not violate the Apache Way (like RTC vs. CTR). They aren't
>> unheard of and they do not work to the exclusion of contributors (RTC, for
>> instance, only impacts committers). I think the vote that was started was
>> only to gauge community opinion as a first step (although I'm not
>> completely well versed in the graduation process, to be honest).
>>
>> If there are concrete things we can do to improve diversity, in your
>> opinion, I am extremely open to hearing them. We already do many of the
>> (excellent) things listed earlier in the thread. JIRA noise withstanding
>> (again, it's a matter of taste - I use the email frequently as I find
>> trolling through JIRA slow) I'm definitely open to ideas. Of course, if
>> Flume simply needs to remain in the incubator until we develop greater
>> diversity, that's fine too. If we're not ready, we're just not ready.
>>
>> On Wed, May 23, 2012 at 11:18 PM, Ralph Goers 
>> wrote:
>>
>>>
>>> On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:
>>>
 On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
  wrote:
>
> On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
>
>> On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
>>  wrote:
>>> Right after I read Jukka's email that started this thread and I
>>> posted my reply and discovered to my shock that they had started a
>>> graduation vote.  I am shocked because I have pointed out repeatedly the
>>> project's complete lack of diversity.  Virtually all the active PMC members
>>> and committers work for the same employer.  I have told them several times
>>> that I would actually like to participate in the project but the way the
>>> project works is very different then every other project I am involved with
>>> at the ASF and the barriers to figure out what is actually going on is very
>>> high. Almost nothing is discussed directly on the dev list - it is all done
>>> through Jira issues or the Review tool.  While all the Jira issue updates
>>> and reviews are sent to the dev list most of that is just noise.  Feel free
>>> to review the dev list archives to see what I am talking a

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Arvind Prabhakar
Hi,

On Thu, May 24, 2012 at 12:19 AM, Ralph Goers wrote:

> The ONLY issue I see for Flume to graduate is diversity.  No one will
> convince me that the current makeup constitutes diversity of any kind.
>
> Perhaps I shouldn't have brought up the mailing list issues as that was
> only meant in the spirit of trying to offer some advice on how more
> diversity could be achieved.  Flume is really the only community I
> participate in that contains Cloudera employees so I do find myself
> wondering if the way the project is run is because that is the way all
> projects with a large number of Cloudera employees are run.  That might
> make all of those participants comfortable but might create a barrier to
> others.
>

Here are the committers who have been active in the past three months:

* Brock Noland (Cloudera)
* Hari Shreedharan  (Cloudera)
* Jarek Jarcec Cecho (AVG Technologies)
* Juhani Connolly   (CyberAgent)
* Mike Percy (Cloudera)
* Mingjie Lai (Trend Micro)
* Prasad Mujumdar (Cloudera)
* Will McQueen (Cloudera)
* Arvind Prabhakar (Cloudera)

There are four companies represented in this list: AVG Technologies,
Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
successfully graduated from Incubator in the past, this meets the diversity
requirements very well.


>
> In any case - I'm not insisting that the way the project is run needs to
> change. I'm simply saying I cannot support graduation with the current
> makeup of the committers and PMC. I don't have a hard and fast ratio -
> gaining 10 new unaffiliated committers who don't do much isn't nearly as
> good as 2 or 3 who are very active.  Ultimately the project needs to figure
> out how to solve this.
>

Stating that some committers "who don't do much isn't nearly as good as 2
or 3 who are very active" is an unfair characterization. This is more
unfair for those who are part of the project but have not been active
lately due to whatever reasons, but have played a foundational role in
getting the project to a point where it is today. I think they are as
important as any other committer who may be very active at the moment.
Merit once earned, never expires [1].

[1] http://www.apache.org/dev/committers.html#committer-set-term

Arvind


>
> Ralph
>
>
> On May 23, 2012, at 11:48 PM, Eric Sammer wrote:
>
> > I appreciate your position Ralph and I don't want anyone to feel like
> they
> > can't contribute. As we've talked about before, we've been quick to
> nurture
> > new contributors to committer status successfully in a few cases. It's
> true
> > that some of the more active committers are from Cloudera, but it's not
> to
> > the exclusion of anyone. Others aren't from Cloudera. Those of us that
> work
> > together are also very strict about abiding to the "if it's not on the
> > mailing list, it didn't happen" rule (where "mailing list" can mean JIRA
> or
> > other ASF infrastructure as well).
> >
> > I'm happy to take your guidance as a mentor, but you also need to
> > understand that some of the ways the Flume project has elected to operate
> > are just a matter of taste. They were proposed, discussed, voted on (and
> > not as a block by Cloudera employees, IIRC - pretty sure I was -0), and
> put
> > in place and do not violate the Apache Way (like RTC vs. CTR). They
> aren't
> > unheard of and they do not work to the exclusion of contributors (RTC,
> for
> > instance, only impacts committers). I think the vote that was started was
> > only to gauge community opinion as a first step (although I'm not
> > completely well versed in the graduation process, to be honest).
> >
> > If there are concrete things we can do to improve diversity, in your
> > opinion, I am extremely open to hearing them. We already do many of the
> > (excellent) things listed earlier in the thread. JIRA noise withstanding
> > (again, it's a matter of taste - I use the email frequently as I find
> > trolling through JIRA slow) I'm definitely open to ideas. Of course, if
> > Flume simply needs to remain in the incubator until we develop greater
> > diversity, that's fine too. If we're not ready, we're just not ready.
> >
> > On Wed, May 23, 2012 at 11:18 PM, Ralph Goers <
> ralph.go...@dslextreme.com>wrote:
> >
> >>
> >> On May 23, 2012, at 10:48 PM, Patrick Hunt wrote:
> >>
> >>> On Wed, May 23, 2012 at 10:36 PM, Ralph Goers
> >>>  wrote:
> 
>  On May 23, 2012, at 10:15 PM, Benson Margulies wrote:
> 
> > On Wed, May 23, 2012 at 10:09 PM, Ralph Goers
> >  wrote:
> >> Right after I read Jukka's email that started this thread and I
> >> posted my reply and discovered to my shock that they had started a
> >> graduation vote.  I am shocked because I have pointed out repeatedly the
> >> project's complete lack of diversity.  Virtually all the active PMC
> members
> >> and committers work for the same employer.  I have told them several
> times
> >> that I would actually like to participate in the project but the way the
> >> project work

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Ralph Goers

On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:

> Hi,
> 
> On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
> wrote:
> 
>> The ONLY issue I see for Flume to graduate is diversity.  No one will
>> convince me that the current makeup constitutes diversity of any kind.
>> 
>> Perhaps I shouldn't have brought up the mailing list issues as that was
>> only meant in the spirit of trying to offer some advice on how more
>> diversity could be achieved.  Flume is really the only community I
>> participate in that contains Cloudera employees so I do find myself
>> wondering if the way the project is run is because that is the way all
>> projects with a large number of Cloudera employees are run.  That might
>> make all of those participants comfortable but might create a barrier to
>> others.
>> 
> 
> Here are the committers who have been active in the past three months:
> 
> * Brock Noland (Cloudera)
> * Hari Shreedharan  (Cloudera)
> * Jarek Jarcec Cecho (AVG Technologies)
> * Juhani Connolly   (CyberAgent)
> * Mike Percy (Cloudera)
> * Mingjie Lai (Trend Micro)
> * Prasad Mujumdar (Cloudera)
> * Will McQueen (Cloudera)
> * Arvind Prabhakar (Cloudera)
> 
> There are four companies represented in this list: AVG Technologies,
> Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
> successfully graduated from Incubator in the past, this meets the diversity
> requirements very well.

I was mistaken and the list above is indeed correct.  For some reason I thought 
a couple of them had become Cloudera employees.  

However, none of those three are currently on the PPMC.  When you look at the 
PPMC list you should also include a few more Cloudera people who do participate 
in release votes and PPMC issues. Most, if not all, of the non-Cloudera PMC 
members don't.



> 
> 
>> 
>> In any case - I'm not insisting that the way the project is run needs to
>> change. I'm simply saying I cannot support graduation with the current
>> makeup of the committers and PMC. I don't have a hard and fast ratio -
>> gaining 10 new unaffiliated committers who don't do much isn't nearly as
>> good as 2 or 3 who are very active.  Ultimately the project needs to figure
>> out how to solve this.
>> 
> 
> Stating that some committers "who don't do much isn't nearly as good as 2
> or 3 who are very active" is an unfair characterization. This is more
> unfair for those who are part of the project but have not been active
> lately due to whatever reasons, but have played a foundational role in
> getting the project to a point where it is today. I think they are as
> important as any other committer who may be very active at the moment.
> Merit once earned, never expires [1].
> 
> [1] http://www.apache.org/dev/committers.html#committer-set-term

I think you misunderstood my point or I didn't state it very well.  Diversity 
isn't achieved simply by having bodies.  IOW I am not suggesting offering 
commit rights to people who haven't earned it just to meet some ratio.  
However, I am not suggesting the project has ever even considered doing that.

Ralph 



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Doug Cutting

+1

Doug

On 05/23/2012 11:45 AM, Josh Wills wrote:

I would like to call a vote for accepting "Apache Crunch" for
incubation in the Apache Incubator. The full proposal is available
below.  We ask the Incubator PMC to sponsor it, with phunt as
Champion, and phunt, tomwhite, and acmurthy volunteering to be
Mentors.

Please cast your vote:

[ ] +1, bring Crunch into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Crunch into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.

http://wiki.apache.org/incubator/CrunchProposal

Proposal text from the wiki:
--
= Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =

== Abstract ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop.

== Proposal ==

Crunch is a Java library for writing, testing, and running pipelines
of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
high-level API for writing and testing complex !MapReduce jobs that
require multiple processing stages.  It has a simple, flexible, and
extensible data model that makes it ideal for processing data that
does not naturally fit into a relational structure, such as time
series and serialized object formats like JSON and Avro. It supports
running pipelines either as a series of !MapReduce jobs on an Apache
Hadoop cluster or in memory on a single machine for fast testing and
debugging.

== Background ==

Crunch was initially developed by Cloudera to simplify the process of
creating sequences of dependent !MapReduce jobs, especially jobs that
processed non-relational data like time series. Its design was based
on a paper Google published about a Java library they developed called
!FlumeJava that was created in order to solve a similar class of
problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
2.0 licensed project in October 2011. During this time Crunch has been
formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
(February 2012), with an incremental update to version 0.2.1 (March
2012) .  These releases are also distributed by Cloudera as source and
binaries from Cloudera's Maven repository.

== Rationale ==

Most of the interesting analytical and data processing tasks that are
run on an Apache Hadoop cluster require a series of !MapReduce jobs to
be executed in sequence. Developers who are creating these pipelines
today need to manually assign the sequence of tasks to perform in a
dependent chain of !MapReduce jobs, even though there are a number of
well-known patterns for fusing dependent computations together into a
single !MapReduce stage and for performing common types of joins and
aggregations. This results in !MapReduce pipelines that are more
difficult to test, maintain, and extend to support new functionality.

Furthermore, the type of data that is being stored and processed using
Apache Hadoop is evolving. Although Hadoop was originally used for
storing large volumes of structured text in the form of webpages and
log files, it is now common for Hadoop to store complex, structured
data formats such as JSON, Apache Avro, and Apache Thrift. These
formats allow developers to work with serialized objects in
programming languages like Java, C++, and Python, and allow for new
types of analysis to be performed on complex data types. Hadoop has
also been adopted by the scientific research community, who are using
Hadoop to process time series data, structured binary files in the
HDF5 format, and large medical and satellite images.

Crunch addresses these challenges by providing a lightweight and
extensible Java API for defining the stages of a data processing
pipeline, which can then be run on an Apache Hadoop cluster as a
sequence of dependent !MapReduce jobs, or in-memory on a single
machine to facilitate fast testing and debugging. Crunch relies on a
small set of primitive abstractions that represent immutable,
distributed collections of objects. Developers define functions that
are applied to those objects in order to generate new immutable,
distributed collections of objects. Crunch also provides a library of
common !MapReduce patterns for performing efficient joins and
aggregation operations over these distributed collections that
developers may integrate into their own pipelines. Crunch also
provides native support for processing structured binary data formats
like JSON, Apache Avro, and Apache Thrift, and is designed to be
extensible to support working with any kind of data format that Java
supports in its native form.

== Initial Goals ==

Crunch is currently in its first major release with a considerable
number of enhancement requests, tasks, and issues recorded towards its
future development. The initial goal of this project will be to
continue to build community in the spirit of the "Apache

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Dave Fisher

On May 24, 2012, at 11:49 AM, Ralph Goers wrote:

> 
> On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:
> 
>> Hi,
>> 
>> On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
>> wrote:
>> 
>>> The ONLY issue I see for Flume to graduate is diversity.  No one will
>>> convince me that the current makeup constitutes diversity of any kind.
>>> 
>>> Perhaps I shouldn't have brought up the mailing list issues as that was
>>> only meant in the spirit of trying to offer some advice on how more
>>> diversity could be achieved.  Flume is really the only community I
>>> participate in that contains Cloudera employees so I do find myself
>>> wondering if the way the project is run is because that is the way all
>>> projects with a large number of Cloudera employees are run.  That might
>>> make all of those participants comfortable but might create a barrier to
>>> others.
>>> 
>> 
>> Here are the committers who have been active in the past three months:
>> 
>> * Brock Noland (Cloudera)
>> * Hari Shreedharan  (Cloudera)
>> * Jarek Jarcec Cecho (AVG Technologies)
>> * Juhani Connolly   (CyberAgent)
>> * Mike Percy (Cloudera)
>> * Mingjie Lai (Trend Micro)
>> * Prasad Mujumdar (Cloudera)
>> * Will McQueen (Cloudera)
>> * Arvind Prabhakar (Cloudera)
>> 
>> There are four companies represented in this list: AVG Technologies,
>> Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
>> successfully graduated from Incubator in the past, this meets the diversity
>> requirements very well.
> 
> I was mistaken and the list above is indeed correct.  For some reason I 
> thought a couple of them had become Cloudera employees.  
> 
> However, none of those three are currently on the PPMC.  When you look at the 
> PPMC list you should also include a few more Cloudera people who do 
> participate in release votes and PPMC issues. Most, if not all, of the 
> non-Cloudera PMC members don't.

I started reading some of the Flume website and I think that when you go to the 
main Wiki page:

https://cwiki.apache.org/confluence/display/FLUME/Index

When you click on the "Flume Cookbook" the resource is at cloudera.org.

http://archive.cloudera.com/cdh/3/flume/Cookbook/

This page lists "flume-...@cloudera.org" and is a file with a revision dated 
May 7, 2012.

You can make you own conclusions, but it looks like podling resources need to 
be migrated to the ASF.

Regards,
Dave

> 
> 
> 
>> 
>> 
>>> 
>>> In any case - I'm not insisting that the way the project is run needs to
>>> change. I'm simply saying I cannot support graduation with the current
>>> makeup of the committers and PMC. I don't have a hard and fast ratio -
>>> gaining 10 new unaffiliated committers who don't do much isn't nearly as
>>> good as 2 or 3 who are very active.  Ultimately the project needs to figure
>>> out how to solve this.
>>> 
>> 
>> Stating that some committers "who don't do much isn't nearly as good as 2
>> or 3 who are very active" is an unfair characterization. This is more
>> unfair for those who are part of the project but have not been active
>> lately due to whatever reasons, but have played a foundational role in
>> getting the project to a point where it is today. I think they are as
>> important as any other committer who may be very active at the moment.
>> Merit once earned, never expires [1].
>> 
>> [1] http://www.apache.org/dev/committers.html#committer-set-term
> 
> I think you misunderstood my point or I didn't state it very well.  Diversity 
> isn't achieved simply by having bodies.  IOW I am not suggesting offering 
> commit rights to people who haven't earned it just to meet some ratio.  
> However, I am not suggesting the project has ever even considered doing that.
> 
> Ralph 
> 
> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Invitation to join Apache Kafka as a committer

2012-05-24 Thread Joel Koshy
Works now.

Thanks,

Joel

On Wed, May 23, 2012 at 8:08 PM, Kevan Miller wrote:

>
> On May 23, 2012, at 10:11 PM, Alan D. Cabrera wrote:
>
> > -kafka-private
> > +kafka-dev
> > +general
> >
> > Ahh, account was only created.  According to root:
> >
> >> Only PMC chairs can grant karma.  If needed, please post to the general@
> /
> >> dev@/private@ list of your project asking for someone with sufficient
> >> karma to grant access to 'jjkoshy'.
> >
> > Sorry about this confusion.  I don't have the necessary karma and am so
> used to other mentors having it that I forgot that the above step needed to
> be done.
> >
> > Can someone in the IPMC perform the needful?   Thanks!
> >
> > cc: incubator general
>
> Done.
>
> --kevan
>
>


Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Tom White
According to Clutch [1] the project has added 8 committers since it
entered incubation. Regarding diversity, committers from over four
organizations are actively involved in Flume development, which is
pretty healthy. There does seem to be a need to have more diversity at
the PPMC level, however, so that's something that could be worked on.

Tom

[1] http://incubator.apache.org/clutch.html

On Thu, May 24, 2012 at 2:06 PM, Dave Fisher  wrote:
>
> On May 24, 2012, at 11:49 AM, Ralph Goers wrote:
>
>>
>> On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:
>>
>>> Hi,
>>>
>>> On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
>>> wrote:
>>>
 The ONLY issue I see for Flume to graduate is diversity.  No one will
 convince me that the current makeup constitutes diversity of any kind.

 Perhaps I shouldn't have brought up the mailing list issues as that was
 only meant in the spirit of trying to offer some advice on how more
 diversity could be achieved.  Flume is really the only community I
 participate in that contains Cloudera employees so I do find myself
 wondering if the way the project is run is because that is the way all
 projects with a large number of Cloudera employees are run.  That might
 make all of those participants comfortable but might create a barrier to
 others.

>>>
>>> Here are the committers who have been active in the past three months:
>>>
>>> * Brock Noland (Cloudera)
>>> * Hari Shreedharan  (Cloudera)
>>> * Jarek Jarcec Cecho (AVG Technologies)
>>> * Juhani Connolly   (CyberAgent)
>>> * Mike Percy (Cloudera)
>>> * Mingjie Lai (Trend Micro)
>>> * Prasad Mujumdar (Cloudera)
>>> * Will McQueen (Cloudera)
>>> * Arvind Prabhakar (Cloudera)
>>>
>>> There are four companies represented in this list: AVG Technologies,
>>> Cloudera, CyberAgent and Trend Micro. Compared to other projects that have
>>> successfully graduated from Incubator in the past, this meets the diversity
>>> requirements very well.
>>
>> I was mistaken and the list above is indeed correct.  For some reason I 
>> thought a couple of them had become Cloudera employees.
>>
>> However, none of those three are currently on the PPMC.  When you look at 
>> the PPMC list you should also include a few more Cloudera people who do 
>> participate in release votes and PPMC issues. Most, if not all, of the 
>> non-Cloudera PMC members don't.
>
> I started reading some of the Flume website and I think that when you go to 
> the main Wiki page:
>
> https://cwiki.apache.org/confluence/display/FLUME/Index
>
> When you click on the "Flume Cookbook" the resource is at cloudera.org.
>
> http://archive.cloudera.com/cdh/3/flume/Cookbook/
>
> This page lists "flume-...@cloudera.org" and is a file with a revision dated 
> May 7, 2012.
>
> You can make you own conclusions, but it looks like podling resources need to 
> be migrated to the ASF.
>
> Regards,
> Dave
>
>>
>>
>>
>>>
>>>

 In any case - I'm not insisting that the way the project is run needs to
 change. I'm simply saying I cannot support graduation with the current
 makeup of the committers and PMC. I don't have a hard and fast ratio -
 gaining 10 new unaffiliated committers who don't do much isn't nearly as
 good as 2 or 3 who are very active.  Ultimately the project needs to figure
 out how to solve this.

>>>
>>> Stating that some committers "who don't do much isn't nearly as good as 2
>>> or 3 who are very active" is an unfair characterization. This is more
>>> unfair for those who are part of the project but have not been active
>>> lately due to whatever reasons, but have played a foundational role in
>>> getting the project to a point where it is today. I think they are as
>>> important as any other committer who may be very active at the moment.
>>> Merit once earned, never expires [1].
>>>
>>> [1] http://www.apache.org/dev/committers.html#committer-set-term
>>
>> I think you misunderstood my point or I didn't state it very well.  
>> Diversity isn't achieved simply by having bodies.  IOW I am not suggesting 
>> offering commit rights to people who haven't earned it just to meet some 
>> ratio.  However, I am not suggesting the project has ever even considered 
>> doing that.
>>
>> Ralph
>>
>>
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Tom White
+1

Tom

On Wed, May 23, 2012 at 1:45 PM, Josh Wills  wrote:
> I would like to call a vote for accepting "Apache Crunch" for
> incubation in the Apache Incubator. The full proposal is available
> below.  We ask the Incubator PMC to sponsor it, with phunt as
> Champion, and phunt, tomwhite, and acmurthy volunteering to be
> Mentors.
>
> Please cast your vote:
>
> [ ] +1, bring Crunch into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Crunch into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.
>
> http://wiki.apache.org/incubator/CrunchProposal
>
> Proposal text from the wiki:
> --
> = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
>
> == Abstract ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop.
>
> == Proposal ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
> high-level API for writing and testing complex !MapReduce jobs that
> require multiple processing stages.  It has a simple, flexible, and
> extensible data model that makes it ideal for processing data that
> does not naturally fit into a relational structure, such as time
> series and serialized object formats like JSON and Avro. It supports
> running pipelines either as a series of !MapReduce jobs on an Apache
> Hadoop cluster or in memory on a single machine for fast testing and
> debugging.
>
> == Background ==
>
> Crunch was initially developed by Cloudera to simplify the process of
> creating sequences of dependent !MapReduce jobs, especially jobs that
> processed non-relational data like time series. Its design was based
> on a paper Google published about a Java library they developed called
> !FlumeJava that was created in order to solve a similar class of
> problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
> 2.0 licensed project in October 2011. During this time Crunch has been
> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
> (February 2012), with an incremental update to version 0.2.1 (March
> 2012) .  These releases are also distributed by Cloudera as source and
> binaries from Cloudera's Maven repository.
>
> == Rationale ==
>
> Most of the interesting analytical and data processing tasks that are
> run on an Apache Hadoop cluster require a series of !MapReduce jobs to
> be executed in sequence. Developers who are creating these pipelines
> today need to manually assign the sequence of tasks to perform in a
> dependent chain of !MapReduce jobs, even though there are a number of
> well-known patterns for fusing dependent computations together into a
> single !MapReduce stage and for performing common types of joins and
> aggregations. This results in !MapReduce pipelines that are more
> difficult to test, maintain, and extend to support new functionality.
>
> Furthermore, the type of data that is being stored and processed using
> Apache Hadoop is evolving. Although Hadoop was originally used for
> storing large volumes of structured text in the form of webpages and
> log files, it is now common for Hadoop to store complex, structured
> data formats such as JSON, Apache Avro, and Apache Thrift. These
> formats allow developers to work with serialized objects in
> programming languages like Java, C++, and Python, and allow for new
> types of analysis to be performed on complex data types. Hadoop has
> also been adopted by the scientific research community, who are using
> Hadoop to process time series data, structured binary files in the
> HDF5 format, and large medical and satellite images.
>
> Crunch addresses these challenges by providing a lightweight and
> extensible Java API for defining the stages of a data processing
> pipeline, which can then be run on an Apache Hadoop cluster as a
> sequence of dependent !MapReduce jobs, or in-memory on a single
> machine to facilitate fast testing and debugging. Crunch relies on a
> small set of primitive abstractions that represent immutable,
> distributed collections of objects. Developers define functions that
> are applied to those objects in order to generate new immutable,
> distributed collections of objects. Crunch also provides a library of
> common !MapReduce patterns for performing efficient joins and
> aggregation operations over these distributed collections that
> developers may integrate into their own pipelines. Crunch also
> provides native support for processing structured binary data formats
> like JSON, Apache Avro, and Apache Thrift, and is designed to be
> extensible to support working with any kind of data format that Java
> supports in its native form.
>
> == Initial Goals ==
>
> Crunch is currently in its first major release with a considerable
> numbe

Re: Flume Graduation (was Re: June reports in two weeks)

2012-05-24 Thread Arun C Murthy
On May 24, 2012, at 10:40 AM, Arvind Prabhakar wrote:

> Hi,
> 
> On Thu, May 24, 2012 at 12:19 AM, Ralph Goers 
> wrote:
> 
>> The ONLY issue I see for Flume to graduate is diversity.  No one will
>> convince me that the current makeup constitutes diversity of any kind.
> 
> Here are the committers who have been active in the past three months:
> 
> * Brock Noland (Cloudera)
> * Hari Shreedharan  (Cloudera)
> * Jarek Jarcec Cecho (AVG Technologies)
> * Juhani Connolly   (CyberAgent)
> * Mike Percy (Cloudera)
> * Mingjie Lai (Trend Micro)
> * Prasad Mujumdar (Cloudera)
> * Will McQueen (Cloudera)
> * Arvind Prabhakar (Cloudera)
> 
> There are four companies represented in this list: AVG Technologies,
> Cloudera, CyberAgent and Trend Micro. 

According to that 66% of active committers are from one organization.

My understanding is that the diversity argument is to prevent one organization 
from causing the project to stall if they lost interest... see #2 in :
http://incubator.apache.org/incubation/Incubation_Policy.html#Minimum+Graduation+Requirements
That, potentially, helps to develop ability to tolerate and resolve conflicts 
(#5) without resorting to corporate structures.

OTOH, graduation might actually help Flume get a more diverse community? Flume 
does seem to meet all other requirements... 

So, the question is: does the project feel that there is no single company 
which is vital to the success of the project? If so, Flume seems ready.

Arun

PS: From my own experience: in the early days of Hadoop we were very concerned 
about not just #companies but also the percentage of representation and this, 
perversely, led to discrimination against folks from the majority contributor 
who were, actually, very qualified! *smile* 
And no, I'm not saying that is the right thing to do! *smile*


Re: [VOTE] Accept Crunch into the Apache Incubator

2012-05-24 Thread Niall Pemberton
+1

Niall

On Wed, May 23, 2012 at 7:45 PM, Josh Wills  wrote:
> I would like to call a vote for accepting "Apache Crunch" for
> incubation in the Apache Incubator. The full proposal is available
> below.  We ask the Incubator PMC to sponsor it, with phunt as
> Champion, and phunt, tomwhite, and acmurthy volunteering to be
> Mentors.
>
> Please cast your vote:
>
> [ ] +1, bring Crunch into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Crunch into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.
>
> http://wiki.apache.org/incubator/CrunchProposal
>
> Proposal text from the wiki:
> --
> = Crunch - Easy, Efficient MapReduce Pipelines in Java and Scala =
>
> == Abstract ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop.
>
> == Proposal ==
>
> Crunch is a Java library for writing, testing, and running pipelines
> of !MapReduce jobs on Apache Hadoop. Its main goal is to provide a
> high-level API for writing and testing complex !MapReduce jobs that
> require multiple processing stages.  It has a simple, flexible, and
> extensible data model that makes it ideal for processing data that
> does not naturally fit into a relational structure, such as time
> series and serialized object formats like JSON and Avro. It supports
> running pipelines either as a series of !MapReduce jobs on an Apache
> Hadoop cluster or in memory on a single machine for fast testing and
> debugging.
>
> == Background ==
>
> Crunch was initially developed by Cloudera to simplify the process of
> creating sequences of dependent !MapReduce jobs, especially jobs that
> processed non-relational data like time series. Its design was based
> on a paper Google published about a Java library they developed called
> !FlumeJava that was created in order to solve a similar class of
> problems. Crunch was open-sourced by Cloudera on !GitHub as an Apache
> 2.0 licensed project in October 2011. During this time Crunch has been
> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
> (February 2012), with an incremental update to version 0.2.1 (March
> 2012) .  These releases are also distributed by Cloudera as source and
> binaries from Cloudera's Maven repository.
>
> == Rationale ==
>
> Most of the interesting analytical and data processing tasks that are
> run on an Apache Hadoop cluster require a series of !MapReduce jobs to
> be executed in sequence. Developers who are creating these pipelines
> today need to manually assign the sequence of tasks to perform in a
> dependent chain of !MapReduce jobs, even though there are a number of
> well-known patterns for fusing dependent computations together into a
> single !MapReduce stage and for performing common types of joins and
> aggregations. This results in !MapReduce pipelines that are more
> difficult to test, maintain, and extend to support new functionality.
>
> Furthermore, the type of data that is being stored and processed using
> Apache Hadoop is evolving. Although Hadoop was originally used for
> storing large volumes of structured text in the form of webpages and
> log files, it is now common for Hadoop to store complex, structured
> data formats such as JSON, Apache Avro, and Apache Thrift. These
> formats allow developers to work with serialized objects in
> programming languages like Java, C++, and Python, and allow for new
> types of analysis to be performed on complex data types. Hadoop has
> also been adopted by the scientific research community, who are using
> Hadoop to process time series data, structured binary files in the
> HDF5 format, and large medical and satellite images.
>
> Crunch addresses these challenges by providing a lightweight and
> extensible Java API for defining the stages of a data processing
> pipeline, which can then be run on an Apache Hadoop cluster as a
> sequence of dependent !MapReduce jobs, or in-memory on a single
> machine to facilitate fast testing and debugging. Crunch relies on a
> small set of primitive abstractions that represent immutable,
> distributed collections of objects. Developers define functions that
> are applied to those objects in order to generate new immutable,
> distributed collections of objects. Crunch also provides a library of
> common !MapReduce patterns for performing efficient joins and
> aggregation operations over these distributed collections that
> developers may integrate into their own pipelines. Crunch also
> provides native support for processing structured binary data formats
> like JSON, Apache Avro, and Apache Thrift, and is designed to be
> extensible to support working with any kind of data format that Java
> supports in its native form.
>
> == Initial Goals ==
>
> Crunch is currently in its first major release with a considerable
> num