Re: [VOTE] Apache Syncope 1.0.0-incubating

2012-08-07 Thread Mark Struberg
+1 (binding)


apart from the jQuery question it looks fine.
You might btw be carefull with the samples. If we publish a WAR which contains 
other libraries, then this might be interpreted as 'distributing' them. 


In OpenWebBeans, DeltaSpike and MyFaces we don't deploy them any longer in 
binary form. Just the sources. Users can easily build it themselfs. And most of 
the times they are only interested in the sample source anyway.

LieGrue,
strub



- Original Message -
> From: Francesco Chicchiriccò 
> To: general@incubator.apache.org
> Cc: 
> Sent: Monday, August 6, 2012 5:50 PM
> Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating
> 
> On 06/08/2012 16:36, Alexei Fedotov wrote:
>>  Hello Francesco,
>> 
>>  Here are few things I have found via manual inspection:
>> 
>>  1. Jquery bundle contains several following strings: "Dual licensed
>>  under the MIT or GPL Version 2 licenses."
>>  *) source release LICENSE file does not contain MIT license;
>>  *) and the file itself does not look like APL licensed;
>>  *) and it is a part of the source release.
>> 
>>  Something should be fixed here, i.e. the files replaced with wget in
>>  the build script.
>> 
>>  2. ./legal_ext/LICENSE does not have a license for jquery. Does war
>>  contain jquery?
> 
> Hi Alexei,
> I've taken a look at other ASF projects including JQuery (or similar
> dual-licensed JS frameworks) and I've opened
> https://issues.apache.org/jira/browse/SYNCOPE-181
> We'll fix this ASAP.
> 
>>  Don't think these issues are stoppers.
> 
> Cool :-)
> What's your vote on the release, then?
> 
> Thanks for your review.
> Regards.
> 
>>  On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg  
> wrote:
>>>  Hi Francesco, I can check in the evening.
>>> 
>>>  LieGrue,
>>>  strub
>>> 
>>> 
>>> 
>>>  - Original Message -
  From: Francesco Chicchiriccò 
  To: general@incubator.apache.org
  Cc:
  Sent: Monday, August 6, 2012 2:49 PM
  Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating
 
  Hi IPMC members,
  we are missing a single vote on this release: anyone interested to 
> check?
 
  TIA.
  Regards.
 
  On 03/08/2012 09:58, Francesco Chicchiriccò wrote:
>   I've created a 1.0.0-incubating release, with the 
> following artifacts
  up
>   for a vote:
> 
>   SVN source tag (r1367421):
> 
 
> https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/
>   List of changes:
> 
 
> https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES
>   Maven staging repo:
>   
> https://repository.apache.org/content/repositories/orgapachesyncope-100/
> 
>   Source release (checksums and signatures are available at the 
> same
>   location):
> 
 
> https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip
>   Staging site:
>   http://incubator.apache.org/syncope/1.0.0-incubating/
> 
>   PGP release keys (signed using 273DF287):
>   http://www.apache.org/dist/incubator/syncope/KEYS
> 
> 
>   This has been voted through on the 
> syncope-...@incubator.apache.org
>   mailing list [1],
>   and now requires a vote on general@incubator.apache.org
> 
>   Votes already cast (on syncope-dev):
> 
>   +1 (binding)
>   * Francesco Chicchiriccò
>   * Massimiliano Perrone
>   * Marco Di Sabatino Di Diodoro
>   * Emmanuel Lécharny (IPMC member)
>   * Simone Tripodi
>   * Colm O hEigeartaigh (IPMC member)
> 
>   +1 (non binding)
>    * Denis Signoretto
> 
> 
>   Vote will be open for 72 hours.
> 
>   [ ] +1  approve
>   [ ] +0  no opinion
>   [ ] -1  disapprove (and reason why)
> 
>   Best regards.
> 
>   [1]
> 
 
> http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html
> 
> -- 
> Francesco Chicchiriccò
> 
> ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member
> http://people.apache.org/~ilgrosso/
> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Wiki write access

2012-08-07 Thread Jukka Zitting
Hi,

On Mon, Aug 6, 2012 at 10:29 PM, Tomer Shiran  wrote:
> I would like to create a Wiki page. Can you please grant me write access?
> (alias tshiran)

Added to ContributorsGroup.

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Hyrum K Wright
On Mon, Aug 6, 2012 at 9:37 PM, Greg Stein  wrote:
> On Aug 6, 2012 7:07 PM, "Gary Martin"  wrote:
>>...
>> The vote will be open for at least 72 hours and therefore ends after 11pm
> UTC on Thursday 9th August.
>>
>> [ ] +1 Release this package as Apache Bloodhound 0.1.0
>> [ ] +0 Don't care
>> [ ] -1 Do not release this package (please explain)
>
> Repeating my prior IPMC binding vote:
>
> +1 to release

Same.  +1 (binding)

-Hyrum

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread ant elder
On Tue, Aug 7, 2012 at 12:07 AM, Gary Martin  wrote:
> Hi,
>
> I would like to request the beginning of the vote for the first release
> Apache Bloodhound in the incubator following the successful vote by the
> Bloodhound PPMC. Two of the four +1 PPMC votes were from the IPMC members
> Greg Stein and Hyrum Wright.
>
> The result of the vote is summarised here:
>
>http://markmail.org/message/i3g5t2m7gajuoyv6
>
> The artefacts for the release including the source distribution and KEYS can
> be found here:
>
>https://dist.apache.org/repos/dist/dev/incubator/bloodhound/
>
> The release itself is created from:
>
>https://svn.apache.org/repos/asf/incubator/bloodhound/branches/0.1
>(r1362530)
>
> Issues identified by Greg and Hyrum to be fixed for the next release are
> listed here:
>
>https://issues.apache.org/bloodhound/ticket/153
>
>
> The vote will be open for at least 72 hours and therefore ends after 11pm
> UTC on Thursday 9th August.
>
> [ ] +1 Release this package as Apache Bloodhound 0.1.0
> [ ] +0 Don't care
> [ ] -1 Do not release this package (please explain)
>
> Cheers,
> Gary

This looks similar to the Syncope release vote thats also happening
right now in that the source distribution includes things like JQuery
but doesn't mention that in the LICENSE file. I'm a bit surprised
people are continuing to vote +1 on the Syncope release knowing that
so am I getting this wrong and the JQuery license doesn't need to be
included here for some reason?

   ...ant

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



{RESULT] [VOTE] S4 0.5.0 Release Candidate 1

2012-08-07 Thread Matthieu Morel

Hi,

The vote for this S4 release passed with the following results at the 
vote deadline:


+1: 7 (5 binding)
-1: 0

Details:

+1 IPMC:
acmurthy, phunt

+1 PPMC
kishoreg*, leoneu*, fpj

+1 wider community
Daniel Gomez, Karthik Kambatla


Thanks to all the participants to the voting process!

I'll now publish the artifacts, and after the sync delay, update the 
websites and send announcements.


Matthieu


(* voted on s4-dev list)


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Branko Čibej
On 07.08.2012 13:14, ant elder wrote:
> This looks similar to the Syncope release vote thats also happening
> right now in that the source distribution includes things like JQuery
> but doesn't mention that in the LICENSE file. I'm a bit surprised
> people are continuing to vote +1 on the Syncope release knowing that
> so am I getting this wrong and the JQuery license doesn't need to be
> included here for some reason?

The NOTICE file explicitly notes external dependencies and their
(standard) licenses. Combined with the ticket that mentions adding
licenses of said dependencies to LICENSE, IMO, this is good enough for a
release candidate.

-- Brane


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Gary Martin

On 08/07/2012 12:39 PM, Branko Čibej wrote:

On 07.08.2012 13:14, ant elder wrote:

This looks similar to the Syncope release vote thats also happening
right now in that the source distribution includes things like JQuery
but doesn't mention that in the LICENSE file. I'm a bit surprised
people are continuing to vote +1 on the Syncope release knowing that
so am I getting this wrong and the JQuery license doesn't need to be
included here for some reason?

The NOTICE file explicitly notes external dependencies and their
(standard) licenses. Combined with the ticket that mentions adding
licenses of said dependencies to LICENSE, IMO, this is good enough for a
release candidate.

-- Brane



I think it is also worth noting that Greg Stein has already mentioned 
this - see the first item in 
https://issues.apache.org/bloodhound/ticket/153 (which also contains the 
link to Greg's email) - and so this will be attended to in the next release.


Cheers,
Gary

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[RESULT] [VOTE] Apache Syncope 1.0.0-incubating

2012-08-07 Thread Francesco Chicchiriccò

Hi all,
even though we would have reached the required number of +1 from IPMC 
members after the required 72 hours, I still have the feeling that we 
are missing some consensus here.


I will now revert the current release candidate and start again from 
scratch with another attempt.


Regards.

On 07/08/2012 11:21, Mark Struberg wrote:

+1 (binding)


apart from the jQuery question it looks fine.
You might btw be carefull with the samples. If we publish a WAR which contains 
other libraries, then this might be interpreted as 'distributing' them.


In OpenWebBeans, DeltaSpike and MyFaces we don't deploy them any longer in 
binary form. Just the sources. Users can easily build it themselfs. And most of 
the times they are only interested in the sample source anyway.

LieGrue,
strub



- Original Message -

From: Francesco Chicchiriccò 
To: general@incubator.apache.org
Cc:
Sent: Monday, August 6, 2012 5:50 PM
Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating

On 06/08/2012 16:36, Alexei Fedotov wrote:

  Hello Francesco,

  Here are few things I have found via manual inspection:

  1. Jquery bundle contains several following strings: "Dual licensed
  under the MIT or GPL Version 2 licenses."
  *) source release LICENSE file does not contain MIT license;
  *) and the file itself does not look like APL licensed;
  *) and it is a part of the source release.

  Something should be fixed here, i.e. the files replaced with wget in
  the build script.

  2. ./legal_ext/LICENSE does not have a license for jquery. Does war
  contain jquery?

Hi Alexei,
I've taken a look at other ASF projects including JQuery (or similar
dual-licensed JS frameworks) and I've opened
https://issues.apache.org/jira/browse/SYNCOPE-181
We'll fix this ASAP.


  Don't think these issues are stoppers.

Cool :-)
What's your vote on the release, then?

Thanks for your review.
Regards.


  On Mon, Aug 6, 2012 at 6:07 PM, Mark Struberg 

wrote:

  Hi Francesco, I can check in the evening.

  LieGrue,
  strub



  - Original Message -

  From: Francesco Chicchiriccò 
  To: general@incubator.apache.org
  Cc:
  Sent: Monday, August 6, 2012 2:49 PM
  Subject: Re: [VOTE] Apache Syncope 1.0.0-incubating

  Hi IPMC members,
  we are missing a single vote on this release: anyone interested to

check?

  TIA.
  Regards.

  On 03/08/2012 09:58, Francesco Chicchiriccò wrote:

   I've created a 1.0.0-incubating release, with the

following artifacts

  up

   for a vote:

   SVN source tag (r1367421):


https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/

   List of changes:


https://svn.apache.org/repos/asf/incubator/syncope/tags/syncope-1.0.0-incubating/CHANGES

   Maven staging repo:
   

https://repository.apache.org/content/repositories/orgapachesyncope-100/

   Source release (checksums and signatures are available at the

same

   location):


https://repository.apache.org/content/repositories/orgapachesyncope-100/org/apache/syncope/syncope-root/1.0.0-incubating/syncope-root-1.0.0-incubating-source-release.zip

   Staging site:
   http://incubator.apache.org/syncope/1.0.0-incubating/

   PGP release keys (signed using 273DF287):
   http://www.apache.org/dist/incubator/syncope/KEYS


   This has been voted through on the

syncope-...@incubator.apache.org

   mailing list [1],
   and now requires a vote on general@incubator.apache.org

   Votes already cast (on syncope-dev):

   +1 (binding)
   * Francesco Chicchiriccò
   * Massimiliano Perrone
   * Marco Di Sabatino Di Diodoro
   * Emmanuel Lécharny (IPMC member)
   * Simone Tripodi
   * Colm O hEigeartaigh (IPMC member)

   +1 (non binding)
* Denis Signoretto


   Vote will be open for 72 hours.

   [ ] +1  approve
   [ ] +0  no opinion
   [ ] -1  disapprove (and reason why)

   Best regards.

   [1]


http://syncope-dev.1063484.n5.nabble.com/VOTE-Apache-Syncope-1-0-0-incubating-tp5710173p5710292.html


--
Francesco Chicchiriccò

ASF Member, Apache Cocoon PMC and Apache Syncope PPMC Member
http://people.apache.org/~ilgrosso/


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: {RESULT] [VOTE] S4 0.5.0 Release Candidate 1

2012-08-07 Thread Richard Frovarp

On 08/07/2012 06:36 AM, Matthieu Morel wrote:

Hi,

The vote for this S4 release passed with the following results at the
vote deadline:

+1: 7 (5 binding)
-1: 0

Details:

+1 IPMC:
acmurthy, phunt

+1 PPMC
kishoreg*, leoneu*, fpj

+1 wider community
Daniel Gomez, Karthik Kambatla


Thanks to all the participants to the voting process!

I'll now publish the artifacts, and after the sync delay, update the
websites and send announcements.

Matthieu



Best I know, you need three IPMC votes for it to pass.


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Droids 0.2.0-incubating (RC1)

2012-08-07 Thread Richard Frovarp

On 07/09/2012 11:19 AM, Richard Frovarp wrote:

A 0.2.0-incubating release candidate has been created. This will be the
second release of Apache Droids incubating.

We have 3 +1 votes, with 1 +1 IPMC vote (rfrovarp). We are in need of 2
more IPMC votes.

Vote thread:
http://mail-archives.apache.org/mod_mbox/incubator-droids-dev/201206.mbox/%3C4FDAA788.4000400%40apache.org%3E


Release Notes:
http://people.apache.org/~rfrovarp/droids/0.2.0-rc1/release-notes.html

The following artifacts are up for a vote:

SVN source tag (r1350453):
https://svn.apache.org/repos/asf/incubator/droids/tags/0.2.0-incubating/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachedroids-238/

Source release:
http://people.apache.org/~rfrovarp/droids/0.2.0-rc1/

PGP release keys (signed using 26B716B3):
https://svn.apache.org/repos/asf/incubator/droids/KEYS


[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)


I still need one more IPMC vote. Can someone please review?

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Tomer Shiran
FYI: I have posted the proposal to the wiki and updated it based on the
feedback from Marvin and Jakob:
http://wiki.apache.org/incubator/DrillProposal

On Mon, Aug 6, 2012 at 2:29 PM, Ted Dunning  wrote:

> In fact, a big part of the motivation for proposing incubation before code
> is ready is exactly to foster the discussions needed to form community.
>
> It is true that many projects that start without the fundamentals face
> challenges that more mature projects face but that is really just a fact of
> life with young projects.
>
> My own experience includes a project that also started without an initial
> code drop.  Mahout has gone on to have a vibrant welcoming community that
> has fostered the donation and development of some very valuable software.
>  I expect Drill will be able to say the same thing before long.
>
> Sent from my iPhone
>
> On Aug 6, 2012, at 2:55 PM, Jakob Homan  wrote:
>
> > Any reason the design docs can't be put up in place of where the
> > source would normally go?
> >
> > On Mon, Aug 6, 2012 at 11:23 AM, Tomer Shiran 
> wrote:
> >> Marvin, thanks for commenting on the proposal! The initial committers
> have
> >> been working on the design for several months, and will commit the
> design
> >> once the project is approved, so we do not expect much friction during
> the
> >> design phase. With that said, we certainly do want to engage others
> early
> >> on, and our goal in incubating earlier is to encourage feedback and
> >> contributions when it is still easy to change the APIs and extensibility
> >> points. This is important because Drill (unlike, say, Google's Dremel)
> must
> >> be really flexible in order to be relevant to a broad user base,
> allowing
> >> multiple data sources, data formats and query languages. While many
> >> projects enter incubation with a complete implementation, others don't,
> and
> >> due to the nature of this project we think that in this case it is
> better
> >> to start earlier.
> >>
> >> Thanks,
> >> Tomer
> >>
> >> On Mon, Aug 6, 2012 at 9:25 AM, Marvin Humphrey  >wrote:
> >>
> >>> On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning 
> wrote:
> >>>
>  Initial Source
>  ==
>  There is no initial source code. All source code will be developed
> within
>  the Apache Incubator.
> >>>
> >>> Coming in without any source code is going to pose a challenge to this
> >>> podling.
> >>>
> >>>http://www.apache.org/foundation/how-it-works.html#incubator
> >>>
> >>>The incubator filters projects on the basis of the likeliness of
> >>> them becoming
> >>>successful meritocratic communities. The basic requirements for
> >>> incubation
> >>>are:
> >>>
> >>>* a working codebase -- over the years and after several
> failures,
> >>> the
> >>>  foundation came to understand that without an initial working
> >>>  codebase, it is generally hard to bootstrap a community. This
> is
> >>>  because merit is not well recognized by developers without a
> >>> working
> >>>  codebase. Also, the friction that is developed during the
> initial
> >>>  design stage is likely to fragment the community.
> >>>
> >>> That last line in particular seems like something to watch out for.
> >>>
> >>> Marvin Humphrey
> >>>
> >>> -
> >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >>> For additional commands, e-mail: general-h...@incubator.apache.org
> >>>
> >>>
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Greg Stein
On Tue, Aug 7, 2012 at 7:14 AM, ant elder  wrote:
>...
> This looks similar to the Syncope release vote thats also happening
> right now in that the source distribution includes things like JQuery
> but doesn't mention that in the LICENSE file. I'm a bit surprised
> people are continuing to vote +1 on the Syncope release knowing that
> so am I getting this wrong and the JQuery license doesn't need to be
> included here for some reason?

My feeling on the matter here is that these are incubating projects.
We allow things like (L)GPL dependencies in the releases, as long as a
PLAN exists to get rid of them. Of course, it must be perfectly clean
to graduate. But I believe we have wiggle room while incubating.

As Branko noted, the included projects are mentioned in the NOTICE
file, but that isn't quite Right. The 0.2.0 release will get it
corrected.

We could have stepped back and rolled another tarball, but I believe
it is more important for Bloodhound to get a release out [than to be
perfect on 0.1.0], in order to get some traction and some attraction
to build a larger community. The BH folks plan to release every few
weeks, so we should see the corrections in a release at the end of the
month. (or we could convince Gary to do another in a week or two :-)

Cheers,
-g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread ant elder
On Tue, Aug 7, 2012 at 6:04 PM, Greg Stein  wrote:
> On Tue, Aug 7, 2012 at 7:14 AM, ant elder  wrote:
>>...
>> This looks similar to the Syncope release vote thats also happening
>> right now in that the source distribution includes things like JQuery
>> but doesn't mention that in the LICENSE file. I'm a bit surprised
>> people are continuing to vote +1 on the Syncope release knowing that
>> so am I getting this wrong and the JQuery license doesn't need to be
>> included here for some reason?
>
> My feeling on the matter here is that these are incubating projects.
> We allow things like (L)GPL dependencies in the releases, as long as a
> PLAN exists to get rid of them. Of course, it must be perfectly clean
> to graduate. But I believe we have wiggle room while incubating.
>
> As Branko noted, the included projects are mentioned in the NOTICE
> file, but that isn't quite Right. The 0.2.0 release will get it
> corrected.
>
> We could have stepped back and rolled another tarball, but I believe
> it is more important for Bloodhound to get a release out [than to be
> perfect on 0.1.0], in order to get some traction and some attraction
> to build a larger community. The BH folks plan to release every few
> weeks, so we should see the corrections in a release at the end of the
> month. (or we could convince Gary to do another in a week or two :-)
>
> Cheers,
> -g
>

Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies
in Incubator releases, we allow them in the source in SVN but i don't
recall any releases like that.

Anyway thats beside the point, ok so lets have this be a precedent
that sets Incubator policy - we now have some wiggle room while
incubating to do a release that violates ASF release policy as long as
it will be fixed soon in another release and definitely before
graduating. A policy like that would help a lot with avoiding the
numerous respins some poddling releases are made to do during voting
on general@.

   ...ant


   ...ant

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Franklin, Matthew B.
>-Original Message-
>From: Marvin Humphrey [mailto:mar...@rectangular.com]
>Sent: Monday, August 06, 2012 12:25 PM
>To: general@incubator.apache.org
>Cc: Grant Ingersoll; Isabel Drost
>Subject: Re: [PROPOSAL] Drill for the Apache Incubator
>
>On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning 
>wrote:
>
>> Initial Source
>> ==
>> There is no initial source code. All source code will be developed within
>> the Apache Incubator.
>
>Coming in without any source code is going to pose a challenge to this
>podling.
>
>http://www.apache.org/foundation/how-it-works.html#incubator
>
>The incubator filters projects on the basis of the likeliness of
>them becoming
>successful meritocratic communities. The basic requirements for incubation
>are:
>
>* a working codebase -- over the years and after several failures, the
>  foundation came to understand that without an initial working
>  codebase, it is generally hard to bootstrap a community. This is
>  because merit is not well recognized by developers without a working
>  codebase. Also, the friction that is developed during the initial
>  design stage is likely to fragment the community.

It seems like there could be flexibility in this requirement, based on a few 
factors.  In this case, a design discussion has been ongoing; but I would also 
think that any community coming in with enough people who know the Apache way 
may also not need as much of a solid starting point code wise.

>
>That last line in particular seems like something to watch out for.
>
>Marvin Humphrey
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Marvin Humphrey
On Tue, Aug 7, 2012 at 11:38 AM, ant elder  wrote:
> Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies
> in Incubator releases, we allow them in the source in SVN but i don't
> recall any releases like that.

I know AOO had interactions with Legal regarding dmake, dictionaries and so
on, though I don't recall exactly what went into their release.  I would be
surprised if any category X dependencies have wound up in an incubating
release without Legal's involvement.

Lucy's early incubating releases had two Perl-licensed (Artistic/GPL)
dependencies (which were not bundled, but had to be downloaded and installed
separately by the consumer).  We sought a variance from Legal and got specific
approval from the Legal VP for our plan, which involved ditching both of the
problematic dependencies prior to graduation:

https://issues.apache.org/jira/browse/LEGAL-86

Are there other examples?

> Anyway thats beside the point, ok so lets have this be a precedent
> that sets Incubator policy - we now have some wiggle room while
> incubating to do a release that violates ASF release policy as long as
> it will be fixed soon in another release and definitely before
> graduating.

It seems that with regards to this Bloodhound release, the issue is restricted
to LICENSE/NOTICE, an area where ASF policies are notoriously unclear and
conformance is arguably spotty even among TLPs.  So long as the licenses of
all dependencies are being obeyed (e.g. no license headers or mandatory files
stripped from source files) and usage is compatible with ASF policy (no
category X dependencies, etc), I agree with the judgment call that an
incubating release need not be held up simply to move the text of the license
from LICENSE to NOTICE or vice versa.

IMO, this is different from releases with category X dependencies, where ASF
policies are clear and conformance is very high among TLPs.  I don't see that
the Incubator should consider this vote a precedent for overturning arbitrary
ASF policy.

If we don't like the poor state of ASF policy and conformance on
LICENSE/NOTICE then the ASF Membership should work to clarify the policy.

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Greg Stein
On Tue, Aug 7, 2012 at 3:44 PM, Marvin Humphrey  wrote:
> On Tue, Aug 7, 2012 at 11:38 AM, ant elder  wrote:
>> Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies
>> in Incubator releases, we allow them in the source in SVN but i don't
>> recall any releases like that.
>
> I know AOO had interactions with Legal regarding dmake, dictionaries and so
> on, though I don't recall exactly what went into their release.  I would be
> surprised if any category X dependencies have wound up in an incubating
> release without Legal's involvement.
>
> Lucy's early incubating releases had two Perl-licensed (Artistic/GPL)
> dependencies (which were not bundled, but had to be downloaded and installed
> separately by the consumer).  We sought a variance from Legal and got specific
> approval from the Legal VP for our plan, which involved ditching both of the
> problematic dependencies prior to graduation:
>
> https://issues.apache.org/jira/browse/LEGAL-86
>
> Are there other examples?

The one that I had in mind was Roller. Several of its incubating
releases had a hard dependency on Hibernate. They were required to
clean it up before graduation, of course.

You can look at the archives back in 2006 when it was incubating. In
particular, there is one sent to private@incubator that I would refer
you to:
  http://s.apache.org/c04  [only usable by ASF Members]

>> Anyway thats beside the point, ok so lets have this be a precedent
>> that sets Incubator policy - we now have some wiggle room while
>> incubating to do a release that violates ASF release policy as long as
>> it will be fixed soon in another release and definitely before
>> graduating.
>
> It seems that with regards to this Bloodhound release, the issue is restricted
> to LICENSE/NOTICE, an area where ASF policies are notoriously unclear and
> conformance is arguably spotty even among TLPs.

I've given some bad info in the past, but after the last go-round
(thanks Marvin), I feel that I've got a better handle on it. And
that's the feedback that I've now provided to the BH people.

> So long as the licenses of
> all dependencies are being obeyed (e.g. no license headers or mandatory files
> stripped from source files) and usage is compatible with ASF policy (no
> category X dependencies, etc),

All good here.

> I agree with the judgment call that an
> incubating release need not be held up simply to move the text of the license
> from LICENSE to NOTICE or vice versa.
>
> IMO, this is different from releases with category X dependencies, where ASF
> policies are clear and conformance is very high among TLPs.  I don't see that
> the Incubator should consider this vote a precedent for overturning arbitrary
> ASF policy.

For TLPs, I totally agree. For projects that are incubating... they
are NOT ASF projects by definition. That is why we've allowed a bit of
wiggle.

In any case, Bloodhound isn't requesting any funny deps. Just getting
a release out there which some already-known issues. That's why it got
my +1, and recommendation to just go with 0.1.0 rather than spinning
up a new tarball.

Cheers,
-g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Greg Stein
On Tue, Aug 7, 2012 at 2:38 PM, ant elder  wrote:
>...
> Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies
> in Incubator releases, we allow them in the source in SVN but i don't
> recall any releases like that.

As I replied to Marvin, Apache Roller had a hard dependency on
Hibernate for some of its incubator releases. Allowing that was okay'd
by the IPMC, VP Legal, and the Board :-)

My view is that these are not true ASF projects, so *some* wiggle is
allowable, especially with a plan in hand.

(now, I still would not advocate for any release that seriously broke
the rules; at a minimum, get LICENSE/NOTICE and source file headers in
there; work on clarifying your dependencies and their licenses; etc)

> Anyway thats beside the point, ok so lets have this be a precedent
> that sets Incubator policy - we now have some wiggle room while
> incubating to do a release that violates ASF release policy as long as
> it will be fixed soon in another release and definitely before
> graduating. A policy like that would help a lot with avoiding the
> numerous respins some poddling releases are made to do during voting
> on general@.

Exactly. We've seen a lot of back/forth which doesn't really help the
podling very much.

It's certainly a subjective judgement call. I don't know where to draw
the line, nor whether we must draw it. One of those "know it when you
see it" things. And we have the judgement of a large body of people
here on this list.

Cheers,
-g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[DISCUSS] [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Jukka Zitting
Hi,

[branching a discuss thread]

On Tue, Aug 7, 2012 at 10:56 PM, Greg Stein  wrote:
> As I replied to Marvin, Apache Roller had a hard dependency on
> Hibernate for some of its incubator releases. Allowing that was okay'd
> by the IPMC, VP Legal, and the Board :-)
>
> My view is that these are not true ASF projects, so *some* wiggle is
> allowable, especially with a plan in hand.

Note that even though podlings aren't full Apache projects yet,
incubating releases *are* official Apache releases, and should
therefore be held to a similar standard. If that standard can't easily
be reached, some podlings (like Subversion when it came in) have opted
to keep cutting non-Apache releases outside the ASF until those issues
have been resolved.

> It's certainly a subjective judgement call. I don't know where to draw
> the line, nor whether we must draw it. One of those "know it when you
> see it" things. And we have the judgement of a large body of people
> here on this list.

Personally I'm fine with things like missing license headers or
partially incomplete license metadata (which sounds like is the case
here), as long as those are just omissions that don't fundamentally
affect our rights (or those of downstream users) to distribute the
releases and as long as there's a commitment to fix such issues in
time for the next release. Such minor issues are fairly common also in
many TLPs (I've filed a number of related bugs), so it's not even a
problem that's limited just to the Incubator.

Larger issues like exceptions to documented licensing policy (like in
the examples brought up here) should always be explicitly cleared with
legal, etc.

BR,

Jukka Zitting

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Andrzej Bialecki

On 07/08/2012 21:14, Franklin, Matthew B. wrote:

-Original Message-
From: Marvin Humphrey [mailto:mar...@rectangular.com]
Sent: Monday, August 06, 2012 12:25 PM
To: general@incubator.apache.org
Cc: Grant Ingersoll; Isabel Drost
Subject: Re: [PROPOSAL] Drill for the Apache Incubator

On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning 
wrote:


Initial Source
==
There is no initial source code. All source code will be developed within
the Apache Incubator.


Coming in without any source code is going to pose a challenge to this
podling.

http://www.apache.org/foundation/how-it-works.html#incubator

The incubator filters projects on the basis of the likeliness of
them becoming
successful meritocratic communities. The basic requirements for incubation
are:

* a working codebase -- over the years and after several failures, the
  foundation came to understand that without an initial working
  codebase, it is generally hard to bootstrap a community. This is
  because merit is not well recognized by developers without a working
  codebase. Also, the friction that is developed during the initial
  design stage is likely to fragment the community.


It seems like there could be flexibility in this requirement, based on a few 
factors.  In this case, a design discussion has been ongoing; but I would also 
think that any community coming in with enough people who know the Apache way 
may also not need as much of a solid starting point code wise.


+1. Given the credentials and the experience of proposed committers and 
mentors, and the fact that the initial design is already done, I don't 
think this is a serious risk. And it's an exciting proposal with a 
potentially big impact.


--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __<><
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread ant elder
On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein  wrote:
> On Tue, Aug 7, 2012 at 3:44 PM, Marvin Humphrey  
> wrote:
>> On Tue, Aug 7, 2012 at 11:38 AM, ant elder  wrote:
>>> Gosh i'm pretty sure we _don't_ allow things like (L)GPL dependencies
>>> in Incubator releases, we allow them in the source in SVN but i don't
>>> recall any releases like that.
>>
>> I know AOO had interactions with Legal regarding dmake, dictionaries and so
>> on, though I don't recall exactly what went into their release.  I would be
>> surprised if any category X dependencies have wound up in an incubating
>> release without Legal's involvement.
>>
>> Lucy's early incubating releases had two Perl-licensed (Artistic/GPL)
>> dependencies (which were not bundled, but had to be downloaded and installed
>> separately by the consumer).  We sought a variance from Legal and got 
>> specific
>> approval from the Legal VP for our plan, which involved ditching both of the
>> problematic dependencies prior to graduation:
>>
>> https://issues.apache.org/jira/browse/LEGAL-86
>>
>> Are there other examples?
>
> The one that I had in mind was Roller. Several of its incubating
> releases had a hard dependency on Hibernate. They were required to
> clean it up before graduation, of course.
>
> You can look at the archives back in 2006 when it was incubating. In
> particular, there is one sent to private@incubator that I would refer
> you to:
>   http://s.apache.org/c04  [only usable by ASF Members]
>

Didn't that get subsequently revised by Cliff et al into "Incubating
projects must not distribute an official product release that includes
works covered by an excluded license" -
http://www.apache.org/legal/3party.html#transition-incubator

   ...ant

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Otis Gospodnetic
I concur with Andrzej.  Let's see that VOTE Ted!

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Andrzej Bialecki 
>To: general@incubator.apache.org 
>Sent: Tuesday, August 7, 2012 5:51 PM
>Subject: Re: [PROPOSAL] Drill for the Apache Incubator
> 
>On 07/08/2012 21:14, Franklin, Matthew B. wrote:
>>> -Original Message-
>>> From: Marvin Humphrey [mailto:mar...@rectangular.com]
>>> Sent: Monday, August 06, 2012 12:25 PM
>>> To: general@incubator.apache.org
>>> Cc: Grant Ingersoll; Isabel Drost
>>> Subject: Re: [PROPOSAL] Drill for the Apache Incubator
>>>
>>> On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning 
>>> wrote:
>>>
 Initial Source
 ==
 There is no initial source code. All source code will be developed within
 the Apache Incubator.
>>>
>>> Coming in without any source code is going to pose a challenge to this
>>> podling.
>>>
>>>     http://www.apache.org/foundation/how-it-works.html#incubator
>>>
>>>     The incubator filters projects on the basis of the likeliness of
>>> them becoming
>>>     successful meritocratic communities. The basic requirements for 
>>>incubation
>>>     are:
>>>
>>>         * a working codebase -- over the years and after several failures, 
>>>the
>>>           foundation came to understand that without an initial working
>>>           codebase, it is generally hard to bootstrap a community. This is
>>>           because merit is not well recognized by developers without a 
>>>working
>>>           codebase. Also, the friction that is developed during the initial
>>>           design stage is likely to fragment the community.
>>
>> It seems like there could be flexibility in this requirement, based on a few 
>> factors.  In this case, a design discussion has been ongoing; but I would 
>> also think that any community coming in with enough people who know the 
>> Apache way may also not need as much of a solid starting point code wise.
>
>+1. Given the credentials and the experience of proposed committers and 
>mentors, and the fact that the initial design is already done, I don't 
>think this is a serious risk. And it's an exciting proposal with a 
>potentially big impact.
>
>-- 
>Best regards,
>Andrzej Bialecki
>http://www.sigram.com, blog http://www.sigram.com/blog
>  ___.,___,___,___,_._. __<><
>[___||.__|__/|__||\/|: Information Retrieval, System Integration
>___|||__||..\|..||..|: Contact: info at sigram dot com
>
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>
>
>
>

Re: Status of Blur?

2012-08-07 Thread Tim Williams
Hi Otis,
Nice!  yeah, we're bootstrapping now...  join us on blur-dev@i.a.o and
blur-user@i.a.o

http://incubator.apache.org/projects/blur.html

The ticket's in now to get the git repo up too.

Thanks,
--tim

On Tue, Aug 7, 2012 at 8:05 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> What's the word on Blur?  The Proposal went well, VOTE thread got all +1s 
> back on July 20th, but not sure if anything is happening with it now and 
> I'm itching! :)
>
> Thanks,
> Otis
> 
>
> Search Analytics - http://sematext.com/search-analytics/index.html
> Scalable Performance Monitoring - http://sematext.com/spm/index.html

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Bloodhound 0.1.0 (incubating)

2012-08-07 Thread Greg Stein
On Tue, Aug 7, 2012 at 5:54 PM, ant elder  wrote:
> On Tue, Aug 7, 2012 at 9:51 PM, Greg Stein  wrote:
>...
>> You can look at the archives back in 2006 when it was incubating. In
>> particular, there is one sent to private@incubator that I would refer
>> you to:
>>   http://s.apache.org/c04  [only usable by ASF Members]
>>
>
> Didn't that get subsequently revised by Cliff et al into "Incubating
> projects must not distribute an official product release that includes
> works covered by an excluded license" -
> http://www.apache.org/legal/3party.html#transition-incubator

Dunno. That link is for a draft document, and has been replaced by a
final/resolved form (see link at top of page).

Regardless... Jukka posted recently, and I'd look to his note for
"current policy". I think his statement puts Incubator policy a little
more relaxed than ASF, but likely not as relaxed as I would have
posited (in regards to dependencies).

Cheers,
-g

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Amber - A Shepherd's View

2012-08-07 Thread Franklin, Matthew B.
Community looks well on the way to graduation.  Congratulations on the
recent release.

There are a few things on the status page that need to be filled in and
processes [1] like suitable name search need to be completed prior to
graduation vote at the IPMC level.

[1]:http://incubator.apache.org/guides/graduation.html#checklist


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Ted Dunning
I would like to call a vote for accepting Drill for incubation in the
Apache Incubator. The full proposal is available below.  Discussion
over the last few days has been quite positive.

Please cast your vote:

[ ] +1, bring Drill into Incubator
[ ] +0, I don't care either way,
[ ] -1, do not bring Drill into Incubator, because...

This vote will be open for 72 hours and only votes from the Incubator
PMC are binding.  The start of the vote is just before 3AM UTC on 8
August so the closing time will be 3AM UTC on 11 August.

Thank you for your consideration!

Ted

http://wiki.apache.org/incubator/DrillProposal

= Drill =

== Abstract ==
Drill is a distributed system for interactive analysis of large-scale
datasets, inspired by
[[http://research.google.com/pubs/pub36632.html|Google's Dremel]].

== Proposal ==
Drill is a distributed system for interactive analysis of large-scale
datasets. Drill is similar to Google's Dremel, with the additional
flexibility needed to support a broader range of query languages, data
formats and data sources. It is designed to efficiently process nested
data. It is a design goal to scale to 10,000 servers or more and to be
able to process petabyes of data and trillions of records in seconds.

== Background ==
Many organizations have the need to run data-intensive applications,
including batch processing, stream processing and interactive
analysis. In recent years open source systems have emerged to address
the need for scalable batch processing (Apache Hadoop) and stream
processing (Storm, Apache S4). In 2010 Google published a paper called
"Dremel: Interactive Analysis of Web-Scale Datasets," describing a
scalable system used internally for interactive analysis of nested
data. No open source project has successfully replicated the
capabilities of Dremel.

== Rationale ==
There is a strong need in the market for low-latency interactive
analysis of large-scale datasets, including nested data (eg, JSON,
Avro, Protocol Buffers). This need was identified by Google and
addressed internally with a system called Dremel.

In recent years open source systems have emerged to address the need
for scalable batch processing (Apache Hadoop) and stream processing
(Storm, Apache S4). Apache Hadoop, originally inspired by Google's
internal MapReduce system, is used by thousands of organizations
processing large-scale datasets. Apache Hadoop is designed to achieve
very high throughput, but is not designed to achieve the sub-second
latency needed for interactive data analysis and exploration. Drill,
inspired by Google's internal Dremel system, is intended to address
this need.

It is worth noting that, as explained by Google in the original paper,
Dremel complements MapReduce-based computing. Dremel is not intended
as a replacement for MapReduce and is often used in conjunction with
it to analyze outputs of MapReduce pipelines or rapidly prototype
larger computations. Indeed, Dremel and MapReduce are both used by
thousands of Google employees.

Like Dremel, Drill supports a nested data model with data encoded in a
number of formats such as JSON, Avro or Protocol Buffers. In many
organizations nested data is the standard, so supporting a nested data
model eliminates the need to normalize the data. With that said, flat
data formats, such as CSV files, are naturally supported as a special
case of nested data.

The Drill architecture consists of four key components/layers:
 * Query languages: This layer is responsible for parsing the user's
query and constructing an execution plan.  The initial goal is to
support the SQL-like language used by Dremel and
[[https://developers.google.com/bigquery/docs/query-reference|Google
BigQuery]], which we call DrQL. However, Drill is designed to support
other languages and programming models, such as the
[[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
Language]], [[http://www.cascading.org/|Cascading]] or
[[https://github.com/tdunning/Plume|Plume]].
 * Low-latency distributed execution engine: This layer is responsible
for executing the physical plan. It provides the scalability and fault
tolerance needed to efficiently query petabytes of data on 10,000
servers. Drill's execution engine is based on research in distributed
execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
columnar storage, and can be extended with additional operators and
connectors.
 * Nested data formats: This layer is responsible for supporting
various data formats. The initial goal is to support the column-based
format used by Dremel. Drill is designed to support schema-based
formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
and schema-less formats such as JSON, BSON or YAML. In addition, it is
designed to support column-based formats such as Dremel,
AVRO-806/Trevni and RCFile, and row-based formats such as Protocol
Buffers, Avro, JSON, BSON and CSV. A particular distinction with Drill
is that the execution engine is flexible enoug

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Ted Dunning
Just sent that out.

Thanks for the encouragement!

On Tue, Aug 7, 2012 at 6:02 PM, Otis Gospodnetic
 wrote:
> I concur with Andrzej.  Let's see that VOTE Ted!

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Scott Deboy
+1 (binding)

On Tue, Aug 7, 2012 at 7:41 PM, Ted Dunning  wrote:

> I would like to call a vote for accepting Drill for incubation in the
> Apache Incubator. The full proposal is available below.  Discussion
> over the last few days has been quite positive.
>
> Please cast your vote:
>
> [ ] +1, bring Drill into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Drill into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.  The start of the vote is just before 3AM UTC on 8
> August so the closing time will be 3AM UTC on 11 August.
>
> Thank you for your consideration!
>
> Ted
>
> http://wiki.apache.org/incubator/DrillProposal
>
> = Drill =
>
> == Abstract ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets, inspired by
> [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
>
> == Proposal ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets. Drill is similar to Google's Dremel, with the additional
> flexibility needed to support a broader range of query languages, data
> formats and data sources. It is designed to efficiently process nested
> data. It is a design goal to scale to 10,000 servers or more and to be
> able to process petabyes of data and trillions of records in seconds.
>
> == Background ==
> Many organizations have the need to run data-intensive applications,
> including batch processing, stream processing and interactive
> analysis. In recent years open source systems have emerged to address
> the need for scalable batch processing (Apache Hadoop) and stream
> processing (Storm, Apache S4). In 2010 Google published a paper called
> "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> scalable system used internally for interactive analysis of nested
> data. No open source project has successfully replicated the
> capabilities of Dremel.
>
> == Rationale ==
> There is a strong need in the market for low-latency interactive
> analysis of large-scale datasets, including nested data (eg, JSON,
> Avro, Protocol Buffers). This need was identified by Google and
> addressed internally with a system called Dremel.
>
> In recent years open source systems have emerged to address the need
> for scalable batch processing (Apache Hadoop) and stream processing
> (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> internal MapReduce system, is used by thousands of organizations
> processing large-scale datasets. Apache Hadoop is designed to achieve
> very high throughput, but is not designed to achieve the sub-second
> latency needed for interactive data analysis and exploration. Drill,
> inspired by Google's internal Dremel system, is intended to address
> this need.
>
> It is worth noting that, as explained by Google in the original paper,
> Dremel complements MapReduce-based computing. Dremel is not intended
> as a replacement for MapReduce and is often used in conjunction with
> it to analyze outputs of MapReduce pipelines or rapidly prototype
> larger computations. Indeed, Dremel and MapReduce are both used by
> thousands of Google employees.
>
> Like Dremel, Drill supports a nested data model with data encoded in a
> number of formats such as JSON, Avro or Protocol Buffers. In many
> organizations nested data is the standard, so supporting a nested data
> model eliminates the need to normalize the data. With that said, flat
> data formats, such as CSV files, are naturally supported as a special
> case of nested data.
>
> The Drill architecture consists of four key components/layers:
>  * Query languages: This layer is responsible for parsing the user's
> query and constructing an execution plan.  The initial goal is to
> support the SQL-like language used by Dremel and
> [[https://developers.google.com/bigquery/docs/query-reference|Google
> BigQuery]], which we call DrQL. However, Drill is designed to support
> other languages and programming models, such as the
> [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|MongoQuery
> Language]], [[http://www.cascading.org/|Cascading]] or
> [[https://github.com/tdunning/Plume|Plume]].
>  * Low-latency distributed execution engine: This layer is responsible
> for executing the physical plan. It provides the scalability and fault
> tolerance needed to efficiently query petabytes of data on 10,000
> servers. Drill's execution engine is based on research in distributed
> execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> columnar storage, and can be extended with additional operators and
> connectors.
>  * Nested data formats: This layer is responsible for supporting
> various data formats. The initial goal is to support the column-based
> format used by Dremel. Drill is designed to support schema-based
> formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
>

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Arun C Murthy
Ted,

Wasn't clear, can I add myself now?

thanks,
Arun

On Aug 6, 2012, at 9:08 AM, Ted Dunning wrote:

> Sounds like some good pull.  I will call a vote tomorrow.
> 
> On Mon, Aug 6, 2012 at 9:45 AM, Arun C Murthy  wrote:
> 
>> Agreed, likewise.
>> 
>> I'd love to get involved and would like to add myself whenever you are
>> ready.
>> 
>> thanks,
>> Arun
>> 
>> On Aug 3, 2012, at 10:40 AM, Owen O'Malley wrote:
>> 
>>> On Thu, Aug 2, 2012 at 3:12 PM, Ted Dunning 
>> wrote:
>>> 
>>> Drill is a distributed system for interactive analysis of large-scale
 datasets, inspired by Google’s Dremel (
 http://research.google.com/pubs/pub36632.html).
 
>>> 
>>> This sounds really interesting Ted and I would love to help you. Would it
>>> be ok to add myself as one of the initial committers?
>>> 
>>> Thanks,
>>>  Owen
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Marvin Humphrey
On Tue, Aug 7, 2012 at 12:14 PM, Franklin, Matthew B.
 wrote:
>>The incubator filters projects on the basis of the likeliness of them
>>becoming successful meritocratic communities. The basic requirements for
>>incubation are:
>>
>>  * a working codebase -- over the years and after several failures,
>>the foundation came to understand that without an initial working
>>codebase, it is generally hard to bootstrap a community. This is
>>because merit is not well recognized by developers without a working
>>codebase. Also, the friction that is developed during the initial
>>design stage is likely to fragment the community.
>
> It seems like there could be flexibility in this requirement, based on a few
> factors.  In this case, a design discussion has been ongoing; but I would
> also think that any community coming in with enough people who know the
> Apache way may also not need as much of a solid starting point code wise.

In the abstract, I'm a little skeptical about your last point. The inclusive,
collaborative emphasis of the Apache Way is effective for evolutionary
development of an existing code base, but IMO it's less well suited to the
revolutionary act of starting a project.  Choosing what *not* to do is really
important when you start out, and that's not necessarily our strength.

In Drill's case, I think the focus problem is mitigated by the fact that the
podling will start with design documents and the Dremel whitepaper rather than
a blank slate empty repository.  In addition, the other classic problem which
afflicts podlings which start with no code -- difficulty refreshing the
community with no releases -- seems unlikely to manifest.

The proposal looks good to me now. :)

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Marvin Humphrey
On Tue, Aug 7, 2012 at 10:09 PM, Arun C Murthy  wrote:

> Wasn't clear, can I add myself now?

Didn't the Incubator go back to discouraging open enrollment?

Is it OK to be invited in based on merit later, or do you feel that due to the
nature of this project, it's essential to be in on the ground floor?

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Ashish
+1 (non-binding)

On Wed, Aug 8, 2012 at 8:11 AM, Ted Dunning  wrote:
> I would like to call a vote for accepting Drill for incubation in the
> Apache Incubator. The full proposal is available below.  Discussion
> over the last few days has been quite positive.
>
> Please cast your vote:
>
> [ ] +1, bring Drill into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Drill into Incubator, because...
>
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.  The start of the vote is just before 3AM UTC on 8
> August so the closing time will be 3AM UTC on 11 August.
>
> Thank you for your consideration!
>
> Ted
>
> http://wiki.apache.org/incubator/DrillProposal
>
> = Drill =
>
> == Abstract ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets, inspired by
> [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
>
> == Proposal ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets. Drill is similar to Google's Dremel, with the additional
> flexibility needed to support a broader range of query languages, data
> formats and data sources. It is designed to efficiently process nested
> data. It is a design goal to scale to 10,000 servers or more and to be
> able to process petabyes of data and trillions of records in seconds.
>
> == Background ==
> Many organizations have the need to run data-intensive applications,
> including batch processing, stream processing and interactive
> analysis. In recent years open source systems have emerged to address
> the need for scalable batch processing (Apache Hadoop) and stream
> processing (Storm, Apache S4). In 2010 Google published a paper called
> "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> scalable system used internally for interactive analysis of nested
> data. No open source project has successfully replicated the
> capabilities of Dremel.
>
> == Rationale ==
> There is a strong need in the market for low-latency interactive
> analysis of large-scale datasets, including nested data (eg, JSON,
> Avro, Protocol Buffers). This need was identified by Google and
> addressed internally with a system called Dremel.
>
> In recent years open source systems have emerged to address the need
> for scalable batch processing (Apache Hadoop) and stream processing
> (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> internal MapReduce system, is used by thousands of organizations
> processing large-scale datasets. Apache Hadoop is designed to achieve
> very high throughput, but is not designed to achieve the sub-second
> latency needed for interactive data analysis and exploration. Drill,
> inspired by Google's internal Dremel system, is intended to address
> this need.
>
> It is worth noting that, as explained by Google in the original paper,
> Dremel complements MapReduce-based computing. Dremel is not intended
> as a replacement for MapReduce and is often used in conjunction with
> it to analyze outputs of MapReduce pipelines or rapidly prototype
> larger computations. Indeed, Dremel and MapReduce are both used by
> thousands of Google employees.
>
> Like Dremel, Drill supports a nested data model with data encoded in a
> number of formats such as JSON, Avro or Protocol Buffers. In many
> organizations nested data is the standard, so supporting a nested data
> model eliminates the need to normalize the data. With that said, flat
> data formats, such as CSV files, are naturally supported as a special
> case of nested data.
>
> The Drill architecture consists of four key components/layers:
>  * Query languages: This layer is responsible for parsing the user's
> query and constructing an execution plan.  The initial goal is to
> support the SQL-like language used by Dremel and
> [[https://developers.google.com/bigquery/docs/query-reference|Google
> BigQuery]], which we call DrQL. However, Drill is designed to support
> other languages and programming models, such as the
> [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
> Language]], [[http://www.cascading.org/|Cascading]] or
> [[https://github.com/tdunning/Plume|Plume]].
>  * Low-latency distributed execution engine: This layer is responsible
> for executing the physical plan. It provides the scalability and fault
> tolerance needed to efficiently query petabytes of data on 10,000
> servers. Drill's execution engine is based on research in distributed
> execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> columnar storage, and can be extended with additional operators and
> connectors.
>  * Nested data formats: This layer is responsible for supporting
> various data formats. The initial goal is to support the column-based
> format used by Dremel. Drill is designed to support schema-based
> formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
> and schema-less formats such as JSON, BSON or YAML. In additi

Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Mattmann, Chris A (388J)
+1 (binding). Good luck and sounds cool!

Cheers,
Chris

On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote:

> I would like to call a vote for accepting Drill for incubation in the
> Apache Incubator. The full proposal is available below.  Discussion
> over the last few days has been quite positive.
> 
> Please cast your vote:
> 
> [ ] +1, bring Drill into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Drill into Incubator, because...
> 
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.  The start of the vote is just before 3AM UTC on 8
> August so the closing time will be 3AM UTC on 11 August.
> 
> Thank you for your consideration!
> 
> Ted
> 
> http://wiki.apache.org/incubator/DrillProposal
> 
> = Drill =
> 
> == Abstract ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets, inspired by
> [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
> 
> == Proposal ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets. Drill is similar to Google's Dremel, with the additional
> flexibility needed to support a broader range of query languages, data
> formats and data sources. It is designed to efficiently process nested
> data. It is a design goal to scale to 10,000 servers or more and to be
> able to process petabyes of data and trillions of records in seconds.
> 
> == Background ==
> Many organizations have the need to run data-intensive applications,
> including batch processing, stream processing and interactive
> analysis. In recent years open source systems have emerged to address
> the need for scalable batch processing (Apache Hadoop) and stream
> processing (Storm, Apache S4). In 2010 Google published a paper called
> "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> scalable system used internally for interactive analysis of nested
> data. No open source project has successfully replicated the
> capabilities of Dremel.
> 
> == Rationale ==
> There is a strong need in the market for low-latency interactive
> analysis of large-scale datasets, including nested data (eg, JSON,
> Avro, Protocol Buffers). This need was identified by Google and
> addressed internally with a system called Dremel.
> 
> In recent years open source systems have emerged to address the need
> for scalable batch processing (Apache Hadoop) and stream processing
> (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> internal MapReduce system, is used by thousands of organizations
> processing large-scale datasets. Apache Hadoop is designed to achieve
> very high throughput, but is not designed to achieve the sub-second
> latency needed for interactive data analysis and exploration. Drill,
> inspired by Google's internal Dremel system, is intended to address
> this need.
> 
> It is worth noting that, as explained by Google in the original paper,
> Dremel complements MapReduce-based computing. Dremel is not intended
> as a replacement for MapReduce and is often used in conjunction with
> it to analyze outputs of MapReduce pipelines or rapidly prototype
> larger computations. Indeed, Dremel and MapReduce are both used by
> thousands of Google employees.
> 
> Like Dremel, Drill supports a nested data model with data encoded in a
> number of formats such as JSON, Avro or Protocol Buffers. In many
> organizations nested data is the standard, so supporting a nested data
> model eliminates the need to normalize the data. With that said, flat
> data formats, such as CSV files, are naturally supported as a special
> case of nested data.
> 
> The Drill architecture consists of four key components/layers:
> * Query languages: This layer is responsible for parsing the user's
> query and constructing an execution plan.  The initial goal is to
> support the SQL-like language used by Dremel and
> [[https://developers.google.com/bigquery/docs/query-reference|Google
> BigQuery]], which we call DrQL. However, Drill is designed to support
> other languages and programming models, such as the
> [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
> Language]], [[http://www.cascading.org/|Cascading]] or
> [[https://github.com/tdunning/Plume|Plume]].
> * Low-latency distributed execution engine: This layer is responsible
> for executing the physical plan. It provides the scalability and fault
> tolerance needed to efficiently query petabytes of data on 10,000
> servers. Drill's execution engine is based on research in distributed
> execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> columnar storage, and can be extended with additional operators and
> connectors.
> * Nested data formats: This layer is responsible for supporting
> various data formats. The initial goal is to support the column-based
> format used by Dremel. Drill is designed to support schema-based
> formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
> and schema-les

Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Arun C Murthy
+1 (binding)

On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote:

> I would like to call a vote for accepting Drill for incubation in the
> Apache Incubator. The full proposal is available below.  Discussion
> over the last few days has been quite positive.
> 
> Please cast your vote:
> 
> [ ] +1, bring Drill into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Drill into Incubator, because...
> 
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.  The start of the vote is just before 3AM UTC on 8
> August so the closing time will be 3AM UTC on 11 August.
> 
> Thank you for your consideration!
> 
> Ted
> 
> http://wiki.apache.org/incubator/DrillProposal
> 
> = Drill =
> 
> == Abstract ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets, inspired by
> [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
> 
> == Proposal ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets. Drill is similar to Google's Dremel, with the additional
> flexibility needed to support a broader range of query languages, data
> formats and data sources. It is designed to efficiently process nested
> data. It is a design goal to scale to 10,000 servers or more and to be
> able to process petabyes of data and trillions of records in seconds.
> 
> == Background ==
> Many organizations have the need to run data-intensive applications,
> including batch processing, stream processing and interactive
> analysis. In recent years open source systems have emerged to address
> the need for scalable batch processing (Apache Hadoop) and stream
> processing (Storm, Apache S4). In 2010 Google published a paper called
> "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> scalable system used internally for interactive analysis of nested
> data. No open source project has successfully replicated the
> capabilities of Dremel.
> 
> == Rationale ==
> There is a strong need in the market for low-latency interactive
> analysis of large-scale datasets, including nested data (eg, JSON,
> Avro, Protocol Buffers). This need was identified by Google and
> addressed internally with a system called Dremel.
> 
> In recent years open source systems have emerged to address the need
> for scalable batch processing (Apache Hadoop) and stream processing
> (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> internal MapReduce system, is used by thousands of organizations
> processing large-scale datasets. Apache Hadoop is designed to achieve
> very high throughput, but is not designed to achieve the sub-second
> latency needed for interactive data analysis and exploration. Drill,
> inspired by Google's internal Dremel system, is intended to address
> this need.
> 
> It is worth noting that, as explained by Google in the original paper,
> Dremel complements MapReduce-based computing. Dremel is not intended
> as a replacement for MapReduce and is often used in conjunction with
> it to analyze outputs of MapReduce pipelines or rapidly prototype
> larger computations. Indeed, Dremel and MapReduce are both used by
> thousands of Google employees.
> 
> Like Dremel, Drill supports a nested data model with data encoded in a
> number of formats such as JSON, Avro or Protocol Buffers. In many
> organizations nested data is the standard, so supporting a nested data
> model eliminates the need to normalize the data. With that said, flat
> data formats, such as CSV files, are naturally supported as a special
> case of nested data.
> 
> The Drill architecture consists of four key components/layers:
> * Query languages: This layer is responsible for parsing the user's
> query and constructing an execution plan.  The initial goal is to
> support the SQL-like language used by Dremel and
> [[https://developers.google.com/bigquery/docs/query-reference|Google
> BigQuery]], which we call DrQL. However, Drill is designed to support
> other languages and programming models, such as the
> [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
> Language]], [[http://www.cascading.org/|Cascading]] or
> [[https://github.com/tdunning/Plume|Plume]].
> * Low-latency distributed execution engine: This layer is responsible
> for executing the physical plan. It provides the scalability and fault
> tolerance needed to efficiently query petabytes of data on 10,000
> servers. Drill's execution engine is based on research in distributed
> execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> columnar storage, and can be extended with additional operators and
> connectors.
> * Nested data formats: This layer is responsible for supporting
> various data formats. The initial goal is to support the column-based
> format used by Dremel. Drill is designed to support schema-based
> formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
> and schema-less formats such as JSON, BSON or YAML. In ad

Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Devaraj Das
+1 (binding)

On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote:

> I would like to call a vote for accepting Drill for incubation in the
> Apache Incubator. The full proposal is available below.  Discussion
> over the last few days has been quite positive.
> 
> Please cast your vote:
> 
> [ ] +1, bring Drill into Incubator
> [ ] +0, I don't care either way,
> [ ] -1, do not bring Drill into Incubator, because...
> 
> This vote will be open for 72 hours and only votes from the Incubator
> PMC are binding.  The start of the vote is just before 3AM UTC on 8
> August so the closing time will be 3AM UTC on 11 August.
> 
> Thank you for your consideration!
> 
> Ted
> 
> http://wiki.apache.org/incubator/DrillProposal
> 
> = Drill =
> 
> == Abstract ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets, inspired by
> [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
> 
> == Proposal ==
> Drill is a distributed system for interactive analysis of large-scale
> datasets. Drill is similar to Google's Dremel, with the additional
> flexibility needed to support a broader range of query languages, data
> formats and data sources. It is designed to efficiently process nested
> data. It is a design goal to scale to 10,000 servers or more and to be
> able to process petabyes of data and trillions of records in seconds.
> 
> == Background ==
> Many organizations have the need to run data-intensive applications,
> including batch processing, stream processing and interactive
> analysis. In recent years open source systems have emerged to address
> the need for scalable batch processing (Apache Hadoop) and stream
> processing (Storm, Apache S4). In 2010 Google published a paper called
> "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> scalable system used internally for interactive analysis of nested
> data. No open source project has successfully replicated the
> capabilities of Dremel.
> 
> == Rationale ==
> There is a strong need in the market for low-latency interactive
> analysis of large-scale datasets, including nested data (eg, JSON,
> Avro, Protocol Buffers). This need was identified by Google and
> addressed internally with a system called Dremel.
> 
> In recent years open source systems have emerged to address the need
> for scalable batch processing (Apache Hadoop) and stream processing
> (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> internal MapReduce system, is used by thousands of organizations
> processing large-scale datasets. Apache Hadoop is designed to achieve
> very high throughput, but is not designed to achieve the sub-second
> latency needed for interactive data analysis and exploration. Drill,
> inspired by Google's internal Dremel system, is intended to address
> this need.
> 
> It is worth noting that, as explained by Google in the original paper,
> Dremel complements MapReduce-based computing. Dremel is not intended
> as a replacement for MapReduce and is often used in conjunction with
> it to analyze outputs of MapReduce pipelines or rapidly prototype
> larger computations. Indeed, Dremel and MapReduce are both used by
> thousands of Google employees.
> 
> Like Dremel, Drill supports a nested data model with data encoded in a
> number of formats such as JSON, Avro or Protocol Buffers. In many
> organizations nested data is the standard, so supporting a nested data
> model eliminates the need to normalize the data. With that said, flat
> data formats, such as CSV files, are naturally supported as a special
> case of nested data.
> 
> The Drill architecture consists of four key components/layers:
> * Query languages: This layer is responsible for parsing the user's
> query and constructing an execution plan.  The initial goal is to
> support the SQL-like language used by Dremel and
> [[https://developers.google.com/bigquery/docs/query-reference|Google
> BigQuery]], which we call DrQL. However, Drill is designed to support
> other languages and programming models, such as the
> [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
> Language]], [[http://www.cascading.org/|Cascading]] or
> [[https://github.com/tdunning/Plume|Plume]].
> * Low-latency distributed execution engine: This layer is responsible
> for executing the physical plan. It provides the scalability and fault
> tolerance needed to efficiently query petabytes of data on 10,000
> servers. Drill's execution engine is based on research in distributed
> execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> columnar storage, and can be extended with additional operators and
> connectors.
> * Nested data formats: This layer is responsible for supporting
> various data formats. The initial goal is to support the column-based
> format used by Dremel. Drill is designed to support schema-based
> formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV,
> and schema-less formats such as JSON, BSON or YAML. In ad

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Arun C Murthy
On Aug 7, 2012, at 10:20 PM, Marvin Humphrey wrote:

> On Tue, Aug 7, 2012 at 10:09 PM, Arun C Murthy  wrote:
> 
>> Wasn't clear, can I add myself now?
> 
> Didn't the Incubator go back to discouraging open enrollment?
> 
> Is it OK to be invited in based on merit later, or do you feel that due to the
> nature of this project, it's essential to be in on the ground floor?

Frankly, I'm not sure right now - things change frequently in the incubator. 
I've seen open-enrollment encouraged and discouraged - sometimes even in the 
same Incubator project! *smile*

OTOH, the reason I'm interested, particularly in Drill, is simple:
# Unlike several other Incubator proposals I've seen, there is no existing body 
of code here. If there was I'd be more hesitant to ask. This isn't meant to be 
pejorative, but merely a statement of fact.
# I've contributed to Apache Hadoop since day one (over 6 yrs now) and I feel 
like my expertise, particularly in MapReduce, would be very useful to Drill - 
again, particularly given it's nascent stage. (Likewise with someone like Owen 
who asked to be included too.)

Arun
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Drill into the Apache Incubator

2012-08-07 Thread Alex Karasulu
+1 (binding)

On Wed, Aug 8, 2012 at 8:33 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> +1 (binding). Good luck and sounds cool!
>
> Cheers,
> Chris
>
> On Aug 7, 2012, at 7:41 PM, Ted Dunning wrote:
>
> > I would like to call a vote for accepting Drill for incubation in the
> > Apache Incubator. The full proposal is available below.  Discussion
> > over the last few days has been quite positive.
> >
> > Please cast your vote:
> >
> > [ ] +1, bring Drill into Incubator
> > [ ] +0, I don't care either way,
> > [ ] -1, do not bring Drill into Incubator, because...
> >
> > This vote will be open for 72 hours and only votes from the Incubator
> > PMC are binding.  The start of the vote is just before 3AM UTC on 8
> > August so the closing time will be 3AM UTC on 11 August.
> >
> > Thank you for your consideration!
> >
> > Ted
> >
> > http://wiki.apache.org/incubator/DrillProposal
> >
> > = Drill =
> >
> > == Abstract ==
> > Drill is a distributed system for interactive analysis of large-scale
> > datasets, inspired by
> > [[http://research.google.com/pubs/pub36632.html|Google's Dremel]].
> >
> > == Proposal ==
> > Drill is a distributed system for interactive analysis of large-scale
> > datasets. Drill is similar to Google's Dremel, with the additional
> > flexibility needed to support a broader range of query languages, data
> > formats and data sources. It is designed to efficiently process nested
> > data. It is a design goal to scale to 10,000 servers or more and to be
> > able to process petabyes of data and trillions of records in seconds.
> >
> > == Background ==
> > Many organizations have the need to run data-intensive applications,
> > including batch processing, stream processing and interactive
> > analysis. In recent years open source systems have emerged to address
> > the need for scalable batch processing (Apache Hadoop) and stream
> > processing (Storm, Apache S4). In 2010 Google published a paper called
> > "Dremel: Interactive Analysis of Web-Scale Datasets," describing a
> > scalable system used internally for interactive analysis of nested
> > data. No open source project has successfully replicated the
> > capabilities of Dremel.
> >
> > == Rationale ==
> > There is a strong need in the market for low-latency interactive
> > analysis of large-scale datasets, including nested data (eg, JSON,
> > Avro, Protocol Buffers). This need was identified by Google and
> > addressed internally with a system called Dremel.
> >
> > In recent years open source systems have emerged to address the need
> > for scalable batch processing (Apache Hadoop) and stream processing
> > (Storm, Apache S4). Apache Hadoop, originally inspired by Google's
> > internal MapReduce system, is used by thousands of organizations
> > processing large-scale datasets. Apache Hadoop is designed to achieve
> > very high throughput, but is not designed to achieve the sub-second
> > latency needed for interactive data analysis and exploration. Drill,
> > inspired by Google's internal Dremel system, is intended to address
> > this need.
> >
> > It is worth noting that, as explained by Google in the original paper,
> > Dremel complements MapReduce-based computing. Dremel is not intended
> > as a replacement for MapReduce and is often used in conjunction with
> > it to analyze outputs of MapReduce pipelines or rapidly prototype
> > larger computations. Indeed, Dremel and MapReduce are both used by
> > thousands of Google employees.
> >
> > Like Dremel, Drill supports a nested data model with data encoded in a
> > number of formats such as JSON, Avro or Protocol Buffers. In many
> > organizations nested data is the standard, so supporting a nested data
> > model eliminates the need to normalize the data. With that said, flat
> > data formats, such as CSV files, are naturally supported as a special
> > case of nested data.
> >
> > The Drill architecture consists of four key components/layers:
> > * Query languages: This layer is responsible for parsing the user's
> > query and constructing an execution plan.  The initial goal is to
> > support the SQL-like language used by Dremel and
> > [[https://developers.google.com/bigquery/docs/query-reference|Google
> > BigQuery]], which we call DrQL. However, Drill is designed to support
> > other languages and programming models, such as the
> > [[http://www.mongodb.org/display/DOCS/Mongo+Query+Language|Mongo Query
> > Language]], [[http://www.cascading.org/|Cascading]] or
> > [[https://github.com/tdunning/Plume|Plume]].
> > * Low-latency distributed execution engine: This layer is responsible
> > for executing the physical plan. It provides the scalability and fault
> > tolerance needed to efficiently query petabytes of data on 10,000
> > servers. Drill's execution engine is based on research in distributed
> > execution engines (eg, Dremel, Dryad, Hyracks, CIEL, Stratosphere) and
> > columnar storage, and can be extended with additional operators and
> > connectors.

Re: [PROPOSAL] Drill for the Apache Incubator

2012-08-07 Thread Jakob Homan
On Mon, Aug 6, 2012 at 2:23 PM, Ted Dunning  wrote:
> No reason at all.
>
Sorry.  I may have been unclear.  I was requesting that the design
docs which are being referenced in the proposal:
"The requirement and design documents are currently stored in MapR
Technologies' source code repository. They will be checked in as part
of the initial code dump."
be made available for review as part of the proposal, much as an
initial source code base would be.  There is also a reference to a
presentation to-be-made available:
High-level slides have been published by MapR: TODO

Can those be made public?

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org