Re: Mahout as TLP

2010-02-12 Thread Dawid Weiss
> 1.  We'd like to organize several subprojects we wish to introduce (Core, 
> NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as 
> Lucene subprojects.

And the collections package, vectors, verification and evaluation
code, potential test data sets... yes, makes sense to make it a TLP. I
don't think Lucene folks will mind -- it's not like Mahout is going to
depart from using Lucene/ Hadoop, etc.

Not that my voice counts much here, but +1 to the idea.

Dawid


Re: Mahout as TLP

2010-02-12 Thread Ted Dunning
I am a bit ambivalent, but net +1 on this.  The deciding factor for me is
that it makes it easier to express the sub-projects.

On Fri, Feb 12, 2010 at 3:22 PM, Dawid Weiss  wrote:

> > 1.  We'd like to organize several subprojects we wish to introduce (Core,
> NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as
> Lucene subprojects.
>
> And the collections package, vectors, verification and evaluation
> code, potential test data sets... yes, makes sense to make it a TLP. I
> don't think Lucene folks will mind -- it's not like Mahout is going to
> depart from using Lucene/ Hadoop, etc.
>
> Not that my voice counts much here, but +1 to the idea.
>
> Dawid
>



-- 
Ted Dunning, CTO
DeepDyve


Re: Mahout as TLP

2010-02-12 Thread Jake Mannix
What are your ambivalencies, Ted?  I'm a little split myself, but all of my
"cons"
are very fuzzy and hard to articulate (mainly around timing).

Could you spell out why your +1 is any weaker than it could be?

  -jake

On Fri, Feb 12, 2010 at 3:26 PM, Ted Dunning  wrote:

> I am a bit ambivalent, but net +1 on this.  The deciding factor for me is
> that it makes it easier to express the sub-projects.
>
> On Fri, Feb 12, 2010 at 3:22 PM, Dawid Weiss 
> wrote:
>
> > > 1.  We'd like to organize several subprojects we wish to introduce
> (Core,
> > NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as
> > Lucene subprojects.
> >
> > And the collections package, vectors, verification and evaluation
> > code, potential test data sets... yes, makes sense to make it a TLP. I
> > don't think Lucene folks will mind -- it's not like Mahout is going to
> > depart from using Lucene/ Hadoop, etc.
> >
> > Not that my voice counts much here, but +1 to the idea.
> >
> > Dawid
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


Re: Mahout as TLP

2010-02-12 Thread Ted Dunning
My ambivalence has to do with uncertainties, mostly.  I don't have a clear
idea of what will change.  It seems like very little, but there is some
overhead.

It still seems like a good move regardless of what I don't know.

On Fri, Feb 12, 2010 at 4:39 PM, Jake Mannix  wrote:

> What are your ambivalencies, Ted?  I'm a little split myself, but all of my
> "cons"
> are very fuzzy and hard to articulate (mainly around timing).
>
> Could you spell out why your +1 is any weaker than it could be?
>
>  -jake
>
> On Fri, Feb 12, 2010 at 3:26 PM, Ted Dunning 
> wrote:
>
> > I am a bit ambivalent, but net +1 on this.  The deciding factor for me is
> > that it makes it easier to express the sub-projects.
> >
> > On Fri, Feb 12, 2010 at 3:22 PM, Dawid Weiss 
> > wrote:
> >
> > > > 1.  We'd like to organize several subprojects we wish to introduce
> > (Core,
> > > NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as
> > > Lucene subprojects.
> > >
> > > And the collections package, vectors, verification and evaluation
> > > code, potential test data sets... yes, makes sense to make it a TLP. I
> > > don't think Lucene folks will mind -- it's not like Mahout is going to
> > > depart from using Lucene/ Hadoop, etc.
> > >
> > > Not that my voice counts much here, but +1 to the idea.
> > >
> > > Dawid
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>



-- 
Ted Dunning, CTO
DeepDyve


Re: Mahout as TLP

2010-02-12 Thread Benson Margulies
TLP-itude means the following:

1) Mahout has it's own PMC. That group will vote on committers,
releases, and other legal issues.

Funny, it's a short list, isn't it? There are many things we might
want to do that will be easier to organize if it's just 'us chickens'
that have to decide, not that the existing Lucene PMC has been
obstructionist. Of course, I write 'us chickens' when I'm likely to be
just a committer due to my relatively recent arrival, but you know
what I mean.

One particularly attractive idea is apparently to go create
subprojects, since Apache doesn't do sub-sub projects.

In the sort term, I'd counsel against a rush to subdivide, but that's just me.



On Fri, Feb 12, 2010 at 7:57 PM, Ted Dunning  wrote:
> My ambivalence has to do with uncertainties, mostly.  I don't have a clear
> idea of what will change.  It seems like very little, but there is some
> overhead.
>
> It still seems like a good move regardless of what I don't know.
>
> On Fri, Feb 12, 2010 at 4:39 PM, Jake Mannix  wrote:
>
>> What are your ambivalencies, Ted?  I'm a little split myself, but all of my
>> "cons"
>> are very fuzzy and hard to articulate (mainly around timing).
>>
>> Could you spell out why your +1 is any weaker than it could be?
>>
>>  -jake
>>
>> On Fri, Feb 12, 2010 at 3:26 PM, Ted Dunning 
>> wrote:
>>
>> > I am a bit ambivalent, but net +1 on this.  The deciding factor for me is
>> > that it makes it easier to express the sub-projects.
>> >
>> > On Fri, Feb 12, 2010 at 3:22 PM, Dawid Weiss 
>> > wrote:
>> >
>> > > > 1.  We'd like to organize several subprojects we wish to introduce
>> > (Core,
>> > > NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as
>> > > Lucene subprojects.
>> > >
>> > > And the collections package, vectors, verification and evaluation
>> > > code, potential test data sets... yes, makes sense to make it a TLP. I
>> > > don't think Lucene folks will mind -- it's not like Mahout is going to
>> > > depart from using Lucene/ Hadoop, etc.
>> > >
>> > > Not that my voice counts much here, but +1 to the idea.
>> > >
>> > > Dawid
>> > >
>> >
>> >
>> >
>> > --
>> > Ted Dunning, CTO
>> > DeepDyve
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


Re: Mahout as TLP

2010-02-12 Thread Ted Dunning
Presumably one of the benefits of this will be fewer +0 votes on Mahout
issues due to fewer Lucene centric folks to don't follow our machinations.

On Fri, Feb 12, 2010 at 6:19 PM, Benson Margulies wrote:

> 1) Mahout has it's own PMC. That group will vote on committers,
> releases, and other legal issues.
>



-- 
Ted Dunning, CTO
DeepDyve


Re: Mahout as TLP

2010-02-12 Thread Jake Mannix
So I'm strongly in favor of getting to decide our own destiny, so in
that sense I'm very much a +1 for this.  Ditto for the option to
create sub-projects.  Then there's the simple fact that we are not
in any real way a project that *belongs* as part of "Lucene" in the
long run.

What makes me ambivalent is: how many really active
developers do we have to support the administrative tasks around
running a TLP?   We certainly have momentum in terms of interest
and codebase.  But should a project only at the 0.2 stage (soon to
be 0.3) be a TLP?

I'm just wondering if we're just giving ourselves more work. From a
practical standpoint, does this make our lives easier, or harder, to
do this now as opposed to later?  Clearly it must be done at some
point, but doing it now has what effect, really?

  -jake

On Fri, Feb 12, 2010 at 7:10 PM, Ted Dunning  wrote:

> Presumably one of the benefits of this will be fewer +0 votes on Mahout
> issues due to fewer Lucene centric folks to don't follow our machinations.
>
> On Fri, Feb 12, 2010 at 6:19 PM, Benson Margulies  >wrote:
>
> > 1) Mahout has it's own PMC. That group will vote on committers,
> > releases, and other legal issues.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


Re: Mahout as TLP

2010-02-13 Thread Kay Kay
As a lurker around in this community and an active user myself, 
expressing mine for whatever it is worth.


I am happy with the decoupling of ML from Search, with the former 
warranting a separate attention to itself. So, +1 on this happening 
eventually to be more independent, but my reservation has to do with the 
timing of it and specifically the versioning of it, and how close would 
a 1.0 release be feasible once this becomes a TLP.




On 02/12/2010 02:44 PM, Grant Ingersoll wrote:

As many of you know, Mahout has been growing pretty quickly and has also 
reached a critical mass.  I, along with some others in the Mahout community, 
feel it would make sense for Mahout to become a TLP  With this in mind, I've 
submitted a proposal to the Lucene PMC to ask the board to make Mahout an 
Apache TLP.  One of the feedbacks from the PMC was question as to whether this 
has been discussed in the community and whether the community is for it.  I 
know it's been brought up tangentially in the past (see [1], [2], [3]) and 
there wasn't any disagreement, but it seems it warrants a more formal 
discussion.

I see the following pros:
1.  We'd like to organize several subprojects we wish to introduce (Core, NLP, 
Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as Lucene 
subprojects.
2.  I also think longer term that while Machine Learning and Search are often 
related, they are not required of each other and that Mahout would be better 
aligned with a more narrow focus of Machine Learning only.
3. The PMC can be more narrowly focused on Mahout and it's needs and will be 
better informed of Mahout's contributors, etc.

Cons:
1. Lucene has a very strong brand and I have no doubt that Mahout benefits from 
that association
2. Changing mailing lists, etc. is a bit of a hassle (mostly for 
infrastructure), but not that big of a deal.  Still, Lucene is well established 
and well-run, so sometimes inertia is a good thing.

At the end of the day, I'm +1.


[1] 
http://search.lucidimagination.com/search/document/a6e03af2952ff196/possible_contribution_at_somewhat_of_a_tangent_to_mahout#5a41be454d503779

[2] 
http://search.lucidimagination.com/search/document/40c4c4ec11ca07b5/mi_clustering#7197ef846b384e4e

[3] 
http://search.lucidimagination.com/search/document/1817a5e65c83bae3/proposing_a_c_port_for_apache_mahout#8e4e8eabc945264d
   




Re: Mahout as TLP

2010-02-13 Thread Grant Ingersoll
All valid points by the many who have responded.  Thanks!

When I woke up this morning, I thought maybe we should postpone until 0.3 is 
out, so it is good to see this expressed here as well.  

As for concerns about overhead, infra@ will take care of most of the heavy 
lifting (new mailing lists, migrating everyone over to the new ones).  We would 
need to move our website and put up a redirect, but that is trivial.  We'd also 
have to move our SVN, but that is trivial as well.  At the PMC level, the ASF 
seems to vary quite a bit here, AFAICT.  Lucene is pretty low key and very low 
volume and the subprojects pretty much run themselves.  I would suspect that 
Mahout would be the same given our roots.

Another thought is that we time it w/ a 1.0 release and come in with a big bang 
including press releases, etc.  On the other hand, if we do it sooner (after 
0.3), we can do two press releases, one for the move and one for the 1.0 
release.  This would give more exposure overall.

Finally, it's not clear the ASF likes lots of subprojects, so we'd need to be 
careful there.  Either that or we just have all committers be committers across 
all the subs.  Then again, it probably isn't a huge deal.  Lucene and Hadoop 
are the two primary examples of projects w/ subs and they are both well run, 
successful projects.

In the end, I still am +1, but think it makes sense to wait until after 0.3.  
Besides, since the next board meeting is Wednesday, this will give us more time 
to think about it.

-Grant

On Feb 13, 2010, at 3:55 AM, Kay Kay wrote:

> As a lurker around in this community and an active user myself, expressing 
> mine for whatever it is worth.
> 
> I am happy with the decoupling of ML from Search, with the former warranting 
> a separate attention to itself. So, +1 on this happening eventually to be 
> more independent, but my reservation has to do with the timing of it and 
> specifically the versioning of it, and how close would a 1.0 release be 
> feasible once this becomes a TLP.
> 
> 
> 
> On 02/12/2010 02:44 PM, Grant Ingersoll wrote:
>> As many of you know, Mahout has been growing pretty quickly and has also 
>> reached a critical mass.  I, along with some others in the Mahout community, 
>> feel it would make sense for Mahout to become a TLP  With this in mind, I've 
>> submitted a proposal to the Lucene PMC to ask the board to make Mahout an 
>> Apache TLP.  One of the feedbacks from the PMC was question as to whether 
>> this has been discussed in the community and whether the community is for 
>> it.  I know it's been brought up tangentially in the past (see [1], [2], 
>> [3]) and there wasn't any disagreement, but it seems it warrants a more 
>> formal discussion.
>> 
>> I see the following pros:
>> 1.  We'd like to organize several subprojects we wish to introduce (Core, 
>> NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as 
>> Lucene subprojects.
>> 2.  I also think longer term that while Machine Learning and Search are 
>> often related, they are not required of each other and that Mahout would be 
>> better aligned with a more narrow focus of Machine Learning only.
>> 3. The PMC can be more narrowly focused on Mahout and it's needs and will be 
>> better informed of Mahout's contributors, etc.
>> 
>> Cons:
>> 1. Lucene has a very strong brand and I have no doubt that Mahout benefits 
>> from that association
>> 2. Changing mailing lists, etc. is a bit of a hassle (mostly for 
>> infrastructure), but not that big of a deal.  Still, Lucene is well 
>> established and well-run, so sometimes inertia is a good thing.
>> 
>> At the end of the day, I'm +1.
>> 
>> 
>> [1] 
>> http://search.lucidimagination.com/search/document/a6e03af2952ff196/possible_contribution_at_somewhat_of_a_tangent_to_mahout#5a41be454d503779
>> 
>> [2] 
>> http://search.lucidimagination.com/search/document/40c4c4ec11ca07b5/mi_clustering#7197ef846b384e4e
>> 
>> [3] 
>> http://search.lucidimagination.com/search/document/1817a5e65c83bae3/proposing_a_c_port_for_apache_mahout#8e4e8eabc945264d
>>   
> 



Re: Mahout as TLP

2010-02-13 Thread Ted Dunning
+1 to waiting.

On Sat, Feb 13, 2010 at 4:45 AM, Grant Ingersoll wrote:

> In the end, I still am +1, but think it makes sense to wait until after
> 0.3.  Besides, since the next board meeting is Wednesday, this will give us
> more time to think about it.
>



-- 
Ted Dunning, CTO
DeepDyve


Re: Mahout as TLP

2010-02-13 Thread Drew Farris
I can't say that I really understand the issues (if there are any) of
the Mahout project running under Lucene's PMC vs. a Mahout PMC, but it
sounds like that would be a big factor in deciding whether the project
should be migrated to its own TLP, eg: if Mahout discussions took up a
significant portion of the Lucene PMC meetings for example.

It doesn't sounds like much of anything else would change from the
perspective of a user or new contributor like myself. From my vantage
point I don't see much impetus to change at this very moment. If we
end up getting something that looks like a solid start on a C++ port,
it might be worth revisiting, but even then that could start off as a
submodule.

http://mahout.apache.org would be easier to remember :)

At the end of the day, I think I'm +1 for waiting as well.

Drew

On Sat, Feb 13, 2010 at 1:36 PM, Ted Dunning  wrote:
> +1 to waiting.
>
> On Sat, Feb 13, 2010 at 4:45 AM, Grant Ingersoll wrote:
>
>> In the end, I still am +1, but think it makes sense to wait until after
>> 0.3.  Besides, since the next board meeting is Wednesday, this will give us
>> more time to think about it.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>


Re: Mahout as TLP

2010-02-13 Thread Benson Margulies
The ongoing admin is really no big deal. The PMC has to report to the
board once a month. As Grant noted, the initial work is mostly a gift
from infra.

I don't see any harm in getting 0.3 out first if that makes folks more
comfortable.



On Sat, Feb 13, 2010 at 2:42 PM, Drew Farris  wrote:
> I can't say that I really understand the issues (if there are any) of
> the Mahout project running under Lucene's PMC vs. a Mahout PMC, but it
> sounds like that would be a big factor in deciding whether the project
> should be migrated to its own TLP, eg: if Mahout discussions took up a
> significant portion of the Lucene PMC meetings for example.
>
> It doesn't sounds like much of anything else would change from the
> perspective of a user or new contributor like myself. From my vantage
> point I don't see much impetus to change at this very moment. If we
> end up getting something that looks like a solid start on a C++ port,
> it might be worth revisiting, but even then that could start off as a
> submodule.
>
> http://mahout.apache.org would be easier to remember :)
>
> At the end of the day, I think I'm +1 for waiting as well.
>
> Drew
>
> On Sat, Feb 13, 2010 at 1:36 PM, Ted Dunning  wrote:
>> +1 to waiting.
>>
>> On Sat, Feb 13, 2010 at 4:45 AM, Grant Ingersoll wrote:
>>
>>> In the end, I still am +1, but think it makes sense to wait until after
>>> 0.3.  Besides, since the next board meeting is Wednesday, this will give us
>>> more time to think about it.
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>


Re: Mahout as TLP

2010-02-13 Thread Grant Ingersoll

On Feb 13, 2010, at 3:20 PM, Benson Margulies wrote:

> The ongoing admin is really no big deal. The PMC has to report to the
> board once a month.

Once a quarter normally.

> As Grant noted, the initial work is mostly a gift
> from infra.
> 
> I don't see any harm in getting 0.3 out first if that makes folks more
> comfortable.

Yeah, this feels better to me the more I think about it.


Re: Mahout as TLP

2010-02-15 Thread Isabel Drost
On Sat Grant Ingersoll  wrote:
> > I don't see any harm in getting 0.3 out first if that makes folks
> > more comfortable.
> 
> Yeah, this feels better to me the more I think about it.

+1 from me as well: I really like the idea of Mahout becoming a TLP -
even before a 1.0 release is available.

However I think it makes sense to sort out the 0.3 release first. If I
am counting correctly, that would make for three reasons for press
releases: A new release, Mahout becoming a TLP and later on a 1.0
release. ;)

Isabel


Re: Mahout as TLP

2010-02-15 Thread Robin Anil
+1


Re: Mahout as TLP

2010-02-15 Thread Jeff Eastman

+1 on Isabel's comments.


Isabel Drost wrote:

On Sat Grant Ingersoll  wrote:
  

I don't see any harm in getting 0.3 out first if that makes folks
more comfortable.
  

Yeah, this feels better to me the more I think about it.



+1 from me as well: I really like the idea of Mahout becoming a TLP -
even before a 1.0 release is available.

However I think it makes sense to sort out the 0.3 release first. If I
am counting correctly, that would make for three reasons for press
releases: A new release, Mahout becoming a TLP and later on a 1.0
release. ;)

Isabel