I personally am caught between a desire on one hand to be inclusive of
everything, and a desire on the other hand to not make the project a
collection of bits and bobs from all over, with some algorithms
existing in C++, others in Java, some distributed, some not, some
supported, some a one-time dump, etc. It really harms end users
ability to place what Mahout 'is' and how much to expect of it. Either
people will be surprised that some new scratch code isn't bug-free,
or, will assume that the mature bits of the code are probably just
very rough too when they may not be.

The latter wins out in my mind, in this case --  it 'feels' like a
different project at this point.

Let me however revive my suggestion that Mahout include a 'sandbox'
module of sorts to host anything at all. This neatly allows for
incorporation of anything, in any state, without confusing users as to
what should be expected of Mahout 'proper', which should be a
reasonably high bar come version 1.0.

On Sat, Oct 3, 2009 at 5:17 PM, Benson Margulies <[email protected]> wrote:
> Folks,
>
> I may be in a position to contribute a very slick implementation of the
> Brown, dePietro, etc. bigram mutual information word clustering scheme
> sometime soon. It is written in C++, and if there's any map-reduce, its via
> OpenMP, not hadoop :-).
>
> As an ASF member, if I'm facilitating getting something useful out as open
> source, I'd rather push it out at Apache.
>
> Any interest in stretching the Mahout tent out to accomodate it?
>
> I'm asking now because I'm starting a negotiation with the academic owner
> thereof, and it would be useful to know in advance if I have a tentative
> home for it at Apache as opposed to having to just dump it into SourceForge.
>
> You could take the attitude that it's part of Mahout as a challenge: can
> anyone out there come up with a practical variation in Java/Hadoop?
>
> --benson
>

Reply via email to