Couple of thoughts, some slightly bigger than this specific topic:

1. I'm not against C++, but it hasn't attracted a lot of attention just yet here in Mahout, either. One thought is we could port it, given a donated implementation as a reference. We can put it in a sandbox as Sean suggested.

2. I've always envisioned Mahout as a TLP. For instance, I've talked with the OpenNLP maintainers about donating it (and they seem amenable, just need to find time) along with the Maxent implementation. Under this vision, Mahout is a TLP chartered to provide machine learning implementations and has multiple subprojects. I could certainly see a subproject for C++ implementations. For instance, we could have:
1. Core Java (common utilities, algorithms, etc.)
2. Core C/C++ (ditto)
3. OpenNLP (builds on Core Java, since OpenNLP's Maxent impl. would go to core) - machine learning targeted specifically towards text - Utilities for text processing currently in utils likely move here so that the core can remain agnostic of input
4. Taste/Recommendations  - all things collab filtering/recommendations
5. Other verticals that require core, scalable ML libraries

#2 is a longer term vision, and we are not there yet, but I think it builds a nice tent, addresses Sean's concerns (I believe) about Mahout being one big monolithic library with a lack of focus and rounds out as a nice set of libraries that help real people solve real problems.

-Grant

On Oct 3, 2009, at 1:17 PM, Benson Margulies wrote:

Folks,

I may be in a position to contribute a very slick implementation of the
Brown, dePietro, etc. bigram mutual information word clustering scheme
sometime soon. It is written in C++, and if there's any map-reduce, its via
OpenMP, not hadoop :-).

As an ASF member, if I'm facilitating getting something useful out as open
source, I'd rather push it out at Apache.

Any interest in stretching the Mahout tent out to accomodate it?

I'm asking now because I'm starting a negotiation with the academic owner thereof, and it would be useful to know in advance if I have a tentative home for it at Apache as opposed to having to just dump it into SourceForge.

You could take the attitude that it's part of Mahout as a challenge: can
anyone out there come up with a practical variation in Java/Hadoop?

--benson


Reply via email to