I think rejecting that contribution is the right thing to do. I think
its very important to narrow our focus. Let us put our efforts into
finishing and polishing what we are working on right now.
A big problem of the "old" mahout was that we set the barrier for
contributions too low and ended up with lots of non-integrated,
hard-to-use algorithms of varying quality.
What is the problem with not accepting a contribution? We agreed with
Andy that this might be better suited for inclusion in Spark's codebase
and I think that was the right decision.
-s
On 06/18/2014 10:29 PM, Pat Ferrel wrote:
Taken from: Re: [jira] [Resolved] (MAHOUT-1153) Implement streaming random
forests
Also, we don't have any mappings for Spark Streaming -- so if your
implementation heavily relies on Spark streaming, i think Spark itself is
the right place for it to be a part of.
We are discouraging engine specific work? Even dismissing Spark Streaming as a
whole?
As it stands we don't have purely (c) methods and indeed i believe these
methods may be totally engine-specific in which case mllib is one of
possibly good homes for them.
Adherence to a specific incarnation of an engine-neutral DSL has become a
requirement for inclusion in Mahout? The current DSL cannot be extended? Or it
can’t be extended with engine specific ways? Or it can’t be extended with Spark
Streaming? I would have thought all of these things desirable otherwise we are
limiting ourselves to a subset of what an engine can do or a subset of problems
that the current DSL supports.
I hope I’m misreading this but it looks like we just discourage a contributor
from adding post hadoop code in an interesting area to Mahout?