My own take is quite similar but a little different. It would be great of Mahout components are *usable* from spark streaming. For instance, model evaluation or on-line clustering both might well fit here. Likewise all of our sequential math stuff could be useful.
Highly Spark specific stuff really does belong in MLib, however. Even they are liable to reject stuff that depends on spark streaming. Not my decision, however. Finally, I don't view our mission as limited to the DSL work. We should also accept/generate useful applications of the DSL. On Wed, Jun 18, 2014 at 2:09 PM, Dmitriy Lyubimov <[email protected]> wrote: > Let me try to re-word a little. > > Contributions we are accepting should have common parts with Mahout (let's > not focus on whether they use or do not use Spark streaming). > > Does this sound more acceptable? > > > On Wed, Jun 18, 2014 at 2:04 PM, Pat Ferrel <[email protected]> wrote: > > > OK, fair enough, for that matter no activity was grounds enough. But that > > wasn’t really the question I asked and your answer below was not given in > > the Jira, so... > > > > Are you suggesting that my questions can be read as statements, in the > > name of “narrow(ing) our focus”? > > > > > > > > On Jun 18, 2014, at 1:37 PM, Sebastian Schelter <[email protected]> wrote: > > > > I think rejecting that contribution is the right thing to do. I think its > > very important to narrow our focus. Let us put our efforts into finishing > > and polishing what we are working on right now. > > > > A big problem of the "old" mahout was that we set the barrier for > > contributions too low and ended up with lots of non-integrated, > hard-to-use > > algorithms of varying quality. > > > > What is the problem with not accepting a contribution? We agreed with > Andy > > that this might be better suited for inclusion in Spark's codebase and I > > think that was the right decision. > > > > -s > > > > On 06/18/2014 10:29 PM, Pat Ferrel wrote: > > > Taken from: Re: [jira] [Resolved] (MAHOUT-1153) Implement streaming > > random forests > > > > > >> Also, we don't have any mappings for Spark Streaming -- so if your > > >> implementation heavily relies on Spark streaming, i think Spark itself > > is > > >> the right place for it to be a part of. > > > > > > We are discouraging engine specific work? Even dismissing Spark > > Streaming as a whole? > > > > > >> As it stands we don't have purely (c) methods and indeed i believe > these > > >> methods may be totally engine-specific in which case mllib is one of > > >> possibly good homes for them. > > > > > > Adherence to a specific incarnation of an engine-neutral DSL has become > > a requirement for inclusion in Mahout? The current DSL cannot be > extended? > > Or it can’t be extended with engine specific ways? Or it can’t be > extended > > with Spark Streaming? I would have thought all of these things desirable > > otherwise we are limiting ourselves to a subset of what an engine can do > or > > a subset of problems that the current DSL supports. > > > > > > I hope I’m misreading this but it looks like we just discourage a > > contributor from adding post hadoop code in an interesting area to > Mahout? > > > > > > > > > >
