Re: Mahout's future

Saikat Kanjilal Sat, 05 Oct 2013 17:09:58 -0700

Ted,
Good list of details, I personally have signed up a few months ago to do some 
work on ALS with Sebastien's guidance but have been extremely thinly stretched 
at work to start the planning or the work, plan on starting in a week or so.  I 
personally would like to work on a and b and am actively needing these for work 
related efforts.   Does it make sense to have a quick meeting of interested 
developers over google chat/conference rather than email to discuss and assign 
folks to specifics?


Thoughts?

Sent from my iPhone

On Oct 5, 2013, at 1:11 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> I was asked to answer an anonymous question about the future of Mahout on
> Quora and thought I should share the answer here as well.
> 
> That really depends on where the community of users wants to take Mahout.
> 
> Some possibilities include:
> 
> a) better classifiers.  Mahout's capabilities in this respect include Naive
> Bayes, Random Forest and logistic regression trained via single threaded
> stochastic gradient descent (SGD).  It would be good to have a high quality
> parallel implementation of SGD and it would be good to have some kind of
> deep learning as well.  The random forest could also use some work.
> 
> b) faster horses.  I think that the sparse matrices can be made
> significantly faster even considering the cost-based optimizer versions
> that we already have.  The addition of JBLAS support for dense matrices
> would also be interesting.
> 
> c) better API interfaces.  The clustering interfaces are a bit of a
> shambles in spite of the cool capabilities available with streaming k-means
> and friends.
> 
> d) better human interfaces.  It would be great to have products like
> Dataiku drive Mahout capabilities.  Dataiku does a really great job of the
> cleansing end of machine learning and Mahout really has not much in that
> area.  It would also be nice to move forward with Dmitriy Lyubimov's work
> on Scala bindings for Mahout.
> 
> e) bigger community.  There are some closely related communities like the
> folks working on Spark with MLI.  More cross fertilization would be very
> cool.
> 
> f) more data.  Getting sample data for testing is very hard.  Getting data
> at scale is exceedingly hard.  If people could suggest a good, big and
> freely available dataset, that would be awesome.
> 
> None of these possibilities matter, however, if somebody doesn't do them.
> So the question to each reader of this answer is "What would you like to
> see and how can you help make that happen"?

Re: Mahout's future

Reply via email to