Re: Call to action – Mahout needs your help

Sebastian Schelter Mon, 25 Mar 2013 01:10:40 -0700

Hi,

throwing in my 2 cents here:

I think that you mentioned a very good point with stating that it is not
clear whether Mahout is a library, a standalone program to interact with
via the command line. IMO, its first and foremost a library (similar to
Lucene), and this should also be reflected in the codebase.

I don't agree that we simply lack manpower but have a clear vision. I
actually think its the other way round. I think Mahout is kind of stuck,
because it does not have a clear vision. I think we faced and still face
very hard challenges, as we have to provide answers for the following
questions:

* for which problems and algorithms does it really make sense to use
MapReduce?

* how broad can the spectrum of things that we offer be without a
decline in quality?

* how do we deal with the fact that our codebase is split up into a
collection of algorithms with very few people being able to work on all
of them, due to the required theoretical background and the complexity
of efficient code

* how do we provide solutions that allow users to scale very fine
grained, e.g. from online to precomputed on a single machine to
precomputed via Hadoop in the recommender stuff.

I think that Mahout is and should always be more than recommenders, but
that we should be more courageous in throwing out things that are not
used very much or not maintained very much or don't meet the quality
standards which we would like to see.

It is also my personal experience (= I heard it over and over again from
our users) that it is extremely hard to get started with Mahout using
the available documentation. MiA is the exception to this, but people
have to buy it first and it lacks a lot of the latest developments. It
would be awesome to have a reworked wiki that is qualitatively
comparable to MiA.

Best,
Sebastian

On 25.03.2013 07:29, Isabel Drost-Fromm wrote:
> 
> 
> On Monday, March 25, 2013 07:22:46 AM Isabel Drost-Fromm wrote:
>> On Sunday, March 24, 2013 05:38:00 PM Grant Ingersoll wrote:
>>> On Mar 24, 2013, at 5:03 PM, Isabel Drost-Fromm wrote:
>>>> What about an experiment: If you (reading this mail) were to write a two
>>>> sentence vision statement for Mahout as you see it - what would that be?
>>>
>>> Produce open source, scalable machine learning code using a community
>>> development model.
>>
>> So taking that apart:
>>
>> - Hadoop is not necessarily part of the equation. All that we promise are
>> implemenations that are reasonably scalable.
> 
> - We play well with small-ish (fits in memory) and large (fits only in memory 
> of 
> many machines) or huge (fits only on disk) datasets.
>  
>> - There is no restriction in there wrt. supporting only specific use cases -
>> in particular no restriction to be recommendations only.
>>
>> - There is no restriction to "only batch" or "only online" learning.
>>
>> If we want to be that broad we definitely lack lots of people, I think.
>>
>> The other question that I cannot answer today: Do we want to be a Java
>> Library that people link with their project, a standalone program that
>> people interact with via the command line, a basis that people can easily
>> integrate into their Pig/Hive/Cascalog/Scalding/Cascading/what-ever-else
>> workflows or all of these?
> 
>

Re: Call to action – Mahout needs your help

Reply via email to