Re: Mahout 1.0 goals

Suneel Marthi Mon, 03 Mar 2014 10:38:27 -0800

To get things moving for 1.0:

a) Address the 4 issues that Sean had raised - we have already started looking 
at Backlog and closing them, started looking at converting old MapReduce to 
newer MapReduce API.

   If someone could start looking at standardizing the input/output formats 
across classifiers, clustering and recommenders that would be great.  Guess 
Frank S. has already started work in that direction.

b)  Need a better and cleaner serialized form of Vectors to handle names and 
other kind'a stuff, this is gonna impact everything that's presently 
implemented.

c)  Agree with ssc, to start looking at Spark-Mahout integration. 

d) Need volunteers to QA/address issues with the present classifiers/clustering 
algorithms. I personally can vouch for how disastrous it is to deploy any of 
Mahout's classifiers/clustering implementations in an Operations environment. A 
good example of that is Sean's recent patch for RDF.

Naive Bayes code as it is now seems half-baked and is incomplete. Not every 
code path has been tested on Streaming KMeans.

This should go some way in addressing the technical debt that's been piled over 
the years.  

On Monday, March 3, 2014 1:05 PM, Sebastian Schelter <[email protected]> wrote:

I would like to discuss whether we should start to have some 
Spark-related code in Mahout.

--sebastian

On 03/03/2014 06:56 PM, Suneel Marthi wrote:
> Grant had setup a Google Hangout for Mahout sometime last year before 0.8 
> release.  I had one setup too for 0.9 release. I definitely wouldn't want to 
> have a hangout on Saturday or weekend.
>
>
>
>
>
> On Monday, March 3, 2014 12:52 PM, Ted Dunning <[email protected]> wrote:
>
> Happy to organize a google hangout.  That has the advantage of allowing more 
> attendees and supporting YouTube archiving.
>
> Sent from my iPhone
>
>
>> On Mar 3, 2014, at 9:34, Giorgio Zoppi <[email protected]> wrote:
>>
>> Hello All,
>> Dr.Dunning could you set a meeting next Sat morning, so we can chat and
>> discuss by skype improvements and what to do and indentify volunteer and
>> tasks.
>> Best Regards,
>> Giorgio
>>
>>
>> 2014-03-03 18:30 GMT+01:00 peng <[email protected]>:
>>
>>> Me three
>>>
>>>
>>>> On Sun 02 Mar 2014 11:45:33 AM EST, Ted Dunning wrote:
>>>>
>>>> Ravi,
>>>>
>>>> Good points.
>>>>
>>>> On Sun, Mar 2, 2014 at 12:38 AM, Ravi Mummulla <[email protected]>
>>>> wrote:
>>>>
>>>> - Natively support Windows (guidance, etc. No documentation exists today,
>>>>> for instance)
>>>> There is a bit of demand for that.
>>>>
>>>> - Faster time to first application (from discovery to first application
>>>>
>>>>> currently takes a non-trivial amount of effort; how can we lower the bar
>>>>> and reduce the friction for adoption?)
>>>> There is huge evidence that this is important.
>>>>
>>>>
>>>>     - Better documenting use cases with working samples/examples
>>>>> (Documentation
>>>>> on https://mahout.apache.org/users/basics/algorithms.html is spread out
>>>>> and
>>>>> there is too much focus on algorithms as opposed to use cases - this is
>>>>> an
>>>>> adoption
 blocker)
>>>> This is also important.
>>>>
>>>>
>>>> - Uniformity of the API set across all algorithms (are we providing the
>>>>> same experience across all APIs?)
>>>> And many people have been tripped up by this.
>>>>
>>>>
>>>>     - Measuring/publishing scalability metrics of various algorithms (why
>>>>> would
>>>>> we want users to adopt Mahout vs. other frameworks for ML at scale?)
>>>> I don't see this as important as some of your other points, but is still
>>>> useful.
>>
>>
>> --
>> Quiero ser el rayo de sol que cada día te despierta
>> para hacerte respirar y vivir en me.
>> "Favola -Moda".

Re: Mahout 1.0 goals

Reply via email to