Re: Is Mahout obsolete now?

2015-10-19 Thread Fei Shan
Spark is a in memory , near realtime Machine Learning frameowork , has
scala and java interface
Mahout is an offline Machine Learning framework, no scala apis

they both built on the HDFS and Hadoop engine

Spark has an ecosystem like Hadoop
Mahout is part of of Hadoop ecosystem

Spark could beat Mahout on processing speed
and concise programming APIs

for online data anaysis , Spark is a better choice.
for offline data analysis, both fits well.



On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
bpp...@gmail.com> wrote:

> Hi,
>
> If I have used Mahout for my recommendation application, should I migrate
> into Spark MLib technology? Is the mahout still supported and migrated?
>
> Thanks
>
> *Prasad Priyadarshana Fernando  >*
> Mobile: +1 330 283 5827
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
This is so inaccurate and not true. You obviously have not been following
Mahout project. Mahout has long moved away from MapReduce and presently
support Spark, H2O and in future Flink as execution engines.

I would suggest you look at the recent Mahout 0.11.0 and see where the
project is before we delve into a comparison of Mahout vs Spark.


On Mon, Oct 19, 2015 at 3:38 PM, Fei Shan  wrote:

> Spark is a in memory , near realtime Machine Learning frameowork , has
> scala and java interface
> Mahout is an offline Machine Learning framework, no scala apis
>
> they both built on the HDFS and Hadoop engine
>
> Spark has an ecosystem like Hadoop
> Mahout is part of of Hadoop ecosystem
>
> Spark could beat Mahout on processing speed
> and concise programming APIs
>
> for online data anaysis , Spark is a better choice.
> for offline data analysis, both fits well.
>
>
>
> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> bpp...@gmail.com> wrote:
>
> > Hi,
> >
> > If I have used Mahout for my recommendation application, should I migrate
> > into Spark MLib technology? Is the mahout still supported and migrated?
> >
> > Thanks
> >
> > *Prasad Priyadarshana Fernando <
> http://www.linkedin.com/in/prasadfernando
> > >*
> > Mobile: +1 330 283 5827
> >
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Sean Owen
No, this is pretty wrong. Spark is not, in general, a real-time
anything. Spark Streaming is a near-real-time streaming framework, but
it is not something you can build models with. Spark MLlib / ML are
offline / batch. Not sure what you mean by Hadoop engine, but Spark
does not build on MapReduce, if that's what you mean.

The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
Mahout is not. It has a fairly different new recommender system called
Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
somehow talking about the "classic" Mahout code here only.

On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan  wrote:
> Spark is a in memory , near realtime Machine Learning frameowork , has
> scala and java interface
> Mahout is an offline Machine Learning framework, no scala apis
>
> they both built on the HDFS and Hadoop engine
>
> Spark has an ecosystem like Hadoop
> Mahout is part of of Hadoop ecosystem
>
> Spark could beat Mahout on processing speed
> and concise programming APIs
>
> for online data anaysis , Spark is a better choice.
> for offline data analysis, both fits well.
>
>
>
> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> bpp...@gmail.com> wrote:
>
>> Hi,
>>
>> If I have used Mahout for my recommendation application, should I migrate
>> into Spark MLib technology? Is the mahout still supported and migrated?
>>
>> Thanks
>>
>> *Prasad Priyadarshana Fernando > >*
>> Mobile: +1 330 283 5827
>>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
Thanks Sean.
Samsara is the new distributed linear algebra DSL that is engine agnostic
and presently support Spark and H2O (Flink is in the works).

We do have Recommenders built on top of Samsara today.

On Mon, Oct 19, 2015 at 3:42 PM, Sean Owen  wrote:

> No, this is pretty wrong. Spark is not, in general, a real-time
> anything. Spark Streaming is a near-real-time streaming framework, but
> it is not something you can build models with. Spark MLlib / ML are
> offline / batch. Not sure what you mean by Hadoop engine, but Spark
> does not build on MapReduce, if that's what you mean.
>
> The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
> Mahout is not. It has a fairly different new recommender system called
> Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
> somehow talking about the "classic" Mahout code here only.
>
> On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan 
> wrote:
> > Spark is a in memory , near realtime Machine Learning frameowork , has
> > scala and java interface
> > Mahout is an offline Machine Learning framework, no scala apis
> >
> > they both built on the HDFS and Hadoop engine
> >
> > Spark has an ecosystem like Hadoop
> > Mahout is part of of Hadoop ecosystem
> >
> > Spark could beat Mahout on processing speed
> > and concise programming APIs
> >
> > for online data anaysis , Spark is a better choice.
> > for offline data analysis, both fits well.
> >
> >
> >
> > On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> > bpp...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> If I have used Mahout for my recommendation application, should I
> migrate
> >> into Spark MLib technology? Is the mahout still supported and migrated?
> >>
> >> Thanks
> >>
> >> *Prasad Priyadarshana Fernando <
> http://www.linkedin.com/in/prasadfernando
> >> >*
> >> Mobile: +1 330 283 5827
> >>
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Suneel Marthi
Hi Prasad,

As Sean has explained in an earlier posting on this thread, Mahout 0.9 and
earlier which were MapReduce based are not supported anymore.

We do have recommenders in Mahout 0.11.0 that have been built on the new
Samsara Math DSL.

Definitely would suggest that you check out the latest Mahout 0.11.0
release.

On Mon, Oct 19, 2015 at 3:14 PM, Prasad Priyadarshana Fernando <
bpp...@gmail.com> wrote:

> Hi,
>
> If I have used Mahout for my recommendation application, should I migrate
> into Spark MLib technology? Is the mahout still supported and migrated?
>
> Thanks
>
> *Prasad Priyadarshana Fernando  >*
> Mobile: +1 330 283 5827
>


Re: Is Mahout obsolete now?

2015-10-19 Thread Ankit Goel
Hi,
I had asked something similar in the past, and I got a reply which helped me 
make my decision. Mahout started out as a ML library, but the direction it is 
taking now is to allow people to make their own math. So its becoming more of a 
platform that supports Spark library among others, and you can even create your 
own. 

> On 19-Oct-2015, at 7:11 pm, Suneel Marthi  wrote:
> 
> This is so inaccurate and not true. You obviously have not been following
> Mahout project. Mahout has long moved away from MapReduce and presently
> support Spark, H2O and in future Flink as execution engines.
> 
> I would suggest you look at the recent Mahout 0.11.0 and see where the
> project is before we delve into a comparison of Mahout vs Spark.
> 
> 
> On Mon, Oct 19, 2015 at 3:38 PM, Fei Shan  wrote:
> 
>> Spark is a in memory , near realtime Machine Learning frameowork , has
>> scala and java interface
>> Mahout is an offline Machine Learning framework, no scala apis
>> 
>> they both built on the HDFS and Hadoop engine
>> 
>> Spark has an ecosystem like Hadoop
>> Mahout is part of of Hadoop ecosystem
>> 
>> Spark could beat Mahout on processing speed
>> and concise programming APIs
>> 
>> for online data anaysis , Spark is a better choice.
>> for offline data analysis, both fits well.
>> 
>> 
>> 
>> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
>> bpp...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> If I have used Mahout for my recommendation application, should I migrate
>>> into Spark MLib technology? Is the mahout still supported and migrated?
>>> 
>>> Thanks
>>> 
>>> *Prasad Priyadarshana Fernando <
>> http://www.linkedin.com/in/prasadfernando
 *
>>> Mobile: +1 330 283 5827
>>> 
>> 



Re: Is Mahout obsolete now?

2015-10-19 Thread Pat Ferrel
BTW this use of Mahout-Samsara on Spark for recs has really expanded. The 
Samsara part I’m calling a Correlation Engine, it can be used to mix usage, 
content, and context to make recs. I look back on 2 years ago as pretty much 
groping around for solutions. Things are much clearer now (for me at least)

Check out some slides about the math, leading to the “whole enchilada” 
equation. Ted Dunning, Sean Owen, and Sebastian Schelter get no small credit.
http://www.slideshare.net/pferrel/unified-recommender-39986309

Even have code running using the PredicitonIO framework. This includesa SDK to 
event store to realtime query. Loosely speaking a lambda architecture. Most of 
the whole enchilada running except the content part of the equation, which only 
works on metadata for how.
https://github.com/pferrel/scala-parallel-universal-recommendation

We even do custom versions at actionML.com


On Oct 19, 2015, at 6:42 AM, Sean Owen  wrote:

No, this is pretty wrong. Spark is not, in general, a real-time
anything. Spark Streaming is a near-real-time streaming framework, but
it is not something you can build models with. Spark MLlib / ML are
offline / batch. Not sure what you mean by Hadoop engine, but Spark
does not build on MapReduce, if that's what you mean.

The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
Mahout is not. It has a fairly different new recommender system called
Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
somehow talking about the "classic" Mahout code here only.

On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan  wrote:
> Spark is a in memory , near realtime Machine Learning frameowork , has
> scala and java interface
> Mahout is an offline Machine Learning framework, no scala apis
> 
> they both built on the HDFS and Hadoop engine
> 
> Spark has an ecosystem like Hadoop
> Mahout is part of of Hadoop ecosystem
> 
> Spark could beat Mahout on processing speed
> and concise programming APIs
> 
> for online data anaysis , Spark is a better choice.
> for offline data analysis, both fits well.
> 
> 
> 
> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> bpp...@gmail.com> wrote:
> 
>> Hi,
>> 
>> If I have used Mahout for my recommendation application, should I migrate
>> into Spark MLib technology? Is the mahout still supported and migrated?
>> 
>> Thanks
>> 
>> *Prasad Priyadarshana Fernando >> *
>> Mobile: +1 330 283 5827
>> 



Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel  wrote:

> Even have code running using the PredicitonIO framework. This includesa
> SDK to event store to realtime query. Loosely speaking a lambda
> architecture. Most of the whole enchilada running except the content part
> of the equation, which only works on metadata for how.
> https://github.com/pferrel/scala-parallel-universal-recommendation
>
> We even do custom versions at actionML.com
>

oh, nice!


Re: Is Mahout obsolete now?

2015-10-19 Thread go canal
I am now curious to know if Spark or Mahout is Out-Of-Core or In-Core solution 
? Specifically, for matrix multiplication and factorization. thanks, canal 


 On Tuesday, October 20, 2015 6:37 AM, Dmitriy Lyubimov  
wrote:
   

 On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel  wrote:

> Even have code running using the PredicitonIO framework. This includesa
> SDK to event store to realtime query. Loosely speaking a lambda
> architecture. Most of the whole enchilada running except the content part
> of the equation, which only works on metadata for how.
> https://github.com/pferrel/scala-parallel-universal-recommendation
>
> We even do custom versions at actionML.com
>

oh, nice!


  

Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
Both.

as before though with mapreduce, distributed algorithms are not just naive
translations of traditional in-memory algorithms.

On Mon, Oct 19, 2015 at 8:46 PM, go canal  wrote:

> I am now curious to know if Spark or Mahout is Out-Of-Core or In-Core
> solution ? Specifically, for matrix multiplication and
> factorization. thanks, canal
>
>
>  On Tuesday, October 20, 2015 6:37 AM, Dmitriy Lyubimov <
> dlie...@gmail.com> wrote:
>
>
>  On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel 
> wrote:
>
> > Even have code running using the PredicitonIO framework. This includesa
> > SDK to event store to realtime query. Loosely speaking a lambda
> > architecture. Most of the whole enchilada running except the content part
> > of the equation, which only works on metadata for how.
> > https://github.com/pferrel/scala-parallel-universal-recommendation
> >
> > We even do custom versions at actionML.com
> >
>
> oh, nice!
>
>
>
>


Re: Is Mahout obsolete now?

2015-10-20 Thread Pavan K Narayanan
Perhaps this page  needs
to be updated with algorithms and features of 0.11.0?

On 19 October 2015 at 18:29, Pat Ferrel  wrote:

> BTW this use of Mahout-Samsara on Spark for recs has really expanded. The
> Samsara part I’m calling a Correlation Engine, it can be used to mix usage,
> content, and context to make recs. I look back on 2 years ago as pretty
> much groping around for solutions. Things are much clearer now (for me at
> least)
>
> Check out some slides about the math, leading to the “whole enchilada”
> equation. Ted Dunning, Sean Owen, and Sebastian Schelter get no small
> credit.
> http://www.slideshare.net/pferrel/unified-recommender-39986309
>
> Even have code running using the PredicitonIO framework. This includesa
> SDK to event store to realtime query. Loosely speaking a lambda
> architecture. Most of the whole enchilada running except the content part
> of the equation, which only works on metadata for how.
> https://github.com/pferrel/scala-parallel-universal-recommendation
>
> We even do custom versions at actionML.com
>
>
> On Oct 19, 2015, at 6:42 AM, Sean Owen  wrote:
>
> No, this is pretty wrong. Spark is not, in general, a real-time
> anything. Spark Streaming is a near-real-time streaming framework, but
> it is not something you can build models with. Spark MLlib / ML are
> offline / batch. Not sure what you mean by Hadoop engine, but Spark
> does not build on MapReduce, if that's what you mean.
>
> The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
> Mahout is not. It has a fairly different new recommender system called
> Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
> somehow talking about the "classic" Mahout code here only.
>
> On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan 
> wrote:
> > Spark is a in memory , near realtime Machine Learning frameowork , has
> > scala and java interface
> > Mahout is an offline Machine Learning framework, no scala apis
> >
> > they both built on the HDFS and Hadoop engine
> >
> > Spark has an ecosystem like Hadoop
> > Mahout is part of of Hadoop ecosystem
> >
> > Spark could beat Mahout on processing speed
> > and concise programming APIs
> >
> > for online data anaysis , Spark is a better choice.
> > for offline data analysis, both fits well.
> >
> >
> >
> > On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> > bpp...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> If I have used Mahout for my recommendation application, should I
> migrate
> >> into Spark MLib technology? Is the mahout still supported and migrated?
> >>
> >> Thanks
> >>
> >> *Prasad Priyadarshana Fernando <
> http://www.linkedin.com/in/prasadfernando
> >>> *
> >> Mobile: +1 330 283 5827
> >>
>
>


Re: Is Mahout obsolete now?

2015-10-20 Thread Dmitriy Lyubimov
Pavan, I guess part of the documentation difficulty is in that Mahout
Samsara environment is only used for "training" but external components are
used for "scoring". So it is not 100% end-to-end Mahout solution to
document.

Pat, it would be nice though to put some of your docs on to Mahout site
though, what you think? They will be bound by Apache ICLA though after that
(meaning anyone can cut-and-paste it and put it in their books, with or --
in practice -- without any attribution).

On Tue, Oct 20, 2015 at 12:05 PM, Pavan K Narayanan <
pavan.naraya...@gmail.com> wrote:

> Perhaps this page 
> needs
> to be updated with algorithms and features of 0.11.0?
>
> On 19 October 2015 at 18:29, Pat Ferrel  wrote:
>
> > BTW this use of Mahout-Samsara on Spark for recs has really expanded. The
> > Samsara part I’m calling a Correlation Engine, it can be used to mix
> usage,
> > content, and context to make recs. I look back on 2 years ago as pretty
> > much groping around for solutions. Things are much clearer now (for me at
> > least)
> >
> > Check out some slides about the math, leading to the “whole enchilada”
> > equation. Ted Dunning, Sean Owen, and Sebastian Schelter get no small
> > credit.
> > http://www.slideshare.net/pferrel/unified-recommender-39986309
> >
> > Even have code running using the PredicitonIO framework. This includesa
> > SDK to event store to realtime query. Loosely speaking a lambda
> > architecture. Most of the whole enchilada running except the content part
> > of the equation, which only works on metadata for how.
> > https://github.com/pferrel/scala-parallel-universal-recommendation
> >
> > We even do custom versions at actionML.com
> >
> >
> > On Oct 19, 2015, at 6:42 AM, Sean Owen  wrote:
> >
> > No, this is pretty wrong. Spark is not, in general, a real-time
> > anything. Spark Streaming is a near-real-time streaming framework, but
> > it is not something you can build models with. Spark MLlib / ML are
> > offline / batch. Not sure what you mean by Hadoop engine, but Spark
> > does not build on MapReduce, if that's what you mean.
> >
> > The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
> > Mahout is not. It has a fairly different new recommender system called
> > Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
> > somehow talking about the "classic" Mahout code here only.
> >
> > On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan 
> > wrote:
> > > Spark is a in memory , near realtime Machine Learning frameowork , has
> > > scala and java interface
> > > Mahout is an offline Machine Learning framework, no scala apis
> > >
> > > they both built on the HDFS and Hadoop engine
> > >
> > > Spark has an ecosystem like Hadoop
> > > Mahout is part of of Hadoop ecosystem
> > >
> > > Spark could beat Mahout on processing speed
> > > and concise programming APIs
> > >
> > > for online data anaysis , Spark is a better choice.
> > > for offline data analysis, both fits well.
> > >
> > >
> > >
> > > On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
> > > bpp...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> If I have used Mahout for my recommendation application, should I
> > migrate
> > >> into Spark MLib technology? Is the mahout still supported and
> migrated?
> > >>
> > >> Thanks
> > >>
> > >> *Prasad Priyadarshana Fernando <
> > http://www.linkedin.com/in/prasadfernando
> > >>> *
> > >> Mobile: +1 330 283 5827
> > >>
> >
> >
>


Re: Is Mahout obsolete now?

2015-10-20 Thread Pat Ferrel
They could be updated a bit but these pages have quite a bit.
http://mahout.apache.org/users/algorithms/recommender-overview.html
http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

I’ll add some docs about the lower level API and maybe a pointer to the slides 
(updated with Ted’s comments).

On Oct 20, 2015, at 2:53 PM, Dmitriy Lyubimov  wrote:

Pavan, I guess part of the documentation difficulty is in that Mahout
Samsara environment is only used for "training" but external components are
used for "scoring". So it is not 100% end-to-end Mahout solution to
document.

Pat, it would be nice though to put some of your docs on to Mahout site
though, what you think? They will be bound by Apache ICLA though after that
(meaning anyone can cut-and-paste it and put it in their books, with or --
in practice -- without any attribution).

On Tue, Oct 20, 2015 at 12:05 PM, Pavan K Narayanan <
pavan.naraya...@gmail.com> wrote:

> Perhaps this page 
> needs
> to be updated with algorithms and features of 0.11.0?
> 
> On 19 October 2015 at 18:29, Pat Ferrel  wrote:
> 
>> BTW this use of Mahout-Samsara on Spark for recs has really expanded. The
>> Samsara part I’m calling a Correlation Engine, it can be used to mix
> usage,
>> content, and context to make recs. I look back on 2 years ago as pretty
>> much groping around for solutions. Things are much clearer now (for me at
>> least)
>> 
>> Check out some slides about the math, leading to the “whole enchilada”
>> equation. Ted Dunning, Sean Owen, and Sebastian Schelter get no small
>> credit.
>> http://www.slideshare.net/pferrel/unified-recommender-39986309
>> 
>> Even have code running using the PredicitonIO framework. This includesa
>> SDK to event store to realtime query. Loosely speaking a lambda
>> architecture. Most of the whole enchilada running except the content part
>> of the equation, which only works on metadata for how.
>> https://github.com/pferrel/scala-parallel-universal-recommendation
>> 
>> We even do custom versions at actionML.com
>> 
>> 
>> On Oct 19, 2015, at 6:42 AM, Sean Owen  wrote:
>> 
>> No, this is pretty wrong. Spark is not, in general, a real-time
>> anything. Spark Streaming is a near-real-time streaming framework, but
>> it is not something you can build models with. Spark MLlib / ML are
>> offline / batch. Not sure what you mean by Hadoop engine, but Spark
>> does not build on MapReduce, if that's what you mean.
>> 
>> The "classic" Mahout code (<= 0.9) is definitely deprecated. The "new"
>> Mahout is not. It has a fairly different new recommender system called
>> Samsara. It has Scala APIs. In fact, it uses Spark. I think you're
>> somehow talking about the "classic" Mahout code here only.
>> 
>> On Mon, Oct 19, 2015 at 2:38 PM, Fei Shan 
>> wrote:
>>> Spark is a in memory , near realtime Machine Learning frameowork , has
>>> scala and java interface
>>> Mahout is an offline Machine Learning framework, no scala apis
>>> 
>>> they both built on the HDFS and Hadoop engine
>>> 
>>> Spark has an ecosystem like Hadoop
>>> Mahout is part of of Hadoop ecosystem
>>> 
>>> Spark could beat Mahout on processing speed
>>> and concise programming APIs
>>> 
>>> for online data anaysis , Spark is a better choice.
>>> for offline data analysis, both fits well.
>>> 
>>> 
>>> 
>>> On Mon, Oct 19, 2015 at 9:14 PM, Prasad Priyadarshana Fernando <
>>> bpp...@gmail.com> wrote:
>>> 
 Hi,
 
 If I have used Mahout for my recommendation application, should I
>> migrate
 into Spark MLib technology? Is the mahout still supported and
> migrated?
 
 Thanks
 
 *Prasad Priyadarshana Fernando <
>> http://www.linkedin.com/in/prasadfernando
> *
 Mobile: +1 330 283 5827
 
>> 
>> 
>