RE: Mahout 1.0 goals

Saikat Kanjilal Wed, 12 Mar 2014 10:57:18 -0700

Agree on many levels , so one thing i was thinking about aligned with this is 
how does mahout fit into custom recommendation engines that are already in 
existence, to step back a bit the web app will have:
1) UI--logic around service up any data pipeline2) BusinessLogic in an MVC like 
framework (Spring etc)3) Recommendations engine4) Data Store5) Search


I see mahout as populating 4 with a real time view of recommendations (using 
cascading or custom plugins underneath) and using spark (or storm) to serve 
these up and for number 3 mahout can live underneath a more business driven 
recommendations engine that ties into 2

For 5 if the interfaces are defined correctly we can potentially plug in any 
lucene like implementation under the hood.   So the question in my mind becomes 
where and how can mahout provide the most value and what are the APIs that need 
to be written for it to fit into one or more of the layers above.

I was reading this story on elasticsearch's site and it sparked some of the 
thoughts above:
http://www.elasticsearch.org/case-study/stumbleupon/

I'd love to volunteer to build something that does all of the above and 
showcases mahout's abilities if there's enough interest.

Regards


> Subject: Re: Mahout 1.0 goals
> From: p...@occamsmachete.com
> Date: Tue, 11 Mar 2014 09:55:04 -0700
> To: dev@mahout.apache.org
> 
> Doing an example site for the solr-recommender Ted and I were faced with same 
> choices you mention below. He and I chose quite different architectures, 
> either of which is perfectly good. 
> 
> I spent some time thinking about what the common integration points are for 
> web apps. Solr supports a large community of web app integrators and works 
> with about any data format and database out there. So in this special case 
> virtually any wep app framework would have one or more methods for 
> integrating with Solr.
> 
> Why not Mahout?
> 
> There at least two ends to web app integration, the input pipeline and 
> serving the results. Not to mention  background potentially periodic model 
> creation. The web app framework usually defines the way data is served (html, 
> json, REST, the list of formats and protocols goes on) so let me put that 
> aside for now. To me this points to getting data into mahout and out again. 
> Ideally it should come in through an extremely flexible mechanism, which may 
> also serve to get the data out.
> 
> Input and output is primarily about translating formats, Ids, and 
> communicating with storage services (local fs, HDFS, S3, DB, …). I chose 
> Cascading to process input in a mostly scalable way. Cascading does not yet 
> have Schemas to support all the DBs so I build one for my DB (MongoDB) but it 
> does support most file systems. There has been some talk in that community 
> about adding Schemas for DBs, which is also possible to do yourself. It may 
> be possible to create several of the more common pipelines all the way from 
> reading data from a logfile, Cassandra, S3, etc through model creation to 
> output to the web app’s primary store. This leaves it somewhat independent of 
> the web app framework. If defined correctly if could have pluggable sink and 
> source types and flexible format definitions.
> 
> Maybe there are better data pipeline frameworks than Cascading and making 
> this work in 80% of use cases will be a fair amount of work but as long as 
> Mahout has enough users it remains an important missing piece. 
> 
> I suspect that any reasonable attempt at this input to Mahout to datastore 
> pipeline would be considered for inclusion or reference in Mahout-Examples.
> 
> 
> On Mar 8, 2014, at 2:31 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
> 
> Ok so the idea here is to tie and make some strategic partnerships with some 
> other open source products and provide Mahout as one component of a web 
> application, so the use cases for mahout will be partly driven by the use 
> cases for the web application itself, so in a nutshell a web application 
> requires: 1) search 2) recommendations 3) a primary data store.  The 
> recommendations may be driven by the higher level use cases but the key piece 
> here will be pushing mahout into delivering real time recommendations that 
> someone can then perform searches over. One example might be to search for 
> music recommendations like what spotify already does and perform term 
> filters, term queries or other lucene based searches to deliver results.  
> Another might be to identify how recommendations fit into the rest endpoints 
> or in the case of serviceizing mahout they can be rest endpoints.    I've 
> been thinking about this for a while since lately I've seen a lot of 
> discussions around mahout being hard to use or pick up and learn.   If 
> there's enough interest I can go into more detail when we meet to discuss 1.0
> 
> > Date: Sat, 8 Mar 2014 11:44:53 +0100
> > From: s...@apache.org
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout 1.0 goals
> > 
> > Hm, can you elaborate more what you mean? IMHO Mahout is a library only, 
> > so we should not build a complete MVC application inside this project, I 
> > think this is something that people should build on top, like 
> > prediction.io .
> > 
> > --sebastian
> > 
> > 
> > On 03/08/2014 12:16 AM, Saikat Kanjilal wrote:
> >> I was also wondering if there'd be any interest in building a plugin to 
> >> interface with elasticsearch and spring, so what I am thinking is an MVC 
> >> type service that performs lucene like searches on recommendation 
> >> algorithm data stored inside a low latency data store, I know/saw that  
> >> there was a discussion on a solr recommender on mahout and would be glad 
> >> to help lead/build an elasticsearch version.
> >> 
> >>> From: ted.dunn...@gmail.com
> >>> Date: Fri, 7 Mar 2014 15:04:42 -0800
> >>> Subject: Re: Mahout 1.0 goals
> >>> To: dev@mahout.apache.org
> >>> 
> >>> There was not yet a meeting.
> >>> 
> >>> I owe the list a summary of what people said and some suggested
> >>> roadmapping.  I will get to that on the weekend and we should be good for 
> >>> a
> >>> hangout meeting sometime next week.
> >>> 
> >>> 
> >>> 
> >>> On Fri, Mar 7, 2014 at 10:35 AM, Saikat Kanjilal 
> >>> <sxk1...@hotmail.com>wrote:
> >>> 
> >>>> Hey Guys,Been trying to follow with the 1.0 goals , was there already a
> >>>> meeting on what the initial plans are for development and notes from 
> >>>> that,
> >>>> I am particualrly interested in deep learning and service-izing mahout ,
> >>>> let me know.
> >>>> Thanks
> >>>> 
> >>>>> From: ted.dunn...@gmail.com
> >>>>> Date: Tue, 4 Mar 2014 19:32:40 -0800
> >>>>> Subject: Re: Mahout 1.0 goals
> >>>>> To: dev@mahout.apache.org; s...@apache.org
> >>>>> 
> >>>>> On Tue, Mar 4, 2014 at 2:24 PM, Sebastian Schelter <s...@apache.org>
> >>>> wrote:
> >>>>> 
> >>>>>> - AFAIK its also a problem to ship it license-wise as the required
> >>>>>> libraries would not be Apache licensed
> >>>>>> 
> >>>>>> See this discussion from the Spark community for details:
> >>>>>> 
> >>>>>> https://github.com/apache/incubator-spark/pull/575
> >>>>>> 
> >>>>> 
> >>>>> This is a real issue and getting a lot of time over on legal as well.
> >>>>> 
> >>>>> A non-optional LGPL dependency doesn't fly at this time.
> >>>> 
> >>>> 
> >>                                    
> >> 
> > 
>                                         
>

RE: Mahout 1.0 goals

Reply via email to