+1 for pluggable sink and source types.


On Tue, Mar 11, 2014 at 5:55 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Doing an example site for the solr-recommender Ted and I were faced with
> same choices you mention below. He and I chose quite different
> architectures, either of which is perfectly good.
>
> I spent some time thinking about what the common integration points are
> for web apps. Solr supports a large community of web app integrators and
> works with about any data format and database out there. So in this special
> case virtually any wep app framework would have one or more methods for
> integrating with Solr.
>
> Why not Mahout?
>
> There at least two ends to web app integration, the input pipeline and
> serving the results. Not to mention  background potentially periodic model
> creation. The web app framework usually defines the way data is served
> (html, json, REST, the list of formats and protocols goes on) so let me put
> that aside for now. To me this points to getting data into mahout and out
> again. Ideally it should come in through an extremely flexible mechanism,
> which may also serve to get the data out.
>
> Input and output is primarily about translating formats, Ids, and
> communicating with storage services (local fs, HDFS, S3, DB, ...). I chose
> Cascading to process input in a mostly scalable way. Cascading does not yet
> have Schemas to support all the DBs so I build one for my DB (MongoDB) but
> it does support most file systems. There has been some talk in that
> community about adding Schemas for DBs, which is also possible to do
> yourself. It may be possible to create several of the more common pipelines
> all the way from reading data from a logfile, Cassandra, S3, etc through
> model creation to output to the web app's primary store. This leaves it
> somewhat independent of the web app framework. If defined correctly if
> could have pluggable sink and source types and flexible format definitions.
>
> Maybe there are better data pipeline frameworks than Cascading and making
> this work in 80% of use cases will be a fair amount of work but as long as
> Mahout has enough users it remains an important missing piece.
>
> I suspect that any reasonable attempt at this input to Mahout to datastore
> pipeline would be considered for inclusion or reference in Mahout-Examples.
>
>
> On Mar 8, 2014, at 2:31 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
>
> Ok so the idea here is to tie and make some strategic partnerships with
> some other open source products and provide Mahout as one component of a
> web application, so the use cases for mahout will be partly driven by the
> use cases for the web application itself, so in a nutshell a web
> application requires: 1) search 2) recommendations 3) a primary data store.
>  The recommendations may be driven by the higher level use cases but the
> key piece here will be pushing mahout into delivering real time
> recommendations that someone can then perform searches over. One example
> might be to search for music recommendations like what spotify already does
> and perform term filters, term queries or other lucene based searches to
> deliver results.  Another might be to identify how recommendations fit into
> the rest endpoints or in the case of serviceizing mahout they can be rest
> endpoints.    I've been thinking about this for a while since lately I've
> seen a lot of discussions around mahout being hard to use or pick up and
> learn.   If there's enough interest I can go into more detail when we meet
> to discuss 1.0
>
> > Date: Sat, 8 Mar 2014 11:44:53 +0100
> > From: s...@apache.org
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout 1.0 goals
> >
> > Hm, can you elaborate more what you mean? IMHO Mahout is a library only,
> > so we should not build a complete MVC application inside this project, I
> > think this is something that people should build on top, like
> > prediction.io .
> >
> > --sebastian
> >
> >
> > On 03/08/2014 12:16 AM, Saikat Kanjilal wrote:
> >> I was also wondering if there'd be any interest in building a plugin to
> interface with elasticsearch and spring, so what I am thinking is an MVC
> type service that performs lucene like searches on recommendation algorithm
> data stored inside a low latency data store, I know/saw that  there was a
> discussion on a solr recommender on mahout and would be glad to help
> lead/build an elasticsearch version.
> >>
> >>> From: ted.dunn...@gmail.com
> >>> Date: Fri, 7 Mar 2014 15:04:42 -0800
> >>> Subject: Re: Mahout 1.0 goals
> >>> To: dev@mahout.apache.org
> >>>
> >>> There was not yet a meeting.
> >>>
> >>> I owe the list a summary of what people said and some suggested
> >>> roadmapping.  I will get to that on the weekend and we should be good
> for a
> >>> hangout meeting sometime next week.
> >>>
> >>>
> >>>
> >>> On Fri, Mar 7, 2014 at 10:35 AM, Saikat Kanjilal <sxk1...@hotmail.com
> >wrote:
> >>>
> >>>> Hey Guys,Been trying to follow with the 1.0 goals , was there already
> a
> >>>> meeting on what the initial plans are for development and notes from
> that,
> >>>> I am particualrly interested in deep learning and service-izing
> mahout ,
> >>>> let me know.
> >>>> Thanks
> >>>>
> >>>>> From: ted.dunn...@gmail.com
> >>>>> Date: Tue, 4 Mar 2014 19:32:40 -0800
> >>>>> Subject: Re: Mahout 1.0 goals
> >>>>> To: dev@mahout.apache.org; s...@apache.org
> >>>>>
> >>>>> On Tue, Mar 4, 2014 at 2:24 PM, Sebastian Schelter <s...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>>> - AFAIK its also a problem to ship it license-wise as the required
> >>>>>> libraries would not be Apache licensed
> >>>>>>
> >>>>>> See this discussion from the Spark community for details:
> >>>>>>
> >>>>>> https://github.com/apache/incubator-spark/pull/575
> >>>>>>
> >>>>>
> >>>>> This is a real issue and getting a lot of time over on legal as well.
> >>>>>
> >>>>> A non-optional LGPL dependency doesn't fly at this time.
> >>>>
> >>>>
> >>
> >>
> >
>
>
>

Reply via email to