+1 for pluggable sink and source types.
On Tue, Mar 11, 2014 at 5:55 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Doing an example site for the solr-recommender Ted and I were faced with > same choices you mention below. He and I chose quite different > architectures, either of which is perfectly good. > > I spent some time thinking about what the common integration points are > for web apps. Solr supports a large community of web app integrators and > works with about any data format and database out there. So in this special > case virtually any wep app framework would have one or more methods for > integrating with Solr. > > Why not Mahout? > > There at least two ends to web app integration, the input pipeline and > serving the results. Not to mention background potentially periodic model > creation. The web app framework usually defines the way data is served > (html, json, REST, the list of formats and protocols goes on) so let me put > that aside for now. To me this points to getting data into mahout and out > again. Ideally it should come in through an extremely flexible mechanism, > which may also serve to get the data out. > > Input and output is primarily about translating formats, Ids, and > communicating with storage services (local fs, HDFS, S3, DB, ...). I chose > Cascading to process input in a mostly scalable way. Cascading does not yet > have Schemas to support all the DBs so I build one for my DB (MongoDB) but > it does support most file systems. There has been some talk in that > community about adding Schemas for DBs, which is also possible to do > yourself. It may be possible to create several of the more common pipelines > all the way from reading data from a logfile, Cassandra, S3, etc through > model creation to output to the web app's primary store. This leaves it > somewhat independent of the web app framework. If defined correctly if > could have pluggable sink and source types and flexible format definitions. > > Maybe there are better data pipeline frameworks than Cascading and making > this work in 80% of use cases will be a fair amount of work but as long as > Mahout has enough users it remains an important missing piece. > > I suspect that any reasonable attempt at this input to Mahout to datastore > pipeline would be considered for inclusion or reference in Mahout-Examples. > > > On Mar 8, 2014, at 2:31 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: > > Ok so the idea here is to tie and make some strategic partnerships with > some other open source products and provide Mahout as one component of a > web application, so the use cases for mahout will be partly driven by the > use cases for the web application itself, so in a nutshell a web > application requires: 1) search 2) recommendations 3) a primary data store. > The recommendations may be driven by the higher level use cases but the > key piece here will be pushing mahout into delivering real time > recommendations that someone can then perform searches over. One example > might be to search for music recommendations like what spotify already does > and perform term filters, term queries or other lucene based searches to > deliver results. Another might be to identify how recommendations fit into > the rest endpoints or in the case of serviceizing mahout they can be rest > endpoints. I've been thinking about this for a while since lately I've > seen a lot of discussions around mahout being hard to use or pick up and > learn. If there's enough interest I can go into more detail when we meet > to discuss 1.0 > > > Date: Sat, 8 Mar 2014 11:44:53 +0100 > > From: s...@apache.org > > To: dev@mahout.apache.org > > Subject: Re: Mahout 1.0 goals > > > > Hm, can you elaborate more what you mean? IMHO Mahout is a library only, > > so we should not build a complete MVC application inside this project, I > > think this is something that people should build on top, like > > prediction.io . > > > > --sebastian > > > > > > On 03/08/2014 12:16 AM, Saikat Kanjilal wrote: > >> I was also wondering if there'd be any interest in building a plugin to > interface with elasticsearch and spring, so what I am thinking is an MVC > type service that performs lucene like searches on recommendation algorithm > data stored inside a low latency data store, I know/saw that there was a > discussion on a solr recommender on mahout and would be glad to help > lead/build an elasticsearch version. > >> > >>> From: ted.dunn...@gmail.com > >>> Date: Fri, 7 Mar 2014 15:04:42 -0800 > >>> Subject: Re: Mahout 1.0 goals > >>> To: dev@mahout.apache.org > >>> > >>> There was not yet a meeting. > >>> > >>> I owe the list a summary of what people said and some suggested > >>> roadmapping. I will get to that on the weekend and we should be good > for a > >>> hangout meeting sometime next week. > >>> > >>> > >>> > >>> On Fri, Mar 7, 2014 at 10:35 AM, Saikat Kanjilal <sxk1...@hotmail.com > >wrote: > >>> > >>>> Hey Guys,Been trying to follow with the 1.0 goals , was there already > a > >>>> meeting on what the initial plans are for development and notes from > that, > >>>> I am particualrly interested in deep learning and service-izing > mahout , > >>>> let me know. > >>>> Thanks > >>>> > >>>>> From: ted.dunn...@gmail.com > >>>>> Date: Tue, 4 Mar 2014 19:32:40 -0800 > >>>>> Subject: Re: Mahout 1.0 goals > >>>>> To: dev@mahout.apache.org; s...@apache.org > >>>>> > >>>>> On Tue, Mar 4, 2014 at 2:24 PM, Sebastian Schelter <s...@apache.org> > >>>> wrote: > >>>>> > >>>>>> - AFAIK its also a problem to ship it license-wise as the required > >>>>>> libraries would not be Apache licensed > >>>>>> > >>>>>> See this discussion from the Spark community for details: > >>>>>> > >>>>>> https://github.com/apache/incubator-spark/pull/575 > >>>>>> > >>>>> > >>>>> This is a real issue and getting a lot of time over on legal as well. > >>>>> > >>>>> A non-optional LGPL dependency doesn't fly at this time. > >>>> > >>>> > >> > >> > > > > >