Doing an example site for the solr-recommender Ted and I were faced with same choices you mention below. He and I chose quite different architectures, either of which is perfectly good.
I spent some time thinking about what the common integration points are for web apps. Solr supports a large community of web app integrators and works with about any data format and database out there. So in this special case virtually any wep app framework would have one or more methods for integrating with Solr. Why not Mahout? There at least two ends to web app integration, the input pipeline and serving the results. Not to mention background potentially periodic model creation. The web app framework usually defines the way data is served (html, json, REST, the list of formats and protocols goes on) so let me put that aside for now. To me this points to getting data into mahout and out again. Ideally it should come in through an extremely flexible mechanism, which may also serve to get the data out. Input and output is primarily about translating formats, Ids, and communicating with storage services (local fs, HDFS, S3, DB, …). I chose Cascading to process input in a mostly scalable way. Cascading does not yet have Schemas to support all the DBs so I build one for my DB (MongoDB) but it does support most file systems. There has been some talk in that community about adding Schemas for DBs, which is also possible to do yourself. It may be possible to create several of the more common pipelines all the way from reading data from a logfile, Cassandra, S3, etc through model creation to output to the web app’s primary store. This leaves it somewhat independent of the web app framework. If defined correctly if could have pluggable sink and source types and flexible format definitions. Maybe there are better data pipeline frameworks than Cascading and making this work in 80% of use cases will be a fair amount of work but as long as Mahout has enough users it remains an important missing piece. I suspect that any reasonable attempt at this input to Mahout to datastore pipeline would be considered for inclusion or reference in Mahout-Examples. On Mar 8, 2014, at 2:31 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: Ok so the idea here is to tie and make some strategic partnerships with some other open source products and provide Mahout as one component of a web application, so the use cases for mahout will be partly driven by the use cases for the web application itself, so in a nutshell a web application requires: 1) search 2) recommendations 3) a primary data store. The recommendations may be driven by the higher level use cases but the key piece here will be pushing mahout into delivering real time recommendations that someone can then perform searches over. One example might be to search for music recommendations like what spotify already does and perform term filters, term queries or other lucene based searches to deliver results. Another might be to identify how recommendations fit into the rest endpoints or in the case of serviceizing mahout they can be rest endpoints. I've been thinking about this for a while since lately I've seen a lot of discussions around mahout being hard to use or pick up and learn. If there's enough interest I can go into more detail when we meet to discuss 1.0 > Date: Sat, 8 Mar 2014 11:44:53 +0100 > From: s...@apache.org > To: dev@mahout.apache.org > Subject: Re: Mahout 1.0 goals > > Hm, can you elaborate more what you mean? IMHO Mahout is a library only, > so we should not build a complete MVC application inside this project, I > think this is something that people should build on top, like > prediction.io . > > --sebastian > > > On 03/08/2014 12:16 AM, Saikat Kanjilal wrote: >> I was also wondering if there'd be any interest in building a plugin to >> interface with elasticsearch and spring, so what I am thinking is an MVC >> type service that performs lucene like searches on recommendation algorithm >> data stored inside a low latency data store, I know/saw that there was a >> discussion on a solr recommender on mahout and would be glad to help >> lead/build an elasticsearch version. >> >>> From: ted.dunn...@gmail.com >>> Date: Fri, 7 Mar 2014 15:04:42 -0800 >>> Subject: Re: Mahout 1.0 goals >>> To: dev@mahout.apache.org >>> >>> There was not yet a meeting. >>> >>> I owe the list a summary of what people said and some suggested >>> roadmapping. I will get to that on the weekend and we should be good for a >>> hangout meeting sometime next week. >>> >>> >>> >>> On Fri, Mar 7, 2014 at 10:35 AM, Saikat Kanjilal <sxk1...@hotmail.com>wrote: >>> >>>> Hey Guys,Been trying to follow with the 1.0 goals , was there already a >>>> meeting on what the initial plans are for development and notes from that, >>>> I am particualrly interested in deep learning and service-izing mahout , >>>> let me know. >>>> Thanks >>>> >>>>> From: ted.dunn...@gmail.com >>>>> Date: Tue, 4 Mar 2014 19:32:40 -0800 >>>>> Subject: Re: Mahout 1.0 goals >>>>> To: dev@mahout.apache.org; s...@apache.org >>>>> >>>>> On Tue, Mar 4, 2014 at 2:24 PM, Sebastian Schelter <s...@apache.org> >>>> wrote: >>>>> >>>>>> - AFAIK its also a problem to ship it license-wise as the required >>>>>> libraries would not be Apache licensed >>>>>> >>>>>> See this discussion from the Spark community for details: >>>>>> >>>>>> https://github.com/apache/incubator-spark/pull/575 >>>>>> >>>>> >>>>> This is a real issue and getting a lot of time over on legal as well. >>>>> >>>>> A non-optional LGPL dependency doesn't fly at this time. >>>> >>>> >> >> >