On Fri, Apr 16, 2010 at 11:56 AM, Sean Owen <sro...@gmail.com> wrote:

> On Fri, Apr 16, 2010 at 7:39 PM, Jake Mannix <jake.man...@gmail.com>
> wrote:
> > I will start playing around with Anthony's github-based stuff, and
> > see where a patch can be made.  The question is where it would
> > go?  It's a fully functioning project already over on its own.
>
>
> I suppose that's my question too -- what is being fixed by a move?
>

What is being fixed is that we currently have an open ticket for providing
LSA hooks for Solr, and lsa4solr provides an end-to-end solution for that
particular task, along with a bunch of other nice things (clojure wrapping
and thus the REPL).  If we want to say "MAHOUT-343 is a Don't Fix",
and that's the consensus, then that's fine.  It could also be implemented
in pure java inside of mahout-core, but I don't see anyone stepping up to
the plate to write that, and here's someone who's done it in another
JVM language we could use.



> The point about integrating with the ML community by having a
> 'LISP-speaking' module, to be friendlier, is a good one. It does call
> into question the Mahout identity -- is it for tinkering with in a lab
> to explore new algorithms (for which Clojure/LISP makes sense)? or is
> it for engineers and production systems at scale -- where Hadoop/Java
> is the lingua franca? Yeah, this is not just another language, but for
> a somewhat different audience.
>
> Maybe "both" is nice.


I think "both" is *necessary*.  At least at all the smallish bay-area
startups doing some scalable production ML these days, there is not
much distinction between "researcher" and "production code-monkey" -
you prototype in whatever language you can, you try it out (maybe using
hadoop streaming) on bigger data sets, then you productionalize it (
usually the same engineers involved in all steps).

If Mahout could be helping along with that entire process, that would
be fantastic.  We'd have shell scripts and an actual REPL to tinker
with, and then when it came time to optimize performance, the same
exact libraries could be used and extended, no more "first we do
stuff with Matlab or R, then port all of the code over to java/c++ later".


  -jake

Reply via email to