So here's my take: once we're a TLP (next month sometime?), it is
a good time to start allowing subprojects or submodules which are
"scripting" layers on top of Mahout - whether they are PigLatin, or
Cascalog, JRuby, or Clojure.  If it's JVM-based, especially, having
code/scripts which are "drivers" and wrappers for what is currently
for the most part a library which has a shell script (Taste-web is
the exception) is a huge useful addition.

To compare, the guys doing Incanter ( http://data-sorcery.org/ )
are using Clojure as a wrapper around Parallel-Colt
( http://sites.google.com/site/piotrwendykier/software/parallelcolt ),
and does all of the heavy lifting in pure Java.  That's why I'm not
too worried about performance stuff (to respond to Robin's concern
down-thread while I'm writing this).

Personally, I'm not a Clojure guy, but Lisp has long been a mainstay
of the AI world, and if we're going to interface with academic ML and
AI more (which we should!!!), Clojure is going to be our best bet,
as it's just Lisp on the JVM, and while were I going to write a REPL
for us, I'd do it in JRuby (much like HBase uses JRuby's jirb for
their shell), and may still, but more easy interactive ways of
using Mahout, the better, I'd say.

Hmm... this was a bit scattered of a response, but I'm really loathe
to turn away a) nice hooks between Solr and Mahout, b) scripting-style
wrappers which could expand our community, and c) simply new
functionality.

I'm certainly game to help shepherd in any code we can use, although
I guess I'm fine waiting to help make a sub-project once we're a TLP
if that's the right way to go.

  -jake

On Fri, Apr 16, 2010 at 10:05 AM, Sean Owen <sro...@gmail.com> wrote:

> Clojure isn't my cup of tea but that's not important.
>
> It's an interesting question, how much belongs under the Mahout tent?
> There's a tradeoff between excluding useful extensions to the project
> on the one hand, and becoming a spare parts bin of code of varying
> levels of maturity and support.


> I'm inclined to see Mahout built, like any project, in layers. We've
> done a good job of that, with collections supporting math, supporting
> core, supporting examples/utils. core remains uneven, but, it's taking
> shape and it's to be expected at this stage that the project has lots
> of fragments that naturally come together or else are weeded out.
>
> So I have some concern with getting even core into better shape before
> considering broadening the tent to include other consumers. (We're not
> talking about integrating Clojure code into core right? Yeah it ends
> up as Java bytecode, but that's not quite the issue... it's not
> understandable and usable to the devs that would be using Java-based
> Mahout.)
>
> And so I have moderate preference for not mixing in other languages
> yet, just like with the talk of a C# port earlier today.
>
> I'm sure it's good code and cool and useful, and deserves support and
> collaboration and liaising; I'm only wondering about whether it ought
> be a piece of Mahout - what is the benefit, against the subtle but
> real drawbacks.
>

Reply via email to