So here's my take: once we're a TLP (next month sometime?), it is a good time to start allowing subprojects or submodules which are "scripting" layers on top of Mahout - whether they are PigLatin, or Cascalog, JRuby, or Clojure. If it's JVM-based, especially, having code/scripts which are "drivers" and wrappers for what is currently for the most part a library which has a shell script (Taste-web is the exception) is a huge useful addition.
To compare, the guys doing Incanter ( http://data-sorcery.org/ ) are using Clojure as a wrapper around Parallel-Colt ( http://sites.google.com/site/piotrwendykier/software/parallelcolt ), and does all of the heavy lifting in pure Java. That's why I'm not too worried about performance stuff (to respond to Robin's concern down-thread while I'm writing this). Personally, I'm not a Clojure guy, but Lisp has long been a mainstay of the AI world, and if we're going to interface with academic ML and AI more (which we should!!!), Clojure is going to be our best bet, as it's just Lisp on the JVM, and while were I going to write a REPL for us, I'd do it in JRuby (much like HBase uses JRuby's jirb for their shell), and may still, but more easy interactive ways of using Mahout, the better, I'd say. Hmm... this was a bit scattered of a response, but I'm really loathe to turn away a) nice hooks between Solr and Mahout, b) scripting-style wrappers which could expand our community, and c) simply new functionality. I'm certainly game to help shepherd in any code we can use, although I guess I'm fine waiting to help make a sub-project once we're a TLP if that's the right way to go. -jake On Fri, Apr 16, 2010 at 10:05 AM, Sean Owen <sro...@gmail.com> wrote: > Clojure isn't my cup of tea but that's not important. > > It's an interesting question, how much belongs under the Mahout tent? > There's a tradeoff between excluding useful extensions to the project > on the one hand, and becoming a spare parts bin of code of varying > levels of maturity and support. > I'm inclined to see Mahout built, like any project, in layers. We've > done a good job of that, with collections supporting math, supporting > core, supporting examples/utils. core remains uneven, but, it's taking > shape and it's to be expected at this stage that the project has lots > of fragments that naturally come together or else are weeded out. > > So I have some concern with getting even core into better shape before > considering broadening the tent to include other consumers. (We're not > talking about integrating Clojure code into core right? Yeah it ends > up as Java bytecode, but that's not quite the issue... it's not > understandable and usable to the devs that would be using Java-based > Mahout.) > > And so I have moderate preference for not mixing in other languages > yet, just like with the talk of a C# port earlier today. > > I'm sure it's good code and cool and useful, and deserves support and > collaboration and liaising; I'm only wondering about whether it ought > be a piece of Mahout - what is the benefit, against the subtle but > real drawbacks. >