+ users@

These are great ideas, and are just the kinds of high level conversations I was hoping to engender. From my agile background, I'd hope to define 0.7 by a small number of "epic stories", in a subset of our overall capabilities, which could focus our attention to a set of derivative JIRAs that will give Mahout a quantum step forward in some functional area from our user's perspective. I think maybe 2-3 such "epics" are all we can handle in a release. I don't necessarily think mine are the right ones either, but are prime for the pump.

If we could only do 2-3 epics, what would they be? Where would the biggest contributions lie?

On 2/11/12 9:45 PM, Lance Norskog wrote:
For incremental improvements, usability and correctness of algorithms.
The "new" Naive Bayes and SGD algorithms both seem to have trouble
classifying. Also, interpretation of results. It is hard to summarize
the quality of results. I often feel like the math-savvy implementors
print a bunch of numbers and say "that looks right", and the rest of
us struggle to get an intuition of what's going on and why.

For new features, "Mahout Online" would be great: a web service that
packages all of the "online" algorithms (tractable speed and memory
use).

On Sat, Feb 11, 2012 at 1:29 PM, Frank Scholten<[email protected]>  wrote:
I'd like to add solving ClassNotFoundException problems with third
party jars in some jobs.

I experimented with having seq2sparse uploading a third party jar with
analyzer and add it to the DistributedCache. Uploading works but
didn't yet get it working inside the Mappers. I have some code lying
around for this that can be used as a starting point, including a
separate project that has dependencies on Mahout and on an analyzer to
test things out.

Another thing would be adding or improving the integration tools. For
example adding a mysql2seq to cluster text from a SQL database.

On Sat, Feb 11, 2012 at 8:01 PM, Jeff Eastman
<[email protected]>  wrote:
Now that 0.6 is in the box, it seems a good time to start thinking about
0.7, from a high level goal perspective at least. Here are a couple that
come to mind:

Target code freeze date August 1, 2012
Get Jenkins working for us again
Complete clustering refactoring and classification convergence
What kind of clustering refactoring do mean here? I did some work on
creating bean configurations in the past (MAHOUT-612). I
underestimated the amount of work required to do the entire
refactoring. If this can be contributed and committed on a per-job
basis I would like to help out.

...



Reply via email to