I could help out with internals of CBayes/Bayes, FPGrowth(if it becomes
ready by then) and writeups or how to's  to improve efficiency on different
datasets. how to understand your data and to disable enable various
parameters of CBayes/Bayes to fit non text data. Sparse database v/s dense
database on frequent pattern mining.
Other than that I could help out with any other writeups on classification,
clustering, pattern mining that you might need as introductions to the topic
at hand.


On Tue, Sep 22, 2009 at 11:04 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> The difference being that we focus on scalable.  This might involve hadoop
> for some, all or none of the steps.
>
> My definition of scalable is "handles data as big as nearly anybody
> produces".  That may or may not require Hadoop to do.  Many on-line
> learning
> systems are so fast that a single machine can munch near google scale
> amounts of data in a few hours.  Many other algorithms might require Hadoop
> for an aggregation step, but nothing else.  Other algorithms might depend
> on
> a cluster of Lucene nodes.
>
> In any case, I think that the focus of Mahout should be scalable learning.
> Period.
>
> The methods used should be drawn from a useful toolkit which prominently
> includes Hadoop.  And Lucene.  And some linear algebra stuff.  And Taste.
>
> This leaves open whether the focus of the book should be scalable learning
> or whether it should be learning with Hadoop.
>
> On Tue, Sep 22, 2009 at 10:18 AM, Sean Owen <sro...@gmail.com> wrote:
>
> > The difference being, not emphasizing Hadoop? I understand that. I
> > also recall we'd agreed that we were not realistically considering any
> > other distributed processing framework in the near future, which I
> > took to mean before v1.0?
> >
> > On Tue, Sep 22, 2009 at 11:59 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> > > I would amend that (again) to clustering, classification and
> > recommendations
> > > at scale.  With Hadoop where necessary.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to