I could help out with internals of CBayes/Bayes, FPGrowth(if it becomes ready by then) and writeups or how to's to improve efficiency on different datasets. how to understand your data and to disable enable various parameters of CBayes/Bayes to fit non text data. Sparse database v/s dense database on frequent pattern mining. Other than that I could help out with any other writeups on classification, clustering, pattern mining that you might need as introductions to the topic at hand.
On Tue, Sep 22, 2009 at 11:04 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > The difference being that we focus on scalable. This might involve hadoop > for some, all or none of the steps. > > My definition of scalable is "handles data as big as nearly anybody > produces". That may or may not require Hadoop to do. Many on-line > learning > systems are so fast that a single machine can munch near google scale > amounts of data in a few hours. Many other algorithms might require Hadoop > for an aggregation step, but nothing else. Other algorithms might depend > on > a cluster of Lucene nodes. > > In any case, I think that the focus of Mahout should be scalable learning. > Period. > > The methods used should be drawn from a useful toolkit which prominently > includes Hadoop. And Lucene. And some linear algebra stuff. And Taste. > > This leaves open whether the focus of the book should be scalable learning > or whether it should be learning with Hadoop. > > On Tue, Sep 22, 2009 at 10:18 AM, Sean Owen <sro...@gmail.com> wrote: > > > The difference being, not emphasizing Hadoop? I understand that. I > > also recall we'd agreed that we were not realistically considering any > > other distributed processing framework in the near future, which I > > took to mean before v1.0? > > > > On Tue, Sep 22, 2009 at 11:59 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > I would amend that (again) to clustering, classification and > > recommendations > > > at scale. With Hadoop where necessary. > > > > > > -- > Ted Dunning, CTO > DeepDyve >