Re: Mahout book

Sean Owen Tue, 22 Sep 2009 09:48:49 -0700

Good, glad to hear there is interest in making this book happen. I
agree, the very first step I'm going through with the publisher is
trying to answer these questions.

I sense some consensus that Mahout v1.0 is primarily clustering,
classification and recommendations at scale using Hadoop. That's a
pretty good mission statement.

Should the book also be a tutorial on these techniques, and on Hadoop?
There are already books like Collective Intelligence in Action, and
Hadoop: The Definitive Guide. My sense is the book shouldn't try to
duplicate these, though it's unavoidable to cover these topics
partially. From an initial conversation, seemed like the most
available and useful niche would be to focus specifically on Mahout
(of course) and focus on practice as opposed to theory. So rather than
explain Hadoop -- walk through all aspects of running a big clustering
job with Mahout on Hadoop.

I think the intended audience you mention is right. It's for people
that are either already experienced engineers, or already familiar
with these techniques, but not both. A practical cookbook fills in the
gaps for either group.

Zaki why don't I forward you the outline I am writing for the
publisher when I finish it, shortly?

The parts that currently would need most help are sections on
clustering and classification. Obviously I can cover recommender
engines.

On Tue, Sep 22, 2009 at 11:35 AM, zaki rahaman <[email protected]> wrote:
> I've been a longtime lurker and I'm still getting used to the ins and outs
> of using Mahout (I've made some hacks to source in my own environment and
> have done some testing, but nothing in production yet) but I'd love to help
> out on a book, maybe with some of the background material. Maybe I'm the
> only one who feels this way, but any Mahout book should have some basic
> introductory background material -- some discussion about machine learning
> (classification, clustering), high level overviews of algorithms, and maybe
> some case studies/examples (why use mahout vs. other tools?). And of course,
> the standard Intro chapter on MapReduce, HDFS, and the rest of the Hadoop
> environment (including deploying on EC2/S3). Again, it's probably best to
> sort out what does/doesn't belong, but first I think it would be a useful
> excercise to figure out who the intended audience really is. In my mind I
> would break it down into a few possibilities:
>
> 1. Java developers looking to incorporate ML algorithms into their existing
> projects/software.
> 2. People from more of an academic background well versed in ML, IR, NLP,
> etc. who are looking for an efficient and scalable software tool to use.
> 3. Devs from a non-Java environment (obv no one is going to write a
> beginner's Java guide, but highlighting parts of the API that may be able to
> interface with other tools -- I have a small library of python wrappers I
> use to set up and run some routine tasks)
>
> On Tue, Sep 22, 2009 at 12:17 PM, Sean Owen <[email protected]> wrote:
>
>> As I mentioned to some of you, there's a proposal to begin work on a
>> book on Mahout. It sounds early, but the publisher assures me it's
>> about the right time to begin, if we want a book out at roughly the
>> time '1.0' rolls out in a year or so. I've heard support for the idea,
>> and think it's a good thing.
>>
>> I'm going to move forward drafting a proposal and draft outline of
>> such a thing. It seems so far I am the (only?) one interested in
>> significant work in writing such a thing, which is cool, so I can
>> drive this -- but I'd be concerned if it were just me speaking for the
>> project book. Hence:
>>
>> - Who else might be interested in being a co-author and putting in
>> significant work?
>> - Would anyone care to read the proposal before I send it in?
>> - Would anyone help me, in the short term, draft an outline of the
>> content of the classification and clustering sections?
>>
>> Sean
>>
>
>
>
> --
> Zaki Rahaman
>

Re: Mahout book

Reply via email to