Okay I have brain freeze, reading the email below:-)

I think PLSI will do (or is a great starter) to what I want. I am looking at a 
hadoop install, with mahout on top, is there any need of lucene.

Also is there a "dummies" guide to all these algos, i.e which are clustering 
algos, which are indexing, which are for "abc", since I am reading a ton of 
information and am not 100% sure of which categories they all fit into....hope 
the question is not to vague

Paul




________________________________
From: Ted Dunning <[email protected]>
To: [email protected]
Sent: Wednesday, 17 June, 2009 7:36:48
Subject: Re: newbie question: LSA anaylsis + others

Indeed there is.  And Prasenjit is being properly modest by not pointing out
that this was due to his efforts.

This is a great example of how terse a language like pig can make many
problems that involve a bunch of counting.  Most EM-like algorithms fit into
this category including k-means, HMM fitting, Dirichlet Process mixture
modeling and lots of others.

The problem in my mind is that it is difficult to tie all of the little
scripts together coherenly.  Prasenjit did this using python, but there is
still no cohesive whole to the resulting program even if the result is much
smaller and probably easier to understand than a large java program.

On Tue, Jun 16, 2009 at 11:07 PM, prasenjit mukherjee
<[email protected]>wrote:

> Well, there is a  PLSI implementation using Pig ( over Hadoop ) as a mahout
> patch : https://issues.apache.org/jira/browse/MAHOUT-106
>
>



      

Reply via email to