Okay I have brain freeze, reading the email below:-) I think PLSI will do (or is a great starter) to what I want. I am looking at a hadoop install, with mahout on top, is there any need of lucene.
Also is there a "dummies" guide to all these algos, i.e which are clustering algos, which are indexing, which are for "abc", since I am reading a ton of information and am not 100% sure of which categories they all fit into....hope the question is not to vague Paul ________________________________ From: Ted Dunning <[email protected]> To: [email protected] Sent: Wednesday, 17 June, 2009 7:36:48 Subject: Re: newbie question: LSA anaylsis + others Indeed there is. And Prasenjit is being properly modest by not pointing out that this was due to his efforts. This is a great example of how terse a language like pig can make many problems that involve a bunch of counting. Most EM-like algorithms fit into this category including k-means, HMM fitting, Dirichlet Process mixture modeling and lots of others. The problem in my mind is that it is difficult to tie all of the little scripts together coherenly. Prasenjit did this using python, but there is still no cohesive whole to the resulting program even if the result is much smaller and probably easier to understand than a large java program. On Tue, Jun 16, 2009 at 11:07 PM, prasenjit mukherjee <[email protected]>wrote: > Well, there is a PLSI implementation using Pig ( over Hadoop ) as a mahout > patch : https://issues.apache.org/jira/browse/MAHOUT-106 > >
