Hi to one and all
First time on this list, have read through the wiki, faq and other docs, but
before I dived further into Mahout I had a few questions or should I say
clarifications.
I am looking for a system which would allow me to:
1. Take a set of words
2. Build clusters of these words, i.e work out the semantic relationship
between these (I guess I could use wordnet as a starter) words. i.e
inter-relationships
3. Once clusters have been formed of words, also work out relationship between
the clusters themselves.
so in essence I could work out that red was similiar to crimson, and hence a
search on red would produce docs with crimson in them even though red was not
mentioned.
would mahout work here?
Of course prior to this, there is the problem of cleaning up the data, i.e
stemming etc.
Now I have read several detailed papers on clustering, ranking, etc, and of
course some algos are better than others, but to me a platform like Mahout
seems interesting since you can deploy the existing ones in the system, and
also later on add others.
Looking at the algorithms it seems as if LSI (PLSI) has not been implemented as
yet, if so which other algo would "suffice" in this case. Admitedley my
knowledge of algos is poor to say the least :-). Also where would (if it does)
Lucene fit in, would it be used to search the results after the algo's had been
applied ? since it seems as if Lucene just uses a weighting system to create
the index, or can Mahout do it all.
As you can see confused, but this is my first pass at this system.
tks
Paul
P.S are any of the algo's feedback algo's, i.e so that someone could inprove
results using user feedback.