There seem to be quite a few alternatives around. I would be interested in comments on the following:- The work at NITLE <http://www.nitle.org/tools/semantic/search.htm> using Contextual Network Search (CNS) a graph-based alternative to LSI. This work *[PDF]* An Introduction to *Random* Indexing<http://www.sics.se/%7Emange/> found on Sahlgren's sics site. The former is advanced and feature rich. It would be complex to integrate into Lucene and would require more or less starting from scratch. Sahlgren doesn't seem to have an open source implementation and if there were one it may have to be rewritten. Nevertheless the key here is that it deals with the problem of matrix reduction intrinsically in the model and so is less computationally intensive, I think. It also deals with the problem of adding and subtracting from the matrix, as I understand it and without a computational penality. What do others think? Are Kanerva (the professor who developed these analytical techniques) and e.g. Sahlgren's methods available for implementation into an open source project? Adam
On 13/12/05, Dave Kor <[EMAIL PROTECTED]> wrote: > > On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote: > > Paul Libbrecht <[EMAIL PROTECTED]> writes: > > > > > We're also thinking about implementing something similar to LSI within > > > ActiveMath which is lucene-powered where both formulae and text > > > searching would benefit of the latent-semantic-similarity. I've been > > > refrained of doing "exactly this" at least since LSI is patented. This > > > might also be a reason why there's no implementation in Lucene's > > > sandbox. > > > > > > Have you looked at other vector-based approaches which are not exactly > LSI ? > > > Have you looked at InfoMap NLP ? > > > > Look for Thomas Hofmann's "probabilistic LSI", and other recent work > > which cites it. > > You might also be interested in "Latent Dirichlet Allocation (LDA)" by > David Blei. In short, it is a more advanced version of "probabilistic > LSI". I am currently writing some code to dump Lucene documents into a > file format used by Blei's LDA implementation written in C. > > > Regards, > Dave. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >