My approach uses the same technique, but I'm using mostly HAG clustering. I did manage to add clustering support to a lucene based application (a customized solution), but I'd like to try to create a 'general purpose' library. I know it ain't easy! I've found many scaling issues, but I saw that with an optimized algorithms you can have pretty good results. Reading a carrot2 and lucene related messages, I figured out that I can cluster only the n first results, avoiding any performance issue in that way. Lucene offers a good support to a clustering framework, based on a tf idf analysis (not thinking of k-means or EM 'til now). The most interesting problem is creating the architecture for such a system, being general purpose but also very efficient. Thanks, Lorenzo
On 6/8/05, Daniel Stephan <[EMAIL PROTECTED]> wrote: > > I am currently writing sth about text retrieval using EM clustering. The > approach represents documents as high-dimensional vectors, but still it > is not related to Lucene (yet?). > How would you add clustering to Lucene? I think it may be a very > interesting technique to improve search results. If it works. My current > experience shows that it scales rather bad for larger document > collections. > > I don't think I will take part in Googles SoC, as I have my own "summer > of code" right now. But I would surely like to take part in discussions > about that topic, or at least read it and throw 2cents at it now and then. > > cheers > Daniel > > > Lorenzo schrieb: > > >Some people just replied, but I forgot the most important thing... > >I'm thinking of this project as part of the Google's Summer of Code > program, > >so I'm looking for other students. > >I've sent an email to Erik and he told me that we can propose this as > part > >of Google's SoC if we find some other people interested in it. > >Lorenzo > > > >On 6/7/05, Lorenzo <[EMAIL PROTECTED]> wrote: > > > > > >>I'm writing this message trying to find some people interested in > creating > >>a 'general purpose' lucene search results' clustering extension. > >>I wrote a simply implementation of clustering, and I would like to > >>contribute to lucene development by releasing an open source clustering > >>implementation. I know that maybe each project need a different > >>implementation but that would be a useful basis for everyone to develop > his > >>own project. > >>Is anyone interested in it? > >>Lorenzo > >> > >> > >> > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >