-----Original Message----- From: Joaquin Delgado Sent: Friday, January 28, 2005 4:41 PM To: 'Lucene Developers List'; [EMAIL PROTECTED] Subject: RE: -> Grouping Search Results by Clustering Snippets:
This is a very interesting thread. Down is a link to a paper I published many years ago (1998) about RAAP, a bookmark recommender system: DELGADO, J., ISHII, N. and URA, T., "Content-based Collaborative Information Filtering: Actively Learning to Classify and Recommend Documents" in M. Klusch, G. Weiß (Eds.): (1998) Cooperative Information Agents II. Learning, Mobility and Electronic Commerce for Information Discovery on the Internet. Springer-Verlag, Lecture Notes in Artificial Intelligence Series No. 1435. http://www.triplehop.com/pdf/cia-final.pdf Regarding the clustering technique, I'd like your opinion on the topic clustering you can find at http://www.find.com This one uses title and snippets from the external engines and "concepts" extracted from documents at indexing time. And for those interested (and willing to read a big chuck of old good stuff about information filtering and recommender systems :-) you can also access my 2000 PhD. Thesis at: http://www.triplehop.com/pdf/Doctoral_Thesis.pdf Cheers, -- Joaquin -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, January 28, 2005 2:47 PM To: Lucene Developers List; [EMAIL PROTECTED] Subject: RE: -> Grouping Search Results by Clustering Snippets: This is very much of interest to me. Although it's not in the UI, I did integrate Lucene and Carrot2 in Simpy ( http://www.simpy.com ). Clustering is currently triggered only by a search. Although you may not be able to tell (again, sucky UI) Simpy is designed in a way that will let me hook in a recommender system, much like you describe it. Users store links into their Simpy accounts, they tag them, perform searches, find other users, add them to their Topics (Simpy-specific thing), and so on, so there is a lot of knowledge about a user that can be derived from all that. Currently, the only quasi-smart thing that goes beyond a simple search is 'More users like this', and even that has a small bug that I need to fix for the next release, but what you are describing sounds very much like one of the directions in which I want to take Simpy and its users. :) Otis --- Adam Saltiel <[EMAIL PROTECTED]> wrote: > This has been implemented in open source, but not with lucene? > http://www.cs.put.poznan.pl/dweiss/carrot/ > and > http://carrot2.sourceforge.net/ > David Weiss is a Polish academic at Poznan University, Poland. He and > others have implemented a servlet based web app that uses pipe lined > components that communicate using http and implement a couple of > clustering algorithms. > Clustering, of course, can go way beyond search result presentation > and > there are some very suggestive examples at > http://www.sics.se/humle/socialcomputing/ > Where the encore project (Martin Svennson) is based on orthogonal > transformations of a large sparse matrix (a possible method for > matrix > dimension reduction). I think it would be interesting to hook a > recommender system into lucene, thus clustering would take place on > the > basis of user profile which may be built up automatically by > accumulating clicks and comparing to other visitors, with some > intelligent weighting to node inputs. > This calls into question what really a search is, does it have to be > instigated by the user or might their context and history suggest > enough > to pull in additional material? So this would be on top of snippets > and > also influence what snippets are returned as well as their > presentation. > Coller still would be to be able to recognise the user without a > login. > This might be implemented with cookies, but to deal with the user in > terms of types of interests, a series of faceted profiles, so that > portals could become fluidly dynamic. Sounds far flung, but I > actually > think it is just round the corner. > Let me know if this is of interest. > > Adam > > > -----Original Message----- > > From: integer [daniel prawdzik] [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, January 26, 2005 5:17 PM > > To: lucene-dev@jakarta.apache.org > > Subject: -> Grouping Search Results by Clustering Snippets: > > > > Grouping Search Results by Clustering Snippets: > > > > The presentation of search engines are typically long unsorted > lists > of > > results. To find the page you're looking for, is often > time-consuming > > and unsatisfying. > > Showing the results in groups by similar topics is a quite more > > suitable solution to give an user a quick overview over the > results. > > This can be done by a technology called cluster analysis. Actually > I'm > > working on my diploma master thesis about this topic. In my > > understanding, it's too nice to be born for the archive, so I want > to > > implement this feature in an opensource software. The coding of > this > > programm already gone pretty far, I've got some tests done and the > > results are impresive and might still get better [you can see some > > results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in > german] > > > > To make a long story short: > > I'm wondering, if this is an attractive feature for the lucene > > community? > > > > regards, > > integer > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]