Relevance Feedback (2)
Hello group, I would like to implement Relevance Feedback functionality for my system. From the privious discussion in this group I know that this is not implemented in Lucene. We all know that Relevance Feedback has two fields, which are 1) Term Reweighting 2) Query Expansion I am interesting in doing both of it. My first thought was that Term Reweighting can be solved with term boosing and expansion, well, with basically generation a new query. Looking close to one of the classic term reweighting formula's (Rocchio) however reveals that I need access to the term vector of the relevant as well as the term vector of the non-relevant documents. Bringing this to Lucenen it would mean, that I need to have the score of each term in the relevant and non-relevant documents to process the reweigthing formula. Coming back to Lucene, this would mean that I need to extract Documents from the Hits object after the search. From this Documents I would need to get all terms and its scores. However, Lucene does not provide this. Only Documents can be retrieved and its scores. It does not provide access to its terms and therefore no access to Term scores. Does somebody have ideas of workaround for Term Reweighting and Query Expansion withouth using the way over Hits. Does somebody have produces workarounds and can provide it to me? Thank you very much in advance, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relevance Feedback (2)
Hello all, oh, I just found a mail from Doug where he wrote that Dmitry Serebrennikov developed something who provides Document vector access: Dmitry Serebrennikov [dmitrys?X0040;earthlink.net] has implemented a substantial extension to Lucene which should help folks doing this sort of research. It provides an explicit vector representation for documents. This way you can, e.g., retrieve a number of documents, efficiently sum their vectors, then derive a new query from the sum. This code was posted to the list a long while back, but is now out of date. As soon as the 1.2 release is final, and Dmitry has time, he intends to merge it into Lucene. Who has this code? Could somebody email it to me? I would highly appreciate it. Is there any attempt from Dmitry or somebody else to adapt it to Lucene 1.3? I wish you all a nice weekend, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query reformulation (Relevance Feedback) in Lucene?
Hello Group of Lucene users, query reformulation is understood as a effective way to improve retrieval power significantly. The theory teaches us that it consists of two basic steps: a) Query expansion (with new terms) b) Reweighting of the terms in the expanded query User relevance feedback is the most popular reformulation strategy to perform query reformulation because it is user centered. Does Lucene generally support this approach? Especially I am wondering if ... 1) there are classes which directly support query expansion OR 2) I would need to do some programming on top of more generic parts? I do not know about 1). All I know about 2) is what I think could work with no evidence if it actually does :-) I think Query expansion with new terms is easy and would just need to create a new QueryParser object with existing terms plus the top n (most frequent terms) of the (in the user point of view) relevant documents. Then I would have a extended query (a). However I do not know how can I reweight this terms? When I formulate the Query I do not actually know about there weights since it is done internally. Does anybody have any idea? Did anybody try to solve this and has some examples which he/she would like to provide? Cheers, Ralf -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query reformulation (Relevance Feedback) in Lucene?
there is no direct support in Lucene for this. there are several strategies for automatic query expansion and most of them rely on either extensive domain-specific analysis of the top N documents on the assumption that the search engine performs well enough to guarantee that the top N documents are all relevant, or that there is a special domain-specific corpus of good documents where the initial search is against these picked documents and their terms mined to augment the initial query before resubmitting to the original corpus. all of these things are things you have to do yourself. term reweighting happens by using term boost. how much you boost by is an open question. Herb... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 03, 2003 6:55 AM To: [EMAIL PROTECTED] Subject: Query reformulation (Relevance Feedback) in Lucene? Hello Group of Lucene users, query reformulation is understood as a effective way to improve retrieval power significantly. The theory teaches us that it consists of two basic steps: a) Query expansion (with new terms) b) Reweighting of the terms in the expanded query User relevance feedback is the most popular reformulation strategy to perform query reformulation because it is user centered. Does Lucene generally support this approach? Especially I am wondering if ... 1) there are classes which directly support query expansion OR 2) I would need to do some programming on top of more generic parts? I do not know about 1). All I know about 2) is what I think could work with no evidence if it actually does :-) I think Query expansion with new terms is easy and would just need to create a new QueryParser object with existing terms plus the top n (most frequent terms) of the (in the user point of view) relevant documents. Then I would have a extended query (a). However I do not know how can I reweight this terms? When I formulate the Query I do not actually know about there weights since it is done internally. Does anybody have any idea? Did anybody try to solve this and has some examples which he/she would like to provide? Cheers, Ralf - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relevance Feedback with Lucene
Hi Melissa, Please ask these questions to the mailling list. Also, I have not seen the code that Dmitry has written. --Peter On 4/2/02 4:44 AM, Melissa Mifsud [EMAIL PROTECTED] wrote: Hi, I'm an undergraduate student also doing some research in IR for my Honours thesis. I haven't looked into how I'm going to add Relevance Feedback to Lucene. however my plans were to use the Rocchio Method for term-reweighting. I followed the topic thread in the mailing list and was wondering if you guys are still discussing the issue. Also, has anyone got hold of the extension that Doug Cutting mentioned or got hold of Dmitry Serebrennikov [[EMAIL PROTECTED]] ? Any help/tips will be appreciated! Melissa
Re: Getting the terms that matched the HitDoc? Relevance Feedback
Subject: RE: Relevance Feedback From: Doug Cutting [EMAIL PROTECTED] Date: Sat, 30 Mar 2002 08:51:39 -0800 To: Lucene Users List [EMAIL PROTECTED] Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a substantial extension to Lucene which should help folks doing this sort of research. It provides an explicit vector representation for documents. This way you can, e.g., retrieve a number of documents, efficiently sum their vectors, then derive a new query from the sum. This code was posted to the list a long while back, but is now out of date. As soon as the 1.2 release is final, and Dmitry has time, he intends to merge it into Lucene. Doug Thanks, Doug. This is true. The code was actually intended, and is being used, for something more like what is being discussed on the Relevance Feedback thread - retrieving lists of terms based on matching documents. Terms as well as Term Positions are available from the API given a document. There is also a cost for using this API. It adds three or four files per index segment, and one of them is as large as the .prx file (provided that you choose to vectorize every indexed field in the document). Another issue with the code is that it does not (yet) support use with unoptimized indexes (those with more than one segment). Dmitry. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Relevance Feedback
Hello, I'm a graduate student in the Information School at the University of Washington. I'm currently in the process of developing a prototype online IR system and I have been making use of Lucene's API. I'm just beginning to plan out some mechanisms for query expansion and relevance feedback. I haven't seen any mention of manual or automated query expansion or user relevance feedback in the Lucene documentation or FAQs. Does anyone have any experience with these processes in general and/or specifically with Lucene? If so, did you have to reimplement Lucene's Scorer to incorporate the relevance measures? Any pointers to information on how practical this redesign would be? Will I be able to implement query expansion and/or relevance feedback without reconstructing some of Lucene's underlying scoring code? Any information pointers you can give would be great. Thanks, Nathan -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Relevance Feedback
On Fri, Mar 29, 2002 at 12:11:03PM -0800, Joshua O'Madadhain wrote: On Fri, 29 Mar 2002, Nathan G. Freier wrote: I'm just beginning to plan out some mechanisms for query expansion and relevance feedback. I've also been doing research in IR using the Lucene API. [...] If you'd like to discuss this offline (since we may be getting off the list topic), feel free to email me. I'm curious about this topic, although I have absolutely no familiarity with it (beyond reading, many years ago, about the real estate browsing UI experiment where they let users click on inappropriate listings and refined the search based on that - a feature I've often wished for with web search engines). If you could either include me in the CC list, or send me a summary, or possibly (if others are also interested), continue the discussion here, I'd appreciate it. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Relevance Feedback
[If this conversation is off-topic for this list, we can move it offline but I figure since at least one other is interested I'll keep it here for now.] Thanks for the reply, Joshua. I'm really just beginning my studies in IR so forgive any niavete. Please see my comments below. I've also been doing research in IR using the Lucene API. Regarding query expansion, I wrote my own code to do that, based on an algorithm I developed (which is a type of thesaurus-based expansion). Basically I ran the original query terms through the appropriate analyzers, decided what terms I wanted to add, and constructed my own Query from these terms (using TermQuery and BooleanQuery). This makes sense to me. I'm looking to do something similar but add in a final step of allowing the user to select which terms (of those that are found to be relevant through a document relevance feedback mechanism) to actually include in the expansion. (I'm thinking OKAPI/Giraffe style here... I suppose it's been done elsewhere too but Efthimis Efthimiadis is my reference point for this project.) Either way, I can see how this is done without interacting with Lucene's scoring/ranking code. While I did weight documents based on the terms they used (take a look at TermQuery.setBoost()), I didn't do relevance feedback per se. Of course, how you want to implement relevance feedback will depend on what mechanism you want to use. I'm planning on implementing an array of term weighting/document ranking algorithms such that the user can choose which algorithm to use. (The system is for educational use so I'm trying to develop some transparency here.) In the near future, I'd like to implement term weighting using: (1) porter, (2) W_P-Q, (3) F4, and (4) EMIM. Each of these relies on (user-based) relevance feedback. Now that I look at this, I wonder if these algorithms could also be implemented outside of the scoring. Perhaps I can request the relevance judgments from the user, recalculate the term weights, use the TermQuery.setBoost() method, and reiterate the search. Am I missing something? Will the term boost function properly given the term weights calculated by these algorithms? PS: Say hi to Wanda Pratt for me. I will... I haven't had the opportunity to meet her yet. Have you worked with her? [Off topic.. sorry.] Nathan -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]