Relevance Feedback
Hello, I'm a graduate student in the Information School at the University of Washington. I'm currently in the process of developing a prototype online IR system and I have been making use of Lucene's API. I'm just beginning to plan out some mechanisms for query expansion and relevance feedback. I haven't seen any mention of manual or automated query expansion or user relevance feedback in the Lucene documentation or FAQs. Does anyone have any experience with these processes in general and/or specifically with Lucene? If so, did you have to reimplement Lucene's Scorer to incorporate the relevance measures? Any pointers to information on how practical this redesign would be? Will I be able to implement query expansion and/or relevance feedback without reconstructing some of Lucene's underlying scoring code? Any information pointers you can give would be great. Thanks, Nathan -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Relevance Feedback
On Fri, Mar 29, 2002 at 12:11:03PM -0800, Joshua O'Madadhain wrote: On Fri, 29 Mar 2002, Nathan G. Freier wrote: I'm just beginning to plan out some mechanisms for query expansion and relevance feedback. I've also been doing research in IR using the Lucene API. [...] If you'd like to discuss this offline (since we may be getting off the list topic), feel free to email me. I'm curious about this topic, although I have absolutely no familiarity with it (beyond reading, many years ago, about the real estate browsing UI experiment where they let users click on inappropriate listings and refined the search based on that - a feature I've often wished for with web search engines). If you could either include me in the CC list, or send me a summary, or possibly (if others are also interested), continue the discussion here, I'd appreciate it. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Getting the terms that matched the HitDoc?
I've been looking around the org.apache.lucene.search.* code and can't seem to find an answer to this. I would like to present the terms that matched for each document in the Hits. For example, to the user it would look like: *** Search Phrase: double blind study found injections Search Terms: (doubl blind studi found inject) Results: Doc1 doubl blind studi found inject Doc2 doubl blind studi found inject Doc3 doubl blind studi inject Doc4 studi inject ... *** Is there a way to get the search terms that were used in the relevance scoring? thanks, rob http://www.robdecker.com/ http://www.planetside.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Relevance Feedback
[If this conversation is off-topic for this list, we can move it offline but I figure since at least one other is interested I'll keep it here for now.] Thanks for the reply, Joshua. I'm really just beginning my studies in IR so forgive any niavete. Please see my comments below. I've also been doing research in IR using the Lucene API. Regarding query expansion, I wrote my own code to do that, based on an algorithm I developed (which is a type of thesaurus-based expansion). Basically I ran the original query terms through the appropriate analyzers, decided what terms I wanted to add, and constructed my own Query from these terms (using TermQuery and BooleanQuery). This makes sense to me. I'm looking to do something similar but add in a final step of allowing the user to select which terms (of those that are found to be relevant through a document relevance feedback mechanism) to actually include in the expansion. (I'm thinking OKAPI/Giraffe style here... I suppose it's been done elsewhere too but Efthimis Efthimiadis is my reference point for this project.) Either way, I can see how this is done without interacting with Lucene's scoring/ranking code. While I did weight documents based on the terms they used (take a look at TermQuery.setBoost()), I didn't do relevance feedback per se. Of course, how you want to implement relevance feedback will depend on what mechanism you want to use. I'm planning on implementing an array of term weighting/document ranking algorithms such that the user can choose which algorithm to use. (The system is for educational use so I'm trying to develop some transparency here.) In the near future, I'd like to implement term weighting using: (1) porter, (2) W_P-Q, (3) F4, and (4) EMIM. Each of these relies on (user-based) relevance feedback. Now that I look at this, I wonder if these algorithms could also be implemented outside of the scoring. Perhaps I can request the relevance judgments from the user, recalculate the term weights, use the TermQuery.setBoost() method, and reiterate the search. Am I missing something? Will the term boost function properly given the term weights calculated by these algorithms? PS: Say hi to Wanda Pratt for me. I will... I haven't had the opportunity to meet her yet. Have you worked with her? [Off topic.. sorry.] Nathan -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Getting the terms that matched the HitDoc?
Hi Rob, One possibility might be to use the IndexReader's termDocs(Term) method and then use the skipTo(int) method on the TermDocs object. Iterate over your document list and then your term list for each document. Each time skipTo puts you at the document you are looking for, you can add that term to the document's term list. This probably isn't the most efficient method but it should work. Nathan Robert A. Decker wrote: I've been looking around the org.apache.lucene.search.* code and can't seem to find an answer to this. I would like to present the terms that matched for each document in the Hits. For example, to the user it would look like: *** Search Phrase: double blind study found injections Search Terms: (doubl blind studi found inject) Results: Doc1 doubl blind studi found inject Doc2 doubl blind studi found inject Doc3 doubl blind studi inject Doc4 studi inject ... *** Is there a way to get the search terms that were used in the relevance scoring? thanks, rob http://www.robdecker.com/ http://www.planetside.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Getting the terms that matched the HitDoc?
Sorry... That should be: 1) iterate over your term list 2) for each term, iterate over your document list using the document number in the skipTo() method 3) if skipTo puts you on the current document, add term to (document,term) list Nathan Nathan G. Freier wrote: Hi Rob, One possibility might be to use the IndexReader's termDocs(Term) method and then use the skipTo(int) method on the TermDocs object. Iterate over your document list and then your term list for each document. Each time skipTo puts you at the document you are looking for, you can add that term to the document's term list. This probably isn't the most efficient method but it should work. Nathan Robert A. Decker wrote: I've been looking around the org.apache.lucene.search.* code and can't seem to find an answer to this. I would like to present the terms that matched for each document in the Hits. For example, to the user it would look like: *** Search Phrase: double blind study found injections Search Terms: (doubl blind studi found inject) Results: Doc1 doubl blind studi found inject Doc2 doubl blind studi found inject Doc3 doubl blind studi inject Doc4 studi inject ... *** Is there a way to get the search terms that were used in the relevance scoring? thanks, rob http://www.robdecker.com/ http://www.planetside.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Getting the terms that matched the HitDoc?
Thanks. This seems to work for stored fields (although I haven't completed all of the steps). However, I want it to work with an indexed, but unstored field, and I'm not getting anything back in my termDocs. There must be some way to do it because this information is what is being used in the scoring. I'll keep poking around the source code, and if you or anyone else has any more suggestions they would be greatly appreciated. thanks, rob http://www.robdecker.com/ http://www.planetside.com/ On Fri, 29 Mar 2002, Nathan G. Freier wrote: Sorry... That should be: 1) iterate over your term list 2) for each term, iterate over your document list using the document number in the skipTo() method 3) if skipTo puts you on the current document, add term to (document,term) list Nathan Nathan G. Freier wrote: Hi Rob, One possibility might be to use the IndexReader's termDocs(Term) method and then use the skipTo(int) method on the TermDocs object. Iterate over your document list and then your term list for each document. Each time skipTo puts you at the document you are looking for, you can add that term to the document's term list. This probably isn't the most efficient method but it should work. Nathan Robert A. Decker wrote: I've been looking around the org.apache.lucene.search.* code and can't seem to find an answer to this. I would like to present the terms that matched for each document in the Hits. For example, to the user it would look like: *** Search Phrase: double blind study found injections Search Terms: (doubl blind studi found inject) Results: Doc1 doubl blind studi found inject Doc2 doubl blind studi found inject Doc3 doubl blind studi inject Doc4 studi inject ... *** Is there a way to get the search terms that were used in the relevance scoring? thanks, rob http://www.robdecker.com/ http://www.planetside.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]