Relevance Feedback

2002-03-29 Thread Nathan G. Freier

Hello,

I'm a graduate student in the Information School at the University of 
Washington.  I'm currently in the process of developing a prototype 
online IR system and I have been making use of Lucene's API.  I'm just 
beginning to plan out some mechanisms for query expansion and relevance 
feedback.  I haven't seen any mention of manual or automated query 
expansion or user relevance feedback in the Lucene documentation or FAQs.  

Does anyone have any experience with these processes in general and/or 
specifically with Lucene?  If so, did you have to reimplement Lucene's 
Scorer to incorporate the relevance measures?  Any pointers to 
information on how practical this redesign would be?  Will I be able to 
implement query expansion and/or relevance feedback without 
reconstructing some of Lucene's underlying scoring code?

Any information pointers you can give would be great.

Thanks,
Nathan


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Relevance Feedback

2002-03-29 Thread Steven J. Owens

On Fri, Mar 29, 2002 at 12:11:03PM -0800, Joshua O'Madadhain wrote:
 On Fri, 29 Mar 2002, Nathan G. Freier wrote:
  I'm just beginning to plan out some mechanisms for query expansion
  and relevance feedback.
 
 I've also been doing research in IR using the Lucene API.
 [...]
 If you'd like to discuss this offline (since we may be getting off the
 list topic), feel free to email me.

 I'm curious about this topic, although I have absolutely no
familiarity with it (beyond reading, many years ago, about the real
estate browsing UI experiment where they let users click on
inappropriate listings and refined the search based on that - a
feature I've often wished for with web search engines).

 If you could either include me in the CC list, or send me a
summary, or possibly (if others are also interested), continue the
discussion here, I'd appreciate it.

Steven J. Owens
[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Getting the terms that matched the HitDoc?

2002-03-29 Thread Robert A. Decker

I've been looking around the org.apache.lucene.search.* code and can't
seem to find an answer to this.

I would like to present the terms that matched for each document in the
Hits. For example, to the user it would look like:

***
Search Phrase: double blind study found injections
Search Terms: (doubl blind studi found inject)

Results:
Doc1 doubl blind studi found inject
Doc2 doubl blind studi found inject
Doc3 doubl blind studi inject
Doc4 studi inject
...
***

Is there a way to get the search terms that were used in the relevance
scoring?


thanks,
rob

http://www.robdecker.com/
http://www.planetside.com/


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Relevance Feedback

2002-03-29 Thread Nathan G. Freier

[If this conversation is off-topic for this list, we can move it offline 
but I figure since at least one other is interested I'll keep it here 
for now.]

Thanks for the reply, Joshua.  I'm really just beginning my studies in 
IR so forgive any niavete.  Please see my comments below.

I've also been doing research in IR using the Lucene API.  Regarding query
expansion, I wrote my own code to do that, based on an algorithm I
developed (which is a type of thesaurus-based expansion).  Basically I ran
the original query terms through the appropriate analyzers, decided what
terms I wanted to add, and constructed my own Query from these terms
(using TermQuery and BooleanQuery).


This makes sense to me.  I'm looking to do something similar but add in 
a final step of allowing the user to select which terms (of those that 
are found to be relevant through a document relevance feedback 
mechanism) to actually include in the expansion.  (I'm thinking 
OKAPI/Giraffe style here... I suppose it's been done elsewhere too but 
Efthimis Efthimiadis is my reference point for this project.)  Either 
way, I can see how this is done without interacting with Lucene's 
scoring/ranking code.


While I did weight documents based on the terms they used (take a look at
TermQuery.setBoost()), I didn't do relevance feedback per se.  Of course,
how you want to implement relevance feedback will depend on what mechanism
you want to use.


I'm planning on implementing an array of term weighting/document ranking 
algorithms such that the user can choose which algorithm to use.  (The 
system is for educational use so I'm trying to develop some transparency 
here.)  In the near future, I'd like to implement term weighting using: 
(1) porter, (2) W_P-Q, (3) F4, and (4) EMIM.  Each of these relies on 
(user-based) relevance feedback.  

Now that I look at this, I wonder if these algorithms could also be 
implemented outside of the scoring.  Perhaps I can request the relevance 
judgments from the user, recalculate the term weights, use the 
TermQuery.setBoost() method, and reiterate the search.  Am I missing 
something?  Will the term boost function properly given the term weights 
calculated by these algorithms?


PS: Say hi to Wanda Pratt for me.  

I will... I haven't had the opportunity to meet her yet.  Have you 
worked with her? [Off topic.. sorry.]

Nathan


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Getting the terms that matched the HitDoc?

2002-03-29 Thread Nathan G. Freier

Hi Rob,

One possibility might be to use the IndexReader's termDocs(Term) method 
and then use the skipTo(int) method on the TermDocs object.  Iterate 
over your document list and then your term list for each document.  Each 
time skipTo puts you at the document you are looking for, you can add 
that term to the document's term list.  This probably isn't the most 
efficient method but it should work.

Nathan

Robert A. Decker wrote:

I've been looking around the org.apache.lucene.search.* code and can't
seem to find an answer to this.

I would like to present the terms that matched for each document in the
Hits. For example, to the user it would look like:

***
Search Phrase: double blind study found injections
Search Terms: (doubl blind studi found inject)

Results:
Doc1 doubl blind studi found inject
Doc2 doubl blind studi found inject
Doc3 doubl blind studi inject
Doc4 studi inject
...
***

Is there a way to get the search terms that were used in the relevance
scoring?


thanks,
rob

http://www.robdecker.com/
http://www.planetside.com/


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Getting the terms that matched the HitDoc?

2002-03-29 Thread Nathan G. Freier

Sorry... That should be:

1) iterate over your term list
2) for each term, iterate over your document list using the document 
number in the skipTo() method
3) if skipTo puts you on the current document, add term to 
(document,term) list

Nathan

Nathan G. Freier wrote:

 Hi Rob,

 One possibility might be to use the IndexReader's termDocs(Term) 
 method and then use the skipTo(int) method on the TermDocs object.  
 Iterate over your document list and then your term list for each 
 document.  Each time skipTo puts you at the document you are looking 
 for, you can add that term to the document's term list.  This probably 
 isn't the most efficient method but it should work.

 Nathan

 Robert A. Decker wrote:

 I've been looking around the org.apache.lucene.search.* code and can't
 seem to find an answer to this.

 I would like to present the terms that matched for each document in the
 Hits. For example, to the user it would look like:

 ***
 Search Phrase: double blind study found injections
 Search Terms: (doubl blind studi found inject)

 Results:
 Doc1 doubl blind studi found inject
 Doc2 doubl blind studi found inject
 Doc3 doubl blind studi inject
 Doc4 studi inject
 ...
 ***

 Is there a way to get the search terms that were used in the relevance
 scoring?


 thanks,
 rob

 http://www.robdecker.com/
 http://www.planetside.com/


 -- 
 To unsubscribe, e-mail:   
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: 
 mailto:[EMAIL PROTECTED]




 -- 
 To unsubscribe, e-mail:   
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: 
 mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Getting the terms that matched the HitDoc?

2002-03-29 Thread Robert A. Decker

Thanks. This seems to work for stored fields (although I haven't completed
all of the steps). However, I want it to work with an indexed, but
unstored field, and I'm not getting anything back in my termDocs.

There must be some way to do it because this information is what is being
used in the scoring.

I'll keep poking around the source code, and if you or anyone else has any
more suggestions they would be greatly appreciated.

thanks,
rob

http://www.robdecker.com/
http://www.planetside.com/

On Fri, 29 Mar 2002, Nathan G. Freier wrote:

 Sorry... That should be:
 
 1) iterate over your term list
 2) for each term, iterate over your document list using the document 
 number in the skipTo() method
 3) if skipTo puts you on the current document, add term to 
 (document,term) list
 
 Nathan
 
 Nathan G. Freier wrote:
 
  Hi Rob,
 
  One possibility might be to use the IndexReader's termDocs(Term) 
  method and then use the skipTo(int) method on the TermDocs object.  
  Iterate over your document list and then your term list for each 
  document.  Each time skipTo puts you at the document you are looking 
  for, you can add that term to the document's term list.  This probably 
  isn't the most efficient method but it should work.
 
  Nathan
 
  Robert A. Decker wrote:
 
  I've been looking around the org.apache.lucene.search.* code and can't
  seem to find an answer to this.
 
  I would like to present the terms that matched for each document in the
  Hits. For example, to the user it would look like:
 
  ***
  Search Phrase: double blind study found injections
  Search Terms: (doubl blind studi found inject)
 
  Results:
  Doc1 doubl blind studi found inject
  Doc2 doubl blind studi found inject
  Doc3 doubl blind studi inject
  Doc4 studi inject
  ...
  ***
 
  Is there a way to get the search terms that were used in the relevance
  scoring?
 
 
  thanks,
  rob
 
  http://www.robdecker.com/
  http://www.planetside.com/
 
 
  -- 
  To unsubscribe, e-mail:   
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail: 
  mailto:[EMAIL PROTECTED]
 
 
 
 
  -- 
  To unsubscribe, e-mail:   
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail: 
  mailto:[EMAIL PROTECTED]
 
 
 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]