Relevance Feedback (2)

2004-01-17 Thread Karl Koch
Hello group,

I would like to implement Relevance Feedback functionality for my system.
From the privious discussion in this group I know that this is not implemented
in Lucene. 

We all know that Relevance Feedback has two fields, which are 
1) Term Reweighting
2) Query Expansion

I am interesting in doing both of it. 

My first thought was that Term Reweighting can be solved with term boosing
and expansion, well, with basically generation a new query. Looking close to
one of the classic term reweighting formula's (Rocchio) however reveals that I
need access to the term vector of the relevant as well as the term vector of
the non-relevant documents. Bringing this to Lucenen it would mean, that I
need to have the score of each term in the relevant and non-relevant documents
to process the reweigthing formula.

Coming back to Lucene, this would mean that I need to extract Documents from
the Hits object after the search. From this Documents I would need to get
all terms and its scores.

However, Lucene does not provide this. Only Documents can be retrieved and
its scores. It does not provide access to its terms and therefore no access to
Term scores.

Does somebody have ideas of workaround for Term Reweighting and Query
Expansion withouth using the way over Hits. Does somebody have produces workarounds
and can provide it to me? 

Thank you very much in advance,
Karl


-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Relevance Feedback (2)

2004-01-17 Thread Karl Koch
Hello all,

oh, I just found a mail from Doug where he wrote that Dmitry Serebrennikov
developed something who provides Document vector access:

 Dmitry Serebrennikov [dmitrys?X0040;earthlink.net] has implemented a
substantial
 extension to Lucene which should help folks doing this sort of research. 
It
  provides an explicit vector representation for documents.  This way you
can,
 e.g., retrieve a number of documents, efficiently sum their vectors, then
 derive a new query from the sum.  This code was posted to the list a long
 while back, but is now out of date.  As soon as the 1.2 release is final,
 and Dmitry has time, he intends to merge it into Lucene.

Who has this code? Could somebody email it to me? I would highly appreciate
it.

Is there any attempt from Dmitry or somebody else to adapt it to Lucene 1.3?


I wish you all a nice weekend,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Query reformulation (Relevance Feedback) in Lucene?

2003-12-03 Thread ambiesense
Hello Group of Lucene users,

query reformulation is understood as a effective way to improve retrieval
power significantly. The theory teaches us that it consists of two basic steps:

a) Query expansion (with new terms)
b) Reweighting of the terms in the expanded query

User relevance feedback is the most popular reformulation strategy to
perform query reformulation because it is user centered. 

Does Lucene generally support this approach? Especially I am wondering if
...

1) there are classes which directly support query expansion OR
2) I would need to do some programming on top of more generic parts? 

I do not know about 1). All I know about 2) is what I think could work with
no evidence if it actually does :-) I think Query expansion with new terms is
easy and would just need to create a new QueryParser object with existing
terms plus the top n (most frequent terms) of the (in the user point of view)
relevant documents. Then I would have a extended query (a). However I do not
know how can I reweight this terms? When I formulate the Query I do not
actually know about there weights since it is done internally. Does anybody have
any idea? Did anybody try to solve this and has some examples which he/she
would like to provide?

Cheers,
Ralf

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Query reformulation (Relevance Feedback) in Lucene?

2003-12-03 Thread Chong, Herb
there is no direct support in Lucene for this. there are several strategies for 
automatic query expansion and most of them rely on either extensive domain-specific 
analysis of the top N documents on the assumption that the search engine performs well 
enough to guarantee that the top N documents are all relevant, or that there is a 
special domain-specific corpus of good documents where the initial search is against 
these picked documents and their terms mined to augment the initial query before 
resubmitting to the original corpus. all of these things are things you have to do 
yourself.  term reweighting happens by using term boost. how much you boost by is an 
open question.

Herb...

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 03, 2003 6:55 AM
To: [EMAIL PROTECTED]
Subject: Query reformulation (Relevance Feedback) in Lucene?


Hello Group of Lucene users,

query reformulation is understood as a effective way to improve retrieval
power significantly. The theory teaches us that it consists of two basic steps:

a) Query expansion (with new terms)
b) Reweighting of the terms in the expanded query

User relevance feedback is the most popular reformulation strategy to
perform query reformulation because it is user centered. 

Does Lucene generally support this approach? Especially I am wondering if
...

1) there are classes which directly support query expansion OR
2) I would need to do some programming on top of more generic parts? 

I do not know about 1). All I know about 2) is what I think could work with
no evidence if it actually does :-) I think Query expansion with new terms is
easy and would just need to create a new QueryParser object with existing
terms plus the top n (most frequent terms) of the (in the user point of view)
relevant documents. Then I would have a extended query (a). However I do not
know how can I reweight this terms? When I formulate the Query I do not
actually know about there weights since it is done internally. Does anybody have
any idea? Did anybody try to solve this and has some examples which he/she
would like to provide?

Cheers,
Ralf

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Relevance Feedback with Lucene

2002-04-02 Thread Peter Carlson


Hi Melissa,

Please ask these questions to the mailling list.

Also, I have not seen the code that Dmitry  has written.

--Peter

On 4/2/02 4:44 AM, Melissa Mifsud [EMAIL PROTECTED] wrote:

 Hi, 
  
 I'm an undergraduate student also doing some research in IR for my Honours
 thesis.
  
 I haven't looked into how I'm going to add Relevance Feedback to Lucene.
 however my plans were to use the Rocchio Method for term-reweighting.
  
 I followed the topic thread in the mailing list and was wondering if you guys
 are still discussing the issue.
  
 Also, has anyone got hold of the extension that Doug Cutting mentioned or got
 hold of Dmitry Serebrennikov [[EMAIL PROTECTED]] ?
  
 Any help/tips will be appreciated!
  
 Melissa
 





Re: Getting the terms that matched the HitDoc? Relevance Feedback

2002-03-31 Thread Dmitry Serebrennikov



Subject:

RE: Relevance Feedback
From:

Doug Cutting [EMAIL PROTECTED]
Date:

Sat, 30 Mar 2002 08:51:39 -0800
To:

Lucene Users List [EMAIL PROTECTED]


Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a substantial
extension to Lucene which should help folks doing this sort of research.  It
provides an explicit vector representation for documents.  This way you can,
e.g., retrieve a number of documents, efficiently sum their vectors, then
derive a new query from the sum.  This code was posted to the list a long
while back, but is now out of date.  As soon as the 1.2 release is final,
and Dmitry has time, he intends to merge it into Lucene.

Doug

Thanks, Doug.

This is true. The code was actually intended, and is being used, for 
something more like what is being discussed on the Relevance Feedback 
thread - retrieving lists of terms based on matching documents. Terms as 
well as Term Positions are available from the API given a document. 
There is also a cost for using this API. It adds three or four files per 
index segment, and one of them is as large as the .prx file (provided 
that you choose to vectorize every indexed field in the document). 
Another issue with the code is that it does not (yet) support use with 
unoptimized indexes (those with more than one segment).

Dmitry.





--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Relevance Feedback

2002-03-29 Thread Nathan G. Freier

Hello,

I'm a graduate student in the Information School at the University of 
Washington.  I'm currently in the process of developing a prototype 
online IR system and I have been making use of Lucene's API.  I'm just 
beginning to plan out some mechanisms for query expansion and relevance 
feedback.  I haven't seen any mention of manual or automated query 
expansion or user relevance feedback in the Lucene documentation or FAQs.  

Does anyone have any experience with these processes in general and/or 
specifically with Lucene?  If so, did you have to reimplement Lucene's 
Scorer to incorporate the relevance measures?  Any pointers to 
information on how practical this redesign would be?  Will I be able to 
implement query expansion and/or relevance feedback without 
reconstructing some of Lucene's underlying scoring code?

Any information pointers you can give would be great.

Thanks,
Nathan


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Relevance Feedback

2002-03-29 Thread Steven J. Owens

On Fri, Mar 29, 2002 at 12:11:03PM -0800, Joshua O'Madadhain wrote:
 On Fri, 29 Mar 2002, Nathan G. Freier wrote:
  I'm just beginning to plan out some mechanisms for query expansion
  and relevance feedback.
 
 I've also been doing research in IR using the Lucene API.
 [...]
 If you'd like to discuss this offline (since we may be getting off the
 list topic), feel free to email me.

 I'm curious about this topic, although I have absolutely no
familiarity with it (beyond reading, many years ago, about the real
estate browsing UI experiment where they let users click on
inappropriate listings and refined the search based on that - a
feature I've often wished for with web search engines).

 If you could either include me in the CC list, or send me a
summary, or possibly (if others are also interested), continue the
discussion here, I'd appreciate it.

Steven J. Owens
[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Relevance Feedback

2002-03-29 Thread Nathan G. Freier

[If this conversation is off-topic for this list, we can move it offline 
but I figure since at least one other is interested I'll keep it here 
for now.]

Thanks for the reply, Joshua.  I'm really just beginning my studies in 
IR so forgive any niavete.  Please see my comments below.

I've also been doing research in IR using the Lucene API.  Regarding query
expansion, I wrote my own code to do that, based on an algorithm I
developed (which is a type of thesaurus-based expansion).  Basically I ran
the original query terms through the appropriate analyzers, decided what
terms I wanted to add, and constructed my own Query from these terms
(using TermQuery and BooleanQuery).


This makes sense to me.  I'm looking to do something similar but add in 
a final step of allowing the user to select which terms (of those that 
are found to be relevant through a document relevance feedback 
mechanism) to actually include in the expansion.  (I'm thinking 
OKAPI/Giraffe style here... I suppose it's been done elsewhere too but 
Efthimis Efthimiadis is my reference point for this project.)  Either 
way, I can see how this is done without interacting with Lucene's 
scoring/ranking code.


While I did weight documents based on the terms they used (take a look at
TermQuery.setBoost()), I didn't do relevance feedback per se.  Of course,
how you want to implement relevance feedback will depend on what mechanism
you want to use.


I'm planning on implementing an array of term weighting/document ranking 
algorithms such that the user can choose which algorithm to use.  (The 
system is for educational use so I'm trying to develop some transparency 
here.)  In the near future, I'd like to implement term weighting using: 
(1) porter, (2) W_P-Q, (3) F4, and (4) EMIM.  Each of these relies on 
(user-based) relevance feedback.  

Now that I look at this, I wonder if these algorithms could also be 
implemented outside of the scoring.  Perhaps I can request the relevance 
judgments from the user, recalculate the term weights, use the 
TermQuery.setBoost() method, and reiterate the search.  Am I missing 
something?  Will the term boost function properly given the term weights 
calculated by these algorithms?


PS: Say hi to Wanda Pratt for me.  

I will... I haven't had the opportunity to meet her yet.  Have you 
worked with her? [Off topic.. sorry.]

Nathan


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]