Re: Passage Search

Paul Elschot Fri, 28 Jan 2005 00:26:47 -0800

On Friday 28 January 2005 01:10, Joaquin Delgado wrote:
> What is described here as "Passage Search" is nothing more than a
> PhraseQuery with a large slope. I think it's a UI problem rather than a
> ranking algorithm. For example you may want to have translate simple
> multi-term queries into phrasequery by default (instead of AND or OR).
> Let's say search you convert the query into <"man bites dog"~1000> every
> time someone types just <man bites dog> so to give more weight to
> documents/passages that contain the words near to each other, like
> Google does by does.
> 
> What I would like however is to have query variables that can be used
> when you don't have a good way of estimating the slope. It would be
> ideal to have a query syntax implementation that would allow doing
> something like:
> <"man bites dog"~DOCSIZE> that would execute a phrasequery with the
> slope being the individual document size in number of characters of each
> hit.


By default Lucene uses 1/(1 + slop) to dampen the frequency of a proximity
match, where slop is the number of indexed terms in the document
between the matching terms.
It's not really a "slope", but there could be a bit fo confusion between slop 
and slope.

This means that the query slop parameter itself (DOCSIZE above)
has no influence on the score of the document. The query slop parameter
only determines which documents will be in the resulting hits.

So, a query with a very large slop (larger than the max doc size)
might do what you need.

Lucene tries to find the best scoring document, not
the best scoring passage. It does this by summing the sloppy frequencies
of the matches in a document.
To find the document with the best passage, one could replace this sum
by a maximum operation, and maybe adapt the document length weighting.

To find back the best passage in that document Lucene is not of direct help, 
but a highlighted view of the whole document probably comes a long way.

> Has anyone thought of how to introduce variables into the query
> language?

It's already possible to use a different Similarity implementation for each
(sub)query. One could extend the query language to associate a different
Similarity with each (sub)query, but that would require quite a bit of
knowledge to use well. It's a tradeoff between using
the parser of the query language and using the parser of the (java) compiler.

Regards,
Paul Elschot.

 
> J.D.
> 
> -----Original Message-----
> From: Giulio Cesare Solaroli [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, January 27, 2005 5:39 PM
> To: Lucene Developers List
> Subject: Passage Search
> 
> Hi all,
> 
> reading some posts in Steve Green's weblog, I found the description of
> a "Passage search"
> (http://blogs.sun.com/roller/page/searchguy/20050126).
> 
> Translated into Lucene words, this looks like a nice score algorithm
> that could be applied to rank the matching documents.
> 
> Does anybody have any idea on how the suggested approach stands up to
> Lucene current algorithm, and how difficult would be to inplement also
> the "Passage search" scoring?
> 
> Thanks for your attention.
> 
> Regards,
> 
> Giulio Cesare Solaroli
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Passage Search

Reply via email to