Re: Can I do boosting based on term postions?

vini Thu, 16 Aug 2007 05:41:00 -0700

Hi Shailendra,

Could you pls send the same class file to my gmail a/c too ?


Regards
vini

Shailendra Sharma wrote:
> 
> Ah, Good way !
> 
> On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>>
>> On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
>> > Paul,
>> >
>> > If I understand Cedric right, he wants to have different boosting
>> depending
>> > on search term positions in the document. By using SpanFirstQuery he
>> will
>> > only be able to consider in terms till particular position;
>>
>>
>> > but he won't be
>> > able to do something like following:
>> >   a) Give 100% boosting to matching in first 100 words.
>> >   b) Give 80% boosting to matching in next 100 words.
>> >   c) Give 60% boosting to matching in next 100 words.
>>
>> > Though it can be done by writing DisjunctionMaxQuery having multiple
>> > SpanFirstQuery with different boosting - but I see it as a workaround
>> only
>> > and not the direct and efficient solution.
>>
>> You're right, but SpanFirstQuery needs only a minor modification
>> for this to work.
>>
>> This modification to SpanFirstQuery would be that the Spans
>> returned by SpanFirstQuery.getSpans() must always return 0
>> from its start() method. Then the slop passed to sloppyFreq(slop)
>> would be the distance from the beginning of the indexed field
>> to the end of the Spans of the SpanQuery passed to SpanFirstQuery.
>>
>> Then the following should work:
>>
>> Term firstTerm = .... ;
>>
>> SpanFirstQuery sfq = new SpanFirstQuery(
>>   new SpanTermQuery( firstTerm),
>>   Integer.MAX_VALUE)) {
>> ...
>> public Similarity getSimilarity() {
>> return new Similarity() {
>> ...
>> float sloppyFreq(slop) {
>>   return (slop < 100)  ? 1.0f
>>            : (slop < 200) ? 0.8f
>>            : (slop < 300) ? 0.6f
>>            : 0.4f ; // etc. etc.
>> }}}}
>>
>>
>> Actually, I'm a bit surprised that SpanFirstQuery does not work that
>> way now.
>>
>> Regards,
>> Paul Elschot
>>
>>
>> >
>> > Cedric,
>> >
>> > I am sending you the implementation of SpanTermQuery to your gmail
>> > account (lucene
>> > mailing list is bouncing email with attachment). I have named the class
>> as
>> > VSpanTermQuery (I have followed the same package hierarchy as lucene).
>> You
>> > also need to extend VSimilarity class - which would require
>> implementation
>> > of method scoreSpan(..).
>> >
>> > Let me know how it went. Though I did a testing for it, but before
>> > submitting to contrib, I need to do extensive testing.
>> >
>> > Thanks,
>> > Shailendra
>> >
>> > On 8/3/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>> > >
>> > > Cedric,
>> > >
>> > > You can choose the end limit for SpanFirstQuery yourself.
>> > >
>> > > Regards,
>> > > Paul Elschot
>> > >
>> > >
>> > > On Friday 03 August 2007 05:38, Cedric Ho wrote:
>> > > > Hi Paul,
>> > > >
>> > > > Isn't SpanFirstQuery only match those with position less than a
>> > > > certain end position?
>> > > >
>> > > > I am rather looking for a query that would score a document higher
>> for
>> > > > terms appear near the start but not totally discard those with
>> terms
>> > > > appear near the end.
>> > > >
>> > > > Regards,
>> > > > Cedric
>> > > >
>> > > > On 8/2/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>> > > > > Cedric,
>> > > > >
>> > > > > SpanFirstQuery could be a solution without payloads.
>> > > > > You may want to give it your own Similarity.sloppyFreq() .
>> > > > >
>> > > > > Regards,
>> > > > > Paul Elschot
>> > > > >
>> > > > > On Thursday 02 August 2007 04:07, Cedric Ho wrote:
>> > > > > > Thanks for the quick response =)
>> > > > > >
>> > > > > > On 8/1/07, Shailendra Sharma <[EMAIL PROTECTED]>
>> wrote:
>> > > > > > > Yes, it is easily doable through "Payload" facility. During
>> > > indexing
>> > > > > process
>> > > > > > > (mainly tokenization), you need to push this extra
>> information
>> in
>> > > each
>> > > > > > > token. And then you can use BoostingTermQuery for using
>> Payload
>> > > value
>> > > to
>> > > > > > > include Payload in the score. You also need to implement
>> > > Similarity
>> > > for
>> > > > > this
>> > > > > > > (mainly scorePayload method).
>> > > > > >
>> > > > > > If I store, say a custom boost factor as Payload, does it means
>> that
>> > > I
>> > > > > > will store one more byte per term per document in the index
>> file? So
>> > > > > > the index file would be much larger?
>> > > > > >
>> > > > > > >
>> > > > > > > Other way can be to extend SpanTermQuery, this already
>> calculates
>> > > the
>> > > > > > > position of match. You just need to do something to use this
>> > > position
>> > > > > value
>> > > > > > > in the score calculation.
>> > > > > >
>> > > > > > I see that SpanTermQuery takes a TermPositions from the
>> indexReader
>> > > > > > and I can get the term position from there. However I am not
>> sure
>> > > how
>> > > > > > to incorporate it into the score calculation. Would you mind
>> give a
>> > > > > > little more detail on this?
>> > > > > >
>> > > > > > >
>> > > > > > > One possible advantage of SpanTermQuery approach is that you
>> can
>> > > play
>> > > > > > > around, without re-creating indices everytime.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Shailendra Sharma,
>> > > > > > > CTO, Ver se' Innovation Pvt. Ltd.
>> > > > > > > Bangalore, India
>> > > > > > >
>> > > > > > > On 8/1/07, Cedric Ho <[EMAIL PROTECTED]> wrote:
>> > > > > > > >
>> > > > > > > > Hi all,
>> > > > > > > >
>> > > > > > > > I was wondering if it is possible to do boosting by search
>> > > terms'
>> > > > > > > > position in the document.
>> > > > > > > >
>> > > > > > > > for example:
>> > > > > > > > search terms appear in the first 100 words, or first 10%
>> words,
>> > > or
>> > > in
>> > > > > > > > first two paragraphs would be given higher score.
>> > > > > > > >
>> > > > > > > > Is it achievable through using the new Payload function in
>> > > lucene
>> > > 2.2?
>> > > > > > > > Or are there any easier ways to achieve these ?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Cedric
>> > > > > > > >
>> > > > > > > >
>> > > ---------------------------------------------------------------------
>> > > > > > > > To unsubscribe, e-mail:
>> [EMAIL PROTECTED]
>> > > > > > > > For additional commands, e-mail:
>> > > [EMAIL PROTECTED]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Cedric
>> > > > > >
>> > > > > >
>> > > ---------------------------------------------------------------------
>> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > > > > > For additional commands, e-mail:
>> [EMAIL PROTECTED]
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > > > > For additional commands, e-mail: [EMAIL PROTECTED]
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > [EMAIL PROTECTED]
>> > > >
>> > > >
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > > For additional commands, e-mail: [EMAIL PROTECTED]
>> > >
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Can-I-do-boosting-based-on-term-postions--tf4197947.html#a12180448
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Can I do boosting based on term postions?

Reply via email to