Re: Can I do boosting based on term postions?

Shailendra Sharma Fri, 03 Aug 2007 11:36:24 -0700

Paul,

If I understand Cedric right, he wants to have different boosting depending
on search term positions in the document. By using SpanFirstQuery he will
only be able to consider in terms till particular position; but he won't be
able to do something like following:
  a) Give 100% boosting to matching in first 100 words.
  b) Give 80% boosting to matching in next 100 words.
  c) Give 60% boosting to matching in next 100 words.


Though it can be done by writing DisjunctionMaxQuery having multiple
SpanFirstQuery with different boosting - but I see it as a workaround only
and not the direct and efficient solution.

Cedric,

I am sending you the implementation of SpanTermQuery to your gmail
account (lucene
mailing list is bouncing email with attachment). I have named the class as
VSpanTermQuery (I have followed the same package hierarchy as lucene). You
also need to extend VSimilarity class - which would require implementation
of method scoreSpan(..).

Let me know how it went. Though I did a testing for it, but before
submitting to contrib, I need to do extensive testing.

Thanks,
Shailendra

On 8/3/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> Cedric,
>
> You can choose the end limit for SpanFirstQuery yourself.
>
> Regards,
> Paul Elschot
>
>
> On Friday 03 August 2007 05:38, Cedric Ho wrote:
> > Hi Paul,
> >
> > Isn't SpanFirstQuery only match those with position less than a
> > certain end position?
> >
> > I am rather looking for a query that would score a document higher for
> > terms appear near the start but not totally discard those with terms
> > appear near the end.
> >
> > Regards,
> > Cedric
> >
> > On 8/2/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
> > > Cedric,
> > >
> > > SpanFirstQuery could be a solution without payloads.
> > > You may want to give it your own Similarity.sloppyFreq() .
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > > On Thursday 02 August 2007 04:07, Cedric Ho wrote:
> > > > Thanks for the quick response =)
> > > >
> > > > On 8/1/07, Shailendra Sharma <[EMAIL PROTECTED]> wrote:
> > > > > Yes, it is easily doable through "Payload" facility. During
> indexing
> > > process
> > > > > (mainly tokenization), you need to push this extra information in
> each
> > > > > token. And then you can use BoostingTermQuery for using Payload
> value
> to
> > > > > include Payload in the score. You also need to implement
> Similarity
> for
> > > this
> > > > > (mainly scorePayload method).
> > > >
> > > > If I store, say a custom boost factor as Payload, does it means that
> I
> > > > will store one more byte per term per document in the index file? So
> > > > the index file would be much larger?
> > > >
> > > > >
> > > > > Other way can be to extend SpanTermQuery, this already calculates
> the
> > > > > position of match. You just need to do something to use this
> position
> > > value
> > > > > in the score calculation.
> > > >
> > > > I see that SpanTermQuery takes a TermPositions from the indexReader
> > > > and I can get the term position from there. However I am not sure
> how
> > > > to incorporate it into the score calculation. Would you mind give a
> > > > little more detail on this?
> > > >
> > > > >
> > > > > One possible advantage of SpanTermQuery approach is that you can
> play
> > > > > around, without re-creating indices everytime.
> > > > >
> > > > > Thanks,
> > > > > Shailendra Sharma,
> > > > > CTO, Ver se' Innovation Pvt. Ltd.
> > > > > Bangalore, India
> > > > >
> > > > > On 8/1/07, Cedric Ho <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I was wondering if it is possible to do boosting by search
> terms'
> > > > > > position in the document.
> > > > > >
> > > > > > for example:
> > > > > > search terms appear in the first 100 words, or first 10% words,
> or
> in
> > > > > > first two paragraphs would be given higher score.
> > > > > >
> > > > > > Is it achievable through using the new Payload function in
> lucene
> 2.2?
> > > > > > Or are there any easier ways to achieve these ?
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Cedric
> > > > > >
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > Thanks,
> > > > Cedric
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
> > --
> > [EMAIL PROTECTED]
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Can I do boosting based on term postions?

Reply via email to