Paul, Hm..even being a Lucene newbie, I can understand your solution easily. Thanks =)
Shailendra, Also thank you for your efforts in helping me to do this. I did learn a lot more about the inner working of lucene through your examples =) Thanks, Cedric On 8/4/07, Shailendra Sharma <[EMAIL PROTECTED]> wrote: > Ah, Good way ! > > On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > On Friday 03 August 2007 20:35, Shailendra Sharma wrote: > > > Paul, > > > > > > If I understand Cedric right, he wants to have different boosting > > depending > > > on search term positions in the document. By using SpanFirstQuery he > > will > > > only be able to consider in terms till particular position; > > > > > > > but he won't be > > > able to do something like following: > > > a) Give 100% boosting to matching in first 100 words. > > > b) Give 80% boosting to matching in next 100 words. > > > c) Give 60% boosting to matching in next 100 words. > > > > > Though it can be done by writing DisjunctionMaxQuery having multiple > > > SpanFirstQuery with different boosting - but I see it as a workaround > > only > > > and not the direct and efficient solution. > > > > You're right, but SpanFirstQuery needs only a minor modification > > for this to work. > > > > This modification to SpanFirstQuery would be that the Spans > > returned by SpanFirstQuery.getSpans() must always return 0 > > from its start() method. Then the slop passed to sloppyFreq(slop) > > would be the distance from the beginning of the indexed field > > to the end of the Spans of the SpanQuery passed to SpanFirstQuery. > > > > Then the following should work: > > > > Term firstTerm = .... ; > > > > SpanFirstQuery sfq = new SpanFirstQuery( > > new SpanTermQuery( firstTerm), > > Integer.MAX_VALUE)) { > > ... > > public Similarity getSimilarity() { > > return new Similarity() { > > ... > > float sloppyFreq(slop) { > > return (slop < 100) ? 1.0f > > : (slop < 200) ? 0.8f > > : (slop < 300) ? 0.6f > > : 0.4f ; // etc. etc. > > }}}} > > > > > > Actually, I'm a bit surprised that SpanFirstQuery does not work that > > way now. > > > > Regards, > > Paul Elschot > > > > > > > > > > Cedric, > > > > > > I am sending you the implementation of SpanTermQuery to your gmail > > > account (lucene > > > mailing list is bouncing email with attachment). I have named the class > > as > > > VSpanTermQuery (I have followed the same package hierarchy as lucene). > > You > > > also need to extend VSimilarity class - which would require > > implementation > > > of method scoreSpan(..). > > > > > > Let me know how it went. Though I did a testing for it, but before > > > submitting to contrib, I need to do extensive testing. > > > > > > Thanks, > > > Shailendra > > > > > > On 8/3/07, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > > > > > Cedric, > > > > > > > > You can choose the end limit for SpanFirstQuery yourself. > > > > > > > > Regards, > > > > Paul Elschot > > > > > > > > > > > > On Friday 03 August 2007 05:38, Cedric Ho wrote: > > > > > Hi Paul, > > > > > > > > > > Isn't SpanFirstQuery only match those with position less than a > > > > > certain end position? > > > > > > > > > > I am rather looking for a query that would score a document higher > > for > > > > > terms appear near the start but not totally discard those with terms > > > > > appear near the end. > > > > > > > > > > Regards, > > > > > Cedric > > > > > > > > > > On 8/2/07, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > > > Cedric, > > > > > > > > > > > > SpanFirstQuery could be a solution without payloads. > > > > > > You may want to give it your own Similarity.sloppyFreq() . > > > > > > > > > > > > Regards, > > > > > > Paul Elschot > > > > > > > > > > > > On Thursday 02 August 2007 04:07, Cedric Ho wrote: > > > > > > > Thanks for the quick response =) > > > > > > > > > > > > > > On 8/1/07, Shailendra Sharma <[EMAIL PROTECTED]> > > wrote: > > > > > > > > Yes, it is easily doable through "Payload" facility. During > > > > indexing > > > > > > process > > > > > > > > (mainly tokenization), you need to push this extra information > > in > > > > each > > > > > > > > token. And then you can use BoostingTermQuery for using > > Payload > > > > value > > > > to > > > > > > > > include Payload in the score. You also need to implement > > > > Similarity > > > > for > > > > > > this > > > > > > > > (mainly scorePayload method). > > > > > > > > > > > > > > If I store, say a custom boost factor as Payload, does it means > > that > > > > I > > > > > > > will store one more byte per term per document in the index > > file? So > > > > > > > the index file would be much larger? > > > > > > > > > > > > > > > > > > > > > > > Other way can be to extend SpanTermQuery, this already > > calculates > > > > the > > > > > > > > position of match. You just need to do something to use this > > > > position > > > > > > value > > > > > > > > in the score calculation. > > > > > > > > > > > > > > I see that SpanTermQuery takes a TermPositions from the > > indexReader > > > > > > > and I can get the term position from there. However I am not > > sure > > > > how > > > > > > > to incorporate it into the score calculation. Would you mind > > give a > > > > > > > little more detail on this? > > > > > > > > > > > > > > > > > > > > > > > One possible advantage of SpanTermQuery approach is that you > > can > > > > play > > > > > > > > around, without re-creating indices everytime. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Shailendra Sharma, > > > > > > > > CTO, Ver se' Innovation Pvt. Ltd. > > > > > > > > Bangalore, India > > > > > > > > > > > > > > > > On 8/1/07, Cedric Ho <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I was wondering if it is possible to do boosting by search > > > > terms' > > > > > > > > > position in the document. > > > > > > > > > > > > > > > > > > for example: > > > > > > > > > search terms appear in the first 100 words, or first 10% > > words, > > > > or > > > > in > > > > > > > > > first two paragraphs would be given higher score. > > > > > > > > > > > > > > > > > > Is it achievable through using the new Payload function in > > > > lucene > > > > 2.2? > > > > > > > > > Or are there any easier ways to achieve these ? > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Cedric > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > For additional commands, e-mail: > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Cedric > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >