Hi Shailendra, Could you pls send the same class file to my gmail a/c too ?
Regards vini Shailendra Sharma wrote: > > Ah, Good way ! > > On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote: >> >> On Friday 03 August 2007 20:35, Shailendra Sharma wrote: >> > Paul, >> > >> > If I understand Cedric right, he wants to have different boosting >> depending >> > on search term positions in the document. By using SpanFirstQuery he >> will >> > only be able to consider in terms till particular position; >> >> >> > but he won't be >> > able to do something like following: >> > a) Give 100% boosting to matching in first 100 words. >> > b) Give 80% boosting to matching in next 100 words. >> > c) Give 60% boosting to matching in next 100 words. >> >> > Though it can be done by writing DisjunctionMaxQuery having multiple >> > SpanFirstQuery with different boosting - but I see it as a workaround >> only >> > and not the direct and efficient solution. >> >> You're right, but SpanFirstQuery needs only a minor modification >> for this to work. >> >> This modification to SpanFirstQuery would be that the Spans >> returned by SpanFirstQuery.getSpans() must always return 0 >> from its start() method. Then the slop passed to sloppyFreq(slop) >> would be the distance from the beginning of the indexed field >> to the end of the Spans of the SpanQuery passed to SpanFirstQuery. >> >> Then the following should work: >> >> Term firstTerm = .... ; >> >> SpanFirstQuery sfq = new SpanFirstQuery( >> new SpanTermQuery( firstTerm), >> Integer.MAX_VALUE)) { >> ... >> public Similarity getSimilarity() { >> return new Similarity() { >> ... >> float sloppyFreq(slop) { >> return (slop < 100) ? 1.0f >> : (slop < 200) ? 0.8f >> : (slop < 300) ? 0.6f >> : 0.4f ; // etc. etc. >> }}}} >> >> >> Actually, I'm a bit surprised that SpanFirstQuery does not work that >> way now. >> >> Regards, >> Paul Elschot >> >> >> > >> > Cedric, >> > >> > I am sending you the implementation of SpanTermQuery to your gmail >> > account (lucene >> > mailing list is bouncing email with attachment). I have named the class >> as >> > VSpanTermQuery (I have followed the same package hierarchy as lucene). >> You >> > also need to extend VSimilarity class - which would require >> implementation >> > of method scoreSpan(..). >> > >> > Let me know how it went. Though I did a testing for it, but before >> > submitting to contrib, I need to do extensive testing. >> > >> > Thanks, >> > Shailendra >> > >> > On 8/3/07, Paul Elschot <[EMAIL PROTECTED]> wrote: >> > > >> > > Cedric, >> > > >> > > You can choose the end limit for SpanFirstQuery yourself. >> > > >> > > Regards, >> > > Paul Elschot >> > > >> > > >> > > On Friday 03 August 2007 05:38, Cedric Ho wrote: >> > > > Hi Paul, >> > > > >> > > > Isn't SpanFirstQuery only match those with position less than a >> > > > certain end position? >> > > > >> > > > I am rather looking for a query that would score a document higher >> for >> > > > terms appear near the start but not totally discard those with >> terms >> > > > appear near the end. >> > > > >> > > > Regards, >> > > > Cedric >> > > > >> > > > On 8/2/07, Paul Elschot <[EMAIL PROTECTED]> wrote: >> > > > > Cedric, >> > > > > >> > > > > SpanFirstQuery could be a solution without payloads. >> > > > > You may want to give it your own Similarity.sloppyFreq() . >> > > > > >> > > > > Regards, >> > > > > Paul Elschot >> > > > > >> > > > > On Thursday 02 August 2007 04:07, Cedric Ho wrote: >> > > > > > Thanks for the quick response =) >> > > > > > >> > > > > > On 8/1/07, Shailendra Sharma <[EMAIL PROTECTED]> >> wrote: >> > > > > > > Yes, it is easily doable through "Payload" facility. During >> > > indexing >> > > > > process >> > > > > > > (mainly tokenization), you need to push this extra >> information >> in >> > > each >> > > > > > > token. And then you can use BoostingTermQuery for using >> Payload >> > > value >> > > to >> > > > > > > include Payload in the score. You also need to implement >> > > Similarity >> > > for >> > > > > this >> > > > > > > (mainly scorePayload method). >> > > > > > >> > > > > > If I store, say a custom boost factor as Payload, does it means >> that >> > > I >> > > > > > will store one more byte per term per document in the index >> file? So >> > > > > > the index file would be much larger? >> > > > > > >> > > > > > > >> > > > > > > Other way can be to extend SpanTermQuery, this already >> calculates >> > > the >> > > > > > > position of match. You just need to do something to use this >> > > position >> > > > > value >> > > > > > > in the score calculation. >> > > > > > >> > > > > > I see that SpanTermQuery takes a TermPositions from the >> indexReader >> > > > > > and I can get the term position from there. However I am not >> sure >> > > how >> > > > > > to incorporate it into the score calculation. Would you mind >> give a >> > > > > > little more detail on this? >> > > > > > >> > > > > > > >> > > > > > > One possible advantage of SpanTermQuery approach is that you >> can >> > > play >> > > > > > > around, without re-creating indices everytime. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Shailendra Sharma, >> > > > > > > CTO, Ver se' Innovation Pvt. Ltd. >> > > > > > > Bangalore, India >> > > > > > > >> > > > > > > On 8/1/07, Cedric Ho <[EMAIL PROTECTED]> wrote: >> > > > > > > > >> > > > > > > > Hi all, >> > > > > > > > >> > > > > > > > I was wondering if it is possible to do boosting by search >> > > terms' >> > > > > > > > position in the document. >> > > > > > > > >> > > > > > > > for example: >> > > > > > > > search terms appear in the first 100 words, or first 10% >> words, >> > > or >> > > in >> > > > > > > > first two paragraphs would be given higher score. >> > > > > > > > >> > > > > > > > Is it achievable through using the new Payload function in >> > > lucene >> > > 2.2? >> > > > > > > > Or are there any easier ways to achieve these ? >> > > > > > > > >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Cedric >> > > > > > > > >> > > > > > > > >> > > --------------------------------------------------------------------- >> > > > > > > > To unsubscribe, e-mail: >> [EMAIL PROTECTED] >> > > > > > > > For additional commands, e-mail: >> > > [EMAIL PROTECTED] >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > Thanks, >> > > > > > Cedric >> > > > > > >> > > > > > >> > > --------------------------------------------------------------------- >> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > > > > > For additional commands, e-mail: >> [EMAIL PROTECTED] >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> --------------------------------------------------------------------- >> > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > > > > For additional commands, e-mail: [EMAIL PROTECTED] >> > > > > >> > > > > >> > > > >> > > > >> > > > -- >> > > > [EMAIL PROTECTED] >> > > > >> > > > >> > > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > > For additional commands, e-mail: [EMAIL PROTECTED] >> > > >> > > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Can-I-do-boosting-based-on-term-postions--tf4197947.html#a12180448 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]