Shai, his wrapper Scorer will just look like: DISI getDISI() { return delegate.getDISI(); }
float score(int doc) { return calcMyAwesomeScore(doc); } this saves delegate.nextDoc(), delegate.advance() indirection calls. But I already offered a better alternative :) On Tue, Jun 8, 2010 at 21:09, Shai Erera <ser...@gmail.com> wrote: > I guess I must be missing something fundamental here :). > > If Scorer is defined as you propose, and I create my Scorer which impls > getDISI() as "return this" - what do I lose? What's wrong w/ Scorer already > being a DISI? > > You mention "it is just inefficient to pay the method call overhead ..." - > what overhead? Are you talking about the decorator delegating the call to > the wrapped scorer? I really think the compiler can handle that, no? > Especially if you make your nextDoc/advance final (which probably you > should) ... > > That doesn't seem to justify an API change, break bw completely (even if we > do it in 4.0 only) and change all the current Scorers ... > > Shai > > On Tue, Jun 8, 2010 at 8:01 PM, John Wang <john.w...@gmail.com> wrote: >> >> re: But Scorer is itself an iterator, so what prevents you from calling >> nextDoc and advance on it without score() >> >> Nothing. It is just inefficient to pay the method call overhead just to >> overload score. >> >> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >> is not good enough I'd just develop my own. >> >> That is what I am doing. I am just proposing the change (see my first >> email) as an improvement. >> >> re: Scorer is itself an iterator >> >> yes, that is the current definition. The point of the proposal is to make >> this change. >> >> -John >> >> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <ser...@gmail.com> wrote: >>> >>> Well … I don't know the reason as well and always thought Scorer and >>> Similarity are confusing. >>> >>> But Scorer is itself an iterator, so what prevents you from calling >>> nextDoc and advance on it without score(). And what would the returned >>> DISI do when nextDoc is called, if not delegate to its subs? >>> >>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >>> is not good enough I'd just develop my own. >>> >>> But perhaps others think differently? >>> >>> Shai >>> >>> On Tuesday, June 8, 2010, John Wang <john.w...@gmail.com> wrote: >>> > Hi Shai: >>> > I am not sure I understand how changing Similarity would solve this >>> > problem, wouldn't you need the reader? >>> > As for PayloadTermQuery, payload is not always the most efficient >>> > way of storing such data, especially when number of terms << numdocs. (I >>> > am >>> > not sure accessing the payload when you iterate is a good idea, but that >>> > is >>> > another discussion) >>> > >>> > Yes, what I described is exactly a simple CustomScoreQuery for a >>> > special use-case. The problem is also in CustomScoreQuery, where nextDoc >>> > and >>> > advance are calling the sub-scorers as a wrapper. This can be avoided if >>> > the >>> > Scorer returns an iterator instead. >>> > >>> > Separating scoring and doc iteration is a good idea anyway. I don't >>> > know the reason to combine them originally. >>> > Thanks >>> > -John >>> > >>> > >>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <ser...@gmail.com> wrote: >>> > >>> > So wouldn't it make sense to add some method to Similarity? Which >>> > receives the doc Id in question maybe ... just thinking here. >>> > >>> > Factoring Scorer like you propose would create 3 objects for >>> > scoring/iterating: Scorer (which really becomes an iterator), Similarity >>> > and >>> > CustomScoreFunction ... >>> > >>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how >>> > you compute your age decay function (where you pull the data about the age >>> > of the document). >>> > >>> > Shai >>> > >>> > >>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <john.w...@gmail.com> wrote: >>> > Hi Shai: >>> > Similarity in many cases is not sufficient for scoring. For >>> > example, to implement age decaying of a document (very useful for corpuses >>> > like news or tweets), you want to project the raw tfidf score onto a time >>> > curve, say f(x), to do this, you'd have a custom scorer that decorates the >>> > underlying scorer from your say, boolean query: >>> > >>> > >>> > >>> > public float score(){ return myFunc(innerScorer.score());} >>> > This is fine, but then you would have to do this as well: >>> > public int nextDoc(){ >>> > >>> > >>> > return innerScorer.nextDoc();} >>> > and also: >>> > public int advance(int target){ return innerScorer.advance();} >>> > The difference here is that nextDoc and advance are called far more times >>> > as >>> > score. And you are introducing an extra method call for them, which is not >>> > insignificant for queries result in large recall sets. >>> > >>> > >>> > >>> > Hope this makes sense. >>> > Thanks >>> > -John >>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <ser...@gmail.com> wrote: >>> > I'm not sure I understand what you mean - Scorer is a DISI itself, and >>> > the scoring formula is mostly controlled by Similarity. >>> > >>> > What will be the benefits of the proposed change? >>> > >>> > Shai >>> > >>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.w...@gmail.com> wrote: >>> > >>> > >>> > >>> > >>> > Hi guys: >>> > >>> > I'd like to make a proposal to change the Scorer class/api to the >>> > following: >>> > >>> > >>> > public abstract class Scorer{ >>> > DocIdSetIterator getDocIDSetIterator(); >>> > float score(int docid); >>> > } >>> > >>> > Reasons: >>> > >>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw >>> > scores from tfidf), one would decorate it, and it would introduce overhead >>> > (in function calls) around nextDoc and advance, even if you just want to >>> > augment the score method which is called much fewer times. >>> > >>> > 2) The current contract forces scoring on the currentDoc in the >>> > underlying iterator. So once you pass "current", you can no longer score. >>> > In >>> > one of our use-cases, it is very inconvenient. >>> > >>> > What do you think? I can go ahead and open an issue and work on a patch >>> > if I get some agreement. >>> > >>> > Thanks >>> > >>> > -John >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org