Shai: Java cannot inline in this case.
Actually there is an urban legend around using final to hint to underlying compiler to inline :) (turns out to be false, one reason being dynamic classloading) write a simple pgm and try and see for yourself (remember to turn on -server on VM options) -John On Tue, Jun 8, 2010 at 10:28 AM, Shai Erera <ser...@gmail.com> wrote: > What do you mean "we are not inlining"? The compiler inlines methods .. at > least it tries. > > Shai > > > On Tue, Jun 8, 2010 at 8:21 PM, John Wang <john.w...@gmail.com> wrote: > >> Shai: >> >> method call overhead in this case is not insignificant because it is >> in a very tight loop, and no, compiler cannot optimize it for you, we are >> not inline-ing cuz we are in a java world. >> >> You are right, this breaks backward compatibility. But from 2.4 - >> 2.9, we have done MUCH worse. :) >> >> -John >> >> >> On Tue, Jun 8, 2010 at 10:09 AM, Shai Erera <ser...@gmail.com> wrote: >> >>> I guess I must be missing something fundamental here :). >>> >>> If Scorer is defined as you propose, and I create my Scorer which impls >>> getDISI() as "return this" - what do I lose? What's wrong w/ Scorer already >>> being a DISI? >>> >>> You mention "it is just inefficient to pay the method call overhead ..." >>> - what overhead? Are you talking about the decorator delegating the call to >>> the wrapped scorer? I really think the compiler can handle that, no? >>> Especially if you make your nextDoc/advance final (which probably you >>> should) ... >>> >>> That doesn't seem to justify an API change, break bw completely (even if >>> we do it in 4.0 only) and change all the current Scorers ... >>> >>> Shai >>> >>> >>> On Tue, Jun 8, 2010 at 8:01 PM, John Wang <john.w...@gmail.com> wrote: >>> >>>> re: But Scorer is itself an iterator, so what prevents you from calling >>>> >>>> nextDoc and advance on it without score() >>>> >>>> Nothing. It is just inefficient to pay the method call overhead just to >>>> overload score. >>>> >>>> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >>>> >>>> is not good enough I'd just develop my own. >>>> >>>> That is what I am doing. I am just proposing the change (see my first >>>> email) as an improvement. >>>> >>>> re: Scorer is itself an iterator >>>> >>>> yes, that is the current definition. The point of the proposal is to >>>> make this change. >>>> >>>> -John >>>> >>>> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <ser...@gmail.com> wrote: >>>> >>>>> Well … I don't know the reason as well and always thought Scorer and >>>>> Similarity are confusing. >>>>> >>>>> But Scorer is itself an iterator, so what prevents you from calling >>>>> nextDoc and advance on it without score(). And what would the returned >>>>> DISI do when nextDoc is called, if not delegate to its subs? >>>>> >>>>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >>>>> is not good enough I'd just develop my own. >>>>> >>>>> But perhaps others think differently? >>>>> >>>>> Shai >>>>> >>>>> On Tuesday, June 8, 2010, John Wang <john.w...@gmail.com> wrote: >>>>> > Hi Shai: >>>>> > I am not sure I understand how changing Similarity would solve >>>>> this problem, wouldn't you need the reader? >>>>> > As for PayloadTermQuery, payload is not always the most efficient >>>>> way of storing such data, especially when number of terms << numdocs. (I >>>>> am >>>>> not sure accessing the payload when you iterate is a good idea, but that >>>>> is >>>>> another discussion) >>>>> > >>>>> > Yes, what I described is exactly a simple CustomScoreQuery for a >>>>> special use-case. The problem is also in CustomScoreQuery, where nextDoc >>>>> and >>>>> advance are calling the sub-scorers as a wrapper. This can be avoided if >>>>> the >>>>> Scorer returns an iterator instead. >>>>> > >>>>> > Separating scoring and doc iteration is a good idea anyway. I >>>>> don't know the reason to combine them originally. >>>>> > Thanks >>>>> > -John >>>>> > >>>>> > >>>>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <ser...@gmail.com> wrote: >>>>> > >>>>> > So wouldn't it make sense to add some method to Similarity? Which >>>>> receives the doc Id in question maybe ... just thinking here. >>>>> > >>>>> > Factoring Scorer like you propose would create 3 objects for >>>>> scoring/iterating: Scorer (which really becomes an iterator), Similarity >>>>> and >>>>> CustomScoreFunction ... >>>>> > >>>>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how >>>>> you compute your age decay function (where you pull the data about the age >>>>> of the document). >>>>> > >>>>> > Shai >>>>> > >>>>> > >>>>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <john.w...@gmail.com> >>>>> wrote: >>>>> > Hi Shai: >>>>> > Similarity in many cases is not sufficient for scoring. For >>>>> example, to implement age decaying of a document (very useful for corpuses >>>>> like news or tweets), you want to project the raw tfidf score onto a time >>>>> curve, say f(x), to do this, you'd have a custom scorer that decorates the >>>>> underlying scorer from your say, boolean query: >>>>> > >>>>> > >>>>> > >>>>> > public float score(){ return myFunc(innerScorer.score());} >>>>> > This is fine, but then you would have to do this as well: >>>>> > public int nextDoc(){ >>>>> > >>>>> > >>>>> > return innerScorer.nextDoc();} >>>>> > and also: >>>>> > public int advance(int target){ return innerScorer.advance();} >>>>> The difference here is that nextDoc and advance are called far more times >>>>> as >>>>> score. And you are introducing an extra method call for them, which is not >>>>> insignificant for queries result in large recall sets. >>>>> > >>>>> > >>>>> > >>>>> > Hope this makes sense. >>>>> > Thanks >>>>> > -John >>>>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <ser...@gmail.com> wrote: >>>>> > I'm not sure I understand what you mean - Scorer is a DISI itself, >>>>> and the scoring formula is mostly controlled by Similarity. >>>>> > >>>>> > What will be the benefits of the proposed change? >>>>> > >>>>> > Shai >>>>> > >>>>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.w...@gmail.com> >>>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > Hi guys: >>>>> > >>>>> > I'd like to make a proposal to change the Scorer class/api to the >>>>> following: >>>>> > >>>>> > >>>>> > public abstract class Scorer{ >>>>> > DocIdSetIterator getDocIDSetIterator(); >>>>> > float score(int docid); >>>>> > } >>>>> > >>>>> > Reasons: >>>>> > >>>>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw >>>>> scores from tfidf), one would decorate it, and it would introduce overhead >>>>> (in function calls) around nextDoc and advance, even if you just want to >>>>> augment the score method which is called much fewer times. >>>>> > >>>>> > 2) The current contract forces scoring on the currentDoc in the >>>>> underlying iterator. So once you pass "current", you can no longer score. >>>>> In >>>>> one of our use-cases, it is very inconvenient. >>>>> > >>>>> > What do you think? I can go ahead and open an issue and work on a >>>>> patch if I get some agreement. >>>>> > >>>>> > Thanks >>>>> > >>>>> > -John >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>>> >>>> >>> >> >