So just to make sure I understand: A Matcher is paired w/ a Scorer, and this pairing is done at Query construction time ... e.g. if I use QP to construct the Query, I'd need to extend QP by providing my custom scorer for relevant Matchers (and reuse the scorers logic for the other fragments), and if I programmatically create a Query, I'll need to pair its Matcher w/ a Scorer. Is that what you meant?
How is that different from today's API? At a high level, someone can extend BQ and override createScorer .. if Scorer was just the Scorer and BQ had a Matcher ... BTW, re the note on BM25BQ -- do you think a BM25 Scorer can fit all query types? I.e. would you reuse the same instance code for Boolean/Term/Phrase/SpanQuery, or would you not need to write a proper BM25 scoring algorithm depending on the Query type? I'm asking this assuming we have a Matcher and Scorer decoupling. If you can indeed have one BM25 scoring algorithm that fits all Query types, which means it's quite agnostic to the Query executed, and only cares about the doc id, and maybe some independent data it can fetch about it from elsewhere, then I agree that the current API is not nicely extensible. But if not, then I don't see how would the Matcher/Scorer change improve that. Perhaps we should describe 2-3 queries, the result query trees and how they are evaluated today vs. the Matcher/Scorer approach? It's always easier to talk about something when you have an example :) Shai On Wed, Jun 9, 2010 at 3:16 PM, Earwin Burrfoot <[email protected]> wrote: > What I have in mind is basically having two parallel trees - one for > matching, one for scoring. > Matching tree is completely independent and can be used as a filter > with sort-by-field approach, for example. > Scoring tree nodes have references to corresponding matching tree > nodes, so they can exploit their "current state". > > Both trees are built with a visitor over some AST produced from > textual query, or programmatically. > So what you have to do is to write said visitors. Some of the basic > scorers can be reused by your custom visitor, so voila - we have nice > extensibility by composition, instead of extensibility by inheritance > (which sucks). Also, all this custom code is gathered in a single > class, instead of being spread over your query derivatives. > This is not a final design, lots of things can differ. I.e. - trees > don't have to be parallel. If we want some query branch to not affect > the score, but do matching, we're currently wrapping it in > ConstantScoreQuery, in my design the matcher tree will look as is, but > corresponding scorer tree branch will be replaced by ConstantScore. > > 2010/6/9 Shai Erera <[email protected]>: > > I don't feel comfortable with the statement "these visitors are then free > to > > specialize on matchers or not ...". Let's think how this API will be used > .. > > today, the user has two hooks - the QueryParser and Collector. Collector > > allows you to plug in your own and by extending QP you can return your > own > > Query for different fragments. > > > > The Query is a full set though - Query + Weight + Scorer. Whether you > extend > > an existing query and just override one of the methods is up to you, but > > still the Query is self contained. > > > > If we break the Query API down to a Matcher and Scorer, how will you > provide > > your own Scorer? Collector is independent of the Query - it just collects > > the results. Will the Scorer be independent of Query too (and become an > > IndexSearcher.search() argument)? I don't think so, 'cause you want to > know > > which Matcher you're up against in order to write a good Scorer. There's > no > > point passing in a PhraseScorer if the query does not include any > > PhraseMatcher. So will you need to extend Query, to return your own > custom > > Scorer, for certain fragments? Can't you do it today already (given the > API > > is not final, is public/protected etc.) > > > > Earwin - is that what you had in mind? If so, let's think first if the > > current API is not sufficient, given that we 'open' it for extension ... > > e.g., can someone achieve that by extending PhraseQuery, override > > createScorer and return his own? Do we need more than that? > > > > I'm not saying we should refactor the API to Matcher + Scorer, just > thinking > > on what do we really need to do and what's the best way to achieve that. > > > > Shai > > > > On Wed, Jun 9, 2010 at 2:24 PM, Earwin Burrfoot <[email protected]> > wrote: > >> > >> > Can we represent the Query > >> > state in some general structure, that no matter which Query you get, > >> > you'll > >> > know how to score it? > >> > >> No. You could go for unified interface that allows you to express > >> different query states, like a set of untyped key-values, but you'll > >> end up switching on these keyvalues in the end. > >> > >> It's better to define a set of matchers, and then produce visitors > >> that compute scores. These visitors are then free to specialize on > >> matchers or not, or ignore the whole tree completely. > >> > >> -- > >> Kirill Zakharenko/Кирилл Захаренко ([email protected]) > >> Phone: +7 (495) 683-567-4 > >> ICQ: 104465785 > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > > > > > -- > Kirill Zakharenko/Кирилл Захаренко ([email protected]) > Phone: +7 (495) 683-567-4 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
