Op maandag 16 november 2009 19:09:52 schreef J. Delgado: > On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot <ear...@gmail.com> wrote: > > This algo is strictly tied to sort-by-score, if I understand it correctly. > > Lucene has queries and sorting decoupled (except for allowOutOfOrder > > mess), so implementing it would require some really fat hacks. > > > > According to the paper on Indexing Boolean Expression (using the WAND > algo), sorting can be done based on scores that are determined based > weight assignment to key-value pairs: > > http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf > > So I believe this can be generalized to sorting by any doc attributes > given the proper weight assignment model > > Of course, the devil-is-in-the-details :-(
Certainly. There is also the somewhat related issue on avoiding the use of positions when not all required terms are present. For the moment there are only ideas there: https://issues.apache.org/jira/browse/LUCENE-1252 Regards, Paul Elschot > > -- Joaquin > > > > On Mon, Nov 16, 2009 at 20:26, J. Delgado <joaquin.delg...@gmail.com> wrote: > >> As I understood it setMinimumNumberShouldMatch(int min) Is used to > >> specify a minimum number of the optional BooleanClauses which must be > >> satisfied. > >> > >> I haven't seen the implementation of setMinimumNumberShouldMatch but > >> it seems a bit different than what is intended with the WAND operator, > >> which can take any real number as threshold θ > >> > >> As stated in the paper: > >> > >> WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ > >> > >> where xi is the indicator variable for Xi, that is xi = 1, if Xi is > >> true 0, otherwise. > >> > >> Observe that WAND can be used to implement AND > >> and OR via > >> AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k), > >> and > >> OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1). > >> > >> What I find interesting is the idea of using a first pass using the > >> upper bound (maximal) contribution of a term on any document score and > >> the dynamic setting of the threshold θ to skip or to fully evaluate a > >> document.. > >> > >> As stated in the paper: > >> > >> "Given this setup our preliminary scoring consists of evaluating > >> for each document d > >> WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ), > >> where Xi is an indicator variable for the presence of query term i in > >> document d and the threshold θ is varied during > >> the algorithm as explained below. If WAND evaluates to true, then the > >> document d undergoes a full evaluation. > >> The threshold θ is set dynamically by the algorithm based on the > >> minimum score m among the top n results found so > >> far, where n is the number of requested documents. The larger the > >> threshold, the more documents will be skipped > >> and thus we will need to compute full scores for fewer documents." > >> > >> I think its worth a try... > >> > >> -- Joaquin > >> > >> On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <a...@getopt.org> wrote: > >>> > >>> J. Delgado wrote: > >>>> > >>>> Here is the link to the paper. > >>>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf > >>>> > >>>> A more recent application of the use and extension of the WAND operator > >>>> for > >>>> indexing of Boolean expressions: > >>>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf > >>>> > >>>> -- Joaquin > >>>> > >>>> > >>>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar < > >>>> shalinman...@gmail.com> wrote: > >>>> > >>>>> Hey Joaquin, > >>>>> > >>>>> The mailing list strips off attachments. Can you please upload it > >>>>> somewhere > >>>>> and give us the link? > >>>>> > >>>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delg...@gmail.com > >>>>>> > >>>>>> wrote: > >>>>>> Please find attached the paper on "Efficient Query Evaluation using a > >>>>>> Two-Level Retrieval Process". I believe that such approach may improve > >>>>> > >>>>> the > >>>>>> > >>>>>> way Lucene/Solr evaluates queries today. > >>> > >>> The functionality of WAND (weak AND) is already implemented in Lucene, if > >>> I understand it correctly - this is the > >>> BooleanQuery.setMinShouldMatch(int). Lucene implements this probably > >>> differently from the algorithm described in the paper, so there may be > >>> still some benefits from comparing the algorithms in Lucene's > >>> BooleanScorer[2] with this one ... > >>> > >>> > >>> -- > >>> Best regards, > >>> Andrzej Bialecki <>< > >>> ___. ___ ___ ___ _ _ __________________________________ > >>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web > >>> ___|||__|| \| || | Embedded Unix, System Integration > >>> http://www.sigram.com Contact: info at sigram dot com > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > >> > > > > > > > > -- > > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > > ICQ: 104465785 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org