On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot <ear...@gmail.com> wrote: > This algo is strictly tied to sort-by-score, if I understand it correctly. > Lucene has queries and sorting decoupled (except for allowOutOfOrder > mess), so implementing it would require some really fat hacks. >
According to the paper on Indexing Boolean Expression (using the WAND algo), sorting can be done based on scores that are determined based weight assignment to key-value pairs: http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf So I believe this can be generalized to sorting by any doc attributes given the proper weight assignment model Of course, the devil-is-in-the-details :-( -- Joaquin > On Mon, Nov 16, 2009 at 20:26, J. Delgado <joaquin.delg...@gmail.com> wrote: >> As I understood it setMinimumNumberShouldMatch(int min) Is used to >> specify a minimum number of the optional BooleanClauses which must be >> satisfied. >> >> I haven't seen the implementation of setMinimumNumberShouldMatch but >> it seems a bit different than what is intended with the WAND operator, >> which can take any real number as threshold θ >> >> As stated in the paper: >> >> WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ >> >> where xi is the indicator variable for Xi, that is xi = 1, if Xi is >> true 0, otherwise. >> >> Observe that WAND can be used to implement AND >> and OR via >> AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k), >> and >> OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1). >> >> What I find interesting is the idea of using a first pass using the >> upper bound (maximal) contribution of a term on any document score and >> the dynamic setting of the threshold θ to skip or to fully evaluate a >> document.. >> >> As stated in the paper: >> >> "Given this setup our preliminary scoring consists of evaluating >> for each document d >> WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ), >> where Xi is an indicator variable for the presence of query term i in >> document d and the threshold θ is varied during >> the algorithm as explained below. If WAND evaluates to true, then the >> document d undergoes a full evaluation. >> The threshold θ is set dynamically by the algorithm based on the >> minimum score m among the top n results found so >> far, where n is the number of requested documents. The larger the >> threshold, the more documents will be skipped >> and thus we will need to compute full scores for fewer documents." >> >> I think its worth a try... >> >> -- Joaquin >> >> On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <a...@getopt.org> wrote: >>> >>> J. Delgado wrote: >>>> >>>> Here is the link to the paper. >>>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf >>>> >>>> A more recent application of the use and extension of the WAND operator for >>>> indexing of Boolean expressions: >>>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf >>>> >>>> -- Joaquin >>>> >>>> >>>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar < >>>> shalinman...@gmail.com> wrote: >>>> >>>>> Hey Joaquin, >>>>> >>>>> The mailing list strips off attachments. Can you please upload it >>>>> somewhere >>>>> and give us the link? >>>>> >>>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delg...@gmail.com >>>>>> >>>>>> wrote: >>>>>> Please find attached the paper on "Efficient Query Evaluation using a >>>>>> Two-Level Retrieval Process". I believe that such approach may improve >>>>> >>>>> the >>>>>> >>>>>> way Lucene/Solr evaluates queries today. >>> >>> The functionality of WAND (weak AND) is already implemented in Lucene, if I >>> understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). >>> Lucene implements this probably differently from the algorithm described in >>> the paper, so there may be still some benefits from comparing the >>> algorithms in Lucene's BooleanScorer[2] with this one ... >>> >>> >>> -- >>> Best regards, >>> Andrzej Bialecki <>< >>> ___. ___ ___ ___ _ _ __________________________________ >>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >>> ___|||__|| \| || | Embedded Unix, System Integration >>> http://www.sigram.com Contact: info at sigram dot com >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > > -- > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org