In full dislosure, there is a patent application that Yahoo! has filed for the use of inverted indexes for using complex predicates for matching contracts and opportunities in advertising: http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
However I believe there are many more applications that can benefit from similar matching techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing the ideas exposed in the original VLDB'09 paper (which is public) in Lucene. As a Yahoo! employee, I might not be able to directly contribute to this project but will be happy to point to any publicly available pointer that can help. Cheers, -- Joaquin On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev < [email protected]> wrote: > Hello Joaquin, > > I looked through the paper several times, and see no problem to implement > it in Lucene (the trivial case at least): > > Let's index conjunctive condition as > {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} > > then, form query from the incoming fact (event): > fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD > > to enforce overlap between condition and event, wrap the query above into > own query whose scorer will check that numClauses for the matched doc is > equal to number of matched clauses. > To get "numClauses for the matched doc" you can use FieldCache that's damn > fast; and "number of matched clauses" can be obtained from > DisjunctionSumScorer.nrMatchers() > > Negative clauses, and multivalue can be covered also, I believe. > > WDYT? > > > On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <[email protected]>wrote: > >> I looked at LUCENE-2987 and its work on the query side (changes to the >> accepted syntax to accept lower case 'or' and 'and'), which isn't really >> related to my proposal. >> >> What I'm proposing is to be able to index complex boolean expressions >> using Lucene. This can be viewed as the opposite of the regular search >> task. The objective here is find a set of relevant queries given a document >> (assignment of values to fields). >> >> This by itself may not sound that interesting but its a key piece >> to efficiently implementing any MATCHING system which is effectively a >> two-way search where constraints are defined both-ways. An example of this >> would be: >> >> 1) Job matching: Potential employers define their "job posting" as a >> documents along with complex boolean expressions used to narrow potential >> candidates. Job searchers upload their "profile" and may formulate complex >> queries when executing a search. Once a is search initiated from any of the >> sides constraints need to satisfied both ways. >> 2) Advertising: Publishers define constraints on the type of >> advertisers/ads they are willing to show in their sites. On the other hand, >> advertisers define constraints (typically at the campaign level) on >> publisher sites they want their ads to show at as well as on the user >> audiences they are targeting to. While some attribute values are known at >> definition time, others are only instantiated once the user visits a given >> page which triggers a matching request that must be satisfied in >> few milliseconds to select "valid" ads and then scored based on "relevance". >> >> So in a matching system a MATCH QUERY is considered to to be a tuple that >> consists of a value assignment to attributes/fields (doc) + a boolean >> expression (query) that goes against a double index also built on tuples >> that simultaneously boolean expressions and associated documents. >> >> To do this efficiently we need to be able to build indexes on Boolean >> expressions (Lucene queries) and retrieve the set of matching expressions >> given a doc (typically few attributes with values assigned), which is the >> core of what is described in this paper: "Indexing Boolean Expressions" >> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) >> >> -- J >> >> >> So to effectively resolve the problem of realtime matching one can >> >> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <[email protected]>wrote: >> >>> On 02/21/2012 12:15 PM, Aayush Kothari wrote: >>> >>> >>> >>> >>>> So if Aayush Kothari is interested in working on this as a Student, >>>> all we need is a formal mentor (I can be the informal one). >>>> >>>> Anyone up for the task? >>>> >>>> >>>> Completely interested in working for and learning about the >>> aforementioned subject/project. +1. >>> >>> This may be related to the work I'm doing with LUCENE-2987 >>> Basically changing the grammar to accepts conjunctions AND and OR in the >>> query text. >>> I would be interested in working with you on some of the details. >>> >>> However, I too am not a formal committer. >>> >>> -- >>> Joe Cabreraeminorlabs.com >>> >>> >> > > > -- > Sincerely yours > Mikhail Khludnev > Lucid Certified > Apache Lucene/Solr Developer > Grid Dynamics > > <http://www.griddynamics.com> > <[email protected]> > >
