Re: Indexing Boolean Expressions

Walter Underwood Mon, 26 Mar 2012 10:18:06 -0700

Efficient rule matching goes further back, at least to "alerting" in Verity K2.


wunder
Search Guy, Chegg

On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:

> BTW, the idea of indexing Boolean Expressions inside a text indexing engine 
> is not new. For example Oracle Text provides the CTXRULE index and the 
> MATCHES operator within their indexing stack, which is primarily used for 
> Rule-based text classification.
> 
> See:
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
> 
> -- J
> 
> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <[email protected]> 
> wrote:
> In full dislosure, there is a patent application that Yahoo! has filed for 
> the use of inverted indexes for using complex  predicates for matching 
> contracts and opportunities in advertising:
> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
> 
> However I believe there are many more applications that can benefit from 
> similar matching techniques (i.e. recommender systems, e-commerce, 
> recruiting,etc) to make it worthwhile implementing the ideas exposed in the 
> original VLDB'09 paper (which is public) in Lucene.
> 
> As a Yahoo! employee, I might not be able to directly contribute to this 
> project but will be happy to point to any publicly available pointer that can 
> help.
> 
> Cheers,
> 
> -- Joaquin
> 
> 
> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev 
> <[email protected]> wrote:
> Hello Joaquin,
> 
> I looked through the paper several times, and see no problem to implement it 
> in Lucene (the trivial case at least):
> 
> Let's index conjunctive condition as
>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
> 
> then, form query from the incoming fact (event):
> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
> 
> to enforce overlap between condition and event, wrap the query above into own 
> query whose scorer will check that numClauses for the matched doc is equal to 
> number of matched clauses. 
> To get "numClauses for the matched doc" you can use FieldCache that's damn 
> fast; and "number of matched clauses" can be obtained from 
> DisjunctionSumScorer.nrMatchers()
> 
> Negative clauses, and multivalue can be covered also, I believe.  
> 
> WDYT?
> 
> 
> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <[email protected]> wrote:
> I looked at LUCENE-2987 and its work on the query side (changes to the 
> accepted syntax to accept lower case 'or' and 'and'), which isn't really 
> related to my proposal.
> 
> What I'm proposing is to be able to index complex boolean expressions using 
> Lucene. This can be viewed as the opposite of the regular search task. The 
> objective here is find a set of relevant queries given a document (assignment 
> of values to fields).
> 
> This by itself may not sound that interesting but its a key piece to 
> efficiently implementing any MATCHING system which is effectively a two-way 
> search where constraints are defined both-ways. An example of this would be:
> 
> 1) Job matching: Potential employers define their "job posting" as a 
> documents along with complex boolean expressions used to narrow potential 
> candidates. Job searchers upload their "profile" and may formulate complex 
> queries when executing a search. Once a is search initiated from any of the 
> sides constraints need to satisfied both ways. 
> 2) Advertising: Publishers define constraints on the type of advertisers/ads 
> they are willing to show in their sites. On the other hand, advertisers 
> define constraints (typically at the campaign level) on publisher sites they 
> want their ads to show at as well as on the user audiences they are targeting 
> to. While some attribute values are known at definition time, others are only 
> instantiated once the user visits a given page which triggers a matching 
> request that must be satisfied in few milliseconds to select "valid" ads and 
> then scored based on "relevance".
> 
> So in a matching system a MATCH QUERY is considered to to be a tuple that 
> consists of a value assignment to attributes/fields (doc) + a boolean 
> expression (query) that goes against a double index also built on tuples that 
>  simultaneously boolean expressions and associated documents.
> 
> To do this efficiently we need to be able to build indexes on Boolean 
> expressions (Lucene queries) and retrieve the set of matching expressions 
> given a doc (typically few attributes with values assigned), which is the 
> core of what is described in this paper: "Indexing Boolean Expressions" (See 
> http://www.vldb.org/pvldb/2/vldb09-83.pdf)
> 
> -- J
> 
> 
> So to effectively resolve the problem of realtime matching one can 
> 
> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <[email protected]> wrote:
> On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>> 
>> 
>> 
>> 
>> So if Aayush Kothari is interested in working on this as a Student, all we 
>> need is a formal mentor (I can be the informal one). 
>> 
>> Anyone up for the task?
>> 
>> 
>> Completely interested in working for and learning about the aforementioned 
>> subject/project. +1.  
> This may be related to the work I'm doing with LUCENE-2987
> Basically changing the grammar to accepts conjunctions AND and OR in the 
> query text.
> I would be interested in working with you on some of the details.
> 
> However, I too am not a formal committer.
> 
> -- 
> Joe Cabrera
> eminorlabs.com
> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>

Re: Indexing Boolean Expressions

Reply via email to