Re: Indexing Boolean Expressions

J. Delgado Mon, 11 Feb 2013 20:10:03 -0800

Yes and no.

Most of the sponsored search and display ad systems have a distributed
"matching service" that select valid ads from each of the distributed node
and then computes actual bids locally for each ad based on bidding models.
Only the top bids per node are sent over to the "auction service"


Now, there has been some work on how to bake the bid model into a relevance
score to be able to select top N from the index, however it is not as
simple as it seems as some of these prediction/bidding models
use sophisticated machine learning algorithms based on training samples and
somewhat complex objective functions.

-- Joaquin


On Mon, Feb 11, 2013 at 10:02 AM, Otis Gospodnetic <
[email protected]> wrote:

> Yeah.... except with Percolator (which uses MemoryIndex under the hood, I
> believe), there is no relevancy score, so you just get match/no match, and
> if you need to figure out top N best matching ads against queries derived
> from page content, you need that relevancy score to get not all matching
> docs, but just those top N.
>
> No?
>
> Otis
> --
> http://sematext.com/
>
>
>
>
>
> On Mon, Feb 11, 2013 at 11:22 AM, J. Delgado <[email protected]>wrote:
>
>> I guess ElasticSearch went ahead of SOLR with the percolate API, which is
>> exactly what is needed for two-way constraint+doc matching problem present
>> in Advertising systems and other use cases:
>>
>> http://www.elasticsearch.org/guide/reference/api/percolate.html
>>
>> Cheers,
>>
>> Joaquin Delgado, PhD.
>> http://www.linkedin.com/pub/profile/0/04b/277
>>
>> On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <[email protected]
>> > wrote:
>>
>>> Efficient rule matching goes further back, at least to "alerting" in
>>> Verity K2.
>>>
>>> wunder
>>> Search Guy, Chegg
>>>
>>> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:
>>>
>>> BTW, the idea of indexing Boolean Expressions inside a text indexing
>>> engine is not new. For example Oracle Text provides the CTXRULE index and
>>> the MATCHES operator within their indexing stack, which is primarily used
>>> for Rule-based text classification.
>>>
>>> See:
>>>
>>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
>>>
>>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
>>>
>>> -- J
>>>
>>> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado 
>>> <[email protected]>wrote:
>>>
>>>> In full dislosure, there is a patent application that Yahoo! has filed
>>>> for the use of inverted indexes for using complex  predicates for matching
>>>> contracts and opportunities in advertising:
>>>>
>>>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>>>>
>>>> However I believe there are many more applications that can benefit
>>>> from similar matching techniques (i.e. recommender systems,
>>>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
>>>> exposed in the original VLDB'09 paper (which is public) in Lucene.
>>>>
>>>> As a Yahoo! employee, I might not be able to directly contribute to
>>>> this project but will be happy to point to any publicly available pointer
>>>> that can help.
>>>>
>>>> Cheers,
>>>>
>>>> -- Joaquin
>>>>
>>>>
>>>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello Joaquin,
>>>>>
>>>>> I looked through the paper several times, and see no problem to
>>>>> implement it in Lucene (the trivial case at least):
>>>>>
>>>>> Let's index conjunctive condition as
>>>>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>>>>
>>>>> then, form query from the incoming fact (event):
>>>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>>>>
>>>>> to enforce overlap between condition and event, wrap the query above
>>>>> into own query whose scorer will check that numClauses for the matched doc
>>>>> is equal to number of matched clauses.
>>>>> To get "numClauses for the matched doc" you can use FieldCache that's
>>>>> damn fast; and "number of matched clauses" can be obtained from
>>>>> DisjunctionSumScorer.nrMatchers()
>>>>>
>>>>> Negative clauses, and multivalue can be covered also, I believe.
>>>>>
>>>>> WDYT?
>>>>>
>>>>>
>>>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> I looked at LUCENE-2987 and its work on the query side (changes to
>>>>>> the accepted syntax to accept lower case 'or' and 'and'), which isn't
>>>>>> really related to my proposal.
>>>>>>
>>>>>> What I'm proposing is to be able to index complex boolean expressions
>>>>>> using Lucene. This can be viewed as the opposite of the regular search
>>>>>> task. The objective here is find a set of relevant queries given a 
>>>>>> document
>>>>>> (assignment of values to fields).
>>>>>>
>>>>>> This by itself may not sound that interesting but its a key piece
>>>>>> to efficiently implementing any MATCHING system which is effectively a
>>>>>> two-way search where constraints are defined both-ways. An example of 
>>>>>> this
>>>>>> would be:
>>>>>>
>>>>>> 1) Job matching: Potential employers define their "job posting" as a
>>>>>> documents along with complex boolean expressions used to narrow potential
>>>>>> candidates. Job searchers upload their "profile" and may formulate 
>>>>>> complex
>>>>>> queries when executing a search. Once a is search initiated from any of 
>>>>>> the
>>>>>> sides constraints need to satisfied both ways.
>>>>>> 2) Advertising: Publishers define constraints on the type of
>>>>>> advertisers/ads they are willing to show in their sites. On the other 
>>>>>> hand,
>>>>>> advertisers define constraints (typically at the campaign level) on
>>>>>> publisher sites they want their ads to show at as well as on the user
>>>>>> audiences they are targeting to. While some attribute values are known at
>>>>>> definition time, others are only instantiated once the user visits a 
>>>>>> given
>>>>>> page which triggers a matching request that must be satisfied in
>>>>>> few milliseconds to select "valid" ads and then scored based on 
>>>>>> "relevance".
>>>>>>
>>>>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>>>>> that consists of a value assignment to attributes/fields (doc) + a 
>>>>>> boolean
>>>>>> expression (query) that goes against a double index also built on tuples
>>>>>> that  simultaneously boolean expressions and associated documents.
>>>>>>
>>>>>> To do this efficiently we need to be able to build indexes on Boolean
>>>>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>>>>> given a doc (typically few attributes with values assigned), which is the
>>>>>> core of what is described in this paper: "Indexing Boolean
>>>>>> Expressions" (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>>>>
>>>>>> -- J
>>>>>>
>>>>>>
>>>>>> So to effectively resolve the problem of realtime matching one can
>>>>>>
>>>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  So if Aayush Kothari is interested in working on this as a
>>>>>>>> Student, all we need is a formal mentor (I can be the informal one).
>>>>>>>>
>>>>>>>>  Anyone up for the task?
>>>>>>>>
>>>>>>>>
>>>>>>>>   Completely interested in working for and learning about the
>>>>>>> aforementioned subject/project. +1.
>>>>>>>
>>>>>>> This may be related to the work I'm doing with LUCENE-2987
>>>>>>> Basically changing the grammar to accepts conjunctions AND and OR in
>>>>>>> the query text.
>>>>>>> I would be interested in working with you on some of the details.
>>>>>>>
>>>>>>> However, I too am not a formal committer.
>>>>>>>
>>>>>>> --
>>>>>>> Joe Cabreraeminorlabs.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Lucid Certified
>>>>> Apache Lucene/Solr Developer
>>>>> Grid Dynamics
>>>>>
>>>>>
>>>
>>>
>>>
>>
>

Re: Indexing Boolean Expressions

Reply via email to