[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

Michael McCandless (JIRA) Sun, 20 Oct 2013 03:26:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800118#comment-13800118
 ]


Michael McCandless commented on LUCENE-5294:
--------------------------------------------

bq. Does it make sense for the new dictionary implementation to support 
CompositeReader?

A CompositeReader is the common case, i.e. and index that has multiple segments 
... I think we should support it?

The easiest way is to just wrap the incoming reader using 
SlowCompositeReaderWrapper.wrap.  However, this adds some unnecessary cost, 
because on each NDV lookup, there is a binary search to locate the right 
sub-reader.  In fact, we are already paying this cost in DocumentInputIterator 
when we use liveDoc (MultiFields.getLiveDocs).  But, I suspect in the grand 
scheme of things this cost is relatively minor, and a suggester is built once 
and used many times, so we may just want do to this option.

The other option is to pull the leaves and step through them yourself; I guess 
you'd need to fix DocumentInputIterator to go segment by segment instead.

bq. What are your thoughts on having a DocumentDictionary setting that will 
collect terms from documents for documents that has all the required fields and 
ignore the others (rather than erroring out)? Is that too much flexibility?

Sure, we could add such leniency?  We could even just make the whole thing 
lenient (i.e., no separate setting)?

> Suggester Dictionary implementation that takes expressions as term weights
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-5294
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5294
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>            Reporter: Areek Zillur
>             Fix For: 4.6, 5.0
>
>         Attachments: LUCENE-5294.patch
>
>
> It would be nice to have a Suggester Dictionary implementation that could 
> compute the weights of the terms consumed by the suggester based on an 
> user-defined expression (using lucene's expression module).
> It could be an extension of the existing DocumentDictionary (which takes 
> terms, weights and (optionally) payloads from the stored documents in the 
> index). The only exception being that instead of taking the weights for the 
> terms from the specified weight fields, it could compute the weights using an 
> user-defn expression, that uses one or more NumicDocValuesField from the 
> document.
> Example:
>   let the document have
>      - product_id
>      - product_name
>      - product_popularity
>      - product_profit
>   Then this implementation could be used with an expression of     
> "0.2*product_popularity + 0.8*product_profit" to determine the weights of the 
> terms for the corresponding documents (optionally along with a payload 
> (product_id))



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

Reply via email to