[jira] [Updated] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Remi Melisson (JIRA) Wed, 18 Dec 2013 10:14:43 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Remi Melisson updated LUCENE-5354:
----------------------------------

    Attachment: LUCENE-5354_2.patch

Hey Michael, thanks for the in-depth code review!
I attached another patch which makes things simpler and fixes what you 
suggested.

The remaining things are :
bq. Have you done any performance testing?
Not really, I've seen that you did some for the infix suggester, but I couldn't 
find the code. Is there something already or should I test the performance my 
way ?


bq. Visiting term vectors for each hit can be costly. It should be more 
performant to pull a DocsAndPositionsEnum up front and then do .advance to each 
(sorted) docID to get the position ... but this is likely more complex (it 
inverts the "stride", so you'd do term by term on the outer loop, then docs on 
the inner loop, vs the opposite that you have now).
For now, the only way I know to access the DocsAndPositionsEnum is by getting 
it from the TermsEnum which implies iterating over the term vector (the doc 
says "Get DocsAndPositionsEnum for the current term").

> Blended score in AnalyzingInfixSuggester
> ----------------------------------------
>
>                 Key: LUCENE-5354
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5354
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>    Affects Versions: 4.4
>            Reporter: Remi Melisson
>            Priority: Minor
>              Labels: suggester
>         Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Reply via email to