[ https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Areek Zillur updated LUCENE-6459: --------------------------------- Attachment: LUCENE-6459.patch Thanks [~mikemccand] for the feedback :) {quote} I thought we accept long (not int) as index-time weight? (But, I think that really is overkill... maybe they should just floats, like per-field boosting at index time) {quote} IMO, a suggestion weight is just an index-time boost for the associated entry. {quote} bq. one possibility would be to use index-weight + (Int.MAX * boost) instead of using MaxWeight of suggestions Sorry I don't understand the idea here? {quote} After a query automaton has been intersected with the FST in {{NRTSuggester}}, boosts and/or context is computed/extracted from each of the partial matched paths by the {{CompletionWeight}} before performing a TopN search. For example, {{FuzzyCompletionWeight}} would count the number of prefix characters a matched input has w.r.t. the analyzed query prefix and set the boost for it to the number of common prefix length. Calculating a suggestion score of {{weight + (maxWeight * boost)}} makes sure that entries with a higher boost (longer common prefix w.r.t. query prefix) will always be scored higher regardless of the index-time weight of suggestion entries. The segment-level {{maxWeight}} is stored in {{CompletionPostingsFormat}} (CompletionIndex), and the {{maxWeight}} is computed across all segments at query-time. Since, the maximum weight for any suggestion entry will be <= Integer.MAX_VALUE, we can just replace the {{maxWeight}} for a suggestField with Integer.MAX_VALUE? One problem might be the loss of precision when converting the long score to a float? {quote} It seems like we could detect this mis-use, since CompletionTerms seems to know whether the field was indexed with contexts or not? I.e, if I accidentally try to run a ContextQuery against a field indexed with only SuggestField, it seems like I should get an exception saying I screwed up ... (similar to trying to run a PhraseQuery on a field that did not index positions)? Maybe add a simple test case? {quote} I updated the patch to error out when using a {{ContextQuery}} against a {{SuggestField}} at rewrite with test. {quote} Can we rename {{TopSuggestDocsCollector.num()}} to maybe .getCountToCollect or something a bit more verbose? {quote} Changed {{TopSuggestDocsCollector.num()}} to {{getCountToCollect()}} > [suggest] Query Interface for suggest API > ----------------------------------------- > > Key: LUCENE-6459 > URL: https://issues.apache.org/jira/browse/LUCENE-6459 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search > Affects Versions: 5.1 > Reporter: Areek Zillur > Assignee: Areek Zillur > Fix For: Trunk, 5.x, 5.1 > > Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, > LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch > > > This patch factors out common indexing/search API used by the recently > introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. > The motivation is to provide a query interface for FST-based fields > (*SuggestField* and *ContextSuggestField*) for enabling suggestion scoring > and more powerful automaton queries. > Previously, only prefix ‘queries’ with index-time weights were supported but > we can also support: > * Prefix queries expressed as regular expressions: get suggestions that > match multiple prefixes > ** Example: _star\[wa\|tr\]_ matches _starwars_ and _startrek_ > * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions > scored by how close they are to the query prefix > ** Example: querying for _seper_ will score _separate_ higher then > _superstitious_ > * Context Queries: get suggestions boosted and/or filtered based on their > indexed contexts (meta data) > ** Example: get typo tolerant suggestions on song names with prefix _like > a roling_ boosting songs with genre _rock_ and _indie_ > ** Example: get suggestion on all file names starting with _finan_ only > for _user1_ and _user2_ > h3. Suggest API > {code} > SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader); > CompletionQuery query = ... > TopSuggestDocs suggest = searcher.suggest(query, num); > {code} > h3. CompletionQuery > *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. > A *CompletionQuery* produces a *CompletionWeight*, which allows > *CompletionQuery* implementations to pass in an automaton that will be > intersected with a FST and allows boosting and meta data extraction from the > intersected partial paths. A *CompletionWeight* produces a > *CompletionScorer*. A *CompletionScorer* executes a Top N search against the > FST with the provided automaton, scoring and filtering all matched paths. > h4. PrefixCompletionQuery > Return documents with values that match the prefix of an analyzed term text > Documents are sorted according to their suggest field weight. > {code} > PrefixCompletionQuery(Analyzer analyzer, Term term) > {code} > h4. RegexCompletionQuery > Return documents with values that match the prefix of a regular expression > Documents are sorted according to their suggest field weight. > {code} > RegexCompletionQuery(Term term) > {code} > h4. FuzzyCompletionQuery > Return documents with values that has prefixes within a specified edit > distance of an analyzed term text. > Documents are ‘boosted’ by the number of matching prefix letters of the > suggestion with respect to the original term text. > {code} > FuzzyCompletionQuery(Analyzer analyzer, Term term) > {code} > h5. Scoring > {{suggestion_weight + (global_maximum_weight * boost)}} > where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all > integers. > {{boost = # of prefix characters matched}} > h4. ContextQuery > Return documents that match a {{CompletionQuery}} filtered and/or boosted by > provided context(s). > {code} > ContextQuery(CompletionQuery query) > contextQuery.addContext(CharSequence context, int boost, boolean exact) > {code} > *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query > suggestions boosted and/or filtered by contexts > h5. Scoring > {{suggestion_weight + (global_maximum_weight * context_boost)}} > where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} > are all integers > When used with {{FuzzyCompletionQuery}}, > {{suggestion_weight + (global_maximum_weight * (context_boost + > fuzzy_boost))}} > h3. Context Suggest Field > To use {{ContextQuery}}, use {{ContextSuggestField}} instead of > {{SuggestField}}. Any {{CompletionQuery}} can be used with > {{ContextSuggestField}}, the default behaviour is to return suggestions from > *all* contexts. {{Context}} for every completion hit can be accessed through > {{SuggestScoreDoc#context}}. > {code} > ContextSuggestField(String name, Collection<CharSequence> contexts, String > value, int weight) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org