[ 
https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384631#comment-14384631
 ] 

Areek Zillur commented on LUCENE-6339:
--------------------------------------

Hi [~thetaphi],
Thanks for the review!
If two documents do have the same suggestion for the same SuggestField, it will 
produce duplicates in terms of the suggestion, but because they are from two 
documents (different doc ids) they are not considered as duplicates.
Maybe we can add a boolean flag in the NRTSuggester to only collect unique 
suggestions, but then we will have to decide on which suggestion to throw out, 
as they are now tied to doc ids?


> [suggest] Near real time Document Suggester
> -------------------------------------------
>
>                 Key: LUCENE-6339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6339
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.0
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: 5.0
>
>         Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, 
> LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able 
> to suggest documents with a *SuggestField* value that matches a given key.
> A SuggestField can be assigned a numeric weight to be used to score the 
> suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document 
> suggester can filter out deleted documents in near real-time. The suggester 
> can filter out documents based on a Filter (note: may change to a non-scoring 
> query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index 
> SuggestField(s) and perform document suggestions.
> h4. Usage
> {code:java}
>   // hook up custom postings format
>   // indexAnalyzer for SuggestField
>   Analyzer analyzer = ...
>   IndexWriterConfig config = new IndexWriterConfig(analyzer);
>   Codec codec = new Lucene50Codec() {
>     PostingsFormat completionPostingsFormat = new 
> Completion50PostingsFormat();
>     @Override
>     public PostingsFormat getPostingsFormatForField(String field) {
>       if (isSuggestField(field)) {
>         return completionPostingsFormat;
>       }
>       return super.getPostingsFormatForField(field);
>     }
>   };
>   config.setCodec(codec);
>   IndexWriter writer = new IndexWriter(dir, config);
>   // index some documents with suggestions
>   Document doc = new Document();
>   doc.add(new SuggestField("suggest_title", "title1", 2));
>   doc.add(new SuggestField("suggest_name", "name1", 3));
>   writer.addDocument(doc)
>   ...
>   // open an nrt reader for the directory
>   DirectoryReader reader = DirectoryReader.open(writer, false);
>   // SuggestIndexSearcher is a thin wrapper over IndexSearcher
>   // queryAnalyzer will be used to analyze the query string
>   SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, 
> queryAnalyzer);
>   
>   // suggest 10 documents for "titl" on "suggest_title" field
>   TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> SuggestField(String name, String value, long weight) 
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*.
> Hits are collected in descending order of the suggestion's weight 
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> void suggest(String field, CharSequence key, int num, Filter filter, 
> TopSuggestDocsCollector collector) 
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune 
> suggest field only parameters. 
> {code:java}
> CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean 
> preservePositionIncrements, int maxGraphExpansions)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to