[jira] [Commented] (SOLR-5773) CollapsingQParserPlugin should make elevated documents the group head

Joel Bernstein (JIRA) Tue, 04 Mar 2014 16:41:31 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920294#comment-13920294
 ]


Joel Bernstein commented on SOLR-5773:
--------------------------------------

Yes, you could implement the filter in the collect method. But it would be less 
efficient then the approach I took for a couple of reasons. First, the collect 
method is called for every document, so adding a filter there means that each 
document needs to be tested by the filter. By filtering in the finished stage, 
only collapsed documents need to be tested. 

Second, because the documents are collapsed into an array based on ord value 
there is an opportunity to do a very efficient merge join when applying the 
filter. You can only do a merge join on two sorted sets. It turns out that the 
larger set was already sorted by the group ord value, so all I had to do was 
sort the group ords in the boosted documents. Take a look at the inner class 
SortedBoostSet.contains method to see the merge join. You'll see how few 
operations it takes to filter the larger result with very little memory used. 

> CollapsingQParserPlugin should make elevated documents the group head
> ---------------------------------------------------------------------
>
>                 Key: SOLR-5773
>                 URL: https://issues.apache.org/jira/browse/SOLR-5773
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>    Affects Versions: 4.6.1
>            Reporter: David
>            Assignee: Joel Bernstein
>              Labels: collapse, solr
>             Fix For: 4.8
>
>         Attachments: SOLR-5773.patch, SOLR-5773.patch, SOLR-5773.patch, 
> SOLR-5773.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Hi Joel,
> I sent you an email but I'm not sure if you received it or not. I ran into a 
> bit of trouble using the CollapsingQParserPlugin with elevated documents. To 
> explain it simply, I want to exclude grouped documents when one of the 
> members of the group are contained in the elevated document set. I'm not sure 
> this is possible currently because as you explain above elevated documents 
> are added to the request context after the original query is constructed.
> To try to better illustrate the problem. If I have 2 documents docid=1 and 
> docid=2 and both have a groupid of 'a'. If a grouped query scores docid 2 
> first in the results but I have elevated docid 1 then both documents are 
> shown in the results when I really only want the elevated document to be 
> shown in the results.
> Is this something that would be difficult to implement? Any help is 
> appreciated.
> I think the solution would be to remove the documents from liveDocs that 
> share the same groupid in the getBoostDocs() function. Let me know if this 
> makes any sense. I'll continue working towards a solution in the meantime.
> {code}
> private IntOpenHashSet getBoostDocs(SolrIndexSearcher indexSearcher, 
> Set<String> boosted) throws IOException {
>       IntOpenHashSet boostDocs = null;
>       if(boosted != null) {
>         SchemaField idField = indexSearcher.getSchema().getUniqueKeyField();
>         String fieldName = idField.getName();
>         HashSet<BytesRef> localBoosts = new HashSet(boosted.size()*2);
>         Iterator<String> boostedIt = boosted.iterator();
>         while(boostedIt.hasNext()) {
>           localBoosts.add(new BytesRef(boostedIt.next()));
>         }
>         boostDocs = new IntOpenHashSet(boosted.size()*2);
>         List<AtomicReaderContext>leaves = 
> indexSearcher.getTopReaderContext().leaves();
>         TermsEnum termsEnum = null;
>         DocsEnum docsEnum = null;
>         for(AtomicReaderContext leaf : leaves) {
>           AtomicReader reader = leaf.reader();
>           int docBase = leaf.docBase;
>           Bits liveDocs = reader.getLiveDocs();
>           Terms terms = reader.terms(fieldName);
>           termsEnum = terms.iterator(termsEnum);
>           Iterator<BytesRef> it = localBoosts.iterator();
>           while(it.hasNext()) {
>             BytesRef ref = it.next();
>             if(termsEnum.seekExact(ref)) {
>               docsEnum = termsEnum.docs(liveDocs, docsEnum);
>               int doc = docsEnum.nextDoc();
>               if(doc != -1) {
>                 //Found the document.
>                 boostDocs.add(doc+docBase);
>                *// HERE REMOVE ANY DOCUMENTS THAT SHARE THE GROUPID NOT ONLY 
> THE DOCID //*
>                 it.remove();
>               }
>             }
>           }
>         }
>       }
>       return boostDocs;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5773) CollapsingQParserPlugin should make elevated documents the group head

Reply via email to