[ 
https://issues.apache.org/jira/browse/SOLR-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371599#comment-16371599
 ] 

Michael Gibney commented on SOLR-7798:
--------------------------------------

Although [~joergr]'s initial description mentions an NPE in ExpandComponent "if 
_accidentally_ used without prior collapsing of results" (italics mine), there 
are applications of ExpandComponent that _intentionally_ do not involve prior 
collapsing of results on the expand field. For example, I'm using cached Join 
queries to implement tiered deduplication of the search domain across multiple 
document sources, but do not wish to deduplicate documents against other 
documents from the same source (and specifically wish to deduplicate the search 
domain, as opposed to the set of results). The approach is described in a bit 
more detail [here|https://github.com/upenn-libraries/solr-source-deduplication] 
(bullet points 3, 4, and 7 are particularly relevant).

[^expand-component.patch] looks good to me, as I can't see a reason why 
{{count}} is being tracked separately, rather than relying on 
{{ordBytes.size()}}. The only potential issue I see with it is that where 
{{count}} is used to determine whether {{groupQuery}} is initialized, {{count}} 
now represents a different concept than {{ordBytes.size()}}. I'm not sure what 
the desired behavior would be (or for that matter, what the explanation is for 
the magic "200" ceiling on {{count)}}.

I've uploaded an alternative, [^expand-npe.patch] , which differs only in that 
it leaves the separate tracking of {{count}} in place (though I don't think it 
should have to), and also in that it checks for duplication on addition of ord 
to groupBits/groupSet, thereby avoiding unnecessary {{BytesRef.deepCopyOf()}} 
in the (normally rare) case where duplicate terms are encountered.

> Improve robustness of ExpandComponent
> -------------------------------------
>
>                 Key: SOLR-7798
>                 URL: https://issues.apache.org/jira/browse/SOLR-7798
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>            Reporter: Jörg Rathlev
>            Priority: Minor
>         Attachments: expand-component.patch, expand-npe.patch
>
>
> The {{ExpandComponent}} causes a {{NullPointerException}} if accidentally 
> used without prior collapsing of results.
> If there are multiple documents in the result which have the same term value 
> in the expand field, the size of the {{ordBytes}}/{{groupSet}} differs from 
> the {{count}} value, and the {{getGroupQuery}} method creates an incompletely 
> filled {{bytesRef}} array, which later causes a {{NullPointerException}} when 
> trying to sort the terms.
> The attached patch extends the test to demonstrate the error, and modifies 
> the {{getGroupQuery}} methods to create the array based on the size of the 
> input maps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to