[jira] [Commented] (SOLR-14365) CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values

Shalin Shekhar Mangar (Jira) Sat, 28 Mar 2020 00:32:11 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069285#comment-17069285
 ]


Shalin Shekhar Mangar commented on SOLR-14365:
----------------------------------------------

I think we should add another method and make it configurable.

> CollapsingQParser - Avoiding always allocate int[] and float[] with size 
> equals to number of unique values
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-14365
>                 URL: https://issues.apache.org/jira/browse/SOLR-14365
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.4.1
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>
> Since Collapsing is a PostFilter, documents reach Collapsing must match with 
> all filters and queries, so the number of documents Collapsing need to 
> collect/compute score is a small fraction of the total number documents in 
> the index. So why do we need to always consume the memory (for int[] and 
> float[] array) for all unique values of the collapsed field? If the number of 
> unique values of the collapsed field found in the documents that match 
> queries and filters is 300 then we only need int[] and float[] array with 
> size of 300 and not 1.2 million in size. However, we don't know which value 
> of the collapsed field will show up in the results so we cannot use a smaller 
> array.
> The easy fix for this problem is using as much as we need by using IntIntMap 
> and IntFloatMap that hold primitives and are much more space efficient than 
> the Java HashMap. These maps can be slower (10x or 20x) than plain int[] and 
> float[] if matched documents is large (almost all documents matched queries 
> and other filters). But our belief is that does not happen that frequently 
> (how frequently do we run collapsing on the entire index?).
> For this issue I propose adding 2 methods for collapsing which is
> * array : which is current implementation
> * hash : which is new approach and will be default method
> later we can add another method {{smart}} which is automatically pick method 
> based on comparision between {{number of docs matched queries and filters}} 
> and {{number of unique values of the field}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14365) CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values

Reply via email to