[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
 ] 

Martin Grotzke commented on SOLR-2583:
--------------------------------------

I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs):

{noformat}
Puts  1.000, size 1.000.000:      CompactFloatArray 898.136,    float[] 
4.000.016,      HashMap  72.192
Puts  10.000, size 1.000.000:     CompactFloatArray 3.724.376,  float[] 
4.000.016,      HashMap  702.784
Puts  100.000, size 1.000.000:    CompactFloatArray 4.016.472,  float[] 
4.000.016,      HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,      HashMap  44.644.032
Puts  1.000, size 5.000.000:      CompactFloatArray 1.128.536,  float[] 
20.000.016,     HashMap  72.256
Puts  10.000, size 5.000.000:     CompactFloatArray 8.168.536,  float[] 
20.000.016,     HashMap  704.832
Puts  100.000, size 5.000.000:    CompactFloatArray 20.013.144, float[] 
20.000.016,     HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016,     HashMap  66.395.584
Puts  1.000, size 10.000.000:     CompactFloatArray 1.275.992,  float[] 
40.000.016,     HashMap  72.256
Puts  10.000, size 10.000.000:    CompactFloatArray 9.289.816,  float[] 
40.000.016,     HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016,     HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016,     HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -------------------------------------------------------------------------
>
>                 Key: SOLR-2583
>                 URL: https://issues.apache.org/jira/browse/SOLR-2583
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Martin Grotzke
>            Priority: Minor
>         Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to