[
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
]
Martin Grotzke commented on SOLR-2583:
--------------------------------------
I just compared memory consumption of the 3 different approaches, with
different number of puts (number of scores) and sizes (number of docs):
{noformat}
Puts 1.000, size 1.000.000: CompactFloatArray 898.136, float[]
4.000.016, HashMap 72.192
Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[]
4.000.016, HashMap 702.784
Puts 100.000, size 1.000.000: CompactFloatArray 4.016.472, float[]
4.000.016, HashMap 6.607.808
Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[]
4.000.016, HashMap 44.644.032
Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[]
20.000.016, HashMap 72.256
Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[]
20.000.016, HashMap 704.832
Puts 100.000, size 5.000.000: CompactFloatArray 20.013.144, float[]
20.000.016, HashMap 7.385.152
Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[]
20.000.016, HashMap 66.395.584
Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[]
40.000.016, HashMap 72.256
Puts 10.000, size 10.000.000: CompactFloatArray 9.289.816, float[]
40.000.016, HashMap 705.280
Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[]
40.000.016, HashMap 7.418.112
Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[]
40.000.016, HashMap 69.282.496
{noformat}
I want to share this intermediately, without further interpretation/conclusion
for now (I just need to get the train).
> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -------------------------------------------------------------------------
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Martin Grotzke
> Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in
> the index. The ExternalFileField (used for external scoring) uses
> FileFloatSource, where one FileFloatSource is created per external scoring
> file. FileFloatSource creates a float array with the size of the number of
> docs (this is also done if the file to load is not found). If there are much
> less entries in the scoring file than there are number of docs in total the
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map
> contains as many entries as there are scoring entries in the external file,
> but not more.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]