[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

Martin Grotzke (JIRA) Tue, 14 Jun 2011 04:54:53 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049143#comment-13049143
 ]


Martin Grotzke commented on SOLR-2583:
--------------------------------------

{quote}
See: http://www.strchr.com/multi-stage_tables

i attached a patch, of a (not great) implementation i was sorta kinda trying to 
clean up for other reasons... maybe you can use it.
{quote}

Thanx, interesting approach!

I just tried to create a CompactFloatArray based on the CompactByteArray to be 
able to compare memory consumptions. There's one change that wasn't just 
changing byte to float, and I'm not sure what's the right adaption in this case:

{code}
diff -w solr/src/java/org/apache/solr/util/CompactByteArray.java 
solr/src/java/org/apache/solr/util/CompactFloatArray.java
57c57
...
202,203c202,203
<   private void touchBlock(int i, int value) {
<     hashes[i] = (hashes[i] + (value << 1)) | 1;
---
>   private void touchBlock(int i, float value) {
>     hashes[i] = (hashes[i] + (Float.floatToIntBits(value) << 1)) | 1;
{code}

The adapted test is green, so it seems to be correct at least. I'll also attach 
the full patch for CompactFloatArray.java and TestCompactFloatArray.java

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -------------------------------------------------------------------------
>
>                 Key: SOLR-2583
>                 URL: https://issues.apache.org/jira/browse/SOLR-2583
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Martin Grotzke
>            Priority: Minor
>         Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

Reply via email to