[jira] [Commented] (LUCENE-6779) Reduce memory allocated by CompressingStoredFieldsWriter to write large strings

Robert Muir (JIRA) Thu, 03 Sep 2015 16:34:05 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730033#comment-14730033
 ]


Robert Muir commented on LUCENE-6779:
-------------------------------------

Also i still think this doesn't really provide any benefits here to do this 
buffering, as the output in question is just a byte array anyway (going to 
compression). I don't think buffering to a separate byte[] really saves 
anything when we are talking about a ByteArrayDataOutput, this is an entirely 
different beast than the "FastOutputStream" used in the benchmark.

So because its already going to a byte[] I think its not so useful to stream it 
this way for a huge document. For such a huge doc (10MB) case the current 
scheme probably has a number of disadvantages both in that buffering and in the 
underlying compression parameters anyway.

I think it would be good to do real indexing benchmarks to see if this really 
helps your case. I suspect to improve it would really require other things, as 
many aspects of the current codec may be suboptimal: not just flush, but merge 
too.

> Reduce memory allocated by CompressingStoredFieldsWriter to write large 
> strings
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-6779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6779
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Shalin Shekhar Mangar
>         Attachments: LUCENE-6779.patch
>
>
> In SOLR-7927, I am trying to reduce the memory required to index very large 
> documents (between 10 to 100MB) and one of the places which allocate a lot of 
> heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem 
> existed in JavaBinCodec and we reduced its memory allocation by falling back 
> to a double pass approach in SOLR-7971 when the utf8 size of the string is 
> greater than 64KB.
> I propose to make the same changes to CompressingStoredFieldsWriter as we 
> made to JavaBinCodec in SOLR-7971.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6779) Reduce memory allocated by CompressingStoredFieldsWriter to write large strings

Reply via email to