[ https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730033#comment-14730033 ]
Robert Muir commented on LUCENE-6779: ------------------------------------- Also i still think this doesn't really provide any benefits here to do this buffering, as the output in question is just a byte array anyway (going to compression). I don't think buffering to a separate byte[] really saves anything when we are talking about a ByteArrayDataOutput, this is an entirely different beast than the "FastOutputStream" used in the benchmark. So because its already going to a byte[] I think its not so useful to stream it this way for a huge document. For such a huge doc (10MB) case the current scheme probably has a number of disadvantages both in that buffering and in the underlying compression parameters anyway. I think it would be good to do real indexing benchmarks to see if this really helps your case. I suspect to improve it would really require other things, as many aspects of the current codec may be suboptimal: not just flush, but merge too. > Reduce memory allocated by CompressingStoredFieldsWriter to write large > strings > ------------------------------------------------------------------------------- > > Key: LUCENE-6779 > URL: https://issues.apache.org/jira/browse/LUCENE-6779 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Shalin Shekhar Mangar > Attachments: LUCENE-6779.patch > > > In SOLR-7927, I am trying to reduce the memory required to index very large > documents (between 10 to 100MB) and one of the places which allocate a lot of > heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem > existed in JavaBinCodec and we reduced its memory allocation by falling back > to a double pass approach in SOLR-7971 when the utf8 size of the string is > greater than 64KB. > I propose to make the same changes to CompressingStoredFieldsWriter as we > made to JavaBinCodec in SOLR-7971. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org