[ https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730434#comment-14730434 ]
Dawid Weiss commented on LUCENE-6779: ------------------------------------- I like what Robert suggested. I'd still use Character constants/ methods where applicable though. Not that I don't know what the code means, but for somebody not familiar with UTF8 it may be easier on the eyes. > Reduce memory allocated by CompressingStoredFieldsWriter to write large > strings > ------------------------------------------------------------------------------- > > Key: LUCENE-6779 > URL: https://issues.apache.org/jira/browse/LUCENE-6779 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Shalin Shekhar Mangar > Attachments: LUCENE-6779.patch, LUCENE-6779_alt.patch > > > In SOLR-7927, I am trying to reduce the memory required to index very large > documents (between 10 to 100MB) and one of the places which allocate a lot of > heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem > existed in JavaBinCodec and we reduced its memory allocation by falling back > to a double pass approach in SOLR-7971 when the utf8 size of the string is > greater than 64KB. > I propose to make the same changes to CompressingStoredFieldsWriter as we > made to JavaBinCodec in SOLR-7971. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org