[ https://issues.apache.org/jira/browse/SOLR-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711591#comment-14711591 ]
ASF subversion and git services commented on SOLR-7971: ------------------------------------------------------- Commit 1697727 from sha...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1697727 ] SOLR-7971: Reduce memory allocated by JavaBinCodec to encode large strings by an amount equal to the string.length() > Reduce memory allocated by JavaBinCodec to encode large strings > --------------------------------------------------------------- > > Key: SOLR-7971 > URL: https://issues.apache.org/jira/browse/SOLR-7971 > Project: Solr > Issue Type: Sub-task > Components: Response Writers, SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Priority: Minor > Fix For: Trunk, 5.4 > > Attachments: SOLR-7971.patch > > > As discussed in SOLR-7927, we can reduce the buffer memory allocated by > JavaBinCodec while writing large strings. > https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420 > {quote} > The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF > ([http://www.unicode.org/glossary/#code_point]). This is encoded in UTF-16 > as surrogate pair {{\uDBFF\uDFFF}}, which takes up two Java chars, and is > represented in UTF-8 as the 4-byte sequence {{F4 8F BF BF}}. This is likely > where the mistaken 4-bytes-per-Java-char formulation came from: the maximum > number of UTF-8 bytes required to represent a Unicode *code point* is 4. > The maximum Java char is {{\uFFFF}}, which is represented in UTF-8 as the > 3-byte sequence {{EF BF BF}}. > So I think it's safe to switch to using 3 bytes per Java char (the unit of > measurement returned by {{String.length()}}), like > {{CompressingStoredFieldsWriter.writeField()}} does. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org