[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690041#comment-13690041 ]
Thiruvalluvan M. G. commented on AVRO-1348: ------------------------------------------- The patch seems fine. But it leads to subtle bugs: - The patch caches the string output in {{toString()}}. Since UTF8 exposes the underlying byte array through {{getBytes()}}, any change made to the contents of the array after first invocation of toString() will not be reflected in the future output of toString(). I don't think there is any simple way to intercept changes to byte array. One way is to do this - (a) don't cache if someone has ever called {{getBytes}} in the past (b) invalidate cache if {{getBytes()}} is called later (c) if Utf8 is constructed using {{Utf8(byte[] bytes)}} do not cache. Hopefully, in the most common cases, byte array is not exposed and hence cache would still work. If all these appear too complicated, we can just drop caching. - Thread-safety. CharsetDecoder is not thread-safe. If two threads invoke {{toString()}} simultaneously, the behavior is undefined. Thread-safety need to be brought in. I'm not sure how expensive is {{Charset.newDocoder()}}. Since we need to serialize access to {{decode()}}, we can have a single static CharsetDecoder and get some additional performance. Apart from these, there are some minor coding-style violations. > Improve Utf8 to String conversion > --------------------------------- > > Key: AVRO-1348 > URL: https://issues.apache.org/jira/browse/AVRO-1348 > Project: Avro > Issue Type: Bug > Reporter: Mark Wagner > Assignee: Mohammad Kamrul Islam > Attachments: AVRO1348v1.patch > > > AVRO-1241 found that the existing method of creating Strings from Utf8 byte > arrays could be made faster. The same method is being used in the > Utf8.toString(), and could likely be sped up by doing the same thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira