[ 
https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690041#comment-13690041
 ] 

Thiruvalluvan M. G. commented on AVRO-1348:
-------------------------------------------

The patch seems fine. But it leads to subtle bugs:

- The patch caches the string output in {{toString()}}. Since UTF8 exposes the 
underlying byte array through {{getBytes()}}, any change made to the contents 
of the array after first invocation of toString() will not be reflected in the 
future output of toString(). I don't think there is any simple way to intercept 
changes to byte array. One way is to do this - (a) don't cache if someone has 
ever called {{getBytes}} in the past (b) invalidate cache if {{getBytes()}} is 
called later (c) if Utf8 is constructed using {{Utf8(byte[] bytes)}} do not 
cache. Hopefully, in the most common cases, byte array is not exposed and hence 
cache would still work. If all these appear too complicated, we can just drop 
caching.
- Thread-safety. CharsetDecoder is not thread-safe. If two threads invoke 
{{toString()}} simultaneously, the behavior is undefined. Thread-safety need to 
be brought in. I'm not sure how expensive is {{Charset.newDocoder()}}. Since we 
need to serialize access to {{decode()}}, we can have a single static 
CharsetDecoder and get some additional performance.

Apart from these, there are some minor coding-style violations.
                
> Improve Utf8 to String conversion
> ---------------------------------
>
>                 Key: AVRO-1348
>                 URL: https://issues.apache.org/jira/browse/AVRO-1348
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Mark Wagner
>            Assignee: Mohammad Kamrul Islam
>         Attachments: AVRO1348v1.patch
>
>
> AVRO-1241 found that the existing method of creating Strings from Utf8 byte 
> arrays could be made faster. The same method is being used in the 
> Utf8.toString(), and could likely be sped up by doing the same thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to