[
https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121674#comment-13121674
]
Scott Carey commented on AVRO-911:
----------------------------------
Object reuse is also hard in some of the work that I am doing (or... have not
had time to do in months) in AVRO-859. Trying to apply object re-use to
complicated object graphs is not very beneficial. Additionally making such
object graphs as immutable as possible has performance gains of its own and
simplifies code.
In simple cases, re-use can have big gains. These mostly boil down to avoiding
boxing of small primitives. Here, you go from allocating something to
allocating nothing.
For Utf8, we have to copy out a byte[] from the stream, so the Utf8 object
allocation is only a small portion of the total allocated, unless it is an
empty string and we were to re-use an empty byte[].
Delaying or avoiding Utf8 <-> String conversion is very beneficial however. I
use Utf8 in many places now for this purpose.
I support Avro removing object re-use for the general case. Specializations
for mutable boxed primitives or even simply returning / accepting primitives
are something we can add later.
The low level read and write should have options for dealing with String as
well as Utf8. Higher level APIs can choose either (for example, one might have
two different SpecificCompiler templates, or switch the type based on an
annotation in AvroIDL).
As far as EscapeAnalysis introducing object allocation elision, this won't
affect most use cases here. It would if you create a new object, call a method
on it, then throw it away within the scope of a method or loop, and in a few
slightly larger scopes.
> remove object reuse from Java APIs
> ----------------------------------
>
> Key: AVRO-911
> URL: https://issues.apache.org/jira/browse/AVRO-911
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.6.0
>
> Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the
> assumption that would provide performance advantages. In particular, the old
> parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the
> GenericArray.peek() method were all designed for this purpose. But I am
> unable to see significant performance improvements when objects are reused.
> I tried modifying Perf.java's GenericTest to reuse records, and its
> StringTest to not reuse Utf8 instances and, in both cases, performance is not
> substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.
> Always using java.lang.String instead of Utf8 would remove a lot of user
> confusion.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira