[ 
https://issues.apache.org/jira/browse/AVRO-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820731#comment-16820731
 ] 

Felix GV commented on AVRO-2307:
--------------------------------

FYI, the following project contains a bunch of performance optimizations 
(including #2 above) as well as more robust compatibility across versions of 
Avro: https://github.com/linkedin/avro-util

> Opt-in setting to improve GC behavior during deserialization?
> -------------------------------------------------------------
>
>                 Key: AVRO-2307
>                 URL: https://issues.apache.org/jira/browse/AVRO-2307
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: compatibility, java
>    Affects Versions: 1.7.7
>            Reporter: Felix GV
>            Priority: Minor
>              Labels: performance
>
> We have a performance-sensitive project that leverages Avro for an online 
> data store, and I wanted to report on a couple of Avro deserialization 
> optimizations we have implemented. On one hand, it is great that Avro’s code 
> is clean and modular enough to have allowed us to achieve this easily. But on 
> the other hand, we are leveraging parts of the API which are probably not 
> typically used by most users, and thus we are exposing ourselves to ongoing 
> maintenance costs as those “ambiguously-public” APIs might change in future 
> versions. For this reason, I wanted to gauge the appetite of the Avro 
> community for taking in those optimizations upstream into the main project.
> The minor challenge is that the optimizations we’ve made are not completely 
> invisible, and therefore should probably be presented as opt-in settings, 
> rather than new defaults. Below is a summary of both changes.
> *1. Re-use of byte arrays when instantiating ByteBuffers*
> When deserializing a byte array that contains a ByteBuffer field, the 
> relevant portion of the input byte array is copied into a new byte array, 
> which is then used as the backing array of a new ByteBuffer. This is 
> completely safe, but appears to be inefficient from our point of view, 
> requiring extra CPU cycles and generating garbage.
> In our case, we have a few schemas which contain some general metadata and an 
> opaque byte array payload, which often ends up being a significant portion of 
> the total byte length. Recopying these bytes results in up to 2x the byte 
> allocation (assuming the schema contained just a ByteBuffer field and nothing 
> else). The ByteBuffer API, however, provides an alternative behavior where 
> the backing array can be larger than needed, with an offset and length 
> provided to indicate the internal boundaries of the payload. In our 
> implementation, we re-use the input byte array as the ButeBuffer’s backing 
> array, therefore avoiding both the allocation and the copy.
> The caveat in this case is that this only works safely for use cases that 
> don’t mutate the content of the bytes (neither the input nor the deserialized 
> object). In our case this assumption is valid.
> If this was implemented in the open-source project, there are a few ways this 
> could be achieved:
>  # We could return a special read-only ByteBuffer implementation that throws 
> an exception if any mutation is attempted, indicating that this special mode 
> ought to be turned off to support mutations.
>  # We could return a modified ByteBuffer implementation which defers the copy 
> of the content lazily until (and only if) one of the mutation API is called. 
> More user-friendly, perhaps, but could introduce hidden overhead which the 
> application designer might prefer to be alerted to rather than being silently 
> shoved under the rug.
> Either way, this probably requires a custom ByteBuffer implementation with 
> special behavior in order to be cleaner. In both cases, however, I don’t see 
> a way to guard against mutations to the input byte array, therefore, this 
> should probably be an opt-in setting for users who know what they’re doing 
> and are comfortable with the read-only assumption in their workloads.
> *2. Primitive (non-boxing) implementation of lists*
> Another challenge we’ve come across is that lists of primitive types (floats 
> in our case) are always boxed into Object Floats by Avro. In our case, this 
> resulted in millions of Objects / second getting allocated, causing 
> pathological GC pressure.
> To solve this, we have implemented an alternative version of the Avro Array 
> class, but which instead of hanging on to an array of generically-typed 
> Objects, internally, hangs on to an array of primitive floats. This causes no 
> boxing at deserialization time, but there is a further challenge which is 
> that since Avro array fields are exposed as instances of the Java List 
> interface, the regular functions of the API all return Objects, therefore 
> merely deferring the boxing to a slightly later point in time. To get around 
> this further complication, we have added a getPrimitive(index) function which 
> returns primitive items directly. In order to be able to use this more 
> optimized function, it is necessary for us to cast the list into our own 
> type, otherwise we wouldn’t see the new function. The end result is quite 
> dramatic, performance-wise, reducing our p99 latencies down to a quarter to a 
> third of their original values, depending on workloads.
> One challenge here is that the “PrimitiveFloatArray” class is an almost 
> complete copy of Avro’s Array class, basically just stripping away the 
> generics and adding the primitive-returning function. If we were to 
> contribute this upstream to the open-source project, I imagine we might want 
> to do this not only for floats but for boolean, int, long and double arrays 
> as well. This would mean roughly 5x the same copy-pasted implementation, 
> which is not ideal from a maintenance standpoint. The generic types are nicer 
> in that sense, but unfortunately, Java generics do not support primitives 
> (AFAIK, unless one of the latest Java releases introduced support for this 
> recently?). In our case, we are willing to pay that maintenance cost in 
> exchange for the dramatic GC reduction it gives us, but I don’t know if the 
> Avro project would come to the same conclusion.
> If implemented, this could potentially be the new default behavior, though 
> boxing would still happen (merely deferred) unless users casted the lists 
> into the new types and used the new primitive-returning functions. In use 
> cases where an entire primitive array is deserialized but only a subset of 
> its items are accessed, this boxing deferral could still provide benefits, 
> however, even without code changes. I imagine there might be a counter case 
> though where a primitive array gets deserialized and then fully accessed 
> multiple times, in which case deferred on-demand boxing might be even worse...
> —
> So again, I re-iterate: the goal of this ticket is to share what we’ve found 
> and what we’ve done about it, with the intent to understand if there would be 
> interest in reviewing and merging a patch which provided the above opt-in 
> benefits.
> Thanks for your feedback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to