[ https://issues.apache.org/jira/browse/AVRO-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix GV updated AVRO-2307: --------------------------- Component/s: java compatibility > Opt-in setting to improve GC behavior during deserialization? > ------------------------------------------------------------- > > Key: AVRO-2307 > URL: https://issues.apache.org/jira/browse/AVRO-2307 > Project: Apache Avro > Issue Type: Improvement > Components: compatibility, java > Affects Versions: 1.7.7 > Reporter: Felix GV > Priority: Minor > > We have a performance-sensitive project that leverages Avro for an online > data store, and I wanted to report on a couple of Avro deserialization > optimizations we have implemented. On one hand, it is great that Avro’s code > is clean and modular enough to have allowed us to achieve this easily. But on > the other hand, we are leveraging parts of the API which are probably not > typically used by most users, and thus we are exposing ourselves to ongoing > maintenance costs as those “ambiguously-public” APIs might change in future > versions. For this reason, I wanted to gauge the appetite of the Avro > community for taking in those optimizations upstream into the main project. > The minor challenge is that the optimizations we’ve made are not completely > invisible, and therefore should probably be presented as opt-in settings, > rather than new defaults. Below is a summary of both changes. > *1. Re-use of byte arrays when instantiating ByteBuffers* > When deserializing a byte array that contains a ByteBuffer field, the > relevant portion of the input byte array is copied into a new byte array, > which is then used as the backing array of a new ByteBuffer. This is > completely safe, but appears to be inefficient from our point of view, > requiring extra CPU cycles and generating garbage. > In our case, we have a few schemas which contain some general metadata and an > opaque byte array payload, which often ends up being a significant portion of > the total byte length. Recopying these bytes results in up to 2x the byte > allocation (assuming the schema contained just a ByteBuffer field and nothing > else). The ByteBuffer API, however, provides an alternative behavior where > the backing array can be larger than needed, with an offset and length > provided to indicate the internal boundaries of the payload. In our > implementation, we re-use the input byte array as the ButeBuffer’s backing > array, therefore avoiding both the allocation and the copy. > The caveat in this case is that this only works safely for use cases that > don’t mutate the content of the bytes (neither the input nor the deserialized > object). In our case this assumption is valid. > If this was implemented in the open-source project, there are a few ways this > could be achieved: > # We could return a special read-only ByteBuffer implementation that throws > an exception if any mutation is attempted, indicating that this special mode > ought to be turned off to support mutations. > # We could return a modified ByteBuffer implementation which defers the copy > of the content lazily until (and only if) one of the mutation API is called. > More user-friendly, perhaps, but could introduce hidden overhead which the > application designer might prefer to be alerted to rather than being silently > shoved under the rug. > Either way, this probably requires a custom ByteBuffer implementation with > special behavior in order to be cleaner. In both cases, however, I don’t see > a way to guard against mutations to the input byte array, therefore, this > should probably be an opt-in setting for users who know what they’re doing > and are comfortable with the read-only assumption in their workloads. > *2. Primitive (non-boxing) implementation of lists* > Another challenge we’ve come across is that lists of primitive types (floats > in our case) are always boxed into Object Floats by Avro. In our case, this > resulted in millions of Objects / second getting allocated, causing > pathological GC pressure. > To solve this, we have implemented an alternative version of the Avro Array > class, but which instead of hanging on to an array of generically-typed > Objects, internally, hangs on to an array of primitive floats. This causes no > boxing at deserialization time, but there is a further challenge which is > that since Avro array fields are exposed as instances of the Java List > interface, the regular functions of the API all return Objects, therefore > merely deferring the boxing to a slightly later point in time. To get around > this further complication, we have added a getPrimitive(index) function which > returns primitive items directly. In order to be able to use this more > optimized function, it is necessary for us to cast the list into our own > type, otherwise we wouldn’t see the new function. The end result is quite > dramatic, performance-wise, reducing our p99 latencies down to a quarter to a > third of their original values, depending on workloads. > One challenge here is that the “PrimitiveFloatArray” class is an almost > complete copy of Avro’s Array class, basically just stripping away the > generics and adding the primitive-returning function. If we were to > contribute this upstream to the open-source project, I imagine we might want > to do this not only for floats but for boolean, int, long and double arrays > as well. This would mean roughly 5x the same copy-pasted implementation, > which is not ideal from a maintenance standpoint. The generic types are nicer > in that sense, but unfortunately, Java generics do not support primitives > (AFAIK, unless one of the latest Java releases introduced support for this > recently?). In our case, we are willing to pay that maintenance cost in > exchange for the dramatic GC reduction it gives us, but I don’t know if the > Avro project would come to the same conclusion. > If implemented, this could potentially be the new default behavior, though > boxing would still happen (merey deferred) unless users casted the lists into > the new types and used the new primitive-returning functions. In use cases > where an entire primitive array is deserialized but only a subset of its > items are accessed, this boxing deferral could still provide benefits, > however. > — > So again, I re-iterate: the goal of this ticket is to share what we’ve found > and what we’ve done about it, with the intent to understand if there would be > interest in reviewing and merging a patch which provided the above opt-in > benefits. > Thanks for your feedback. -- This message was sent by Atlassian JIRA (v7.6.3#76005)