David Mollitor created AVRO-4069:
------------------------------------
Summary: Remove Reader String Cache from Generic Datum Reader
Key: AVRO-4069
URL: https://issues.apache.org/jira/browse/AVRO-4069
Project: Apache Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.12.0
Reporter: David Mollitor
Assignee: David Mollitor
Fix For: 1.13.0
I was doing some profiling, and this "ReaderCache" code lit up:
{code:java}
public Class getStringClass(final Schema s) {
final IdentitySchemaKey key = new IdentitySchemaKey(s);
return this.stringClassCache.computeIfAbsent(key, (IdentitySchemaKey k)
-> this.findStringClass.apply(k.schema));
}
}
private final ReaderCache readerCache = new
ReaderCache(this::findStringClass);
protected Class findStringClass(Schema schema) {
String name = schema.getProp(GenericData.STRING_PROP);
if (name == null)
return CharSequence.class;
switch (GenericData.StringType.valueOf(name)) {
case String:
return String.class;
default:
return CharSequence.class;
}
}
{code}
The String cache here is simply caching a single value: the class of the
{{STRING_PROP}} in the Schema. Well, this is a lot over overhead for caching a
relatively simple mapping. Consider that this must create a new
{{IdentitySchemaKey}} object every time it does this lookup and this is a HOT
path. It would take less time time, and add less Heap pressure, to perform the
simple mapping for each invocation.
Follow on work: the Map in the Schema is synchronized. Maybe the map can be
made non-synchronized or the Schema can explicitly cache this value in a
non-synchronized way to make this one property load faster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)