[ https://issues.apache.org/jira/browse/NIFI-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630957#comment-16630957 ]
ASF subversion and git services commented on NIFI-5640: ------------------------------------------------------- Commit 2e1005e884cef70ea9c2eb1152d70e546ad2b5c3 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=2e1005e ] NIFI-5640: Improved efficiency of Avro Reader and some methods of AvroTypeUtil. Also switched ServiceStateTransition to using read/write locks instead of synchronized blocks because profiling showed that significant time was spent in determining state of a Controller Service when attempting to use it. Switching to a ReadLock should provide better performance there. Signed-off-by: Matthew Burgess <mattyb...@apache.org> This closes #3036 > Improve efficiency of Avro Record Reader > ---------------------------------------- > > Key: NIFI-5640 > URL: https://issues.apache.org/jira/browse/NIFI-5640 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > Fix For: 1.8.0 > > > There are a few things that we are doing in the Avro Reader that cause subpar > performance. Firstly, in the AvroTypeUtil, when converting an Avro > GenericRecord to our Record, the building of the RecordSchema is slow because > we call toString() (which is quite expensive) on the Avro schema in order to > provide a textual version to RecordSchema. However, the text is typically not > used and it is optional to provide the schema text, so we should avoid > calling Schema#toString() whenever possible. > The AvroTypeUtil class also calls #getNonNullSubSchemas() a lot. In some > cases we don't really need to do this and can avoid creating the sublist. In > other cases, we do need to call it. However, the method uses the stream() > method on an existing List just to filter out 0 or 1 elements. While use of > the stream() method makes the code very readable, it is quite a bit more > expensive than just iterating over the existing list and adding to an > ArrayList. We should avoid use of the {{stream()}} method for trivial pieces > of code in time-critical parts of the codebase. > Additionally, I've found that Avro's GenericDatumReader is extremely > inefficient, at least in some cases, when reading Strings because it uses an > IdentityHashMap to cache details about the schema. But IdentityHashMap is far > slower than if it were to just use HashMap so we could subclass the reader in > order to avoid the slow caching. -- This message was sent by Atlassian JIRA (v7.6.3#76005)