[ 
https://issues.apache.org/jira/browse/NIFI-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630957#comment-16630957
 ] 

ASF subversion and git services commented on NIFI-5640:
-------------------------------------------------------

Commit 2e1005e884cef70ea9c2eb1152d70e546ad2b5c3 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=2e1005e ]

NIFI-5640: Improved efficiency of Avro Reader and some methods of AvroTypeUtil. 
Also switched ServiceStateTransition to using read/write locks instead of 
synchronized blocks because profiling showed that significant time was spent in 
determining state of a Controller Service when attempting to use it. Switching 
to a ReadLock should provide better performance there.

Signed-off-by: Matthew Burgess <mattyb...@apache.org>

This closes #3036


> Improve efficiency of Avro Record Reader
> ----------------------------------------
>
>                 Key: NIFI-5640
>                 URL: https://issues.apache.org/jira/browse/NIFI-5640
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.8.0
>
>
> There are a few things that we are doing in the Avro Reader that cause subpar 
> performance. Firstly, in the AvroTypeUtil, when converting an Avro 
> GenericRecord to our Record, the building of the RecordSchema is slow because 
> we call toString() (which is quite expensive) on the Avro schema in order to 
> provide a textual version to RecordSchema. However, the text is typically not 
> used and it is optional to provide the schema text, so we should avoid 
> calling Schema#toString() whenever possible.
> The AvroTypeUtil class also calls #getNonNullSubSchemas() a lot. In some 
> cases we don't really need to do this and can avoid creating the sublist. In 
> other cases, we do need to call it. However, the method uses the stream() 
> method on an existing List just to filter out 0 or 1 elements. While use of 
> the stream() method makes the code very readable, it is quite a bit more 
> expensive than just iterating over the existing list and adding to an 
> ArrayList. We should avoid use of the {{stream()}} method for trivial pieces 
> of code in time-critical parts of the codebase.
> Additionally, I've found that Avro's GenericDatumReader is extremely 
> inefficient, at least in some cases, when reading Strings because it uses an 
> IdentityHashMap to cache details about the schema. But IdentityHashMap is far 
> slower than if it were to just use HashMap so we could subclass the reader in 
> order to avoid the slow caching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to