[ https://issues.apache.org/jira/browse/NIFI-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675407#comment-16675407 ]
ASF GitHub Bot commented on NIFI-5757: -------------------------------------- Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/3111 @arkadius thanks for compiling that list. Sorry it took so long to reply! Looking through the list, I do think you're right - these all appear to be the same pattern. I certainly didn't realize that we were making such prolific use of this pattern. Reading through the Caffeine docs, it probably does make sense to update these as well. > AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache > ----------------------------------------------------------------------- > > Key: NIFI-5757 > URL: https://issues.apache.org/jira/browse/NIFI-5757 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Affects Versions: 1.7.1 > Reporter: Arek Burdach > Priority: Major > > Avro record serialization is a quite expensive operation. > This stack trace I very often see in thread dumps: > {noformat} > Thread 48583: (state = BLOCKED) > - > org.apache.nifi.avro.AvroRecordSetWriter.compileAvroSchema(java.lang.String) > @bci=9, line=124 (Compiled frame) > - > org.apache.nifi.avro.AvroRecordSetWriter.createWriter(org.apache.nifi.logging.ComponentLog, > org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) > @bci=96, line=92 (Compiled frame) > - sun.reflect.GeneratedMethodAccessor183.invoke(java.lang.Object, > java.lang.Object[]) @bci=56 (Compiled frame) > - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, > java.lang.Object[]) @bci=6, line=43 (Compiled frame) > - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) > @bci=56, line=498 (Compiled frame) > - > org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[]) @bci=309, line=89 (Compiled > frame) > - com.sun.proxy.$Proxy100.createWriter(org.apache.nifi.logging.ComponentLog, > org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) > @bci=24 (Compiled frame) > - > org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(org.apache.nifi.flowfile.FlowFile, > org.apache.nifi.serialization.record.RecordSet, > org.apache.nifi.serialization.RecordSetWriterFactory, > org.apache.nifi.serialization.record.RecordSchema, java.lang.String, > java.lang.String) @bci=71, line=169 (Compiled frame) > - > org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_1_0$1.process(java.io.InputStream) > @bci=94, line=412 (Compiled frame) > {noformat} > The reason why it happens is because {{AvroRecordSetWriter}} synchronizing > every access to cache of compiled schemas. > I've prepared PR that is fixing this issue by using {{ConcurrentHashMap}} > instead: https://github.com/apache/nifi/pull/3111 > It is not a perfect fix because it removes cache size limitation which BTW > was hardcoded to {{20}}. Services can be reusable by many flows so such a > hard limit is not a good choice. > What do you think about such an improvement? -- This message was sent by Atlassian JIRA (v7.6.3#76005)