[ https://issues.apache.org/jira/browse/NIFI-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422674#comment-17422674 ]
Matthieu RÉ commented on NIFI-8760: ----------------------------------- Today I have two simple fixes equivalent in terms of performance (tested on GenerateFF and MergeRecord, SplitJson, QueryRecord) : * First is to follow [the idea of the first implementation|https://github.com/apache/nifi/blob/528fce2407d092d4ced1a58fcc14d0bc6e660b89/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/VolatileContentRepository.java#L473], that was for a ResourceClaim to call the corresponding ContentClaim at the offset 0. It doesn't work when the searched ContentClaim has a length, because the ContentClaim implements an "equalsTo" that takes the length into account and its constructor called by read(ResourceClaim) initializes it to -1. So a fix could be to search for the ContentClaim in the map matching the ResourceClaim and the offset 0. As I said, even if this implementation seems poor since it does not benefit from the structure of the Map of Comparable keys to search for a ContentClaim, the performance of this solution seems equivalent to the second one. * Second is to simply consider the VolatileContentRepository as non-compatible with the read(ResourceClaim) and to only allow read(ContentClaim) as it is the case for the EncryptedFileSystemRepository. Since the structure of the data storage(s) in this implementation is Map<ContentClaim, ContentBlock>, I lake of experience to answer the question : * Does it make sense to try to use the ResourceClaim to call ContentBlock(s) in case of a VolatileContentRepository ? * If yes, could there be a benefit to call ContentBlock from all the offset matching the ResourceClaim, instead of only the offset 0 as it intended to be ? * Else, the second fix is probably the good one Please don't hesitate to correct me if I'm wrong or misunderstood something. For now, I will link the second fix as a Git Patch here : [^0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch], to help anyone in the need of a fix. > VolatileContentRepository fails to retrieve content from claims with several > processors > --------------------------------------------------------------------------------------- > > Key: NIFI-8760 > URL: https://issues.apache.org/jira/browse/NIFI-8760 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.13.1, 1.13.2 > Reporter: Matthieu RÉ > Priority: Major > Labels: content-repository, volatile > Attachments: > 0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch, flow.xml.gz, > nifi.properties > > > For several processors such as MergeRecord, QueryRecord, SplitJson, the use > of VolatileContentRepository implementation infers errors while retrieving > Flowfiles from claims. The following logs are generated using NiFi 1.13.1 > from Docker and the flow.xml.gz and nifi.properties file attached. > MergeRecord (with JsonTreeReader, JsonRecordSetWriter with default > configuration): > {{2021-07-06 10:15:09,170 ERROR [Timer-Driven Process Thread-1] > o.a.nifi.processors.standard.MergeRecord > MergeRecord[id=7b425cff-017a-1000-6a20-58c4e064df3d] Failed to bin > StandardFlowFileRecord[uuid=3e894a96-883a-4ac2-8121-b8200964cf20,claim=StandardContentClaim > [resourceClaim=StandardResourceClaim[id=6, container=in-memory, > section=section], offset=0, > length=5655],offset=0,name=b2c7cf61-b421-477d-902e-daeb2ed58f0d,size=5655] > due to org.apache.nifi.controller.repository.ContentNotFoundException: Could > not find content for StandardContentClaim > [resourceClaim=StandardResourceClaim[id=6, container=in-memory, > section=section], offset=0, length=-1]: > org.apache.nifi.controller.repository.ContentNotFoundException: Could not > find content for StandardContentClaim > [resourceClaim=StandardResourceClaim[id=6, container=in-memory, > section=section], offset=0, length=-1]}} > {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not > find content for StandardContentClaim > [resourceClaim=StandardResourceClaim[id=6, container=in-memory, > section=section], offset=0, length=-1]}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}} > {{at > org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}} > {{at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2409)}} > {{at > org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:383)}} > {{at > org.apache.nifi.processors.standard.MergeRecord.onTrigger(MergeRecord.java:346)}} > {{at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}} > {{at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}} > {{at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}} > {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}} > {{at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}} > {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}} > {{at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} > {{at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{at java.lang.Thread.run(Thread.java:748)}} > QueryRecord: > {{2021-07-06 10:15:09,174 ERROR [Timer-Driven Process Thread-4] > o.a.nifi.processors.standard.QueryRecord > QueryRecord[id=673fe9f6-017a-1000-8041-dfde9d02d976] Failed to determine > Record Schema from > StandardFlowFileRecord[uuid=090e3058-67e6-4436-bea9-d511132848e3,claim=StandardContentClaim > [resourceClaim=StandardResourceClaim[id=2, container=in-memory, > section=section], offset=0, > length=5655],offset=0,name=090e3058-67e6-4436-bea9-d511132848e3,size=5655]; > routing to failure: > org.apache.nifi.controller.repository.ContentNotFoundException: Could not > find content for StandardContentClaim > [resourceClaim=StandardResourceClaim[id=2, container=in-memory, > section=section], offset=0, length=-1]}} > {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not > find content for StandardContentClaim > [resourceClaim=StandardResourceClaim[id=2, container=in-memory, > section=section], offset=0, length=-1]}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}} > {{at > org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}} > {{at > org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}} > {{at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2409)}} > {{at > org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:294)}} > {{at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)}} > {{at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}} > {{at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}} > {{at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}} > {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}} > {{at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}} > {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}} > {{at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} > {{at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{at java.lang.Thread.run(Thread.java:748)}} > SplitJson: > {{2021-07-06 10:15:10,178 ERROR [Timer-Driven Process Thread-5] > o.a.nifi.processors.standard.SplitJson > SplitJson[id=7b411bdc-017a-1000-0f48-53d6a2ad5ee9] > SplitJson[id=7b411bdc-017a-1000-0f48-53d6a2ad5ee9] failed to process session > due to java.lang.NullPointerException; Processor Administratively Yielded for > 1 sec: java.lang.NullPointerException}} > {{java.lang.NullPointerException: null}} > {{at > org.apache.nifi.processors.standard.SplitJson.onTrigger(SplitJson.java:199)}} > {{at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)}} > {{at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}} > {{at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}} > {{at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}} > {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}} > {{at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}} > {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}} > {{at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}} > {{at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} > {{at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{at java.lang.Thread.run(Thread.java:748)}} > This issue is not reproducible at 1.13.0 by my side, so it could correlate > with the commit > [528fce2407d092d4ced1a58fcc14d0bc6e660b89|https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89]. > With some support I would be glad to help investigate and solve the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)