[ https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379266#comment-16379266 ]
ASF GitHub Bot commented on NIFI-4872: -------------------------------------- Github user markap14 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2475#discussion_r171065265 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitXml.java --- @@ -82,6 +84,7 @@ description = "The number of split FlowFiles generated from the parent FlowFile"), @WritesAttribute(attribute = "segment.original.filename ", description = "The filename of the parent FlowFile") }) +@SystemResourceConsideration(resource = SystemResource.MEMORY) --- End diff -- In this particular context, we are buffering the entirety of the FlowFile's content (as a Document object, which can take approximately 10 times as much heap as the size of the XML - i.e., a 1 MB XML document may take 10 MB of heap), in addition to all of the generated FlowFile objects. A two-stage approach may well be necessary for lots of splits, but even then if the XML is large you could potentially run out of heap space. > NIFI component high resource usage annotation > --------------------------------------------- > > Key: NIFI-4872 > URL: https://issues.apache.org/jira/browse/NIFI-4872 > Project: Apache NiFi > Issue Type: New Feature > Components: Core Framework, Core UI > Affects Versions: 1.5.0 > Reporter: Jeff Storck > Assignee: Jeff Storck > Priority: Critical > > NiFi Processors currently have no means to relay whether or not they have may > be resource intensive or not. The idea here would be to introduce an > Annotation that can be added to Processors that indicate they may cause high > memory, disk, CPU, or network usage. For instance, any Processor that reads > the FlowFile contents into memory (like many XML Processors for instance) may > cause high memory usage. What ultimately determines if there is high > memory/disk/cpu/network usage will depend on the FlowFiles being processed. > With many of these components in the dataflow, it increases the risk of > OutOfMemoryErrors and performance degradation. > The annotation should support one value from a fixed list of: CPU, Disk, > Memory, Network. It should also allow the developer to provide a custom > description of the scenario that the component would fall under the high > usage category. The annotation should be able to be specified multiple > times, for as many resources as it has the potential to be high usage. > By marking components with this new Annotation, we can update the generated > Processor documentation to include this fact. -- This message was sent by Atlassian JIRA (v7.6.3#76005)