[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379266#comment-16379266
 ] 

ASF GitHub Bot commented on NIFI-4872:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2475#discussion_r171065265
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitXml.java
 ---
    @@ -82,6 +84,7 @@
                     description = "The number of split FlowFiles generated 
from the parent FlowFile"),
             @WritesAttribute(attribute = "segment.original.filename ", 
description = "The filename of the parent FlowFile")
     })
    +@SystemResourceConsideration(resource = SystemResource.MEMORY)
    --- End diff --
    
    In this particular context, we are buffering the entirety of the FlowFile's 
content (as a Document object, which can take approximately 10 times as much 
heap as the size of the XML - i.e., a 1 MB XML document may take 10 MB of 
heap), in addition to all of the generated FlowFile objects. A two-stage 
approach may well be necessary for lots of splits, but even then if the XML is 
large you could potentially run out of heap space.


> NIFI component high resource usage annotation
> ---------------------------------------------
>
>                 Key: NIFI-4872
>                 URL: https://issues.apache.org/jira/browse/NIFI-4872
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework, Core UI
>    Affects Versions: 1.5.0
>            Reporter: Jeff Storck
>            Assignee: Jeff Storck
>            Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to