[ 
https://issues.apache.org/jira/browse/NIFI-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Secules updated NIFI-10553:
--------------------------------
    Description: 
When NiFi's merge processors are configured to defragment, the user wants 
flowfiles merged in a specific way according to the `fragment.` attributes. 
Hoever, when MergeDocuments is handling many unique values for 
`fragment.identifier` it opens up one bin per value until it reaches the 
`MAX_BIN_COUNT` parameter configured on this processor. This parameter is there 
to limit memory used by merging too many things all at once. It is not certain 
that the user will be able to set this to an appropriate value for every flow, 
and the consequence is that evicting a partially filled bin will result in 
possible downstream issues and flowfiles stuck in the input connection of 
MergeDocuments.

 

Instead of this behaviour, the merge processor should penalize and requeue 
flowfiles that don't fit in any of the existing bins if we have reached the max 
number of bins already. Penalizing non-matching flowfiles will give time for 
the ones needed to complete the existing bins to arrive.

I wrote a unit test on my fork of NiFi which covers this bug: 
https://github.com/esecules/nifi/blob/2e5074eabfc0be100491fa007329ce9492382af7/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestMergeContent.java#L1091

  was:
When NiFi's merge processors are configured to defragment, the user wants 
flowfiles merged in a specific way according to the `fragment.` attributes. 
Hoever, when MergeDocuments is handling many unique values for 
`fragment.identifier` it opens up one bin per value until it reaches the 
`MAX_BIN_COUNT` parameter configured on this processor. This parameter is there 
to limit memory used by merging too many things all at once. It is not certain 
that the user will be able to set this to an appropriate value for every flow, 
and the consequence is that evicting a partially filled bin will result in 
possible downstream issues and flowfiles stuck in the input connection of 
MergeDocuments.

 

Instead of this behaviour, the merge processor should penalize and requeue 
flowfiles that don't fit in any of the existing bins if we have reached the max 
number of bins already. Penalizing non-matching flowfiles will give time for 
the ones needed to complete the existing bins to arrive.


> MergeContent Prematurely Evicts Bins
> ------------------------------------
>
>                 Key: NIFI-10553
>                 URL: https://issues.apache.org/jira/browse/NIFI-10553
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.14.0, 1.16.3
>            Reporter: Eric Secules
>            Priority: Major
>
> When NiFi's merge processors are configured to defragment, the user wants 
> flowfiles merged in a specific way according to the `fragment.` attributes. 
> Hoever, when MergeDocuments is handling many unique values for 
> `fragment.identifier` it opens up one bin per value until it reaches the 
> `MAX_BIN_COUNT` parameter configured on this processor. This parameter is 
> there to limit memory used by merging too many things all at once. It is not 
> certain that the user will be able to set this to an appropriate value for 
> every flow, and the consequence is that evicting a partially filled bin will 
> result in possible downstream issues and flowfiles stuck in the input 
> connection of MergeDocuments.
>  
> Instead of this behaviour, the merge processor should penalize and requeue 
> flowfiles that don't fit in any of the existing bins if we have reached the 
> max number of bins already. Penalizing non-matching flowfiles will give time 
> for the ones needed to complete the existing bins to arrive.
> I wrote a unit test on my fork of NiFi which covers this bug: 
> https://github.com/esecules/nifi/blob/2e5074eabfc0be100491fa007329ce9492382af7/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestMergeContent.java#L1091



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to