[ https://issues.apache.org/jira/browse/NIFI-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582162#comment-14582162 ]
Michael Moser commented on NIFI-378: ------------------------------------ I think this will help understand what I saw. If merging 3 files whose fragment.index attribute is: file #1 fragment.index=1 file #2 fragment.index=3 file #3 fragment.index=2 Then it will sort the files based on fragment.index and put the files in correct order (file #1 then file #3 then file #2). For my use case, I had 1 processor generating fragment.index=1 and sending to MergeContent. I had another processor generating fragment.index=2 and sending to MergeContent. So if MergeContent sees these files: file #1 fragment.index=1 file #2 fragment.index=1 file #3 fragment.index=1 file #4 fragment.index=2 file #5 fragment.index=2 file #6 fragment.index=2 I expected file #1 and file #4 to be merged, then file #2 and file #5, then file #3 and file #6. However, the processor actually merged file #1 and file #2, then file #3 and file #4, then file #5 and file #6. Your understanding of our use case is correct. Maybe this is a violation of the contract of this processor. I just didn't understand the contract and my expectations were incorrect. But when MergeContent does a merge, it does not actually check that all fragment.index attributes are unique, it just sorts them. > MergeContent in Defragment mode will merge fragments without checking index > --------------------------------------------------------------------------- > > Key: NIFI-378 > URL: https://issues.apache.org/jira/browse/NIFI-378 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Affects Versions: 0.0.1 > Reporter: Michael Moser > Priority: Minor > > When in Defragment mode, the MergeContent processor looks for > fragment.identifier and fragment.count attributes in order to place FlowFiles > in the correct bin. The fragment.index attribute is ignored. > If you happen to have many FlowFile in the queue to MergeContent, and they > all have fragment.identifier=foo and fragment.count=2, then it will merge two > FlowFiles that have fragment.index=1 or it will merge two FlowFiles that have > fragment.index=2. > Granted this may seem odd. The use case is to give the MergeContent > processor two input queues. We configure one queue to contain files with > fragment.index=1 and the other queue to contain files with fragment.index=2. > We want one file from each queue to be merged. Instead it will merge two > files from the same queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)