[ 
https://issues.apache.org/jira/browse/NIFI-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582162#comment-14582162
 ] 

Michael Moser commented on NIFI-378:
------------------------------------

I think this will help understand what I saw.  If merging 3 files whose 
fragment.index attribute is:
file #1 fragment.index=1
file #2 fragment.index=3
file #3 fragment.index=2
Then it will sort the files based on fragment.index and put the files in 
correct order (file #1 then file #3 then file #2).

For my use case, I had 1 processor generating fragment.index=1 and sending to 
MergeContent.  I had another processor generating fragment.index=2 and sending 
to MergeContent.  So if MergeContent sees these files:
file #1  fragment.index=1
file #2  fragment.index=1
file #3  fragment.index=1
file #4  fragment.index=2
file #5  fragment.index=2
file #6  fragment.index=2
I expected file #1 and file #4 to be merged, then file #2 and file #5, then 
file #3 and file #6.  However, the processor actually merged file #1 and file 
#2, then file #3 and file #4, then file #5 and file #6.

Your understanding of our use case is correct.  Maybe this is a violation of 
the contract of this processor.  I just didn't understand the contract and my 
expectations were incorrect.  But when MergeContent does a merge, it does not 
actually check that all fragment.index attributes are unique, it just sorts 
them.

> MergeContent in Defragment mode will merge fragments without checking index
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-378
>                 URL: https://issues.apache.org/jira/browse/NIFI-378
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 0.0.1
>            Reporter: Michael Moser
>            Priority: Minor
>
> When in Defragment mode, the MergeContent processor looks for 
> fragment.identifier and fragment.count attributes in order to place FlowFiles 
> in the correct bin.  The fragment.index attribute is ignored.
> If you happen to have many FlowFile in the queue to MergeContent, and they 
> all have fragment.identifier=foo and fragment.count=2, then it will merge two 
> FlowFiles that have fragment.index=1 or it will merge two FlowFiles that have 
> fragment.index=2.
> Granted this may seem odd.  The use case is to give the MergeContent 
> processor two input queues.  We configure one queue to contain files with 
> fragment.index=1 and the other queue to contain files with fragment.index=2.  
> We want one file from each queue to be merged.  Instead it will merge two 
> files from the same queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to