[jira] [Commented] (NIFI-5918) MergeRecord works wrong with Defragment strategy

Andres Garagiola (JIRA) Mon, 25 Feb 2019 11:59:00 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777249#comment-16777249
 ]


Andres Garagiola commented on NIFI-5918:
----------------------------------------

Hello all,

I just created a [PR|https://github.com/apache/nifi/pull/3334] regarding this 
issue. 

Let me know your comments.

Regards

> MergeRecord works wrong with Defragment strategy
> ------------------------------------------------
>
>                 Key: NIFI-5918
>                 URL: https://issues.apache.org/jira/browse/NIFI-5918
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.8.0
>            Reporter: Alexander Bukarev
>            Assignee: Alexander Bukarev
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Steps*
> # Create the simple flow: 
> #* {{GenerateFlowFile}} (with constant payload "txt1,txt2" and 10 secs 
> schedulling) 
> #* -> {{SplitContent}} (with comma as a separator)
> #* -> some chain of processors which get "txt1" and "txt2" as a inbound 
> params and produce flowfiles with more than 1 record ((!) that's important). 
> For example, I use {{ExtractText}} (to get "txt1" and "txt2" as an 
> attribute), then {{ExecuteSQLRecord}} (to execute SQL using "txt1" and "txt2" 
> as a parameter)
> #* -> {{MergeRecord}} (with *Defragment* merge strategy - (!) that's 
> important)
> #* -> {{LogAttribute}} or whatever you prefer to observe the merge result
> # Now just run the flow
> *Result:* we'll see an error in logs like {panel}Could not merge bin with 1 
> FlowFiles because of the 'fragment.count' attribute had a value of '2' but 
> only 1 of 2 FlowFiles were encountered before this bin was evicted (due to to 
> Max Bin Age being reached or due to the Maximum Number of Bins being 
> exceeded).{panel}
> *Expected result:* the flow file containing records from both SQL queries 
> (for "txt1" and "txt2")
> The cause is {{RecordBinManager}} uses {{fragment.count}} flow file attribute 
> to calculate required *record* number to release the bin. However, the 
> attribute contains the number of *flow files* instead. As in above scenario 
> each file contains more than 1 records (at least 2) that means {{RecordBin}} 
> thinks the bin is "full enough" when first flow file arrives (because it 
> contains >= 2 records and {{fragment.count}} is equal to 2 in the scenario). 
> So the bin is released wrongly.
> I think there is a mistake and in *Defragment* mode we are interested in a 
> number of flow files and never in records number. In opposite, we should care 
> about a number of records usin Bin-Packaging Algorithm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5918) MergeRecord works wrong with Defragment strategy

Reply via email to