[ 
https://issues.apache.org/jira/browse/NIFI-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770711#comment-16770711
 ] 

Adam Fisher commented on NIFI-6050:
-----------------------------------

As I think about this, the merge rules could get messy. The data field in my 
previous example was an array because it had multiple values for the mail1 
record but just a string in the case of mail3 since it only had one value 
during the merge operation. *MergeRecord* may be suitable enough and this 
processor block would be considered too specialized. Feel free to close if that 
is the case. I'm just wondering if it gets expensive to create tens of 
thousands of tiny FlowFiles with each one containing maybe 2-4 records on 
average.

> Add FlattenRecord Processor
> ---------------------------
>
>                 Key: NIFI-6050
>                 URL: https://issues.apache.org/jira/browse/NIFI-6050
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>            Reporter: Adam Fisher
>            Priority: Minor
>              Labels: features
>
> h1. *FlattenRecord*
> h3. *Tags:*
> record, recordpath, rpath, merge, group, content, correlation, stream, bin, 
> organize, flatten
> *CapabilityDescription:* 
>  Receives record-oriented data (i.e., data that can be read by the configured 
> Record Reader) and evaluates one or more RecordPaths determined by 
> user-defined properties against each record in the incoming FlowFile. Each 
> record is then merged with other "like records" and the resulting FlowFile 
> content will contain a single merged record with all of the fields from the 
> group of records matching the RecordPaths. Two records are considered alike 
> if they have the same value for all configured RecordPaths.
> Array and plain record properties are merged recursively. Other complex types 
> and value types are overwritten by assignment. FlowFile content is scanned 
> from top to bottom. Subsequent records with matching field names overwrite 
> field assignments of previous records in the same matching group. All other 
> unique fields appearing in a matched group of records will be merged together.
> See Additional Details on the Usage page for more information and examples.
> *Properties:*
>  * *Record Reader* - Specifies the Controller Service to use for reading 
> incoming data
>  * *Record Writer* - Specifies the Controller Service to use for writing out 
> the records
>  * *Scanning Strategy* - Specifies how the FlowFile content is searched to 
> find matching records to merge based on the user-defined RecordPath 
> properties.
>  ** *Global* - Matching RecordPaths could appear anywhere in the file. This 
> can be less performant than the Linear strategy for large files since a 
> subset of matching records will be held in memory for each unique combination 
> of RecordPaths before being written to the outgoing FlowFile.
>  ** *Linear* - Records with matching RecordPaths appear sequentially next to 
> each other in the file. If the incoming FlowFile has this characteristic, it 
> is better suited to processing large files. RecordPaths that match but are 
> not sequentially next to each other in the FlowFile will be placed in 
> separate matching groups and therefore be a separate record in the outgoing 
> FlowFile content.
>  * *Merge Strategy* - Specifies the algorithm used to merge records.
>  ** *Defragment* - Combines fragments that are associated by RecordPaths back 
> into a single record.
>  ** _... Need ideas for other merge strategies otherwise this property can go 
> away._
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to