[ https://issues.apache.org/jira/browse/FLINK-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588850#comment-15588850 ]
Greg Hogan commented on FLINK-4854: ----------------------------------- Hi and thanks for submitting a ticket for this idea! Has this been discussed on the dev mailing list (if yes, please add a link)? It looks like a major addition to the DataStream API and should be fully vetted before starting. > Efficient Batch Operator in Streaming > ------------------------------------- > > Key: FLINK-4854 > URL: https://issues.apache.org/jira/browse/FLINK-4854 > Project: Flink > Issue Type: Improvement > Components: DataStream API > Reporter: Xiaowei Jiang > Assignee: MaGuowei > Labels: features > Original Estimate: 168h > Remaining Estimate: 168h > > Very often, it's more efficient to process a batch of records at once instead > of processing them one by one. We can use window to achieve this > functionality. However, window will store all records in states, which can be > costly. It's desirable to have an efficient implementation of batch operator. > The batch operator works per task and behave similarly to aligned windows. > Here is an example of how the interface looks like to a user. > {code} > interface BatchFunction { > // add the record to the buffer > // returns if the batch is ready to be flushed > boolean addRecord(T record); > // process all pending records in the buffer > void flush(Collector collector) ; > } > DataStream ds = ... > BatchFunction func = ... > ds.batch(func); > {code} > The operator calls addRecord for each record. The batch function saves the > record in its own buffer. The addRecord returns if the pending buffer should > be flushed. In that case, the operator invokes flush. -- This message was sent by Atlassian JIRA (v6.3.4#6332)