Hi All, While design / implementation for custom control tuples is ongoing, I thought it would be a good idea to consider its usefulness in one of the use cases - batch applications.
This is a proposal to adapt / extend existing operators in the Apache Apex Malhar library so that it is easy to use them in batch use cases. Naturally, this would be applicable for only a subset of operators like File, JDBC and NoSQL databases. For example, for a file based store, (say HDFS store), we could have FileBatchInput and FileBatchOutput operators which allow easy integration into a batch application. These operators would be extended from their existing implementations and would be "Batch Aware", in that they may understand the meaning of some specific control tuples that flow through the DAG. Start batch and end batch seem to be the obvious candidates that come to mind. On receipt of such control tuples, they may try to modify the behavior of the operator - to reinitialize some metrics or finalize an output file for example. We can discuss the potential control tuples and actions in detail, but first I would like to understand the views of the community for this proposal. ~ Bhupesh