Hi All,

While design / implementation for custom control tuples is ongoing, I
thought it would be a good idea to consider its usefulness in one of the
use cases -  batch applications.

This is a proposal to adapt / extend existing operators in the Apache Apex
Malhar library so that it is easy to use them in batch use cases.
Naturally, this would be applicable for only a subset of operators like
File, JDBC and NoSQL databases.
For example, for a file based store, (say HDFS store), we could have
FileBatchInput and FileBatchOutput operators which allow easy integration
into a batch application. These operators would be extended from their
existing implementations and would be "Batch Aware", in that they may
understand the meaning of some specific control tuples that flow through
the DAG. Start batch and end batch seem to be the obvious candidates that
come to mind. On receipt of such control tuples, they may try to modify the
behavior of the operator - to reinitialize some metrics or finalize an
output file for example.

We can discuss the potential control tuples and actions in detail, but
first I would like to understand the views of the community for this
proposal.

~ Bhupesh

Reply via email to