Yes, if the user needs to develop a batch application, then batch aware operators need to be used in the application. The nature of the application is mostly controlled by the input and the output operators used in the application.
For example, consider an application which needs to filter records in a input file and store the filtered records in another file. The nature of this app is to end once the entire file is processed. Following things are expected of the application: 1. Once the input data is over, finalize the output file from .tmp files. - Responsibility of output operator 2. End the application, once the data is read and processed - Responsibility of input operator These functions are essential to allow the user to do higher level operations like scheduling or running a workflow of batch applications. I am not sure about intermediate (processing) operators, as there is no change in their functionality for batch use cases. Perhaps, allowing running multiple batches in a single application may require similar changes in processing operators as well. ~ Bhupesh On Mon, Jan 16, 2017 at 2:19 PM, Priyanka Gugale <pri...@apache.org> wrote: > Will it make an impression on user that, if he has a batch usecase he has > to use batch aware operators only? If so, is that what we expect? I am not > aware of how do we implement batch scenario so this might be a basic > question. > > -Priyanka > > On Mon, Jan 16, 2017 at 12:02 PM, Bhupesh Chawda <bhup...@datatorrent.com> > wrote: > > > Hi All, > > > > While design / implementation for custom control tuples is ongoing, I > > thought it would be a good idea to consider its usefulness in one of the > > use cases - batch applications. > > > > This is a proposal to adapt / extend existing operators in the Apache > Apex > > Malhar library so that it is easy to use them in batch use cases. > > Naturally, this would be applicable for only a subset of operators like > > File, JDBC and NoSQL databases. > > For example, for a file based store, (say HDFS store), we could have > > FileBatchInput and FileBatchOutput operators which allow easy integration > > into a batch application. These operators would be extended from their > > existing implementations and would be "Batch Aware", in that they may > > understand the meaning of some specific control tuples that flow through > > the DAG. Start batch and end batch seem to be the obvious candidates that > > come to mind. On receipt of such control tuples, they may try to modify > the > > behavior of the operator - to reinitialize some metrics or finalize an > > output file for example. > > > > We can discuss the potential control tuples and actions in detail, but > > first I would like to understand the views of the community for this > > proposal. > > > > ~ Bhupesh > > >