Hi Chandni, How about restricting tuples which can be processed per window. If someone wants to process small and frequent batches, he can set batch size to some small value and also reduce the window size. This would build some back pressure of course. But that could be acceptable if one really want to restrict batch size. The though was triggered while working on Cassandra output operator. Cassandra creates problem in processing batches of size greater than some value (don't recall exact number right now). Other databases may want to restrict the batch size for similar or other reasons.
-Priyanka On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh <[email protected]> wrote: > Priyanka, > > AbstractBatchTransactionableStore assumes all tuples in one application as > a batch because it needs to store the tuples in the store exactly-once. > > If there is more than one batch in an application window, then to store the > tuples exactly once the window Id needs to be written with every tuple as > well which is not that efficient. Therefore we take advantage of the > transaction support by saving just the window id once (not with every > tuple) but this necessitates all the tuples to be considered as a batch. > > Every operator in a DAG can have its own application window size. So to > reduce the size per batch, the application window attribute needs to be > modified. > > Chandni > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar < > [email protected]> > wrote: > > > +1 for this. > > > > ~ Chinmay. > > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale <[email protected]> > > wrote: > > > > > Hi, > > > > > > In Malhar we have an > > > operator AbstractBatchTransactionableStoreOutputOperator which creates > > > batches based on tuples received in a window. At the end of the window > > > these batches are sent to database for processing. > > > There is no way to configure MAX_SIZE on these batches. Based on input > > rate > > > the batch sizes can grow very high, and we might want to restrict batch > > > size. > > > > > > Any operator can extend and do batch management on their own, but I see > > it > > > as generic requirement and IMO we should change base class i.e. > > > AbstractBatchTransactionableStoreOutputOperator class to accept > MAX_SIZE > > > for batch from outside. > > > > > > Any opinion on this? > > > > > > -Priyanka > > > > > >
