Hi, Sorry if I was not clear, but I am trying to propose the MAX_SIZE per window which the operator could process. The size could be less than the MAX_SIZE, no restriction about that.
-Priyanka On Mon, Dec 28, 2015 at 3:22 PM, Chandni Singh <[email protected]> wrote: > How do you propose to to restrict the no. of tuples processed in an > application window < batch size. > > I don't see a way to enforce that batch size can never be less tuples > processed in an application window. > > On Mon, Dec 28, 2015 at 1:25 AM, Priyanka Gugale <[email protected]> > wrote: > > > Hi Chandni, > > > > How about restricting tuples which can be processed per window. If > someone > > wants to process small and frequent batches, he can set batch size to > some > > small value and also reduce the window size. This would build some back > > pressure of course. But that could be acceptable if one really want to > > restrict batch size. > > The though was triggered while working on Cassandra output operator. > > Cassandra creates problem in processing batches of size greater than some > > value (don't recall exact number right now). Other databases may want to > > restrict the batch size for similar or other reasons. > > > > -Priyanka > > > > On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh <[email protected]> > > wrote: > > > > > Priyanka, > > > > > > AbstractBatchTransactionableStore assumes all tuples in one application > > as > > > a batch because it needs to store the tuples in the store exactly-once. > > > > > > If there is more than one batch in an application window, then to store > > the > > > tuples exactly once the window Id needs to be written with every tuple > as > > > well which is not that efficient. Therefore we take advantage of the > > > transaction support by saving just the window id once (not with every > > > tuple) but this necessitates all the tuples to be considered as a > batch. > > > > > > Every operator in a DAG can have its own application window size. So to > > > reduce the size per batch, the application window attribute needs to be > > > modified. > > > > > > Chandni > > > > > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar < > > > [email protected]> > > > wrote: > > > > > > > +1 for this. > > > > > > > > ~ Chinmay. > > > > > > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale <[email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > In Malhar we have an > > > > > operator AbstractBatchTransactionableStoreOutputOperator which > > creates > > > > > batches based on tuples received in a window. At the end of the > > window > > > > > these batches are sent to database for processing. > > > > > There is no way to configure MAX_SIZE on these batches. Based on > > input > > > > rate > > > > > the batch sizes can grow very high, and we might want to restrict > > batch > > > > > size. > > > > > > > > > > Any operator can extend and do batch management on their own, but I > > see > > > > it > > > > > as generic requirement and IMO we should change base class i.e. > > > > > AbstractBatchTransactionableStoreOutputOperator class to accept > > > MAX_SIZE > > > > > for batch from outside. > > > > > > > > > > Any opinion on this? > > > > > > > > > > -Priyanka > > > > > > > > > > > > > > >
