Hey Chinmay/Priyanka, We need to write tuples exactly once in the store. Please address the failure scenarios on how to achieve exactly once and idempotency. I mentioned in my previous mail why multiple batches in a window is a problem with exactly once.
Control via app window would mean, tuning the functionality by controlling the platform params. I think it's best one gets option to seperate the concerns of platform and that of app logic. Application window is a unit of aggregation. Every operator in a DAG can have different application window which is the support platform provides for application logic. Chandni On Mon, Dec 28, 2015 at 10:35 AM, Chinmay Kolhatkar <[email protected] > wrote: > Hi, > > Just a thought on how it can possibly be done. > > The pseudo code might look like this: > > processTuple() > { > If(batchSize < configuredBatchSize){ > //add to the batch > } > Else { > // process the batch as a transaction > // empty the data structure of batch. > } > } > > endWindow() > { > // process the batch as transaction. > // empty the data structure of batch. > } > > This way, user can get better/direct control over what transaction means. > > As chandni rightly said, one can reduce the application window size for the > operator, and that would reduce the batch size. But that's not something > which looks intuitive from user's perspective. > Control via app window would mean, tuning the functionality by controlling > the platform params. I think it's best one gets option to seperate the > concerns of platform and that of app logic. > > If one wants to control the batch size, he/she should be able to do that by > just setting the property of batch size(a number), and not by changing app > window size (an indirect time unit). > > ~ Chinmay > On 28 Dec 2015 22:53, "Chandni Singh" <[email protected]> wrote: > > > But you will not allow multiple batches in the same window? > > Can you please elaborate on failure scenarios and how it affects > > idempotency. > > > > Chandni > > > > On Mon, Dec 28, 2015 at 2:32 AM, Priyanka Gugale < > [email protected] > > > > > wrote: > > > > > Hi, > > > > > > Sorry if I was not clear, but I am trying to propose the MAX_SIZE per > > > window which the operator could process. The size could be less than > the > > > MAX_SIZE, no restriction about that. > > > > > > -Priyanka > > > > > > On Mon, Dec 28, 2015 at 3:22 PM, Chandni Singh < > [email protected]> > > > wrote: > > > > > > > How do you propose to to restrict the no. of tuples processed in an > > > > application window < batch size. > > > > > > > > I don't see a way to enforce that batch size can never be less tuples > > > > processed in an application window. > > > > > > > > On Mon, Dec 28, 2015 at 1:25 AM, Priyanka Gugale <[email protected]> > > > > wrote: > > > > > > > > > Hi Chandni, > > > > > > > > > > How about restricting tuples which can be processed per window. If > > > > someone > > > > > wants to process small and frequent batches, he can set batch size > to > > > > some > > > > > small value and also reduce the window size. This would build some > > back > > > > > pressure of course. But that could be acceptable if one really want > > to > > > > > restrict batch size. > > > > > The though was triggered while working on Cassandra output > operator. > > > > > Cassandra creates problem in processing batches of size greater > than > > > some > > > > > value (don't recall exact number right now). Other databases may > want > > > to > > > > > restrict the batch size for similar or other reasons. > > > > > > > > > > -Priyanka > > > > > > > > > > On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Priyanka, > > > > > > > > > > > > AbstractBatchTransactionableStore assumes all tuples in one > > > application > > > > > as > > > > > > a batch because it needs to store the tuples in the store > > > exactly-once. > > > > > > > > > > > > If there is more than one batch in an application window, then to > > > store > > > > > the > > > > > > tuples exactly once the window Id needs to be written with every > > > tuple > > > > as > > > > > > well which is not that efficient. Therefore we take advantage of > > the > > > > > > transaction support by saving just the window id once (not with > > every > > > > > > tuple) but this necessitates all the tuples to be considered as a > > > > batch. > > > > > > > > > > > > Every operator in a DAG can have its own application window size. > > So > > > to > > > > > > reduce the size per batch, the application window attribute needs > > to > > > be > > > > > > modified. > > > > > > > > > > > > Chandni > > > > > > > > > > > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > +1 for this. > > > > > > > > > > > > > > ~ Chinmay. > > > > > > > > > > > > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > In Malhar we have an > > > > > > > > operator AbstractBatchTransactionableStoreOutputOperator > which > > > > > creates > > > > > > > > batches based on tuples received in a window. At the end of > the > > > > > window > > > > > > > > these batches are sent to database for processing. > > > > > > > > There is no way to configure MAX_SIZE on these batches. Based > > on > > > > > input > > > > > > > rate > > > > > > > > the batch sizes can grow very high, and we might want to > > restrict > > > > > batch > > > > > > > > size. > > > > > > > > > > > > > > > > Any operator can extend and do batch management on their own, > > > but I > > > > > see > > > > > > > it > > > > > > > > as generic requirement and IMO we should change base class > i.e. > > > > > > > > AbstractBatchTransactionableStoreOutputOperator class to > accept > > > > > > MAX_SIZE > > > > > > > > for batch from outside. > > > > > > > > > > > > > > > > Any opinion on this? > > > > > > > > > > > > > > > > -Priyanka > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
