+1 for having this as a platform feature. On Mon, Apr 4, 2016 at 12:27 PM, Bhupesh Chawda <[email protected]> wrote:
> +1 on making this an Apex feature. It would be good to have all output > stores supported with this kind of functionality. > > ~Bhupesh > > On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale < > [email protected]> > wrote: > > > +1 for this feature for all output operators. I have faced this issue > while > > working on CassandraOutput operator as well. > > > > -Priyanka > > > > On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <[email protected]> > > wrote: > > > > > Ashwin, abstract class isn't suitable. > > > > > > The basic ingredient is WAL, from which the operator consumes as > > > appropriate. We already have the WAL implementation. > > > > > > Incoming tuples are added to the WAL, consumption is async. WAL offset > is > > > checkpointed. > > > > > > -- > > > sent from mobile > > > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" < > > [email protected]> > > > wrote: > > > > > > > @Chinmay I went through your code and related comments on pull > > request. I > > > > was originally thinking of using AbstractReconciler but seems like > > > > AbstractAsyncProcessor will be better because 1. it has multithreaded > > > > output 2. it does not wait for committed window to process queued > > tuples. > > > > One question though, is it fault tolerant? > > > > > > > > @Thomas Current design creates an abstract implementation, so it > needs > > to > > > > be extended to create specific output writers. Need to think more > about > > > how > > > > to make this pluggable. > > > > > > > > @Others I am planning to get the common functionality out to make it > > > > reusable for outputs to any external system. > > > > > > > > Regards, > > > > Ashwin. > > > > > > > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise < > [email protected] > > > > > > > wrote: > > > > > > > > > Can this be solved through a pluggable component of operator, > similar > > > to > > > > > window data manager? > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > +1 on making it a feature for all output operators that depend on > > > > > external > > > > > > system > > > > > > > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > +1 on making it apex feature. > > > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > +1 for the idea, love to have this feature available for > > > operators > > > > > > > writing > > > > > > > > to external systems. > > > > > > > > > > > > > > > > - Tushar. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar < > > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Ashwin, > > > > > > > > > > > > > > > > > > This is indeed a very required functionality. > > > > > > > > > > > > > > > > > > I think this is applicable for all three types of > operators: > > > > Input > > > > > > > (e.g. > > > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), > Output > > > > > > > > (JDBCOutput, > > > > > > > > > HDFSOutput, etc) > > > > > > > > > > > > > > > > > > I tried to do something similar: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java > > > > > > > > > > > > > > > > > > But this had limitation related extending another class > other > > > > than > > > > > > this > > > > > > > > > abstract class etc.... > > > > > > > > > > > > > > > > > > From what you have mentioned above, I see the functionality > > > > > basically > > > > > > > > > provides asynchronous processing of tuples. > > > > > > > > > > > > > > > > > > Considering this is a common functionality, Can this > feature > > > > > instead > > > > > > be > > > > > > > > > added in platform (apex-core) itself? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Chinmay. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani < > > > > > > [email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Dear Ashwin, > > > > > > > > > > > > > > > > > > > > The approach sounds good. I am assuming that this will be > > > done > > > > > for > > > > > > > all > > > > > > > > > the > > > > > > > > > > output data stores and not limited to JDBC. > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Mohit > > > > > > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta < > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > There are many use cases in which we are writing tuples > > to > > > > > > external > > > > > > > > > > system > > > > > > > > > > > using JDBC etc. There are instances when the external > > > system > > > > > > might > > > > > > > be > > > > > > > > > > slow > > > > > > > > > > > and down for some time. In those cases, the current > > > > > > implementation > > > > > > > of > > > > > > > > > > jdbc > > > > > > > > > > > output operators fail and restart until the external > > system > > > > is > > > > > up > > > > > > > > > again. > > > > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To > > deal > > > > > with > > > > > > > such > > > > > > > > > > > scenarios, we should write the output in a reconciled > > > fashion > > > > > > where > > > > > > > > the > > > > > > > > > > > reconciler thread is writing at the pace of external > > > system. > > > > We > > > > > > > > should > > > > > > > > > > also > > > > > > > > > > > provide an ability to spool the data to disk when the > > > > external > > > > > > > system > > > > > > > > > is > > > > > > > > > > > down or the output operators queue is full. > > > > > > > > > > > > > > > > > > > > > > Here are the proposed features for the output operator. > > > > > > > > > > > > > > > > > > > > > > 1. Write to external system in a separate reconciler > > > thread. > > > > > > > > > > > 2. Queue the tuples in memory for reconciler thread to > > > > consume. > > > > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when > the > > > > queue > > > > > > is > > > > > > > > > full. > > > > > > > > > > > 4. Read from WAL and write to queue as queue is being > > > > consumed. > > > > > > > > > > > 5. When external system is able to consume as fast as > > > > incoming > > > > > > > > > > throughput, > > > > > > > > > > > WAL is not written. The queue will just buffer the > tuples > > > > > before > > > > > > > > > writing > > > > > > > > > > to > > > > > > > > > > > external system. > > > > > > > > > > > > > > > > > > > > > > Here is the JIRA: > > > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037 > > > > > > > > > > > > > > > > > > > > > > Please let me know if you have any feedback on the > > design. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Ashwin. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Regards, > > > > Ashwin. > > > > > > > > > >
