Created https://issues.apache.org/jira/browse/APEXMALHAR-2090 to track this. Will be opening the PR soon.
Regards, Sandeep On Tue, May 10, 2016 at 10:07 PM, Sandeep Deshmukh <[email protected]> wrote: > The current implementation does query every window. I would be fixing > existing operator for polling only. Work is being done on optimized JDBC > connetor: https://issues.apache.org/jira/browse/APEXMALHAR-2066 , so will > not be spending time improving this operator. > > FetchDirection is a hint and can't be guaranteed. So, would be good not to > base the logic on it. > > Regards, > Sandeep > > On Tue, May 10, 2016 at 12:35 AM, Bhupesh Chawda <[email protected]> > wrote: > >> +1 to incremental scan. >> >> In the case where newly added data is also ingested, are we querying >> multiple times with the same query? Or is the resultset of the first query >> updated continuously with the newer records? In the latter case, the >> Resultset can effectively be an infinitely iterable set. >> >> >> ~Bhupesh >> >> On Mon, May 9, 2016 at 10:34 AM, Priyanka Gugale < >> [email protected]> >> wrote: >> >> > Incremental scan was not available with jdbc operator till now. +1 for >> > adding that. >> > >> > -Priyanka >> > >> > On Mon, May 9, 2016 at 8:56 AM, Mohit Jotwani <[email protected]> >> > wrote: >> > >> > > +1 for incremental data. >> > > >> > > Regards, >> > > Mohit >> > > On 9 May 2016 19:59, "Yogi Devendra" <[email protected]> >> > wrote: >> > > >> > > > +1 for incremental data fetching. >> > > > for fetchDirection variable; it is better to get inputs from >> original >> > > > author (if possible). >> > > > >> > > > ~ Yogi >> > > > >> > > > On 9 May 2016 at 19:04, Akshay Gore <[email protected]> wrote: >> > > > >> > > > > +1 for incremental data fetching. This is a must-have feature. >> > > > > >> > > > > -Akshay >> > > > > On 09-May-2016 3:39 pm, "Sandeep Deshmukh" < >> [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi All, >> > > > > > >> > > > > > I am using JdbcPOJOInputOperator to ingest data from mysql to >> > HDFS. I >> > > > > > observed that once the existing data is ingested, newly added >> data >> > > in >> > > > > > mysql is not ingested. At the same time, if I add some data to >> > mysql >> > > > when >> > > > > > the ingestion is still going on, the newly added data is also >> > > ingested >> > > > on >> > > > > > HDFS. >> > > > > > >> > > > > > In the code, fetching data in batches in achieved using >> fetchSize >> > > > > parameter >> > > > > > that limits the number of tuples to fetch per result set and >> > > pageNumber >> > > > > is >> > > > > > used internally to manage the offset calculation as ( fetchSize >> * >> > > > > > pageNumber). The pageNumber is incremented per window. >> > > > > > >> > > > > > When the existing tuples are ingested, there is no further data >> > > ingest >> > > > > but >> > > > > > the pageNumber variable is still incremented. This results is >> > trying >> > > to >> > > > > > fetch data that is beyond the number of tuples in the >> > > > table/queryresult. >> > > > > > >> > > > > > Changing offset calculations to tuples read so far will fix this >> > > issue >> > > > > and >> > > > > > the operator can then be used to poll for newer data in the >> table. >> > > > > > >> > > > > > If you need to have a quick look at the code: >> > > > https://github.com/apache/ >> > > > > > incubator-apex-malhar/blob/master/library/src/main/java/ >> > > > > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java >> > > > > > >> > > > > > Side observation: fetchDirection variable is unused in the code. >> > Will >> > > > > > remove it from the class. >> > > > > > >> > > > > > Would like get your thoughts on my observations. I will create a >> > JIRA >> > > > and >> > > > > > open a PR based on inputs received on this thread. >> > > > > > >> > > > > > Regards, >> > > > > > Sandeep >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
