+1 for incremental fetching of data On Mon, May 9, 2016 at 6:51 PM, Shubham Pathak <[email protected]> wrote:
> Ability to ingest newly added data is desirable. > +1 for the proposal. > > Thanks, > Shubham > > On Mon, May 9, 2016 at 3:38 PM, Sandeep Deshmukh <[email protected]> > wrote: > > > Hi All, > > > > I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I > > observed that once the existing data is ingested, newly added data in > > mysql is not ingested. At the same time, if I add some data to mysql when > > the ingestion is still going on, the newly added data is also ingested on > > HDFS. > > > > In the code, fetching data in batches in achieved using fetchSize > parameter > > that limits the number of tuples to fetch per result set and pageNumber > is > > used internally to manage the offset calculation as ( fetchSize * > > pageNumber). The pageNumber is incremented per window. > > > > When the existing tuples are ingested, there is no further data ingest > but > > the pageNumber variable is still incremented. This results is trying to > > fetch data that is beyond the number of tuples in the table/queryresult. > > > > Changing offset calculations to tuples read so far will fix this issue > and > > the operator can then be used to poll for newer data in the table. > > > > If you need to have a quick look at the code: https://github.com/apache/ > > incubator-apex-malhar/blob/master/library/src/main/java/ > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java > > > > Side observation: fetchDirection variable is unused in the code. Will > > remove it from the class. > > > > Would like get your thoughts on my observations. I will create a JIRA and > > open a PR based on inputs received on this thread. > > > > Regards, > > Sandeep > > > -- *regards,* *~pradeep*
