Ability to ingest newly added data is desirable. +1 for the proposal. Thanks, Shubham
On Mon, May 9, 2016 at 3:38 PM, Sandeep Deshmukh <[email protected]> wrote: > Hi All, > > I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I > observed that once the existing data is ingested, newly added data in > mysql is not ingested. At the same time, if I add some data to mysql when > the ingestion is still going on, the newly added data is also ingested on > HDFS. > > In the code, fetching data in batches in achieved using fetchSize parameter > that limits the number of tuples to fetch per result set and pageNumber is > used internally to manage the offset calculation as ( fetchSize * > pageNumber). The pageNumber is incremented per window. > > When the existing tuples are ingested, there is no further data ingest but > the pageNumber variable is still incremented. This results is trying to > fetch data that is beyond the number of tuples in the table/queryresult. > > Changing offset calculations to tuples read so far will fix this issue and > the operator can then be used to poll for newer data in the table. > > If you need to have a quick look at the code: https://github.com/apache/ > incubator-apex-malhar/blob/master/library/src/main/java/ > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java > > Side observation: fetchDirection variable is unused in the code. Will > remove it from the class. > > Would like get your thoughts on my observations. I will create a JIRA and > open a PR based on inputs received on this thread. > > Regards, > Sandeep >
