Hi All, I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I observed that once the existing data is ingested, newly added data in mysql is not ingested. At the same time, if I add some data to mysql when the ingestion is still going on, the newly added data is also ingested on HDFS.
In the code, fetching data in batches in achieved using fetchSize parameter that limits the number of tuples to fetch per result set and pageNumber is used internally to manage the offset calculation as ( fetchSize * pageNumber). The pageNumber is incremented per window. When the existing tuples are ingested, there is no further data ingest but the pageNumber variable is still incremented. This results is trying to fetch data that is beyond the number of tuples in the table/queryresult. Changing offset calculations to tuples read so far will fix this issue and the operator can then be used to poll for newer data in the table. If you need to have a quick look at the code: https://github.com/apache/ incubator-apex-malhar/blob/master/library/src/main/java/ com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java Side observation: fetchDirection variable is unused in the code. Will remove it from the class. Would like get your thoughts on my observations. I will create a JIRA and open a PR based on inputs received on this thread. Regards, Sandeep
