Hi All,

I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I
observed that  once the existing data is ingested, newly added data in
mysql is not ingested. At the same time, if I add some data to mysql when
the ingestion is still going on, the newly added data is also ingested on
HDFS.

In the code, fetching data in batches in achieved using fetchSize parameter
that limits the number of tuples to fetch per result set and pageNumber is
used internally to manage the offset calculation as ( fetchSize *
pageNumber). The pageNumber is incremented per window.

When the existing tuples are ingested, there is no further data ingest but
the pageNumber variable is still incremented. This results is trying to
fetch data that is beyond the number of tuples in the table/queryresult.

Changing offset calculations to tuples read so far will fix this issue and
the operator can then be used to poll for newer data in the table.

If you need to have a quick look at the code: https://github.com/apache/
incubator-apex-malhar/blob/master/library/src/main/java/
com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java

Side observation: fetchDirection variable is unused in the code. Will
remove it from the class.

Would like get your thoughts on my observations. I will create a JIRA and
open a PR based on inputs received on this thread.

Regards,
Sandeep

Reply via email to