Ability to ingest newly added data is desirable.
+1 for the proposal.

Thanks,
Shubham

On Mon, May 9, 2016 at 3:38 PM, Sandeep Deshmukh <[email protected]>
wrote:

> Hi All,
>
> I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I
> observed that  once the existing data is ingested, newly added data in
> mysql is not ingested. At the same time, if I add some data to mysql when
> the ingestion is still going on, the newly added data is also ingested on
> HDFS.
>
> In the code, fetching data in batches in achieved using fetchSize parameter
> that limits the number of tuples to fetch per result set and pageNumber is
> used internally to manage the offset calculation as ( fetchSize *
> pageNumber). The pageNumber is incremented per window.
>
> When the existing tuples are ingested, there is no further data ingest but
> the pageNumber variable is still incremented. This results is trying to
> fetch data that is beyond the number of tuples in the table/queryresult.
>
> Changing offset calculations to tuples read so far will fix this issue and
> the operator can then be used to poll for newer data in the table.
>
> If you need to have a quick look at the code: https://github.com/apache/
> incubator-apex-malhar/blob/master/library/src/main/java/
> com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java
>
> Side observation: fetchDirection variable is unused in the code. Will
> remove it from the class.
>
> Would like get your thoughts on my observations. I will create a JIRA and
> open a PR based on inputs received on this thread.
>
> Regards,
> Sandeep
>

Reply via email to