+1 for incremental fetching of data

On Mon, May 9, 2016 at 6:51 PM, Shubham Pathak <[email protected]>
wrote:

> Ability to ingest newly added data is desirable.
> +1 for the proposal.
>
> Thanks,
> Shubham
>
> On Mon, May 9, 2016 at 3:38 PM, Sandeep Deshmukh <[email protected]>
> wrote:
>
> > Hi All,
> >
> > I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I
> > observed that  once the existing data is ingested, newly added data in
> > mysql is not ingested. At the same time, if I add some data to mysql when
> > the ingestion is still going on, the newly added data is also ingested on
> > HDFS.
> >
> > In the code, fetching data in batches in achieved using fetchSize
> parameter
> > that limits the number of tuples to fetch per result set and pageNumber
> is
> > used internally to manage the offset calculation as ( fetchSize *
> > pageNumber). The pageNumber is incremented per window.
> >
> > When the existing tuples are ingested, there is no further data ingest
> but
> > the pageNumber variable is still incremented. This results is trying to
> > fetch data that is beyond the number of tuples in the table/queryresult.
> >
> > Changing offset calculations to tuples read so far will fix this issue
> and
> > the operator can then be used to poll for newer data in the table.
> >
> > If you need to have a quick look at the code: https://github.com/apache/
> > incubator-apex-malhar/blob/master/library/src/main/java/
> > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java
> >
> > Side observation: fetchDirection variable is unused in the code. Will
> > remove it from the class.
> >
> > Would like get your thoughts on my observations. I will create a JIRA and
> > open a PR based on inputs received on this thread.
> >
> > Regards,
> > Sandeep
> >
>



-- 
*regards,*
*~pradeep*

Reply via email to