Re: JdbcPOJOInputOperator Behaviour

Sandeep Deshmukh Tue, 10 May 2016 09:39:42 -0700

The current implementation does query every window. I would be fixing
existing operator for polling only. Work is being done on optimized JDBC
connetor: https://issues.apache.org/jira/browse/APEXMALHAR-2066 , so will
not be spending time improving this operator.


FetchDirection is a hint and can't be guaranteed. So, would be good not to
base the logic on it.

Regards,
Sandeep

On Tue, May 10, 2016 at 12:35 AM, Bhupesh Chawda <[email protected]>
wrote:

> +1 to incremental scan.
>
> In the case where newly added data is also ingested, are we querying
> multiple times with the same query? Or is the resultset of the first query
> updated continuously with the newer records? In the latter case, the
> Resultset can effectively be an infinitely iterable set.
>
>
> ~Bhupesh
>
> On Mon, May 9, 2016 at 10:34 AM, Priyanka Gugale <[email protected]
> >
> wrote:
>
> > Incremental scan was not available with jdbc operator till now. +1 for
> > adding that.
> >
> > -Priyanka
> >
> > On Mon, May 9, 2016 at 8:56 AM, Mohit Jotwani <[email protected]>
> > wrote:
> >
> > > +1 for incremental data.
> > >
> > > Regards,
> > > Mohit
> > > On 9 May 2016 19:59, "Yogi Devendra" <[email protected]>
> > wrote:
> > >
> > > > +1 for incremental data fetching.
> > > > for fetchDirection variable; it is better to get inputs from original
> > > > author (if possible).
> > > >
> > > > ~ Yogi
> > > >
> > > > On 9 May 2016 at 19:04, Akshay Gore <[email protected]> wrote:
> > > >
> > > > > +1 for incremental data fetching. This is a must-have feature.
> > > > >
> > > > > -Akshay
> > > > > On 09-May-2016 3:39 pm, "Sandeep Deshmukh" <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I am using JdbcPOJOInputOperator to ingest data from mysql to
> > HDFS. I
> > > > > > observed that  once the existing data is ingested, newly added
> data
> > > in
> > > > > > mysql is not ingested. At the same time, if I add some data to
> > mysql
> > > > when
> > > > > > the ingestion is still going on, the newly added data is also
> > > ingested
> > > > on
> > > > > > HDFS.
> > > > > >
> > > > > > In the code, fetching data in batches in achieved using fetchSize
> > > > > parameter
> > > > > > that limits the number of tuples to fetch per result set and
> > > pageNumber
> > > > > is
> > > > > > used internally to manage the offset calculation as ( fetchSize *
> > > > > > pageNumber). The pageNumber is incremented per window.
> > > > > >
> > > > > > When the existing tuples are ingested, there is no further data
> > > ingest
> > > > > but
> > > > > > the pageNumber variable is still incremented. This results is
> > trying
> > > to
> > > > > > fetch data that is beyond the number of tuples in the
> > > > table/queryresult.
> > > > > >
> > > > > > Changing offset calculations to tuples read so far will fix this
> > > issue
> > > > > and
> > > > > > the operator can then be used to poll for newer data in the
> table.
> > > > > >
> > > > > > If you need to have a quick look at the code:
> > > > https://github.com/apache/
> > > > > > incubator-apex-malhar/blob/master/library/src/main/java/
> > > > > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java
> > > > > >
> > > > > > Side observation: fetchDirection variable is unused in the code.
> > Will
> > > > > > remove it from the class.
> > > > > >
> > > > > > Would like get your thoughts on my observations. I will create a
> > JIRA
> > > > and
> > > > > > open a PR based on inputs received on this thread.
> > > > > >
> > > > > > Regards,
> > > > > > Sandeep
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: JdbcPOJOInputOperator Behaviour

Reply via email to