Re: JdbcPOJOInputOperator Behaviour

Sandeep Deshmukh Thu, 12 May 2016 22:17:05 -0700

Created https://issues.apache.org/jira/browse/APEXMALHAR-2090 to track
this. Will be opening the PR soon.


Regards,
Sandeep

On Tue, May 10, 2016 at 10:07 PM, Sandeep Deshmukh <[email protected]>
wrote:

> The current implementation does query every window. I would be fixing
> existing operator for polling only. Work is being done on optimized JDBC
> connetor: https://issues.apache.org/jira/browse/APEXMALHAR-2066 , so will
> not be spending time improving this operator.
>
> FetchDirection is a hint and can't be guaranteed. So, would be good not to
> base the logic on it.
>
> Regards,
> Sandeep
>
> On Tue, May 10, 2016 at 12:35 AM, Bhupesh Chawda <[email protected]>
> wrote:
>
>> +1 to incremental scan.
>>
>> In the case where newly added data is also ingested, are we querying
>> multiple times with the same query? Or is the resultset of the first query
>> updated continuously with the newer records? In the latter case, the
>> Resultset can effectively be an infinitely iterable set.
>>
>>
>> ~Bhupesh
>>
>> On Mon, May 9, 2016 at 10:34 AM, Priyanka Gugale <
>> [email protected]>
>> wrote:
>>
>> > Incremental scan was not available with jdbc operator till now. +1 for
>> > adding that.
>> >
>> > -Priyanka
>> >
>> > On Mon, May 9, 2016 at 8:56 AM, Mohit Jotwani <[email protected]>
>> > wrote:
>> >
>> > > +1 for incremental data.
>> > >
>> > > Regards,
>> > > Mohit
>> > > On 9 May 2016 19:59, "Yogi Devendra" <[email protected]>
>> > wrote:
>> > >
>> > > > +1 for incremental data fetching.
>> > > > for fetchDirection variable; it is better to get inputs from
>> original
>> > > > author (if possible).
>> > > >
>> > > > ~ Yogi
>> > > >
>> > > > On 9 May 2016 at 19:04, Akshay Gore <[email protected]> wrote:
>> > > >
>> > > > > +1 for incremental data fetching. This is a must-have feature.
>> > > > >
>> > > > > -Akshay
>> > > > > On 09-May-2016 3:39 pm, "Sandeep Deshmukh" <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > I am using JdbcPOJOInputOperator to ingest data from mysql to
>> > HDFS. I
>> > > > > > observed that  once the existing data is ingested, newly added
>> data
>> > > in
>> > > > > > mysql is not ingested. At the same time, if I add some data to
>> > mysql
>> > > > when
>> > > > > > the ingestion is still going on, the newly added data is also
>> > > ingested
>> > > > on
>> > > > > > HDFS.
>> > > > > >
>> > > > > > In the code, fetching data in batches in achieved using
>> fetchSize
>> > > > > parameter
>> > > > > > that limits the number of tuples to fetch per result set and
>> > > pageNumber
>> > > > > is
>> > > > > > used internally to manage the offset calculation as ( fetchSize
>> *
>> > > > > > pageNumber). The pageNumber is incremented per window.
>> > > > > >
>> > > > > > When the existing tuples are ingested, there is no further data
>> > > ingest
>> > > > > but
>> > > > > > the pageNumber variable is still incremented. This results is
>> > trying
>> > > to
>> > > > > > fetch data that is beyond the number of tuples in the
>> > > > table/queryresult.
>> > > > > >
>> > > > > > Changing offset calculations to tuples read so far will fix this
>> > > issue
>> > > > > and
>> > > > > > the operator can then be used to poll for newer data in the
>> table.
>> > > > > >
>> > > > > > If you need to have a quick look at the code:
>> > > > https://github.com/apache/
>> > > > > > incubator-apex-malhar/blob/master/library/src/main/java/
>> > > > > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java
>> > > > > >
>> > > > > > Side observation: fetchDirection variable is unused in the code.
>> > Will
>> > > > > > remove it from the class.
>> > > > > >
>> > > > > > Would like get your thoughts on my observations. I will create a
>> > JIRA
>> > > > and
>> > > > > > open a PR based on inputs received on this thread.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Sandeep
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: JdbcPOJOInputOperator Behaviour

Reply via email to