The current implementation does query every window. I would be fixing existing operator for polling only. Work is being done on optimized JDBC connetor: https://issues.apache.org/jira/browse/APEXMALHAR-2066 , so will not be spending time improving this operator.
FetchDirection is a hint and can't be guaranteed. So, would be good not to base the logic on it. Regards, Sandeep On Tue, May 10, 2016 at 12:35 AM, Bhupesh Chawda <[email protected]> wrote: > +1 to incremental scan. > > In the case where newly added data is also ingested, are we querying > multiple times with the same query? Or is the resultset of the first query > updated continuously with the newer records? In the latter case, the > Resultset can effectively be an infinitely iterable set. > > > ~Bhupesh > > On Mon, May 9, 2016 at 10:34 AM, Priyanka Gugale <[email protected] > > > wrote: > > > Incremental scan was not available with jdbc operator till now. +1 for > > adding that. > > > > -Priyanka > > > > On Mon, May 9, 2016 at 8:56 AM, Mohit Jotwani <[email protected]> > > wrote: > > > > > +1 for incremental data. > > > > > > Regards, > > > Mohit > > > On 9 May 2016 19:59, "Yogi Devendra" <[email protected]> > > wrote: > > > > > > > +1 for incremental data fetching. > > > > for fetchDirection variable; it is better to get inputs from original > > > > author (if possible). > > > > > > > > ~ Yogi > > > > > > > > On 9 May 2016 at 19:04, Akshay Gore <[email protected]> wrote: > > > > > > > > > +1 for incremental data fetching. This is a must-have feature. > > > > > > > > > > -Akshay > > > > > On 09-May-2016 3:39 pm, "Sandeep Deshmukh" < > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I am using JdbcPOJOInputOperator to ingest data from mysql to > > HDFS. I > > > > > > observed that once the existing data is ingested, newly added > data > > > in > > > > > > mysql is not ingested. At the same time, if I add some data to > > mysql > > > > when > > > > > > the ingestion is still going on, the newly added data is also > > > ingested > > > > on > > > > > > HDFS. > > > > > > > > > > > > In the code, fetching data in batches in achieved using fetchSize > > > > > parameter > > > > > > that limits the number of tuples to fetch per result set and > > > pageNumber > > > > > is > > > > > > used internally to manage the offset calculation as ( fetchSize * > > > > > > pageNumber). The pageNumber is incremented per window. > > > > > > > > > > > > When the existing tuples are ingested, there is no further data > > > ingest > > > > > but > > > > > > the pageNumber variable is still incremented. This results is > > trying > > > to > > > > > > fetch data that is beyond the number of tuples in the > > > > table/queryresult. > > > > > > > > > > > > Changing offset calculations to tuples read so far will fix this > > > issue > > > > > and > > > > > > the operator can then be used to poll for newer data in the > table. > > > > > > > > > > > > If you need to have a quick look at the code: > > > > https://github.com/apache/ > > > > > > incubator-apex-malhar/blob/master/library/src/main/java/ > > > > > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java > > > > > > > > > > > > Side observation: fetchDirection variable is unused in the code. > > Will > > > > > > remove it from the class. > > > > > > > > > > > > Would like get your thoughts on my observations. I will create a > > JIRA > > > > and > > > > > > open a PR based on inputs received on this thread. > > > > > > > > > > > > Regards, > > > > > > Sandeep > > > > > > > > > > > > > > > > > > > > >
