All, We have created a JDBCPollInputOperator with the below features,
1. poll from external jdbc store asynchronously in the input operator. 2. polling frequency and batch size are configurable. 3.User can specify the polling query 4.User can specify the columns to fetch as a part of the result set. 5. It is idempotent and partition-able. 6. Supports both batch + polling behavior. With the above set of features there as some assumptions for idempotency & partitioning, 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key. 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given. 3.This operator uses static partitioning to arrive at range queries for exactly once reads 4.Assumption is that there is an ordered column using which range queries can be formed. 5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in the list 6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of the emit columns eg columnA,columnB,columnC Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the JDBCMetaDataUtility. JDBCMetaDataUtility A utility class used to retrieve the metadata for a given unique key of a SQL table. This class would emit range queries based on a primary index given. Presently this operator has been tested with MySQL. In the later iterations we intend to support in-clause support to enable exactly once semantics for non-ordered key column(s). Here's a link to the PR, https://github.com/apache/apex-malhar/pull/282 Thanks, Dev