All,

We have created a JDBCPollInputOperator with the below features,

1. poll from external jdbc store asynchronously in the input operator.
2. polling frequency and batch size are configurable.
3.User can specify the polling query
4.User can specify the columns to fetch as a part of the result set.
5. It is idempotent and partition-able.
6. Supports both batch + polling behavior.

With the above set of features there as some assumptions for idempotency &
partitioning,
1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up
key.
2.Optionally batchSize,pollInterval,Look-up key and a where clause can be
given.
3.This operator uses static partitioning to arrive at range queries for
exactly once reads
4.Assumption is that there is an ordered column using which range queries
can be formed.
5.If an emitColumnList is provided, please ensure that the keyColumn is the
first column in the list
6.Range queries are formed using the JdbcMetaDataUtility Output - comma
separated list of the emit columns eg columnA,columnB,columnC

Per window the first and the last key processed is saved using the
FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This
(lowerBound,upperBoundPair) is then used for recovery.The queries are
constructed using the JDBCMetaDataUtility.

JDBCMetaDataUtility
A utility class used to retrieve the metadata for a given unique key of a
SQL table. This class would emit range queries based on a primary index
given.

Presently this operator has been tested with MySQL.
In the later iterations we intend to support in-clause support to enable
exactly once semantics for non-ordered key column(s).

Here's a link to the PR,
https://github.com/apache/apex-malhar/pull/282

Thanks,
Dev

Reply via email to