Ashwin Chandra Putta created APEXMALHAR-2037:
------------------------------------------------
Summary: Reconciled jdbc output operator with ability to spool to
disk
Key: APEXMALHAR-2037
URL: https://issues.apache.org/jira/browse/APEXMALHAR-2037
Project: Apache Apex Malhar
Issue Type: New Feature
Reporter: Ashwin Chandra Putta
Assignee: Ashwin Chandra Putta
There are many use cases in which we are writing tuples to external system
using jdbc etc. There are instances when the external system might be slow and
down for some time. In those cases, the current implementation of jdbc output
operators fail and restart until the external system is up again. Meanwhile,
the DAG is slowed down by this operator. To deal with such scenarios, we can
write the output in a reconciled fashion where the reconciler thread is writing
at the pace of external system. We should also provide an ability to spool the
data to disk when the external system is down or the output operators queue is
full.
Here are the proposed features for the output operator.
1. Write to external system in a separate reconciler thread.
2. Queue the tuples in memory for reconciler thread to consume.
3. Spool the incoming tuples to hdfs using a WAL when the queue size is reached.
4. Read from WAL and write to queue as queue is being consumed.
5. When external system is able to consumer as fast as incoming throughput, WAL
is not written. The queue will just buffer the tuples before writing to
external system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)