Github user zentol commented on the issue:
https://github.com/apache/flink/pull/2332
Don't you loose any guarantees regarding order of mutations the moment you
use asynchronous updates anyway?
The WriteAheadSink should only be used if you want to deal with
non-deterministic programs or want to send data in atomic mini-batches and must
rely on the order of elements. Otherwise there are much simpler solutions.
If you do idempotent updates the only thing you have to do is write the
data into HBase, and make sure that every update sent for a given checkpoint is
acknowledged before it is regarded as complete. If you don't acknowledge them
you lose at-least-once guarantees. This scheme does not provide exactly-once
*delivery* guarantees, however at any given point in time the table would be in
a state as if the updates were only sent once. This is the same guarantee that
we provide for Cassandra.
For non-idempotent updates the thing gets a lot more difficult.
If you can fire an entire checkpoint as a single atomic batch you just won
the lottery, as you can use the above scheme and a small auxiliary table to
track completed checkpoints per sink subtask.
if you can't do that you will have to use system-specific
features/guarantees to engineer a solution that provides exactly-once
guarantees. Versioning, rollbacks, unique ID's; something that either allows
you to revert the table to a clean state or track precisely which updates were
applied and sent the remaining updates.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---