Arun Mahadevan created STORM-3292:
-------------------------------------

             Summary: Trident HiveState must flush writers when the batch 
commits
                 Key: STORM-3292
                 URL: https://issues.apache.org/jira/browse/STORM-3292
             Project: Apache Storm
          Issue Type: Improvement
            Reporter: Arun Mahadevan


For trident the hive writer is flushed only after it hits the batch size.

see - 
https://github.com/apache/storm/blob/master/external/storm-hive/src/main/java/org/apache/storm/hive/trident/HiveState.java#L108

Trident HiveState does not flush during the batch commit and it appears to be 
an oversight. Without this trident state cannot guarantee at-least once. (E.g. 
if the transaction is open but trident moves to the next txid and later fails 
the data in the open transaction is lost).

So I think for at-least once, the HiveState must flush all the writers 
irrespective of the batch sizes when trident invokes the "commit(txid)" .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to