Sankar Hariappan created HIVE-21530:
---------------------------------------

             Summary: Replicate Streaming ingest on ACID tables.
                 Key: HIVE-21530
                 URL: https://issues.apache.org/jira/browse/HIVE-21530
             Project: Hive
          Issue Type: Sub-task
          Components: repl, Transactions
    Affects Versions: 4.0.0
            Reporter: Sankar Hariappan
            Assignee: mahesh kumar behera
         Attachments: Hive ACID Replication_ Streaming Ingest Tables.pdf

implement replication of hive streaming ingest of tables as per  [^Hive ACID 
Replication_ Streaming Ingest Tables.pdf] .
changes to txn_commit to include information about transaction batch.
changes to copy task to only copy if there is a difference in file size or 
checksum, seems specific to transaction batch shouldnt be used for normal 
transactions.
copy the correct sequence of files w.r.t data file + side file.
remove side files ( which looks like are suffixed as _flush in file names) when 
the batch is committed.
how do we determine the idempotent nature of the events here, update the 
corresponding table + partition and not copy new version of the file.
validate if partial copied data files are handled on the target warehouse given 
correct side file. can we leave the side file file forever, in case during 
transaction batch copy after certain transactions are copied over then primary 
warehouse fails. we wont be able to remove _flush file, on failover do we have 
to handle this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to