sankarh commented on a change in pull request #579: HIVE-21109 : Support stats
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
##########
File path:
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
##########
@@ -84,6 +86,73 @@ public static ValidTxnList
createValidReadTxnList(GetOpenTxnsResponse txns, long
return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark,
minOpenTxnId);
}
+ /**
+ * Transform a {@link
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+ * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that
the caller intends to
+ * read the files, and thus treats both open and aborted transactions as
invalid.
+ *
+ * This API is used by Hive replication which may have multiple transactions
open at a time.
+ *
+ * @param txns open txn list from the metastore
+ * @param currentTxns Current transactions that the replication has opened.
If any of the
+ * transactions is greater than 0 it will be removed from
the exceptions
+ * list so that the replication sees its own transaction
as valid.
+ * @return a valid txn list.
+ */
+ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
Review comment:
The complete logic of considering all txns opened in a batch by open txn
event as current txns is incorrect.
Multiple txns are opened by repl task only for replicating Hive Streaming
case where we allocate txns batch but use one at a time. Also, we don't update
stats in that case. Even if we update stats, it should refer to one txn as
current txn and rest of the txns are left open.
Shall remove replTxnIds cache in TxnManager as well. All callers shall
create a hardcoded ValidWriteIdList using the writeId received from event msg.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services