[ https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-18772: ---------------------------------- Description: Instead of using Lock Manager state as it currently does. This will eliminate possible race conditions See this [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208] Suppose A is the set of all ValidTxnList across all active readers. Each ValidTxnList has minOpenTxnId. MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all currently active readers This means that no active transaction in the system sees any txn with txnid < X as open. This means if construct ValidTxnIdList with HWM=X-1 and use that in getAcidState(), any files determined by this call as 'obsolete', will be seen as obsolete by any existing/future reader, i.e. can be physically deleted. This is also necessary for multi-statement transactions where relying on the state of Lock Manager is not sufficient. For example Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open. 13 commits (via it's parent txn) at t2 > t1. (17 is still running). Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on Table1/Part1 (17 is still running) Now delta_13 may be cleaned since it can be seen as obsolete and there may be no locks on it, i.e. no one is reading it. Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot use base_14 is that may have absorbed delete events from delete_delta_14. Using MIN_HISTORY_LEVEL solves this. was: Instead of using Lock Manager state as it currently does. This will eliminate possible race conditions > Make Acid Cleaner use MIN_HISTORY_LEVEL > --------------------------------------- > > Key: HIVE-18772 > URL: https://issues.apache.org/jira/browse/HIVE-18772 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 3.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Major > > Instead of using Lock Manager state as it currently does. > This will eliminate possible race conditions > See this > [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208] > Suppose A is the set of all ValidTxnList across all active readers. Each > ValidTxnList has minOpenTxnId. > MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all > currently active readers > This means that no active transaction in the system sees any txn with txnid < > X as open. > This means if construct ValidTxnIdList with HWM=X-1 and use that in > getAcidState(), any files determined by this call as 'obsolete', will be seen > as obsolete by any existing/future reader, i.e. can be physically deleted. > This is also necessary for multi-statement transactions where relying on the > state of Lock Manager is not sufficient. For example > Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open. > 13 commits (via it's parent txn) at t2 > t1. (17 is still running). > Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on > Table1/Part1 (17 is still running) > Now delta_13 may be cleaned since it can be seen as obsolete and there may be > no locks on it, i.e. no one is reading it. > Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot > use base_14 is that may have absorbed delete events from delete_delta_14. > Using MIN_HISTORY_LEVEL solves this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)