[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-09-04 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Attachment: HIVE-18772.02.patch

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-09-04 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Attachment: HIVE-18772.02.patch

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch, 
> HIVE-18772.02.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-09-10 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Status: Open  (was: Patch Available)

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch, 
> HIVE-18772.02.patch, HIVE-18772.03.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-09-10 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Attachment: HIVE-18772.03.patch

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch, 
> HIVE-18772.02.patch, HIVE-18772.03.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-10-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Description: 
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions

See this 
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
currently active readers

This means that no active transaction in the system sees any txn with txnid < X 
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in 
getAcidState(), any files determined by this call as 'obsolete', will be seen 
as obsolete by any existing/future reader, i.e. can be physically deleted.

This is also necessary for multi-statement transactions where relying on the 
state of Lock Manager is not sufficient.  For example

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
use base_14 is that may have absorbed delete events from delete_delta_14.

Another Use Case
There is delta_1_1 and delta_2_2 on disk both created by committed txns.
T5 starts reading these.  At the same time compactor creates delta_1_2.
Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them while 
the read is still in progress.  This is because Compactor itself is not running 
in a txn and the files that
it produces are visible immediately.  If it ran in a txn, the new files would 
only be visible once
this txn is visible to others (including the Cleaner).  

Using MIN_HISTORY_LEVEL solves this.

See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL


  was:
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions

See this 
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
currently active readers

This means that no active transaction in the system sees any txn with txnid < X 
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in 
getAcidState(), any files determined by this call as 'obsolete', will be seen 
as obsolete by any existing/future reader, i.e. can be physically deleted.

This is also necessary for multi-statement transactions where relying on the 
state of Lock Manager is not sufficient.  For example

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
use base_14 is that may have absorbed delete events from delete_delta_14.

Using MIN_HISTORY_LEVEL solves this.

See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch, 
> HIVE-18772.02.patch, HIVE-18772.03.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn

[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-11-01 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Attachment: HIVE-18772.04.patch

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch, 
> HIVE-18772.02.patch, HIVE-18772.03.patch, HIVE-18772.04.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Another Use Case
> There is delta_1_1 and delta_2_2 on disk both created by committed txns.
> T5 starts reading these.  At the same time compactor creates delta_1_2.
> Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them 
> while the read is still in progress.  This is because Compactor itself is not 
> running in a txn and the files that
> it produces are visible immediately.  If it ran in a txn, the new files would 
> only be visible once
> this txn is visible to others (including the Cleaner).  
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Description: 
Instead of using Lock Manager state as it currently does.

This will eliminate possible race conditions



  was:
Instead of using Lock Manager state as it currently does.

This will eliminate possible race conditions


> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Target Version/s: 4.0.0  (was: 3.0.0)

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Description: 
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions

See this 
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
currently active readers

This means that no active transaction in the system sees any txn with txnid < X 
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in 
getAcidState(), any files determined by this call as 'obsolete', will be seen 
as obsolete by any existing/future reader, i.e. can be physically deleted.

This is also necessary for multi-statement transactions where relying on the 
state of Lock Manager is not sufficient.  For example

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
use base_14 is that may have absorbed delete events from delete_delta_14.

Using MIN_HISTORY_LEVEL solves this.


  was:
Instead of using Lock Manager state as it currently does.

This will eliminate possible race conditions




> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Description: 
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions

See this 
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
currently active readers

This means that no active transaction in the system sees any txn with txnid < X 
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in 
getAcidState(), any files determined by this call as 'obsolete', will be seen 
as obsolete by any existing/future reader, i.e. can be physically deleted.

This is also necessary for multi-statement transactions where relying on the 
state of Lock Manager is not sufficient.  For example

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
use base_14 is that may have absorbed delete events from delete_delta_14.

Using MIN_HISTORY_LEVEL solves this.

See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL


  was:
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions

See this 
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
currently active readers

This means that no active transaction in the system sees any txn with txnid < X 
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in 
getAcidState(), any files determined by this call as 'obsolete', will be seen 
as obsolete by any existing/future reader, i.e. can be physically deleted.

This is also necessary for multi-statement transactions where relying on the 
state of Lock Manager is not sufficient.  For example

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
use base_14 is that may have absorbed delete events from delete_delta_14.

Using MIN_HISTORY_LEVEL solves this.



> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it

[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-24 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Attachment: HIVE-18772.01.patch

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-08-24 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18772:
--
Status: Patch Available  (was: Open)

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18772.01.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this 
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each 
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all 
> currently active readers
> This means that no active transaction in the system sees any txn with txnid < 
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in 
> getAcidState(), any files determined by this call as 'obsolete', will be seen 
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the 
> state of Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be 
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot 
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)