[ 
https://issues.apache.org/jira/browse/TEPHRA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15640677#comment-15640677
 ] 

ASF GitHub Bot commented on TEPHRA-35:
--------------------------------------

Github user anew commented on the issue:

    https://github.com/apache/incubator-tephra/pull/19
  
    Question: Suppose I start a transaction, which times out, and therefore 
goes into the invalid list. A little later HBase performs a major compaction. 
This transaction and all its writes are removed from the table by the 
DataJanitor. A little later TxManager prunes its invalid transactions, and 
because this tx has been removed from HBase, it removes it from the invalid 
list. 
    
    The problem is if the program that started the transaction is still 
running. What if it performs another write after the transaction pruning? This 
would be an invalid version, but now it has been pruned from the invalid list 
and becomes visible. 
    
    Isn't that a problem?


> Prune invalid transaction set once all data for a given invalid transaction 
> has been dropped
> --------------------------------------------------------------------------------------------
>
>                 Key: TEPHRA-35
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-35
>             Project: Tephra
>          Issue Type: New Feature
>            Reporter: Gary Helmling
>            Assignee: Poorna Chandra
>            Priority: Blocker
>         Attachments: ApacheTephraAutomaticInvalidListPruning-v2.pdf
>
>
> In addition to dropping the data from invalid transactions we need to be able 
> to prune the invalid set of any transactions where data cleanup has been 
> completely performed. Without this, the invalid set will grow indefinitely 
> and become a greater and greater cost to in-progress transactions over time.
> To do this correctly, the TransactionDataJanitor coprocessor will need to 
> maintain some bookkeeping for the transaction data that it removes, so that 
> the transaction manager can reason about when all of a given transaction's 
> data has been removed. Only at this point can the transaction manager safely 
> drop the transaction ID from the invalid set.
> One approach would be for the TransactionDataJanitor to update a table 
> marking when a major compaction was performed on a region and what 
> transaction IDs were filtered out. Once all regions in a table containing the 
> transaction data have been compacted, we can remove the filtered out 
> transaction IDs from the invalid set. However, this will need to cope with 
> changing region names due to splits, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to