[jira] [Updated] (RATIS-2094) TransactionContext's stateMachineLogEntry and stateMachineContext may cause corruption

Duong (Jira) Wed, 22 May 2024 18:07:05 -0700


     [ 
https://issues.apache.org/jira/browse/RATIS-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Duong updated RATIS-2094:
-------------------------
    Description: 
stateMachineLogEntry and stateMachineContext are parsed/created from 
RaftClientRequest or LogEntryProto and attached to TransactionContext in the 
StateMachine.startTransaction methods.

There are 2 variants of StateMachine.startTransaction;

1. startTransaction(RaftClientRequest): This is called only on the leader side. 
The result of this method is not cached and is passed temporarily alongside 
RaftClientRequest for further processing, for example used by 
StateMachine.write. 

2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both leader 
and follower side. The result of this call is cached on the node
 * On leader: this is called right before applyTransaction to produce a 
TransactionContext for StateMachine.applyTransaction.

 * On follower: this is called when the appendEntries request is received. The 
resulting TransactionContext is cached to be used by StateMachine.write and 
then StateMachine.applyTransaction.

The startTransaction methods are called with the RaftClientRequest or 
LogEntryProto parsed directly from the original zero-copy buffers. In turn, the 
stateMachineLogEntry and stateMachineContext (which is parsed/created from 
them) can contain data reference directly to the original zero-copy buffer 
without an explicit referent counter.

 

For the use-case of stateMachineCache=false, this fortunately, doesn't cause 
corruption because the LogEntries liked with the original buffers are cached in 
LogCache, and the cached LogEntries (always) outlive the cached 
TransactionContexts (?).

 

For the use-case of stateMachine=true, this may cause corruption, because the 
cached LogEntries are decoupled from the original buffers and it depends on 
stateMachineCache to determine when the original zero-copy is released. One 
clear problem is with TransactionContext created by 
startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created 
from the original LogEntries referring to the zero-copy buffers, then cached 
and used later, for example in applyTransaction. At the time it's used, the 
original buffer may have been released already.

 

 

 

  was:
stateMachineLogEntry and stateMachineContext are parsed/created from 
RaftClientRequest or LogEntryProto and attached to TransactionContext in the 
StateMachine.startTransaction methods.

There are 2 variants of StateMachine.startTransaction;

1. startTransaction(RaftClientRequest): This is called only on the leader side. 
The result of this method is not cached and is passed temporarily alongside 
RaftClientRequest for further processing, for example. 

2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both leader 
and follower side. The result of this call is cached on the node
 * On leader: this is called right before applyTransaction to produce a 
TransactionContext for StateMachine.applyTransaction.

 * On follower: this is called when the appendEntries request is received. The 
result is cached to be used by StateMachine.write and then 
StateMachine.applyTransaction.

The startTransaction methods are called with the RaftClientRequest or 
LogEntryProto parsed directly from the original zero-copy buffers. In turn, the 
stateMachineLogEntry and stateMachineContext (which is parsed/created from 
them) will have data reference directly to the original zero-copy buffer 
without explicit referent counter. The fact that TransactionContext is cached 
makes it worse.

 

 

 

 


> TransactionContext's stateMachineLogEntry and stateMachineContext may cause 
> corruption
> --------------------------------------------------------------------------------------
>
>                 Key: RATIS-2094
>                 URL: https://issues.apache.org/jira/browse/RATIS-2094
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Duong
>            Assignee: Duong
>            Priority: Major
>
> stateMachineLogEntry and stateMachineContext are parsed/created from 
> RaftClientRequest or LogEntryProto and attached to TransactionContext in the 
> StateMachine.startTransaction methods.
> There are 2 variants of StateMachine.startTransaction;
> 1. startTransaction(RaftClientRequest): This is called only on the leader 
> side. The result of this method is not cached and is passed temporarily 
> alongside RaftClientRequest for further processing, for example used by 
> StateMachine.write. 
> 2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both 
> leader and follower side. The result of this call is cached on the node
>  * On leader: this is called right before applyTransaction to produce a 
> TransactionContext for StateMachine.applyTransaction.
>  * On follower: this is called when the appendEntries request is received. 
> The resulting TransactionContext is cached to be used by StateMachine.write 
> and then StateMachine.applyTransaction.
> The startTransaction methods are called with the RaftClientRequest or 
> LogEntryProto parsed directly from the original zero-copy buffers. In turn, 
> the stateMachineLogEntry and stateMachineContext (which is parsed/created 
> from them) can contain data reference directly to the original zero-copy 
> buffer without an explicit referent counter.
>  
> For the use-case of stateMachineCache=false, this fortunately, doesn't cause 
> corruption because the LogEntries liked with the original buffers are cached 
> in LogCache, and the cached LogEntries (always) outlive the cached 
> TransactionContexts (?).
>  
> For the use-case of stateMachine=true, this may cause corruption, because the 
> cached LogEntries are decoupled from the original buffers and it depends on 
> stateMachineCache to determine when the original zero-copy is released. One 
> clear problem is with TransactionContext created by 
> startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created 
> from the original LogEntries referring to the zero-copy buffers, then cached 
> and used later, for example in applyTransaction. At the time it's used, the 
> original buffer may have been released already.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RATIS-2094) TransactionContext's stateMachineLogEntry and stateMachineContext may cause corruption

Reply via email to