[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers ?

We have to ensure Atomicity of ACID for state update: If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged. No partial change is allowed so that leader SCM can safely 
revert the state change for the whole raft groups.

Above analysis also applies to pipeline V2 and etc.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers ?

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
> follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and followers won't diverge, a.k.a. ensure the replicated 
> StateMachine for leader and followers ?
> We have to ensure Atomicity of ACID for state update: If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged. No partial change is allowed so that leader SCM can safely 
> revert the state change for the whole raft groups.
> Above analysis also applies to pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers ?

We have to ensure Atomicity of ACID for state update: If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged. No partial change is allowed so that leader SCM can safely 
revert the state change for the whole raft groups.

Above analysis also applies to pipeline V2 and other issues besides disk 
failure.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers ?

We have to ensure Atomicity of ACID for state update: If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged. No partial change is allowed so that leader SCM can safely 
revert the state change for the whole raft groups.

Above analysis also applies to pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
> follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and followers won't diverge, a.k.a. ensure the replicated 
> StateMachine for leader and followers ?
> We have to ensure Atomicity of ACID for state update: If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged. No partial change is allowed so that leader SCM can safely 
> revert the state change for the whole raft groups.
> Above analysis also applies to pipeline V2 and other issues besides disk 
> failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers ?

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
> follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and followers won't diverge, a.k.a. ensure the replicated 
> StateMachine for leader and followers ?
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies to pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
follower SCM fails due to db exception, what can we do to ensure that states of 
leader and followers won't diverge, a.k.a. ensure the replicated StateMachine 
for leader and followers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if leader SCM succeed the operation, meanwhile any 
> follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and followers won't diverge, a.k.a. ensure the replicated 
> StateMachine for leader and followers.
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies to pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies to pipeline V2 and etc.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
> Follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and follower won't diverge, a.k.a., ensure the replicated state 
> machine for leader and folowers.
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies to pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state changes if meet 
IOException for db operations. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.

  was:
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state change if meet 
IOException for db operation. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state changes if meet 
> IOException for db operations. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
> Follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and follower won't diverge, a.k.a., ensure the replicated state 
> machine for leader and folowers.
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies ot pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handle exceptions occurred in writing RocksDB for 
container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state change if meet 
IOException for db operation. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.

  was:
I have a concern about how to handling exceptions occurred in writing RocksDB 
for container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state change if meet 
IOException for db operation. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.


> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handle exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state change if meet 
> IOException for db operation. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
> Follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and follower won't diverge, a.k.a., ensure the replicated state 
> machine for leader and folowers.
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies ot pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: (was:  

Fix a bug in https://issues.apache.org/jira/browse/HDDS-3895 

In ContainerStateManagerV2, both disk state (column families in RocksDB) and 
memory state (container maps in memory) are protected by raft, and should keep 
their consistency upon each modification.)

> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Description: 
I have a concern about how to handling exceptions occurred in writing RocksDB 
for container V2, such as allocateContainer, deleteContainer and 
updateContainerState.

For non-HA case, allocateContainer reverts the memory state change if meet 
IOException for db operation. deleteContainer and updateContainerState just 
throw out the IOException and leave the memory state in an inconsistency state.

After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
Follower SCM fails due to db exception, what can we do to ensure that states of 
leader and follower won't diverge, a.k.a., ensure the replicated state machine 
for leader and folowers.

We have to ensure Atomicity of ACID for state update. If any exception 
occurred, SCM (no matter leader or follower) should throw exception and keep 
states unchanged, so that leader SCM can safely revert the state change for the 
whole raft groups.

Above analysis also applies ot pipeline V2 and etc.

> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
> I have a concern about how to handling exceptions occurred in writing RocksDB 
> for container V2, such as allocateContainer, deleteContainer and 
> updateContainerState.
> For non-HA case, allocateContainer reverts the memory state change if meet 
> IOException for db operation. deleteContainer and updateContainerState just 
> throw out the IOException and leave the memory state in an inconsistency 
> state.
> After we enable SCM-HA, if Leader SCM succeed the operation, meanwhile any 
> Follower SCM fails due to db exception, what can we do to ensure that states 
> of leader and follower won't diverge, a.k.a., ensure the replicated state 
> machine for leader and folowers.
> We have to ensure Atomicity of ACID for state update. If any exception 
> occurred, SCM (no matter leader or follower) should throw exception and keep 
> states unchanged, so that leader SCM can safely revert the state change for 
> the whole raft groups.
> Above analysis also applies ot pipeline V2 and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4136) Design for Error/Exception handling in state update for container/pipeline V2

2020-08-24 Thread Glen Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4136:

Summary: Design for Error/Exception handling in state update for 
container/pipeline V2  (was: Design for Error/Exception handling in state 
updates for container/pipeline V2)

> Design for Error/Exception handling in state update for container/pipeline V2
> -
>
> Key: HDDS-4136
> URL: https://issues.apache.org/jira/browse/HDDS-4136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>
>  
> Fix a bug in https://issues.apache.org/jira/browse/HDDS-3895 
> In ContainerStateManagerV2, both disk state (column families in RocksDB) and 
> memory state (container maps in memory) are protected by raft, and should 
> keep their consistency upon each modification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org