errose28 commented on code in PR #6967:
URL: https://github.com/apache/ozone/pull/6967#discussion_r1684881500
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java:
##########
@@ -349,8 +349,13 @@ private boolean updateContainerState(final DatanodeDetails
datanode,
break;
case DELETING:
/*
- * The container is under deleting. do nothing.
+ HDDS-11136: If a DELETING container has a non-empty CLOSED replica, the
container should be moved back to CLOSED
+ state.
*/
+ if (replica.getState() == State.CLOSED && replica.hasIsEmpty() &&
!replica.getIsEmpty()) {
+ containerManager.transitionDeletingToClosedState(containerId);
Review Comment:
Let's put an INFO log here, similar to [this
one](https://github.com/apache/ozone/blob/930d3f0552b86d6605625cabe5e8961652d14fa2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java#L337).
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManagerImpl.java:
##########
@@ -367,6 +368,28 @@ public void updateContainerState(final
HddsProtos.ContainerID containerID,
}
}
+ @Override
+ public void transitionDeletingToClosedState(HddsProtos.ContainerID
containerID) throws IOException {
+ final ContainerID id = ContainerID.getFromProtobuf(containerID);
+
+ try (AutoCloseableLock ignored = writeLock(id)) {
+ if (containers.contains(id)) {
+ final ContainerInfo oldInfo = containers.getContainerInfo(id);
+ final LifeCycleState oldState = oldInfo.getState();
+ if (oldState != DELETING) {
+ throw new InvalidContainerStateException("Container " + oldInfo +
"must be in DELETING state to " +
+ "transition to CLOSED.");
Review Comment:
I think the `ContainerInfo` object is a bit more
[verbose](https://github.com/apache/ozone/blob/1322add415ca9093886a28a86e75844feb0be1ed/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerInfo.java#L285)
than we need here. Also I think we can touch up the message to indicate we are
moving it "back" to closed. Otherwise it looks like a container isn't supposed
to move to closed from any state other than deleting.
```suggestion
throw new InvalidContainerStateException("Cannot transition
container " + id + " from " + oldState + " back to CLOSED. The container must
be in the DELETING state.");
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManager.java:
##########
@@ -131,6 +131,16 @@ void updateContainerState(ContainerID containerID,
LifeCycleEvent event)
throws IOException, InvalidStateTransitionException;
+ /**
+ * Bypasses the container state machine to change a container's state from
DELETING to CLOSED. This API was
+ * specially introduced to fix a bug (HDDS-11136), and should NOT be used in
any other context.
Review Comment:
Maybe we can make this description a little more general? We can reference
the initial bug, but it's also a decent defensive mechanism against accidental
deletion if we end up in this situation another way.
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerManagerImpl.java:
##########
@@ -132,6 +134,62 @@ void testUpdateContainerState() throws Exception {
assertEquals(LifeCycleState.CLOSED,
containerManager.getContainer(cid).getState());
}
+ @Test
+ void testTransitionDeletingToClosedState() throws IOException,
InvalidStateTransitionException {
Review Comment:
nit. Can we parameterize this by container type instead of duplicating each
line for each container?
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java:
##########
@@ -100,7 +101,6 @@ void setup() throws IOException,
InvalidStateTransitionException {
dbStore = DBStoreBuilder.createDBStore(
conf, new SCMDBDefinition());
scmhaManager = SCMHAManagerStub.getInstance(true);
- nodeManager = new MockNodeManager(true, 10);
Review Comment:
Why was this line removed?
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerStateManager.java:
##########
@@ -148,6 +149,42 @@ public void checkReplicationStateMissingReplica()
assertEquals(3, c1.getReplicationConfig().getRequiredNodes());
}
+ @Test
+ public void testTransitionDeletingToClosedState() throws IOException {
+ HddsProtos.ContainerInfoProto.Builder builder =
HddsProtos.ContainerInfoProto.newBuilder();
+ builder.setContainerID(1)
+ .setState(HddsProtos.LifeCycleState.DELETING)
+ .setUsedBytes(0)
+ .setNumberOfKeys(0)
+ .setOwner("root")
+ .setReplicationType(HddsProtos.ReplicationType.RATIS)
+ .setReplicationFactor(ReplicationFactor.THREE);
+
+ HddsProtos.ContainerInfoProto container = builder.build();
+ HddsProtos.ContainerID cid =
HddsProtos.ContainerID.newBuilder().setId(container.getContainerID()).build();
+ containerStateManager.addContainer(container);
+ containerStateManager.transitionDeletingToClosedState(cid);
+ assertEquals(HddsProtos.LifeCycleState.CLOSED,
containerStateManager.getContainer(ContainerID.getFromProtobuf(cid))
+ .getState());
+ }
+
+ @Test
+ public void testTransitionDeletingToClosedStateAllowsOnlyDeletingContainer()
throws IOException {
+ HddsProtos.ContainerInfoProto.Builder builder =
HddsProtos.ContainerInfoProto.newBuilder();
+ builder.setContainerID(1)
+ .setState(HddsProtos.LifeCycleState.QUASI_CLOSED)
+ .setUsedBytes(0)
+ .setNumberOfKeys(0)
+ .setOwner("root")
+ .setReplicationType(HddsProtos.ReplicationType.RATIS)
+ .setReplicationFactor(ReplicationFactor.THREE);
+
+ HddsProtos.ContainerInfoProto container = builder.build();
+ HddsProtos.ContainerID cid =
HddsProtos.ContainerID.newBuilder().setId(container.getContainerID()).build();
+ containerStateManager.addContainer(container);
+ assertThrows(IOException.class, () ->
containerStateManager.transitionDeletingToClosedState(cid));
Review Comment:
We can test for the more specific `InvalidContainerStateException` here.
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java:
##########
@@ -442,6 +446,85 @@ public void testClosingToClosedForECContainer()
assertEquals(LifeCycleState.CLOSED,
containerManager.getContainer(container2.containerID()).getState());
}
+ /**
+ * Tests that a DELETING container transitions to CLOSED if a non-empty
CLOSED replica is reported. It does not
+ * transition if a non-empty CLOSING (or any other state) replica is
reported.
+ */
+ @Test
+ public void shouldTransitionFromDeletingToClosedWhenNonEmptyClosedReplica()
throws IOException {
+ ContainerInfo container = getContainer(LifeCycleState.DELETING);
+ containerStateManager.addContainer(container.getProtobuf());
+
+ // set up a non-empty CLOSED replica
+ DatanodeDetails dnWithClosedReplica =
nodeManager.getNodes(NodeStatus.inServiceHealthy()).get(0);
+ ContainerReplicaProto.Builder builder = ContainerReplicaProto.newBuilder();
+ ContainerReplicaProto closedReplica =
builder.setContainerID(container.getContainerID())
+ .setIsEmpty(false)
+ .setState(ContainerReplicaProto.State.CLOSED)
+ .setKeyCount(0)
+ .setBlockCommitSequenceId(123)
+ .setOriginNodeId(dnWithClosedReplica.getUuidString()).build();
+
+ // set up a non-empty CLOSING replica
+ DatanodeDetails dnWithClosingReplica =
nodeManager.getNodes(NodeStatus.inServiceHealthy()).get(1);
+ ContainerReplicaProto closingReplica =
builder.setState(ContainerReplicaProto.State.CLOSING)
+ .setOriginNodeId(dnWithClosingReplica.getUuidString()).build();
+
+ // should not transition on processing the CLOSING replica's report
+ ContainerReportHandler containerReportHandler = new
ContainerReportHandler(nodeManager, containerManager);
+ ContainerReportsProto closingContainerReport =
getContainerReports(closingReplica);
+ containerReportHandler
+ .onMessage(new ContainerReportFromDatanode(dnWithClosingReplica,
closingContainerReport), publisher);
+
+ assertEquals(LifeCycleState.DELETING,
containerStateManager.getContainer(container.containerID()).getState());
+
+ // should transition on processing the CLOSED replica's report
+ ContainerReportsProto closedContainerReport =
getContainerReports(closedReplica);
+ containerReportHandler
+ .onMessage(new ContainerReportFromDatanode(dnWithClosedReplica,
closedContainerReport), publisher);
+ assertEquals(LifeCycleState.CLOSED,
containerStateManager.getContainer(container.containerID()).getState());
+ }
+
+ /**
+ * Tests that a DELETING EC container transitions to CLOSED if a non-empty
CLOSED replica is reported. It does not
+ * transition if a non-empty CLOSING (or any other state) replica is
reported.
+ */
+ @Test
+ public void
ecContainerShouldTransitionFromDeletingToClosedWhenNonEmptyClosedReplica()
throws IOException {
Review Comment:
Can this also be covered with a parameterized version of the previous test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]