sumitagrawl commented on code in PR #7249:
URL: https://github.com/apache/ozone/pull/7249#discussion_r1798831831


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java:
##########
@@ -369,12 +372,13 @@ public DatanodeDeletedBlockTransactions getTransactions(
               if (checkInadequateReplica(replicas, txn)) {
                 continue;
               }
+              metrics.setNumBlockDeletionTransactionDataNodes(dnList.size());
+              metrics.incrProcessedTransaction();

Review Comment:
   getTransaction() can have skip of the transaction for metrics, so this 
increment needs to be done in getTransaction() only if not skipped.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java:
##########
@@ -369,12 +372,13 @@ public DatanodeDeletedBlockTransactions getTransactions(
               if (checkInadequateReplica(replicas, txn)) {
                 continue;
               }
+              metrics.setNumBlockDeletionTransactionDataNodes(dnList.size());

Review Comment:
   DnList **do not** have DNs which are unhealthy Or overloaded causing pending 
request as present. Normally these are not case.
   `NumBlockDeletionTransactionDataNodes` metrics may not be useful at this 
point. This either can be added after DN filter is done at 
SCMBlockDeletingService.call() Or may not be required, since here, its called 
with same value in loop.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java:
##########
@@ -300,11 +285,29 @@ private void getTransaction(DeletedBlocksTransaction tx,
             .setCount(transactionStatusManager.getOrDefaultRetryCount(
               tx.getTxID(), 0))
             .build();
+
+    // We have made an improvement here, and we expect that all replicas
+    // of the Container being sent will be included in the dnList.
+    // This change benefits ACK confirmation and improves deletion speed.
+    // The principle behind it is that
+    // DN can receive the command to delete a certain Container at the same 
time and provide
+    // feedback to SCM at roughly the same time.
+    // This avoids the issue of deletion blocking,
+    // where some replicas of a Container are deleted while others do not 
receive the delete command.
+    long containerId = tx.getContainerID();
     for (ContainerReplica replica : replicas) {
-      DatanodeDetails details = replica.getDatanodeDetails();
-      if (!dnList.contains(details)) {
-        continue;
+      DatanodeDetails datanodeDetails = replica.getDatanodeDetails();
+      if (!dnList.contains(datanodeDetails)) {

Review Comment:
   This logic can be moved inside checkInadequateReplica() with renaming method 
checkInadequateReplicaAndDnList() as the purpose of introducing this method is 
to avoid partial deletion / consideration of the transaction in that loop.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java:
##########
@@ -300,11 +285,29 @@ private void getTransaction(DeletedBlocksTransaction tx,
             .setCount(transactionStatusManager.getOrDefaultRetryCount(
               tx.getTxID(), 0))
             .build();
+
+    // We have made an improvement here, and we expect that all replicas
+    // of the Container being sent will be included in the dnList.
+    // This change benefits ACK confirmation and improves deletion speed.
+    // The principle behind it is that
+    // DN can receive the command to delete a certain Container at the same 
time and provide
+    // feedback to SCM at roughly the same time.
+    // This avoids the issue of deletion blocking,
+    // where some replicas of a Container are deleted while others do not 
receive the delete command.
+    long containerId = tx.getContainerID();
     for (ContainerReplica replica : replicas) {
-      DatanodeDetails details = replica.getDatanodeDetails();
-      if (!dnList.contains(details)) {
-        continue;
+      DatanodeDetails datanodeDetails = replica.getDatanodeDetails();
+      if (!dnList.contains(datanodeDetails)) {
+        DatanodeDetails dnDetail = replica.getDatanodeDetails();
+        LOG.debug("Skip Container = {}, because DN = {} is not in dnList.",
+            containerId, dnDetail.getUuid());
+        metrics.incrSkippedTransaction();

Review Comment:
   SkippedTransaction metrics need to be updated for checkInadequateReplica() 
also as check done at caller.
   We may need this also for other cases where transaction is not considered, 
like open container, as check at caller.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to