Pratyush Bhatt created HDDS-10652:
-------------------------------------

             Summary: [Upgrade][EC] Reconstruction failing with 
"java.io.IOException: None of the block data have checksum"
                 Key: HDDS-10652
                 URL: https://issues.apache.org/jira/browse/HDDS-10652
             Project: Apache Ozone
          Issue Type: Bug
          Components: EC, ECOfflineRecovery
            Reporter: Pratyush Bhatt


{color:#172b4d}*Upgrade versions:*
Pre upgrade hash: 
[https://github.com/apache/ozone/commit/6ee6c357678676661ebb3181a56622c79b487bc1]

Post upgrade Hash:
[https://github.com/apache/ozone/commit/46b6f3def1d84ca769affb4d3f0d84dece6e8567]

{color}{color:#172b4d}*Scenario:*
Write a EC file(5GB) RS-3-2-1024K policy(in this case) before upgrade, after 
upgrade, shut down either 2 Parity nodes(this case) or 2 Data nodes, as the 
policy supports tolerating 2 DN failure. Check if reconstruction happens after 
sometime.

*Observed Behavior:*
1. Data was successfully written pre-upgrade using Freon. 
File name: 
_o3://ozone1711558189/ec-construct-vol/ec-construct-buck/ec-construction/0_
2. Post upgrade Stop two of the DNs, in this case the Parity nodes that we 
obtained from one of the containers that was storing the above file's 
data.{color}
{code:java}
ozone admin container info 1004 --json
2024-03-27 21:35:15,065|INFO|MainThread|machine.py:232 - 
run()||GUID=183f2d10-e3a7-407f-adb5-b87f3e3af53b|Exit Code: 0
2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:723 - 
find_ec_data_parity_hosts()|parity hosts: ['ccycloud-4.quasar-ftvxjz.xyz', 
'ccycloud-3.quasar-ftvxjz.xyz']
2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:724 - 
find_ec_data_parity_hosts()|data hosts: ['ccycloud-8.quasar-ftvxjz.xyz', 
'ccycloud-5.quasar-ftvxjz.xyz', 'ccycloud-1.quasar-ftvxjz.xyz'] {code}
{code:java}
2024-03-27 21:35:15,311|INFO|MainThread|cm_apilib.py:1214 - 
stopComponent()|Initiating stop of OZONE_DATANODE at host 
ccycloud-4.quasar-ftvxjz.xyz
2024-03-27 21:35:15,349|INFO|MainThread|cm_apilib.py:1218 - 
stopComponent()|Command name = Stop , ID = 2860  
2024-03-27 21:35:15,580|INFO|MainThread|cm_apilib.py:1214 - 
stopComponent()|Initiating stop of OZONE_DATANODE at host 
ccycloud-3.quasar-ftvxjz.xyz
2024-03-27 21:35:15,609|INFO|MainThread|cm_apilib.py:1218 - 
stopComponent()|Command name = Stop , ID = 2862  {code}
{color:#172b4d}Node ccycloud-3.quasar-ftvxjz.xyz and 
cycloud-4.quasar-ftvxjz.xyz are stopped.

3. Read file's data(Online Reconstruction) and compute checksum, -> That 
matched.
4. Wait for Reconstruction to happen, test waited for 20 Minutes, but Still 
only 3 DNs were present even after 20 minutes:{color}
{code:java}
['ccycloud-5.quasar-ftvxjz.xyz', 'ccycloud-1.quasar-ftvxjz.xyz', 
'ccycloud-8.quasar-ftvxjz.xyz']{code}
Infact still after 10 hours(At the time of writing), there are still 3 DNs only:
{code:java}
date
Thu Mar 28 08:39:16 UTC 2024
ozone admin container info 1004 --json
{
  "containerInfo" : {
    "state" : "CLOSED",
    "stateEnterTime" : "2024-03-27T18:43:51.934Z",
    "replicationConfig" : {
      "data" : 3,
      "parity" : 2,
      "ecChunkSize" : 1048576,
      "codec" : "RS",
      "requiredNodes" : 5,
      "replicationType" : "EC"
    },
    "usedBytes" : 1342177280,
    "numberOfKeys" : 5,
    "lastUsed" : "2024-03-28T08:39:24.535189Z",
    "owner" : "om1",
    "containerID" : 1004,
    "deleteTransactionId" : 0,
    "sequenceId" : 0,
    "deleted" : false,
    "open" : false
  },
  "pipeline" : {
    "id" : {
      "id" : "73532c14-40ac-4924-9353-2f18ab0d63f2"
    },
    "replicationConfig" : {
      "data" : 3,
      "parity" : 2,
      "ecChunkSize" : 1048576,
      "codec" : "RS",
      "requiredNodes" : 5,
      "replicationType" : "EC"
    },
    "nodesInOrder" : [ {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    }, {
      "level" : 0,
      "cost" : 0,
      "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "ipAddress" : "10.140.40.9",
      "hostName" : "ccycloud-1.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -1387859873,
      "networkLocation" : "/default",
      "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "numOfLeaves" : 1
    }, {
      "level" : 0,
      "cost" : 0,
      "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "ipAddress" : "10.140.137.128",
      "hostName" : "ccycloud-8.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : 1098159392,
      "networkLocation" : "/default",
      "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "numOfLeaves" : 1
    } ],
    "creationTimestamp" : "2024-03-28T08:39:24.480Z",
    "stateEnterTime" : "2024-03-28T08:39:24.545517Z",
    "leaderNode" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    },
    "firstNode" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    },
    "closestNode" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    },
    "allocationTimeout" : false,
    "healthy" : true,
    "pipelineState" : "ALLOCATED",
    "nodes" : [ {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.root.comops.site",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    }, {
      "level" : 0,
      "cost" : 0,
      "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "ipAddress" : "10.140.40.9",
      "hostName" : "ccycloud-1.quasar-ftvxjz.root.comops.site",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -1387859873,
      "networkLocation" : "/default",
      "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "numOfLeaves" : 1
    }, {
      "level" : 0,
      "cost" : 0,
      "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "ipAddress" : "10.140.137.128",
      "hostName" : "ccycloud-8.quasar-ftvxjz.root.comops.site",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : 1098159392,
      "networkLocation" : "/default",
      "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "numOfLeaves" : 1
    } ],
    "empty" : false,
    "type" : "EC"
  },
  "replicas" : [ {
    "containerID" : 1004,
    "state" : "CLOSED",
    "datanodeDetails" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "ipAddress" : "10.140.37.12",
      "hostName" : "ccycloud-5.quasar-ftvxjz.xyz",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -662262523,
      "networkLocation" : "/default",
      "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
      "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
      "numOfLeaves" : 1
    },
    "placeOfBirth" : "6179347f-5824-41d4-b722-f1dbc5f14880",
    "sequenceId" : 0,
    "keyCount" : 5,
    "bytesUsed" : 1342177280,
    "replicaIndex" : 2
  }, {
    "containerID" : 1004,
    "state" : "CLOSED",
    "datanodeDetails" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "ipAddress" : "10.140.40.9",
      "hostName" : "ccycloud-1.quasar-ftvxjz.root.comops.site",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : -1387859873,
      "networkLocation" : "/default",
      "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
      "numOfLeaves" : 1
    },
    "placeOfBirth" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
    "sequenceId" : 0,
    "keyCount" : 5,
    "bytesUsed" : 1342177280,
    "replicaIndex" : 3
  }, {
    "containerID" : 1004,
    "state" : "CLOSED",
    "datanodeDetails" : {
      "level" : 0,
      "cost" : 0,
      "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "ipAddress" : "10.140.137.128",
      "hostName" : "ccycloud-8.quasar-ftvxjz.root.comops.site",
      "ports" : [ {
        "name" : "HTTPS",
        "value" : 9883
      }, {
        "name" : "CLIENT_RPC",
        "value" : 9864
      }, {
        "name" : "REPLICATION",
        "value" : 9886
      }, {
        "name" : "RATIS",
        "value" : 9858
      }, {
        "name" : "RATIS_ADMIN",
        "value" : 9857
      }, {
        "name" : "RATIS_SERVER",
        "value" : 9856
      }, {
        "name" : "STANDALONE",
        "value" : 9859
      } ],
      "setupTime" : 0,
      "persistedOpState" : "IN_SERVICE",
      "persistedOpStateExpiryEpochSec" : 0,
      "initialVersion" : 0,
      "currentVersion" : 1,
      "decommissioned" : false,
      "maintenance" : false,
      "signature" : 1098159392,
      "networkLocation" : "/default",
      "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
      "numOfLeaves" : 1
    },
    "placeOfBirth" : "711656cf-a99e-4b2c-8c35-f015ee94889c",
    "sequenceId" : 0,
    "keyCount" : 5,
    "bytesUsed" : 1342177280,
    "replicaIndex" : 1
  } ]
} {code}
Checked the SCM Logs, it is still sending reconstructECContainersCommand, 
{code:java}
2024-03-28 08:36:56,748 INFO [Under Replicated 
Processor]-org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: 
Sending command [reconstructECContainersCommand: containerID: 1004, 
replicationConfig: EC{rs-3-2-1024k}, sources: 
[ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(ccycloud-8.quasar-ftvxjz.root.comops.site/10.140.137.128)
 replicaIndex: 1, 
6179347f-5824-41d4-b722-f1dbc5f14880(ccycloud-5.quasar-ftvxjz.root.comops.site/10.140.37.12)
 replicaIndex: 2, 
d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(ccycloud-1.quasar-ftvxjz.root.comops.site/10.140.40.9)
 replicaIndex: 3], targets: 
[572ed33d-a834-4d80-be35-7b1b19c8bd74(ccycloud-7.quasar-ftvxjz.root.comops.site/10.140.234.130),
 
711656cf-a99e-4b2c-8c35-f015ee94889c(ccycloud-2.quasar-ftvxjz.root.comops.site/10.140.45.129)],
 missingIndexes: [4, 5]] for container ContainerInfo{id=#1004, state=CLOSED, 
stateEnterTime=2024-03-27T18:43:51.934Z, 
pipelineID=PipelineID=53f5587f-9e6c-465d-a0cb-b82d10c227d3, owner=om1} to 
572ed33d-a834-4d80-be35-7b1b19c8bd74(ccycloud-7.quasar-ftvxjz.root.comops.site/10.140.234.130)
 with datanode deadline 1711615886747 and scm deadline 1711615916747 {code}
Checked one of the Target DN ccycloud-7.quasar-ftvxjz.root.comops.site, its 
throwing below warnings.
{code:java}
2024-03-28 08:37:14,982 WARN 
[ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
 FAILED reconstructECContainersCommand: containerID=1004, 
replication=rs-3-2-1024k, missingIndexes=[4, 5], 
sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(ccycloud-8.quasar-ftvxjz.xyz/10.140.137.128),
 
2=6179347f-5824-41d4-b722-f1dbc5f14880(ccycloud-5.quasar-ftvxjz.xyz/10.140.37.12),
 
3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(ccycloud-1.quasar-ftvxjz.xyz/10.140.40.9)},
 
targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(ccycloud-7.quasar-ftvxjz.xyz/10.140.234.130),
 
5=711656cf-a99e-4b2c-8c35-f015ee94889c(ccycloud-2.quasar-ftvxjz.xyz/10.140.45.129)}
 after 10639 ms
java.io.IOException: None of the block data have checksum which means 
2(parity)+1 blocks are not present
        at 
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:156)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:325)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:171)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
        at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2024-03-28 08:37:14,982 WARN 
[ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
 Failed FAILED reconstructECContainersCommand: containerID=1004, 
replication=rs-3-2-1024k, missingIndexes=[4, 5], 
sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(ccycloud-8.quasar-ftvxjz.xyz/10.140.137.128),
 
2=6179347f-5824-41d4-b722-f1dbc5f14880(ccycloud-5.quasar-ftvxjz.xyz/10.140.37.12),
 
3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(ccycloud-1.quasar-ftvxjz.xyz/10.140.40.9)},
 
targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(ccycloud-7.quasar-ftvxjz.xyz/10.140.234.130),
 
5=711656cf-a99e-4b2c-8c35-f015ee94889c(ccycloud-2.quasar-ftvxjz.xyz/10.140.45.129)}
 {code}
*Expected Behavior:* Reconstruction should have happened 

Note: This is fairly reproducible everytime.

cc: [~siddhant] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to