[ https://issues.apache.org/jira/browse/HDDS-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Kumar Asawa resolved HDDS-7909. --------------------------------------- Resolution: Resolved Resolving as sub tasks are done > When DN is offline Read of EC data is failing [Failed to execute command > GetBlock on the Pipeline] > -------------------------------------------------------------------------------------------------- > > Key: HDDS-7909 > URL: https://issues.apache.org/jira/browse/HDDS-7909 > Project: Apache Ozone > Issue Type: Bug > Components: SCM > Reporter: Arun Sarin > Priority: Major > > When DN is offline Read of EC data is failing > Getting the below error message: > {code:java} > GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to > read the EC block {code} > Stack Trace: > {code:java} > 2023-02-03 14:05:31,610|INFO|MainThread|machine.py:188 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|RUNNING: > /opt/cloudera/parcels/CDH/bin/ozone sh key get > o3://ozone1/vol-x20w7/enc-buck-3yp31/decom_1675432802 /tmp/Get_file1675433131 > 2023-02-03 14:05:35,968|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:35 WARN > impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties > 2023-02-03 14:05:36,040|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO > impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). > 2023-02-03 14:05:36,041|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO > impl.MetricsSystemImpl: XceiverClientMetrics metrics system started > 2023-02-03 14:05:36,937|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: 4b386868-0719-4e2f-bd3b-bda45c921f97, Nodes: > 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:36.904Z[Etc/UTC]]. 2023-02-03 > 14:05:36,938|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=4b386868-0719-4e2f-bd3b-bda45c921f97: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:36,980|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: 34a3c677-ed98-428f-a0d9-a19f73f93116, Nodes: > 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:36.970Z[Etc/UTC]]. 2023-02-03 > 14:05:36,981|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=34a3c677-ed98-428f-a0d9-a19f73f93116: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,014|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: aee4853d-cc99-43c8-a682-2dc4ad322242, Nodes: > 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.003Z[Etc/UTC]]. 2023-02-03 > 14:05:37,016|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=aee4853d-cc99-43c8-a682-2dc4ad322242: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,039|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,034 > [main] WARN io.ECBlockInputStreamProxy > (ECBlockInputStreamProxy.java:read(180)) - Failing over to reconstruction > read due to an error in ECBlockReader. Exception Class: > org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception > Message: java.io.IOException: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,040|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN > io.ECBlockInputStreamProxy: Failing over to reconstruction read due to an > error in ECBlockReader. Exception Class: > org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception > Message: java.io.IOException: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,057|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN > erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load > libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or > directory) 2023-02-03 14:05:37,058|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN > erasurecode.ErasureCodeNative: ISA-L support is not available in your > platform... using builtin-java codec where applicable 2023-02-03 > 14:05:37,185|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: b3a482a5-33c9-40dd-8614-bfc136ec4479, Nodes: > 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.163Z[Etc/UTC]]. 2023-02-03 > 14:05:37,187|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=b3a482a5-33c9-40dd-8614-bfc136ec4479: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,229|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: d18620b2-70cb-4f07-95b7-45d69980f100, Nodes: > 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.220Z[Etc/UTC]]. 2023-02-03 > 14:05:37,230|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=d18620b2-70cb-4f07-95b7-45d69980f100: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,260|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: 08dd8828-a81e-44ac-8757-aa1b66df2c72, Nodes: > 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.250Z[Etc/UTC]]. 2023-02-03 > 14:05:37,261|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=08dd8828-a81e-44ac-8757-aa1b66df2c72: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,282|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,279 > [main] WARN io.ECBlockReconstructedStripeInputStream > (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - > Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC > index 5. Excluding the block Exception: java.io.IOException Exception > Message: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,284|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN > io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: > 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block > Exception: java.io.IOException Exception Message: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,331|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: 59920096-eac8-40bd-86c6-4a2fb44edfc7, Nodes: > 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.290Z[Etc/UTC]]. 2023-02-03 > 14:05:37,333|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=59920096-eac8-40bd-86c6-4a2fb44edfc7: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,362|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: d8b18f5b-1fbe-4493-b370-08e22eb0e64d, Nodes: > 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.351Z[Etc/UTC]]. 2023-02-03 > 14:05:37,364|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=d8b18f5b-1fbe-4493-b370-08e22eb0e64d: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,390|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR > scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline > Pipeline[ Id: 78e3a9ff-df9d-4cbf-a584-b73254e06ce8, Nodes: > 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), > ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, > CreationTimestamp2023-02-03T14:05:37.380Z[Etc/UTC]]. 2023-02-03 > 14:05:37,392|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO > storage.BlockInputStream: Unable to read information for block conID: 5007 > locID: 111677748019205007 bcsId: 0 from pipeline > PipelineID=78e3a9ff-df9d-4cbf-a584-b73254e06ce8: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,411|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,409 > [main] WARN io.ECBlockReconstructedStripeInputStream > (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - > Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC > index 4. Excluding the block Exception: java.io.IOException Exception > Message: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,413|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN > io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: > 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block > Exception: java.io.IOException Exception Message: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception 2023-02-03 14:05:37,442|INFO|MainThread|machine.py:203 - > run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient > datanodes to read the EC block {code} > Additional Debugging RCA was done and found out that there were sufficient > number of DN's available at the time of key get operations. Below are the > details : > > EC Dn's are supposed to be 7 and are 7 in numbers > RATIS has to be 3 and those are 3 > EC Data node - > Datanodes':[u'hostname-1.hostname.root.hwx.site', > u'hostname-7.hostname.root.hwx.site', u'hostname-2.hostname.root.hwx.site', > u'hostname-6.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', > u'hostname-5.hostname.root.hwx.site', u'hostname-8.hostname.root.hwx.site'], > Ratis DN available at this point 5 > [u'hostname-2.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', > u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', > u'hostname-6.hostname.root.hwx.site'] > Adding the log files -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org