[ 
https://issues.apache.org/jira/browse/HDDS-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar Asawa resolved HDDS-7909.
---------------------------------------
    Resolution: Resolved

Resolving as sub tasks are done

> When DN is offline Read of EC data is failing [Failed to execute command 
> GetBlock on the Pipeline]
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-7909
>                 URL: https://issues.apache.org/jira/browse/HDDS-7909
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Arun Sarin
>            Priority: Major
>
> When DN is offline Read of EC data is failing
> Getting the below error message:
> {code:java}
> GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to 
> read the EC block {code}
> Stack Trace:
> {code:java}
> 2023-02-03 14:05:31,610|INFO|MainThread|machine.py:188 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|RUNNING: 
> /opt/cloudera/parcels/CDH/bin/ozone sh key get 
> o3://ozone1/vol-x20w7/enc-buck-3yp31/decom_1675432802 /tmp/Get_file1675433131 
> 2023-02-03 14:05:35,968|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:35 WARN 
> impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties 
> 2023-02-03 14:05:36,040|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO 
> impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 
> 2023-02-03 14:05:36,041|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO 
> impl.MetricsSystemImpl: XceiverClientMetrics metrics system started 
> 2023-02-03 14:05:36,937|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: 4b386868-0719-4e2f-bd3b-bda45c921f97, Nodes: 
> 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:36.904Z[Etc/UTC]]. 2023-02-03 
> 14:05:36,938|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=4b386868-0719-4e2f-bd3b-bda45c921f97: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:36,980|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: 34a3c677-ed98-428f-a0d9-a19f73f93116, Nodes: 
> 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:36.970Z[Etc/UTC]]. 2023-02-03 
> 14:05:36,981|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=34a3c677-ed98-428f-a0d9-a19f73f93116: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,014|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: aee4853d-cc99-43c8-a682-2dc4ad322242, Nodes: 
> 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.003Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,016|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=aee4853d-cc99-43c8-a682-2dc4ad322242: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,039|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,034 
> [main] WARN io.ECBlockInputStreamProxy 
> (ECBlockInputStreamProxy.java:read(180)) - Failing over to reconstruction 
> read due to an error in ECBlockReader. Exception Class: 
> org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception 
> Message: java.io.IOException: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,040|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN 
> io.ECBlockInputStreamProxy: Failing over to reconstruction read due to an 
> error in ECBlockReader. Exception Class: 
> org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception 
> Message: java.io.IOException: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,057|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN 
> erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load 
> libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or 
> directory) 2023-02-03 14:05:37,058|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN 
> erasurecode.ErasureCodeNative: ISA-L support is not available in your 
> platform... using builtin-java codec where applicable 2023-02-03 
> 14:05:37,185|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: b3a482a5-33c9-40dd-8614-bfc136ec4479, Nodes: 
> 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.163Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,187|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=b3a482a5-33c9-40dd-8614-bfc136ec4479: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,229|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: d18620b2-70cb-4f07-95b7-45d69980f100, Nodes: 
> 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.220Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,230|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=d18620b2-70cb-4f07-95b7-45d69980f100: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,260|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: 08dd8828-a81e-44ac-8757-aa1b66df2c72, Nodes: 
> 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.250Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,261|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=08dd8828-a81e-44ac-8757-aa1b66df2c72: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,282|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,279 
> [main] WARN io.ECBlockReconstructedStripeInputStream 
> (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - 
> Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC 
> index 5. Excluding the block Exception: java.io.IOException Exception 
> Message: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,284|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN 
> io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 
> 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block 
> Exception: java.io.IOException Exception Message: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,331|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: 59920096-eac8-40bd-86c6-4a2fb44edfc7, Nodes: 
> 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.290Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,333|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=59920096-eac8-40bd-86c6-4a2fb44edfc7: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,362|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: d8b18f5b-1fbe-4493-b370-08e22eb0e64d, Nodes: 
> 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.351Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,364|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=d8b18f5b-1fbe-4493-b370-08e22eb0e64d: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,390|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR 
> scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline 
> Pipeline[ Id: 78e3a9ff-df9d-4cbf-a584-b73254e06ce8, Nodes: 
> 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2023-02-03T14:05:37.380Z[Etc/UTC]]. 2023-02-03 
> 14:05:37,392|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO 
> storage.BlockInputStream: Unable to read information for block conID: 5007 
> locID: 111677748019205007 bcsId: 0 from pipeline 
> PipelineID=78e3a9ff-df9d-4cbf-a584-b73254e06ce8: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,411|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,409 
> [main] WARN io.ECBlockReconstructedStripeInputStream 
> (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - 
> Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC 
> index 4. Excluding the block Exception: java.io.IOException Exception 
> Message: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,413|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN 
> io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 
> 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block 
> Exception: java.io.IOException Exception Message: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2023-02-03 14:05:37,442|INFO|MainThread|machine.py:203 - 
> run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient 
> datanodes to read the EC block  {code}
> Additional Debugging RCA was done and found out that there were sufficient 
> number of DN's available at the time of key get operations. Below are the 
> details :
>  
> EC Dn's are supposed to be 7 and are 7 in numbers
> RATIS has to be 3 and those are 3 
> EC Data node -
> Datanodes':[u'hostname-1.hostname.root.hwx.site', 
> u'hostname-7.hostname.root.hwx.site', u'hostname-2.hostname.root.hwx.site', 
> u'hostname-6.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', 
> u'hostname-5.hostname.root.hwx.site', u'hostname-8.hostname.root.hwx.site'],
> Ratis DN available at this point 5
> [u'hostname-2.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', 
> u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', 
> u'hostname-6.hostname.root.hwx.site']
> Adding the log files 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to