[ 
https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3559:
--------------------------------
    Target Version/s: 0.6.0
              Labels: Triaged pull-request-available  (was: 
pull-request-available)

> Datanode doesn't handle java heap OutOfMemory exception 
> --------------------------------------------------------
>
>                 Key: HDDS-3559
>                 URL: https://issues.apache.org/jira/browse/HDDS-3559
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.5.0
>            Reporter: Li Cheng
>            Priority: Major
>              Labels: Triaged, pull-request-available
>
> 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.Endpoi
> ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 
> for past 0 seconds.
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: 
> Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy38.submitRequest(Unknown Source)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)
>  
> On a cluster, one datanode stops reporting to SCM while being kept unknown. 
> The datanode process is still working. Log shows Java heap OOM when it's 
> serializing protobuf for rpc message. However, datanode silently stops 
> reports to SCM and the process becomes stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to