[ https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arpit Agarwal updated HDDS-3559: -------------------------------- Target Version/s: 0.6.0 Labels: Triaged pull-request-available (was: pull-request-available) > Datanode doesn't handle java heap OutOfMemory exception > -------------------------------------------------------- > > Key: HDDS-3559 > URL: https://issues.apache.org/jira/browse/HDDS-3559 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Affects Versions: 0.5.0 > Reporter: Li Cheng > Priority: Major > Labels: Triaged, pull-request-available > > 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN > org.apache.hadoop.ozone.container.common.statemachine.Endpoi > ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 > for past 0 seconds. > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148) > at > org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145) > at > org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: > Java heap space > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy38.submitRequest(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116) > > On a cluster, one datanode stops reporting to SCM while being kept unknown. > The datanode process is still working. Log shows Java heap OOM when it's > serializing protobuf for rpc message. However, datanode silently stops > reports to SCM and the process becomes stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org