[
https://issues.apache.org/jira/browse/TEZ-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor resolved TEZ-4440.
-------------------------------
Resolution: Fixed
> When tez app run in yarn fed cluster, may throw NPE
> ---------------------------------------------------
>
> Key: TEZ-4440
> URL: https://issues.apache.org/jira/browse/TEZ-4440
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: zhengchenyu
> Assignee: zhengchenyu
> Priority: Major
> Fix For: 0.9.3, 0.10.3
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> For hadoop version before YARN-8933. When tez app is running in yarn fed
> cluster, getAvailableResources may return null, then throw NPE.
> {code:java}
> 2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread]
> |rm.YarnTaskSchedulerService|: Got Error from RMClient
> java.lang.NullPointerException
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
> 2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread]
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler
> Thread,5,main] threw an Exception.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
> Caused by: java.lang.NullPointerException
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
> at
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
> In yarn federatiaon, AMRMProxy connect multi-rm in async way, so
> AllocateResponse::getAvailableResources may return null, then throw NPE.
> In my PR, I replace Resource.Instance(0,0) to null. Because null may means
> yarn is busy, return 0 is reasonable.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)