Re: flink on yarn 作业挂掉反复重启
可以检查下是不是 JobManager 内存不足被 OOM kill 了,如果有更多的日志也可以贴出来 Best, Weihua On Mon, Jul 18, 2022 at 8:41 PM SmileSmile wrote: > hi,all > 遇到这种场景,flink on yarn,并行度3000的场景下,作业包含了多个agg操作,作业recover from checkpoint > 或者savepoint必现无法恢复的情况,作业反复重启 > jm报错org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - > RECEIVED S > IGNAL 15: SIGTERM. Shutting down as requested. > > 请问有什么好的排查思路吗 > > > > >
flink on yarn 作业挂掉反复重启
hi,all 遇到这种场景,flink on yarn,并行度3000的场景下,作业包含了多个agg操作,作业recover from checkpoint 或者savepoint必现无法恢复的情况,作业反复重启 jm报错org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - RECEIVED S IGNAL 15: SIGTERM. Shutting down as requested. 请问有什么好的排查思路吗