Yarn deployment takes long on some networks

Gyula Fóra Tue, 21 Nov 2017 06:42:40 -0800

Hi all!

Today we started noticing that deploying our jobs took over 3 minutes when
deployed from some machine and normal (few seconds) when deployed from the
others.


Looking at the logs it seems that the client cant find some job id for a
few minutes in this case:

...
2017-11-21 15:23:00,880 DEBUG org.apache.flink.yarn.YarnJobManager
                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in
JobManager
2017-11-21 15:23:04,528 DEBUG org.apache.zookeeper.ClientCnxn
                 - Got ping response for sessionid: 0x25eb8e005b7971b after
0ms
2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
                - IPC Client (937277082) connection to
splat13.sto.midasplayer.com/172.26.87.155:8030 from splat sending #38
2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
                - IPC Client (937277082) connection to
splat13.sto.midasplayer.com/172.26.87.155:8030 from splat got value #38
2017-11-21 15:23:04,651 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
                 - Call: allocate took 16ms
2017-11-21 15:23:05,880 DEBUG org.apache.flink.yarn.YarnJobManager
                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in
JobManager
2017-11-21 15:23:06,409 DEBUG akka.remote.RemoteWatcher
                 - Sending Heartbeat to [akka.tcp://
fl...@splat33.sto.midasplayer.com:56045]
2017-11-21 15:23:06,413 DEBUG akka.remote.RemoteWatcher
                 - Received heartbeat rsp from [akka.tcp://
fl...@splat33.sto.midasplayer.com:56045]
2017-11-21 15:23:07,665 DEBUG
akka.serialization.Serialization(akka://flink)                - Using
serializer[akka.serialization.JavaSerializer] for message
[org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse]
2017-11-21 15:23:07,824 INFO  org.apache.flink.yarn.YarnJobManager
                - Submitting job 179d67bfab7c4c0b9f00ea772f6e4f0c
(event-bifrost-log).
2017

Interestingly enough nothing like this shows when deployed from other
servers.
We suspect there might be some strange network issue (which doesnt seem to
affect jar upload times) that screws with akka in some way.

Any idea how to debug this?
Thank you!

Gyula

Yarn deployment takes long on some networks

Reply via email to