I upgraded Go's build system to GKE 1.4.6 the other day.

I'm now finding Docker dying and the GKE nodes to go unhealthy (their
kubernetes.service crash looping on boot, looking for Docker).

$ kubectl get nodes
NAME                                       STATUS     AGE
gke-buildlets-default-pool-a4a240f8-0pf0   NotReady   10h
gke-buildlets-default-pool-a4a240f8-7uwv   NotReady   10h
gke-buildlets-default-pool-a4a240f8-spbd   NotReady   10h

..

  Ready                 Unknown         Wed, 23 Nov 2016 10:12:50 +0000
    Wed, 23 Nov 2016 10:13:35 +0000         NodeStatusUnknown
Kubelet stopped posting node status.

....

If I ssh into one of the nodes and sudo journalctl -f, I see
kubernetes.service repeatedly crash looping with:

Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]:
kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]:
kubelet.service: Unit entered failed state.
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]:
kubelet.service: Failed with result 'exit-code'.
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv kubelet[9136]:
error: failed to run Kubelet: failed to create kubelet: failed to get
runtime
version: docker: failed to get docker version: Cannot connect to the Docker
daemon. Is the docker daemon running on this host?


And yup, Docker is dead:

# journalctl -u docker.service --since="10 hours ago"
....
....
Nov 23 11:40:28 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]:
time="2016-11-23T11:40:28.546703577Z" level=warning msg="container
464da48e2869e518e7bc5e052ced1e25b8ce70ba15168d1d34599b825dc61519 restart
canceled"
Nov 23 11:40:28 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]:
time="2016-11-23T11:40:28.766556953Z" level=warning msg="container
d1f1425cfe29a34783668a39bb50ad5db3f218a7d0caae12d8ba81387776b708 restart
canceled"
*Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]:
time="2016-11-23T11:40:33.516533433Z" level=error msg="Force shutdown
daemon"*
*Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]:
time="2016-11-23T11:40:33Z" level=info msg="stopping containerd after
receiving terminated"*
*Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]:
time="2016-11-23T11:40:33Z" level=fatal msg="containerd: serve grpc"
error="accept unix /var/run/docker/libcontainerd/docker-containerd.sock:
use of closed network connection"*
Nov 23 11:40:34 gke-buildlets-default-pool-a4a240f8-7uwv sh[4983]: + [ ! -s
/var/lib/docker/repositories-overlay ]
Nov 23 11:40:34 gke-buildlets-default-pool-a4a240f8-7uwv sh[4983]: + rm -f
/var/lib/docker/repositories-overlay
Nov 23 11:40:59 gke-buildlets-default-pool-a4a240f8-7uwv sh[5113]: + [ ! -s
/var/lib/docker/repositories-overlay ]
Nov 23 11:40:59 gke-buildlets-default-pool-a4a240f8-7uwv sh[5113]: + rm -f
/var/lib/docker/repositories-overlay
....
Nov 23 11:47:35 gke-buildlets-default-pool-a4a240f8-7uwv sh[15220]: + [ !
-s /var/lib/docker/repositories-overlay ]Nov 23 11:47:35
gke-buildlets-default-pool-a4a240f8-7uwv sh[15220]: + rm -f
/var/lib/docker/repositories-overlay
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv docker[15226]:
time="2016-11-23T11:47:36.494367508Z" level=fatal msg=*"Error starting
daemon: Error initializing network controller: Error creating default
\"bridge\" network: failed to allocate gateway (169.254.123.1): Address
already in use"*
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv sh[15264]: + [ !
-s /var/lib/docker/repositories-overlay ]
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv sh[15264]: + rm -f
/var/lib/docker/repositories-overlay
Nov 23 11:47:37 gke-buildlets-default-pool-a4a240f8-7uwv docker[15270]:
time="2016-11-23T11:47:37.607995725Z" level=fatal msg="Error starting
daemon: Error initializing network controller: Error creating default
\"bridge\" network: failed to allocate gateway (169.254.123.1): Address
already in use"

And I can't restart docker:

# systemctl start docker.service
Nov 23 16:53:55 gke-buildlets-default-pool-a4a240f8-7uwv sh[15731]: + [ !
-s /var/lib/docker/repositories-overlay ]
Nov 23 16:53:55 gke-buildlets-default-pool-a4a240f8-7uwv sh[15731]: + rm -f
/var/lib/docker/repositories-overlay
Nov 23 16:53:56 gke-buildlets-default-pool-a4a240f8-7uwv docker[15737]:
time="2016-11-23T16:53:56.413732453Z" level=fatal msg="Error starting
daemon: Error init
ializing network controller: Error creating default \"bridge\" network:
failed to allocate gateway (169.254.123.1): Address already in use"
Job for docker.service failed because the control process exited with error
code. See "systemctl status docker.service" and "journalctl -xe" for
details.


# docker --version
Docker version 1.11.2, build 4dc5990


Any clues?

Is this a known issue?

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.
  • [kubernetes-use... Brad Fitzpatrick

Reply via email to