[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516371#comment-15516371
 ] 

Ian Babrou commented on MESOS-6118:
-----------------------------------

I had to rework your patch a bit to apply on top of master. I then build 1.0.1 
with the resulting fs.cpp:

{noformat}
Sep 23 12:56:49 36com72 mesos-agent[15633]: Failed to perform recovery: Collect 
failed: Unable to unmount volumes for Docker container 
'5ec94354-f785-4d13-b3ef-fb1a37eac007': Failed to get mount table: Cycle found 
in mount table hierarchy through entry '1': 1 1 0:2 / / rw shared:1 - rootfs 
rootfs rw,size=65513288k,nr_inodes=16378322
Sep 23 12:56:49 36com72 mesos-agent[15633]: 17 1 0:17 / /sys 
rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 18 1 0:5 / /proc 
rw,nosuid,nodev,noexec,relatime shared:7 - proc proc rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 19 1 0:6 / /dev rw,nosuid shared:8 
- devtmpfs devtmpfs rw,size=65513304k,nr_inodes=16378326,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 20 17 0:18 / /sys/kernel/security 
rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 21 17 0:16 / /sys/fs/selinux 
rw,relatime shared:4 - selinuxfs selinuxfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 22 19 0:19 / /dev/shm 
rw,nosuid,nodev shared:9 - tmpfs tmpfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 23 19 0:13 / /dev/pts 
rw,nosuid,noexec,relatime shared:10 - devpts devpts 
rw,gid=5,mode=620,ptmxmode=000
Sep 23 12:56:49 36com72 mesos-agent[15633]: 24 1 0:20 / /run rw,nosuid,nodev 
shared:11 - tmpfs tmpfs rw,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 25 24 0:21 / /run/lock 
rw,nosuid,nodev,noexec,relatime shared:12 - tmpfs tmpfs rw,size=5120k
Sep 23 12:56:49 36com72 mesos-agent[15633]: 26 17 0:22 / /sys/fs/cgroup 
ro,nosuid,nodev,noexec shared:5 - tmpfs tmpfs ro,mode=755
Sep 23 12:56:49 36com72 mesos-agent[15633]: 27 26 0:23 / /sys/fs/cgroup/systemd 
rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup 
rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
Sep 23 12:56:49 36com72 mesos-agent[15633]: 28 26 0:24 / /sys/fs/cgroup/cpuset 
rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,cpuset
Sep 23 12:56:49 36com72 mesos-agent[15633]: 29 26 0:25 / 
/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:14 - cgroup 
cgroup rw,cpu,cpuacct
Sep 23 12:56:49 36com72 mesos-agent[15633]: 30 26 0:26 / /sys/fs/cgroup/blkio 
rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,blkio
Sep 23 12:56:49 36com72 mesos-agent[15633]: 31 26 0:27 / /sys/fs/cgroup/memory 
rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,memory
Sep 23 12:56:49 36com72 mesos-agent[15633]: 32 26 0:28 / /sys/fs/cgroup/devices 
rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,devices
Sep 23 12:56:49 36com72 mesos-agent[15633]: 33 26 0:29 / /sys/fs/cgroup/freezer 
rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,freezer
Sep 23 12:56:49 36com72 mesos-agent[15633]: 34 26 0:30 / 
/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:19 - 
cgroup cgroup rw,net_cls,net_prio
Sep 23 12:56:49 36com72 mesos-agent[15633]: 35 26 0:31 / 
/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:20 - cgroup 
cgroup rw,perf_event
Sep 23 12:56:49 36com72 mesos-agent[15633]: 36 26 0:32 / /sys/fs/cgroup/hugetlb 
rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,hugetlb
Sep 23 12:56:49 36com72 mesos-agent[15633]: 37 26 0:33 / /sys/fs/cgroup/pids 
rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,pids
Sep 23 12:56:49 36com72 mesos-agent[15633]: 38 18 0:34 / 
/proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 
rw,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
Sep 23 12:56:49 36com72 mesos-agent[15633]: 39 19 0:35 / /dev/hugepages 
rw,relatime shared:24 - hugetlbfs hugetlbfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 40 17 0:8 / /sys/kernel/debug 
rw,relatime shared:25 - debugfs debugfs rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 41 19 0:15 / /dev/mqueue 
rw,relatime shared:26 - mqueue mqueue rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: 42 1 9:127 / /state rw,relatime 
shared:27 - ext4 /dev/md127 rw,stripe=384,data=ordered
Sep 23 12:56:49 36com72 mesos-agent[15633]: 43 1 0:37 / /srv rw,relatime 
shared:28 - nfs4 10.36.14.18:/srv/hosts/36com72 
rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18
Sep 23 12:56:49 36com72 mesos-agent[15633]: 44 1 0:37 / /srv-master rw,relatime 
shared:29 - nfs4 10.36.14.18:/srv 
rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18
Sep 23 12:56:49 36com72 mesos-agent[15633]: 45 38 0:36 / 
/proc/sys/fs/binfmt_misc rw,relatime shared:30 - binfmt_misc binfmt_misc rw
Sep 23 12:56:49 36com72 mesos-agent[15633]: To remedy this do as follows:
Sep 23 12:56:49 36com72 mesos-agent[15633]: Step 1: rm -f 
/state/var/lib/mesos/meta/slaves/latest
Sep 23 12:56:49 36com72 mesos-agent[15633]: This ensures agent doesn't recover 
old live executors.
Sep 23 12:56:49 36com72 mesos-agent[15633]: Step 2: Restart the agent.
{noformat}

I'm on Debian Jessie and Linux 4.4.17.

> Agent would crash with docker container tasks due to host mount table read.
> ---------------------------------------------------------------------------
>
>                 Key: MESOS-6118
>                 URL: https://issues.apache.org/jira/browse/MESOS-6118
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 1.0.1
>         Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Jamie Briant
>            Assignee: Kevin Klues
>            Priority: Critical
>              Labels: linux, slave
>             Fix For: 1.1.0, 1.0.2
>
>         Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to