[ 
https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-244:
-----------------------------

    Description: 
This appeared in one of our clusters at Twitter.

Looks like the slave webui process (which is a fork of mesos-slave) is not 
properly shutting down. 
Couple of things that need to happen to fix this

1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file 
descriptors between slave webui and the executors.
2) Explicitly call executor shutdown, to give it a chance to clean up


[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
 

  was:
This appeared in one of our clusters at Twitter.

Looks like the slave webui process (which is a fork of mesos-slave) is not 
properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors 
between slave webui and the executors.

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
 

        Summary: Mesos slave process is not shutting down cleanly  (was: Mesos 
webui process is not doing FD_CLOEXEC)
    
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
>                 Key: MESOS-244
>                 URL: https://issues.apache.org/jira/browse/MESOS-244
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not 
> properly shutting down. 
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file 
> descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
> mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
> mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
> mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
> python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to