[jira] [Commented] (MESOS-4279) Graceful restart of docker task

2016-04-15 Thread Tyson Norris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243232#comment-15243232
 ] 

Tyson Norris commented on MESOS-4279:
-

Thanks for the updates.
One note I wanted to add was that we see exactly what [~bydga] describes above, 
in the "there are actually 2 bugs" comment:
- task stdout is truncated (compared to docker container json.log)
- task status is killed (instead of finished)

For example, regarding "You are calling the run->discard method (which causes 
to close the stderr/stdout streams) too early - during the "stoping period" 
container can (and usually will) write something about the termination". If I 
check the docker container log file on disk, it has a series of lines that are 
emitted during shutdown, so I can see that "docker stop" is called and the 
container does actually perform a graceful shutdown. HOWEVER, the task stdout 
does not receive any of these lines, after the docker stop is called.



> Graceful restart of docker task
> ---
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.2
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>  Labels: docker, mesosphere
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4279) Graceful restart of docker task

2016-04-14 Thread Tyson Norris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241564#comment-15241564
 ] 

Tyson Norris commented on MESOS-4279:
-

We are seeing this as well.
* on mesos-slave we use: --docker_stop_timeout=50secs
* outside of mesos, using "docker stop  produces some logged 
output from container based on container process handling SIGTERM signal
* inside of mesos, when task is stopped by marathon, no output is generated

Is there any issue reproducing this?

> Graceful restart of docker task
> ---
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2587) libprocess should allow configuration of ip/port separate from the ones it binds to

2015-06-21 Thread Tyson Norris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595313#comment-14595313
 ] 

Tyson Norris commented on MESOS-2587:
-

We see a simliar problem with slaves, where slave MESOS_HOSTNAME is either 
reachable by master (using private IP), or else the browser UI works (using 
public IP) - but we cannot make both work properly at the same time. Generally 
anywhere that HOSTNAME or IP is configurable, this should ideally include a 
public and a private value, for networks that expose different values depending 
on the actual client.

> libprocess should allow configuration of ip/port separate from the ones it 
> binds to
> ---
>
> Key: MESOS-2587
> URL: https://issues.apache.org/jira/browse/MESOS-2587
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Cosmin Lehene
>
> Currently libprocess will advertise {{LIBPROCESS_IP}}{{LIBPROCESS_PORT}}, but 
> if a framework runs in a container without an an interface that has a 
> publicly accessible IP (e.g. a container in bridge mode) it will advertise an 
> IP that will not be reachable by master.
> With this, we could advertise the external IP (reachable from master) of the 
> bridge from within a container. 
> This should allow frameworks running in containers to work in the safer 
> bridged mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)