[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables

2015-10-16 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3751:
-
Summary: MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
--executor_environmnent_variables  (was: MESOS_NATIVE_JAVA_LIBRARY not set on 
MesosContainerizre tasks with --executor_environmnent_variables)

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
> --executor_environmnent_variables
> ---
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3506:
-
Story Points: 1
  Labels: documentation mesosphere  (was: )

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3706) Tasks stuck in staging.

2015-10-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961026#comment-14961026
 ] 

haosdent commented on MESOS-3706:
-

Do you have stdout and stderr log?

> Tasks stuck in staging.
> ---
>
> Key: MESOS-3706
> URL: https://issues.apache.org/jira/browse/MESOS-3706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Affects Versions: 0.23.0, 0.24.1
>Reporter: Jord Sonneveld
> Attachments: Screen Shot 2015-10-12 at 9.08.30 AM.png, Screen Shot 
> 2015-10-12 at 9.24.32 AM.png, mesos-slave.INFO, mesos-slave.INFO.2, 
> mesos-slave.INFO.3
>
>
> I have a docker image which starts fine on all my slaves except for one.  On 
> that one, it is stuck in STAGING for a long time and never starts.  The INFO 
> log is full of messages like this:
> I1012 16:02:09.210306 34905 slave.cpp:1768] Asked to kill task 
> kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72 of framework 
> 20150109-172016-504433162-5050-19367-0002
> E1012 16:02:09.211272 34907 socket.hpp:174] Shutdown failed on fd=12: 
> Transport endpoint is not connected [107]
> kwe-vinland-work is the task that is stuck in staging.  It is launched by 
> marathon.  I have launched 161 instances successfully on my cluster.  But it 
> refuses to launch on this specific slave.
> These machines are all managed via ansible so their configurations are / 
> should be identical.  I have re-run my ansible scripts and rebooted the 
> machines to no avail.
> It's been in this state for almost 30 minutes.  You can see the mesos docker 
> executor is still running:
> jord@dalstgmesos03:~$ date
> Mon Oct 12 16:13:55 UTC 2015
> jord@dalstgmesos03:~$ ps auwx | grep kwe-vinland
> root 35360  0.0  0.0 1070576 21476 ?   Ssl  15:46   0:00 
> mesos-docker-executor 
> --container=mesos-20151012-082619-4145023498-5050-22623-S0.0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --docker=docker --help=false --mapped_directory=/mnt/mesos/sandbox 
> --sandbox_directory=/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --stop_timeout=0ns
> According to docker ps -a, nothing was ever even launched:
> jord@dalstgmesos03:/data/mesos$ sudo docker ps -a
> CONTAINER IDIMAGE  
> COMMAND  CREATED STATUS  PORTS
> NAMES
> 5c858b90b0a0registry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   39 minutes ago  Up 39 minutes   
> 0.0.0.0:9125->8125/udp, 0.0.0.0:9126->8126/tcp   statsd-fe-influxdb
> d765ba3829fdregistry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   41 minutes ago  Up 41 minutes   
> 0.0.0.0:8125->8125/udp, 0.0.0.0:8126->8126/tcp   statsd-repeater
> Those are the only two entries. Nothing about the kwe-vinland job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3752) CentOS 6 dependency install fails at Maven

2015-10-16 Thread Greg Mann (JIRA)
Greg Mann created MESOS-3752:


 Summary: CentOS 6 dependency install fails at Maven
 Key: MESOS-3752
 URL: https://issues.apache.org/jira/browse/MESOS-3752
 Project: Mesos
  Issue Type: Documentation
Reporter: Greg Mann
Assignee: Greg Mann


It seems the Apache Maven dependencies have changed such that following the 
Getting Started docs for CentOS 6.6 will fail at Maven installation:

{code}
---> Package apache-maven.noarch 0:3.3.3-2.el6 will be installed
--> Processing Dependency: java-devel >= 1:1.7.0 for package: 
apache-maven-3.3.3-2.el6.noarch
--> Finished Dependency Resolution
Error: Package: apache-maven-3.3.3-2.el6.noarch (epel-apache-maven)
   Requires: java-devel >= 1:1.7.0
   Available: java-1.5.0-gcj-devel-1.5.0.0-29.1.el6.x86_64 (base)
   java-devel = 1.5.0
   Available: 1:java-1.6.0-openjdk-devel-1.6.0.35-1.13.7.1.el6_6.x86_64 
(base)
   java-devel = 1:1.6.0
   Available: 1:java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64 
(updates)
   java-devel = 1:1.6.0
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

2015-10-16 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961039#comment-14961039
 ] 

James Peach commented on MESOS-2079:


As per [~bmahler] on IRC, we should ignore {{SIGPIPE}} globally at lib process 
initialization and remove the various {{SIGPIPE}} suppression dances.

> IO.Write test is flaky on OS X 10.10.
> -
>
> Key: MESOS-2079
> URL: https://issues.apache.org/jira/browse/MESOS-2079
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, technical debt, test
> Environment: OS X 10.10
> {noformat}
> $ clang++ --version
> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> {noformat}
>Reporter: Benjamin Mahler
>Assignee: James Peach
>  Labels: flaky
>
> [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. 
> Unfortunately, we don't have a stacktrace for SIGPIPE currently:
> {noformat}
> [ RUN  ] IO.Write
> make[5]: *** [check-local] Broken pipe: 13
> {noformat}
> Running in gdb, seems to always occur here:
> {code}
> Program received signal SIGPIPE, Broken pipe.
> [Switching to process 56827 thread 0x60b]
> 0x7fff9a011132 in __psynch_cvwait ()
> (gdb) where
> #0  0x7fff9a011132 in __psynch_cvwait ()
> #1  0x7fff903e7ea0 in _pthread_cond_wait ()
> #2  0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at 
> gate.hpp:82
> #3  0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373
> #4  0x7fff903e72fc in _pthread_body ()
> #5  0x7fff903e7279 in _pthread_start ()
> #6  0x7fff903e54b1 in thread_start ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3365) Export per container SNMP statistics

2015-10-16 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961098#comment-14961098
 ] 

Ian Downes commented on MESOS-3365:
---

Reviewed. [~wangcong], please address the comments and I'll review again.

> Export per container SNMP statistics
> 
>
> Key: MESOS-3365
> URL: https://issues.apache.org/jira/browse/MESOS-3365
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>Priority: Minor
>  Labels: twitter
>
> We need to export the per container SNMP statistics too, from its 
> /proc/net/snmp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3706) Tasks stuck in staging.

2015-10-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961026#comment-14961026
 ] 

haosdent edited comment on MESOS-3706 at 10/16/15 5:14 PM:
---

Do you have stdout and stderr log for your task?


was (Author: haosd...@gmail.com):
Do you have stdout and stderr log?

> Tasks stuck in staging.
> ---
>
> Key: MESOS-3706
> URL: https://issues.apache.org/jira/browse/MESOS-3706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Affects Versions: 0.23.0, 0.24.1
>Reporter: Jord Sonneveld
> Attachments: Screen Shot 2015-10-12 at 9.08.30 AM.png, Screen Shot 
> 2015-10-12 at 9.24.32 AM.png, mesos-slave.INFO, mesos-slave.INFO.2, 
> mesos-slave.INFO.3
>
>
> I have a docker image which starts fine on all my slaves except for one.  On 
> that one, it is stuck in STAGING for a long time and never starts.  The INFO 
> log is full of messages like this:
> I1012 16:02:09.210306 34905 slave.cpp:1768] Asked to kill task 
> kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72 of framework 
> 20150109-172016-504433162-5050-19367-0002
> E1012 16:02:09.211272 34907 socket.hpp:174] Shutdown failed on fd=12: 
> Transport endpoint is not connected [107]
> kwe-vinland-work is the task that is stuck in staging.  It is launched by 
> marathon.  I have launched 161 instances successfully on my cluster.  But it 
> refuses to launch on this specific slave.
> These machines are all managed via ansible so their configurations are / 
> should be identical.  I have re-run my ansible scripts and rebooted the 
> machines to no avail.
> It's been in this state for almost 30 minutes.  You can see the mesos docker 
> executor is still running:
> jord@dalstgmesos03:~$ date
> Mon Oct 12 16:13:55 UTC 2015
> jord@dalstgmesos03:~$ ps auwx | grep kwe-vinland
> root 35360  0.0  0.0 1070576 21476 ?   Ssl  15:46   0:00 
> mesos-docker-executor 
> --container=mesos-20151012-082619-4145023498-5050-22623-S0.0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --docker=docker --help=false --mapped_directory=/mnt/mesos/sandbox 
> --sandbox_directory=/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --stop_timeout=0ns
> According to docker ps -a, nothing was ever even launched:
> jord@dalstgmesos03:/data/mesos$ sudo docker ps -a
> CONTAINER IDIMAGE  
> COMMAND  CREATED STATUS  PORTS
> NAMES
> 5c858b90b0a0registry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   39 minutes ago  Up 39 minutes   
> 0.0.0.0:9125->8125/udp, 0.0.0.0:9126->8126/tcp   statsd-fe-influxdb
> d765ba3829fdregistry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   41 minutes ago  Up 41 minutes   
> 0.0.0.0:8125->8125/udp, 0.0.0.0:8126->8126/tcp   statsd-repeater
> Those are the only two entries. Nothing about the kwe-vinland job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3752) CentOS 6 dependency install fails at Maven

2015-10-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3752:
-
Story Points: 1

> CentOS 6 dependency install fails at Maven
> --
>
> Key: MESOS-3752
> URL: https://issues.apache.org/jira/browse/MESOS-3752
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, installation, mesosphere
>
> It seems the Apache Maven dependencies have changed such that following the 
> Getting Started docs for CentOS 6.6 will fail at Maven installation:
> {code}
> ---> Package apache-maven.noarch 0:3.3.3-2.el6 will be installed
> --> Processing Dependency: java-devel >= 1:1.7.0 for package: 
> apache-maven-3.3.3-2.el6.noarch
> --> Finished Dependency Resolution
> Error: Package: apache-maven-3.3.3-2.el6.noarch (epel-apache-maven)
>Requires: java-devel >= 1:1.7.0
>Available: java-1.5.0-gcj-devel-1.5.0.0-29.1.el6.x86_64 (base)
>java-devel = 1.5.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.35-1.13.7.1.el6_6.x86_64 (base)
>java-devel = 1:1.6.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64 (updates)
>java-devel = 1:1.6.0
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961061#comment-14961061
 ] 

Marco Massenzio commented on MESOS-3738:


[~tnachen] is not going to be around for the next couple of weeks, 
unfortunately.

cc: [~mcypark] - could you please have a look and see if you can shepherd, 
please?

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 

[jira] [Created] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerizre tasks with --executor_environmnent_variables

2015-10-16 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-3751:
---

 Summary: MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerizre 
tasks with --executor_environmnent_variables
 Key: MESOS-3751
 URL: https://issues.apache.org/jira/browse/MESOS-3751
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.25.0, 0.24.1
Reporter: Cody Maloney


When using --executor_environment_variables, and having 
MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.

Relevant code: 
https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281

It sees that the variable is in the mesos-slave's environment (os::getenv), 
rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerizre tasks with --executor_environmnent_variables

2015-10-16 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-3751:
---

Assignee: Gilbert Song

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerizre tasks with 
> --executor_environmnent_variables
> 
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3750) ContentType/SchedulerTest.ShutdownExecutor/0 is flaky

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961305#comment-14961305
 ] 

Anand Mazumdar commented on MESOS-3750:
---

The root cause for the flakiness is due to the following race condition:

Bad Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The master upon noticing the terminal status update , marks the resources as 
available. Note, there that these resources can now be offered back to the 
framework.
- The master sends an {{Event::Offer}} to the framework.
- This fails the test as the framework was expecting an {{Event::Failure}}.

Good Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The framework on getting the status update, sends a {{Event::SHUTDOWN}} to 
the master leading to the master destroying the executor.
- No subsequent offers are sent to the framework in this case. An 
{{Event::FAILURE}} is sent by the master to the framework letting it know that 
an active executor has finished.

This seems easily fixable but I wonder if we should wait for a resolution on 
MESOS-3339 before fixing this as that enables us to filter subsequent offers 
quite easily and set expectations based on that.

> ContentType/SchedulerTest.ShutdownExecutor/0 is flaky
> -
>
> Key: MESOS-3750
> URL: https://issues.apache.org/jira/browse/MESOS-3750
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: flaky-test
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/942/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN  ] ContentType/SchedulerTest.ShutdownExecutor/0
> Using temporary directory 
> '/tmp/ContentType_SchedulerTest_ShutdownExecutor_0_AEwZqa'
> I1016 12:51:41.421211 30336 leveldb.cpp:176] Opened db in 131.548926ms
> I1016 12:51:41.480257 30336 leveldb.cpp:183] Compacted db in 58.993935ms
> I1016 12:51:41.480355 30336 leveldb.cpp:198] Created db iterator in 28351ns
> I1016 12:51:41.480376 30336 leveldb.cpp:204] Seeked to beginning of db in 
> 2740ns
> I1016 12:51:41.480388 30336 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 493ns
> I1016 12:51:41.480445 30336 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1016 12:51:41.481050 30359 recover.cpp:449] Starting replica recovery
> I1016 12:51:41.481324 30359 recover.cpp:475] Replica is in EMPTY status
> I1016 12:51:41.482493 30360 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (9945)@172.17.7.238:34368
> I1016 12:51:41.482924 30363 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1016 12:51:41.483568 30361 recover.cpp:566] Updating replica status to 
> STARTING
> I1016 12:51:41.485028 30365 master.cpp:376] Master 
> 3684704c-615f-4f62-b45c-b7ae0cb8176f (635f798fc895) started on 
> 172.17.7.238:34368
> I1016 12:51:41.485051 30365 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ContentType_SchedulerTest_ShutdownExecutor_0_AEwZqa/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/ContentType_SchedulerTest_ShutdownExecutor_0_AEwZqa/master" 
> --zk_session_timeout="10secs"
> I1016 12:51:41.485404 30365 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1016 12:51:41.485419 30365 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I1016 12:51:41.485429 30365 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/ContentType_SchedulerTest_ShutdownExecutor_0_AEwZqa/credentials'
> I1016 12:51:41.485692 30365 master.cpp:467] Using default 'crammd5' 
> authenticator
> I1016 12:51:41.485827 30365 master.cpp:504] Authorization enabled
> I1016 12:51:41.486111 30360 whitelist_watcher.cpp:79] No 

[jira] [Commented] (MESOS-3752) CentOS 6 dependency install fails at Maven

2015-10-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961304#comment-14961304
 ] 

Greg Mann commented on MESOS-3752:
--

I found the maintainer of these packages, Ding-Yi Chen, and it looks like I'm 
not the only one seeing this issue. I left a comment on his blog where he 
announced his release: 
https://dingyichen.wordpress.com/2015/10/08/dchens-apache-maven-is-updated-to-3-3-3-for-el7-and-el6/

> CentOS 6 dependency install fails at Maven
> --
>
> Key: MESOS-3752
> URL: https://issues.apache.org/jira/browse/MESOS-3752
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, installation, mesosphere
>
> It seems the Apache Maven dependencies have changed such that following the 
> Getting Started docs for CentOS 6.6 will fail at Maven installation:
> {code}
> ---> Package apache-maven.noarch 0:3.3.3-2.el6 will be installed
> --> Processing Dependency: java-devel >= 1:1.7.0 for package: 
> apache-maven-3.3.3-2.el6.noarch
> --> Finished Dependency Resolution
> Error: Package: apache-maven-3.3.3-2.el6.noarch (epel-apache-maven)
>Requires: java-devel >= 1:1.7.0
>Available: java-1.5.0-gcj-devel-1.5.0.0-29.1.el6.x86_64 (base)
>java-devel = 1.5.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.35-1.13.7.1.el6_6.x86_64 (base)
>java-devel = 1:1.6.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64 (updates)
>java-devel = 1:1.6.0
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-16 Thread Alex Clemmer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961302#comment-14961302
 ] 

Alex Clemmer commented on MESOS-1563:
-

Hey folks. Just saw these review #39345 today. I just wanted to pop my head in 
here to let you all know that the Windows integration work that is ongoing will 
involve pretty substantial changes to libprocess and stout. I mention this only 
because I'd like to put it on your radar so that (to the extent possible) we 
can avoid breaking the Windows build. :)

I can give a quick once-over to the reviews if you like, please just make sure 
you include me when relevant. My handle is `hausdorff`.

I also hope you will let me know if I can help in any other way.

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3750) ContentType/SchedulerTest.ShutdownExecutor/0 is flaky

2015-10-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961305#comment-14961305
 ] 

Vinod Kone edited comment on MESOS-3750 at 10/16/15 9:04 PM:
-

The root cause for the flakiness is due to the following race condition:

Bad Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The master upon noticing the terminal status update , marks the resources as 
available. Note, there that these resources can now be offered back to the 
framework.
- The master sends an {{Event::Offer}} to the framework.
- This fails the test as the framework was expecting an {{Event::Failure}}.

Good Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The framework on getting the status update, sends a {{Call::SHUTDOWN}} to the 
master leading to the master destroying the executor.
- No subsequent offers are sent to the framework in this case. An 
{{Event::FAILURE}} is sent by the master to the framework letting it know that 
an active executor has finished.

This seems easily fixable but I wonder if we should wait for a resolution on 
MESOS-3339 before fixing this as that enables us to filter subsequent offers 
quite easily and set expectations based on that.


was (Author: anandmazumdar):
The root cause for the flakiness is due to the following race condition:

Bad Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The master upon noticing the terminal status update , marks the resources as 
available. Note, there that these resources can now be offered back to the 
framework.
- The master sends an {{Event::Offer}} to the framework.
- This fails the test as the framework was expecting an {{Event::Failure}}.

Good Run
- The master sends 1 offer to the framework.
- The framework accepts the offer and launches a task.
- The executor , when the task completes, sends a {{TASK_FINISHED}} status 
update to the master.
- The framework on getting the status update, sends a {{Event::SHUTDOWN}} to 
the master leading to the master destroying the executor.
- No subsequent offers are sent to the framework in this case. An 
{{Event::FAILURE}} is sent by the master to the framework letting it know that 
an active executor has finished.

This seems easily fixable but I wonder if we should wait for a resolution on 
MESOS-3339 before fixing this as that enables us to filter subsequent offers 
quite easily and set expectations based on that.

> ContentType/SchedulerTest.ShutdownExecutor/0 is flaky
> -
>
> Key: MESOS-3750
> URL: https://issues.apache.org/jira/browse/MESOS-3750
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: flaky-test
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/942/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN  ] ContentType/SchedulerTest.ShutdownExecutor/0
> Using temporary directory 
> '/tmp/ContentType_SchedulerTest_ShutdownExecutor_0_AEwZqa'
> I1016 12:51:41.421211 30336 leveldb.cpp:176] Opened db in 131.548926ms
> I1016 12:51:41.480257 30336 leveldb.cpp:183] Compacted db in 58.993935ms
> I1016 12:51:41.480355 30336 leveldb.cpp:198] Created db iterator in 28351ns
> I1016 12:51:41.480376 30336 leveldb.cpp:204] Seeked to beginning of db in 
> 2740ns
> I1016 12:51:41.480388 30336 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 493ns
> I1016 12:51:41.480445 30336 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1016 12:51:41.481050 30359 recover.cpp:449] Starting replica recovery
> I1016 12:51:41.481324 30359 recover.cpp:475] Replica is in EMPTY status
> I1016 12:51:41.482493 30360 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (9945)@172.17.7.238:34368
> I1016 12:51:41.482924 30363 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1016 12:51:41.483568 30361 recover.cpp:566] Updating replica status to 
> STARTING
> I1016 12:51:41.485028 30365 master.cpp:376] Master 
> 3684704c-615f-4f62-b45c-b7ae0cb8176f (635f798fc895) started on 
> 172.17.7.238:34368
> I1016 12:51:41.485051 30365 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> 

[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-16 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961376#comment-14961376
 ] 

Ian Downes commented on MESOS-1563:
---

The changes to support FreeBSD are relatively minor and in many circumstances 
it behaves like OSX. If these changes get in before your integration is 
complete then the responsibility will be reversed :-)

So...  that brings us to the point: if we merge support for FreeBSD then we 
should (must?) include it in CI somewhere. Does anyone know if Apache can 
provide this? [~hausdorff], what are you doing for Windows CI?

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-16 Thread Jay Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961311#comment-14961311
 ] 

Jay Taylor commented on MESOS-3738:
---

I've rebuilt with the 0.25.0 patch on this ticket and confirmed that all 
previously failing health-check configurations now work:

[OK] Using launcher_dir flag
[OK] Using MESOS_LAUNCHER_DIR environment variable
[OK] Not setting the flag or variable, health-checks now launch fine!

Thanks Haosdent et. al.!

Best,
Jay

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat 

[jira] [Commented] (MESOS-3712) --launcher_dir flag is not picked up when running health-checks

2015-10-16 Thread Jay Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961314#comment-14961314
 ] 

Jay Taylor commented on MESOS-3712:
---

This ticket should be merged into 
https://issues.apache.org/jira/browse/MESOS-3738.  I've tested the patch in 
3738 and it fixed all broken aspects of health-check invocation.

> --launcher_dir flag is not picked up when running health-checks
> ---
>
> Key: MESOS-3712
> URL: https://issues.apache.org/jira/browse/MESOS-3712
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.24.1, 0.25.0
> Environment: Ubuntu Linux x64
>Reporter: Jay Taylor
>  Labels: containerizer, mesosphere, tech-debt
>
> Mesos configuration flags are one-way and aren't expanded to their 
> corresponding MESOS_ENV variable.
> The {{MESOS_LAUNCHER_DIR}} however, is necessary [here| 
> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576]:
> {code}
>   const Option envPath = os::getenv("MESOS_LAUNCHER_DIR");
>   string path =
> envPath.isSome() ? envPath.get()
>  : os::realpath(Path(argv[0]).dirname()).get();
> {code}
> when the executor needs to resolve the path to run, for example, 
> health-checks.
> Instead of the passed-in argument {{argv[0]}} (which contains the Agent's 
> {{--work_dir}} instead) is the path that ends up being used.
> How can the requisite MESOS_LAUNCHER_DIR env var be available when 
> {{docker/executor.cpp}} (a child process of {{mesos-slave}}) attempts to read 
> it?
> 
> The relevant email thread is here:
> http://www.mail-archive.com/user@mesos.apache.org/msg04794.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961397#comment-14961397
 ] 

Vinod Kone commented on MESOS-1563:
---

[~idownes] If there is a Docker image for FreeBSD, we can easily add support 
for it in our CI. Make sure to add FreeBSD support to support/jenkins_build.sh 
script and test locally that it works.

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables

2015-10-16 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3751:

Sprint: Mesosphere Sprint 21

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
> --executor_environmnent_variables
> ---
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3599) COMMAND health checks with marathon running in slave context broken

2015-10-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960327#comment-14960327
 ] 

haosdent commented on MESOS-3599:
-

[~ekesken] If you have time, please take a look whether the use case in 
https://reviews.apache.org/r/39387/diff/1#2 match your expect or not.

> COMMAND health checks with marathon running in slave context broken
> ---
>
> Key: MESOS-3599
> URL: https://issues.apache.org/jira/browse/MESOS-3599
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Erhan Kesken
>Assignee: haosdent
>Priority: Critical
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960287#comment-14960287
 ] 

Anand Mazumdar edited comment on MESOS-3747 at 10/16/15 7:20 AM:
-

[~liqlin] Thanks for taking this up. It's perfectly fine to drop these silently 
with a warning when using the C++ Scheduler Library as we do for other failed 
validations as you already spoke about earlier.

This JIRA was referring to what should be the general behavior of the Mesos 
master in case of clients that might be different then the Scheduler Library.


was (Author: anandmazumdar):
[~liqiang] Thanks for taking this up. It's perfectly fine to drop these 
silently with a warning when using the C++ Scheduler Library as we do for other 
failed validations as you already spoke about earlier.

This JIRA was referring to what should be the general behavior of the Mesos 
master in case of clients that might be different then the Scheduler Library.

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> 

[jira] [Commented] (MESOS-3589) Mesos tasks with many ports cause significant master performance problems

2015-10-16 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960366#comment-14960366
 ] 

Klaus Ma commented on MESOS-3589:
-

[~jamesmulcahy], it's glad to see the performance is acceptable :). It's 
reasonable that the performance of random ports is lower; but I'll add a UT 
case to see the measure the gap, and to see if any improvement.

> Mesos tasks with many ports cause significant master performance problems
> -
>
> Key: MESOS-3589
> URL: https://issues.apache.org/jira/browse/MESOS-3589
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: James Mulcahy
> Attachments: mesos-master-perf-top-call-expanded.txt, 
> mesos-master-perf.txt
>
>
> Today, I used a framework to fire off some tasks which each requested a lot 
> of ports.  When doing so, the meson-master performance drops heavily.
> With 70 tasks each requesting 100 ports, the master seems to spend the 
> majority (~90%) of its time dealing with merging and coalescing Value_Range 
> objects.  
> I'll attach some text views of the output of 'perf' record/report from the 
> meson-master during this.  The call-graph trace for the most frequently 
> sampled call is:
> {code}
> -   4.42%  mesos-master  mesos-master [.] 
> google::protobuf::internal::RepeatedPtrFieldBase::size() const
>   
>   
>- google::protobuf::internal::RepeatedPtrFieldBase::size() const   
>   
>   
>   
>   - 37.05% mesos::Value_Ranges::range_size() const
>   
>   
>   
>  - 91.07% mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range 
> const&)   
>   
>  
> - mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges 
> const&)   
>   
> 
>- 68.35% mesos::operator+=(mesos::Value_Ranges&, 
> mesos::Value_Ranges const&)   
>   
> 
>   - 99.46% Option 
> mesos::Resources::get(std::string const&) const  
>   
>  
>  - mesos::internal::model(mesos::Resources const&)
>   
>   
>   
> - 97.58% mesos::internal::model(mesos::internal::Task 
> const&)   
>   
>   
>  
> mesos::internal::master::model(mesos::internal::master::Framework const&) 
>   
>   
>  
>  
> mesos::internal::master::Master::Http::state(process::http::Request const&) 
> const 
>   
>
>  
> mesos::internal::master::Master::initialize()::{lambda(process::http::Request 
> const&)#9}::operator()(process::http::Request const&) const   
>   
>  
>  
> 

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960287#comment-14960287
 ] 

Anand Mazumdar commented on MESOS-3747:
---

[[~liqiang] Thanks for taking this up. It's perfectly fine to drop these 
silently with a warning when using the C++ Scheduler Library as we do for other 
failed validations as you already spoke about earlier.

This JIRA was referring to what should be the general behavior of the Mesos 
master in case of clients that might be different then the Scheduler Library.

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for 
> executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping 
> launch
> I1015 13:15:34.263478 19638 containerizer.cpp:640] 

[jira] [Comment Edited] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960287#comment-14960287
 ] 

Anand Mazumdar edited comment on MESOS-3747 at 10/16/15 7:18 AM:
-

[~liqiang] Thanks for taking this up. It's perfectly fine to drop these 
silently with a warning when using the C++ Scheduler Library as we do for other 
failed validations as you already spoke about earlier.

This JIRA was referring to what should be the general behavior of the Mesos 
master in case of clients that might be different then the Scheduler Library.


was (Author: anandmazumdar):
[[~liqiang] Thanks for taking this up. It's perfectly fine to drop these 
silently with a warning when using the C++ Scheduler Library as we do for other 
failed validations as you already spoke about earlier.

This JIRA was referring to what should be the general behavior of the Mesos 
master in case of clients that might be different then the Scheduler Library.

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> 

[jira] [Commented] (MESOS-3599) COMMAND health checks with marathon running in slave context broken

2015-10-16 Thread Erhan Kesken (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960350#comment-14960350
 ] 

Erhan Kesken commented on MESOS-3599:
-

yes it does, thank you.

> COMMAND health checks with marathon running in slave context broken
> ---
>
> Key: MESOS-3599
> URL: https://issues.apache.org/jira/browse/MESOS-3599
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Erhan Kesken
>Assignee: haosdent
>Priority: Critical
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3589) Mesos tasks with many ports cause significant master performance problems

2015-10-16 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960371#comment-14960371
 ] 

Joerg Schad commented on MESOS-3589:


[~jamesmulcahy]: Good to hear the patch helped. We are still having ideas for 
further improvements. Until then is it ok if I close this a duplicate of 
MESOS-3051?

> Mesos tasks with many ports cause significant master performance problems
> -
>
> Key: MESOS-3589
> URL: https://issues.apache.org/jira/browse/MESOS-3589
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: James Mulcahy
> Attachments: mesos-master-perf-top-call-expanded.txt, 
> mesos-master-perf.txt
>
>
> Today, I used a framework to fire off some tasks which each requested a lot 
> of ports.  When doing so, the meson-master performance drops heavily.
> With 70 tasks each requesting 100 ports, the master seems to spend the 
> majority (~90%) of its time dealing with merging and coalescing Value_Range 
> objects.  
> I'll attach some text views of the output of 'perf' record/report from the 
> meson-master during this.  The call-graph trace for the most frequently 
> sampled call is:
> {code}
> -   4.42%  mesos-master  mesos-master [.] 
> google::protobuf::internal::RepeatedPtrFieldBase::size() const
>   
>   
>- google::protobuf::internal::RepeatedPtrFieldBase::size() const   
>   
>   
>   
>   - 37.05% mesos::Value_Ranges::range_size() const
>   
>   
>   
>  - 91.07% mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range 
> const&)   
>   
>  
> - mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges 
> const&)   
>   
> 
>- 68.35% mesos::operator+=(mesos::Value_Ranges&, 
> mesos::Value_Ranges const&)   
>   
> 
>   - 99.46% Option 
> mesos::Resources::get(std::string const&) const  
>   
>  
>  - mesos::internal::model(mesos::Resources const&)
>   
>   
>   
> - 97.58% mesos::internal::model(mesos::internal::Task 
> const&)   
>   
>   
>  
> mesos::internal::master::model(mesos::internal::master::Framework const&) 
>   
>   
>  
>  
> mesos::internal::master::Master::Http::state(process::http::Request const&) 
> const 
>   
>
>  
> mesos::internal::master::Master::initialize()::{lambda(process::http::Request 
> const&)#9}::operator()(process::http::Request const&) const   
>   
>  
>  
> std::_Function_handler (process::http::Request 

[jira] [Updated] (MESOS-3732) Speed up FaultToleranceTest.FrameworkReregister test

2015-10-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3732:
---
Labels: mesosphere newbie  (was: mesosphere)

> Speed up FaultToleranceTest.FrameworkReregister test
> 
>
> Key: MESOS-3732
> URL: https://issues.apache.org/jira/browse/MESOS-3732
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere, newbie
>
> FaultToleranceTest.FrameworkReregister test takes more than one second to 
> complete:
> {code}
> [ RUN  ] FaultToleranceTest.FrameworkReregister
> [   OK ] FaultToleranceTest.FrameworkReregister (1056 ms)
> {code}
> There must be a {{1s}} timeout somewhere which we should mitigate via 
> {{Clock::advance()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2015-10-16 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960576#comment-14960576
 ] 

Bernd Mathiske commented on MESOS-3235:
---

This is not a fix yet, but it should help with debugging when the bug occurs 
again.

commit 7b53bb110f560ae366bad24d6a51b39d1e4ce43b
Author: Bernd Mathiske 
Date:   Fri Oct 16 14:14:19 2015 +0200

Added additional diagnostic output when a fetcher cache test fails.

Dumps all involved task/executor sandbox contents in test tear down
only if a failure occurred.

Review: https://reviews.apache.org/r/37813


> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 

[jira] [Comment Edited] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2015-10-16 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960668#comment-14960668
 ] 

Bernd Mathiske edited comment on MESOS-2858 at 10/16/15 1:21 PM:
-

Puzzling. When accessing the files via a terminal, they exist and can be read. 
But when the fetcher tries to access them programmatically running as root, the 
operation is not permitted.

The "access" that fails merely checks whether the downloaded archive is 
executable (and expects to find it non-executable). We could skip this check 
and the test would succeed. It is not essential to Mesos' operations at all.

However, I'd still like to know what is going on here.



was (Author: bernd-mesos):
Puzzling. When accessing the files via a terminal, they exist and can be read. 
But when the fetcher tries to access them programmatically running as root, the 
operation is not permitted.

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica 

[jira] [Updated] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-10-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3566:
--
Description: 
Since the {{RecordIO}} format is not that widely used, searching for it online 
does not offer much help. 
- It would be good if we can add to the docs, a small section on its 
specification for framework developers. 
- Also, add details on why {{RecordIO}} format is being used and why just using 
vanilla {{ChunkedEncoding}} and encode one event per chunk won't suffice.
- Add info about the rationale behind using the {{RecordIO}} format and not 
just relying on encoding an event per chunk.
- Bonus points, if we can have a simple code snippet in C++/Java on reading a 
{{RecordIO}} response to help developers.

  was:
Since the {{RecordIO}} format is not that widely used, searching for it online 
does not offer much help. 
- It would be good if we can add to the docs, a small section on its 
specification for framework developers. 
- Also, add details on why {{RecordIO}} format is being used and why just using 
vanilla {{ChunkedEncoding}} and encode one event per chunk won't suffice.
- Bonus points, if we can have a simple code snippet in C++/Java on reading a 
{{RecordIO}} response to help developers.


> Add a section to the Scheduler HTTP API docs around RecordIO specification
> --
>
> Key: MESOS-3566
> URL: https://issues.apache.org/jira/browse/MESOS-3566
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Since the {{RecordIO}} format is not that widely used, searching for it 
> online does not offer much help. 
> - It would be good if we can add to the docs, a small section on its 
> specification for framework developers. 
> - Also, add details on why {{RecordIO}} format is being used and why just 
> using vanilla {{ChunkedEncoding}} and encode one event per chunk won't 
> suffice.
> - Add info about the rationale behind using the {{RecordIO}} format and not 
> just relying on encoding an event per chunk.
> - Bonus points, if we can have a simple code snippet in C++/Java on reading a 
> {{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3562) Anomalous bytes in stream from HTTPI Api

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939194#comment-14939194
 ] 

Anand Mazumdar edited comment on MESOS-3562 at 10/16/15 1:50 PM:
-

[~BenWhitehead] There seems to be some confusion here. Comments Inline.

> This isn't really standard chunks though, there are chunks within chunks and 
> the configuration of the client would have to know that.
Can you elaborate a bit more on what do you mean by chunks between chunks here 
? We strictly adhere to the standard chunk encoding format defined in RFC 2616. 
The only difference here is that the {{data}} in chunks itself is encoded in 
{{RecordIO}} format.

> What is the motivation behind using recordio format ?
Intermediaries on the network e.g. proxies are free to change the chunk 
boundaries and this should not have any effect on the recipient application.  
We wanted a way to delimit/encode two events for JSON/Protobuf responses 
consistently and RecordIO format allowed us to do that. 

We could have gone away with RecordIO for JSON responses though by just 
delimiting on {{\n}} but that would have made it inconsistent in behavior when 
compared to Protobuf Responses.

>  If standard encoding were used then every HTTP client would already have the 
> necessary understanding to know how to deal with the chunks.
We use standard chunk encoding as defined in RFC. What do you mean here ?

> Where is the specification for what recordio format is? I have not been able 
> to find anything online.
We should add more information on this in our docs. For now, till we do that, 
here is a brief description on what the format looks like:
{code}
5\n
hello
6\n
world!
{code}

Ideally, whatever client you are using should do the de-chunking for you. You 
should get this back from the client i.e. just the {{RecordIO}} encoded data.
{code}
104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
{code}

cc'ing [~bmahler] If I missed anything.


was (Author: anandmazumdar):
[~BenWhitehead] There seems to be some confusion here. Comments Inline.

> This isn't really standard chunks though, there are chunks within chunks and 
> the configuration of the client would have to know that.
Can you elaborate a bit more on what do you mean by chunks between chunks here 
? We strictly adhere to the standard chunk encoding format defined in RFC 2616. 
The only difference here is that the {{data}} in chunks itself is encoded in 
{{RecordIO}} format.

> What is the motivation behind using recordio format ?
We wanted a way to delimit two events for JSON/Protobuf responses and RecordIO 
format allowed us to do that. We could have gone away with RecordIO for JSON 
though by just delimiting on {{\n}} but that would have made it inconsistent in 
behavior when compared to Protobuf Responses.

>  If standard encoding were used then every HTTP client would already have the 
> necessary understanding to know how to deal with the chunks.
We use standard chunk encoding as defined in RFC. What do you mean here ?

> Where is the specification for what recordio format is? I have not been able 
> to find anything online.
We should add more information on this in our docs. For now, till we do that, 
here is a brief description on what the format looks like:
{code}
5\n
hello
6\n
world!
{code}

Ideally, whatever client you are using should do the de-chunking for you. You 
should get this back from the client i.e. just the {{RecordIO}} encoded data.
{code}
104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
{code}

cc'ing [~bmahler] If I missed anything.

> Anomalous bytes in stream from HTTPI Api
> 
>
> Key: MESOS-3562
> URL: https://issues.apache.org/jira/browse/MESOS-3562
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: Linux 3.16.7-24-desktop #1 SMP PREEMPT Mon Aug 3 
> 14:37:06 UTC 2015 (ec183cc) x86_64 x86_64 GNU/Linux
> Mesos 0.24.0
> gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
>Reporter: Ben Whitehead
>Priority: Blocker
>  Labels: http, mesosphere, wireprotocol
> Attachments: app.log, tcpdump.log
>
>
> When connecting to the new HTTP Api and attempting to {{SUBSCRIBE}} there are 
> some anomalous bytes contained in the chunked stream that appear to be 
> causing problems when I attempting to integrate.
> Attached are two log files. app.log represents my application trying to 
> connect to mesos using RxNetty. Netty has been configured to log all data it 
> sends/receives over the wire this can be seen in the byte blocks in the log. 
> The client is constructing a protobuf in java for the subscribe call  
> {code:java}

[jira] [Commented] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960739#comment-14960739
 ] 

Anand Mazumdar commented on MESOS-3566:
---

Thanks, updated description to reflect this now.

> Add a section to the Scheduler HTTP API docs around RecordIO specification
> --
>
> Key: MESOS-3566
> URL: https://issues.apache.org/jira/browse/MESOS-3566
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Since the {{RecordIO}} format is not that widely used, searching for it 
> online does not offer much help. 
> - It would be good if we can add to the docs, a small section on its 
> specification for framework developers. 
> - Also, add details on why {{RecordIO}} format is being used and why just 
> using vanilla {{ChunkedEncoding}} and encode one event per chunk won't 
> suffice.
> - Add info about the rationale behind using the {{RecordIO}} format and not 
> just relying on encoding an event per chunk.
> - Bonus points, if we can have a simple code snippet in C++/Java on reading a 
> {{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960739#comment-14960739
 ] 

Anand Mazumdar edited comment on MESOS-3566 at 10/16/15 1:54 PM:
-

Thanks, updated description to reflect this now. For now , till we do that, I 
edited my earlier comment on MESOS-3562 to reflect the reasoning.

https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194


was (Author: anandmazumdar):
Thanks, updated description to reflect this now.

> Add a section to the Scheduler HTTP API docs around RecordIO specification
> --
>
> Key: MESOS-3566
> URL: https://issues.apache.org/jira/browse/MESOS-3566
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Since the {{RecordIO}} format is not that widely used, searching for it 
> online does not offer much help. 
> - It would be good if we can add to the docs, a small section on its 
> specification for framework developers. 
> - Also, add details on why {{RecordIO}} format is being used and why just 
> using vanilla {{ChunkedEncoding}} and encode one event per chunk won't 
> suffice.
> - Add info about the rationale behind using the {{RecordIO}} format and not 
> just relying on encoding an event per chunk.
> - Bonus points, if we can have a simple code snippet in C++/Java on reading a 
> {{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2015-10-16 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960668#comment-14960668
 ] 

Bernd Mathiske commented on MESOS-2858:
---

Puzzling. When accessing the files via a terminal, they exist and can be read. 
But when the fetcher tries to access them programmatically running as root, the 
operation is not permitted.

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
> VOTING
> I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
> group
> I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
> master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
> I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
> I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
> I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar
> I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated
> 

[jira] [Commented] (MESOS-2918) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky

2015-10-16 Thread Chi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960689#comment-14960689
 ] 

Chi Zhang commented on MESOS-2918:
--

[~jieyu] do you mind talking a look at these?

> CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky
> --
>
> Key: MESOS-2918
> URL: https://issues.apache.org/jira/browse/MESOS-2918
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.23.0
>Reporter: Paul Brett
>Assignee: Chi Zhang
>  Labels: test, twitter
>
> This test fails when swap is enabled on the platform because it creates a 
> memory hog with the expectation that the OOM killer will kill the hog but 
> with swap enabled, the hog is just swapped out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3706) Tasks stuck in staging.

2015-10-16 Thread Jord Sonneveld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960890#comment-14960890
 ] 

Jord Sonneveld commented on MESOS-3706:
---

I am also seeing this in my syslog:

Oct 15 22:39:44 dalstgmesos03 mesos-slave[35681]: I1015 22:39:44.047500 35711 
docker.cpp:1000] Ignoring updating container 
'926f9e35-f0d4-4041-a146-d9899462b9ed' with resources passed to update is 
identical to existing resources

Which seems to come from this line of code: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1010

> Tasks stuck in staging.
> ---
>
> Key: MESOS-3706
> URL: https://issues.apache.org/jira/browse/MESOS-3706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Affects Versions: 0.23.0, 0.24.1
>Reporter: Jord Sonneveld
> Attachments: Screen Shot 2015-10-12 at 9.08.30 AM.png, Screen Shot 
> 2015-10-12 at 9.24.32 AM.png, mesos-slave.INFO, mesos-slave.INFO.2, 
> mesos-slave.INFO.3
>
>
> I have a docker image which starts fine on all my slaves except for one.  On 
> that one, it is stuck in STAGING for a long time and never starts.  The INFO 
> log is full of messages like this:
> I1012 16:02:09.210306 34905 slave.cpp:1768] Asked to kill task 
> kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72 of framework 
> 20150109-172016-504433162-5050-19367-0002
> E1012 16:02:09.211272 34907 socket.hpp:174] Shutdown failed on fd=12: 
> Transport endpoint is not connected [107]
> kwe-vinland-work is the task that is stuck in staging.  It is launched by 
> marathon.  I have launched 161 instances successfully on my cluster.  But it 
> refuses to launch on this specific slave.
> These machines are all managed via ansible so their configurations are / 
> should be identical.  I have re-run my ansible scripts and rebooted the 
> machines to no avail.
> It's been in this state for almost 30 minutes.  You can see the mesos docker 
> executor is still running:
> jord@dalstgmesos03:~$ date
> Mon Oct 12 16:13:55 UTC 2015
> jord@dalstgmesos03:~$ ps auwx | grep kwe-vinland
> root 35360  0.0  0.0 1070576 21476 ?   Ssl  15:46   0:00 
> mesos-docker-executor 
> --container=mesos-20151012-082619-4145023498-5050-22623-S0.0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --docker=docker --help=false --mapped_directory=/mnt/mesos/sandbox 
> --sandbox_directory=/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --stop_timeout=0ns
> According to docker ps -a, nothing was ever even launched:
> jord@dalstgmesos03:/data/mesos$ sudo docker ps -a
> CONTAINER IDIMAGE  
> COMMAND  CREATED STATUS  PORTS
> NAMES
> 5c858b90b0a0registry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   39 minutes ago  Up 39 minutes   
> 0.0.0.0:9125->8125/udp, 0.0.0.0:9126->8126/tcp   statsd-fe-influxdb
> d765ba3829fdregistry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   41 minutes ago  Up 41 minutes   
> 0.0.0.0:8125->8125/udp, 0.0.0.0:8126->8126/tcp   statsd-repeater
> Those are the only two entries. Nothing about the kwe-vinland job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-10-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3566:
--
Description: 
Since the {{RecordIO}} format is not that widely used, searching for it online 
does not offer much help. 
- It would be good if we can add to the docs, a small section on its 
specification for framework developers. 
- Also, add details on why {{RecordIO}} format is being used and why just using 
vanilla {{ChunkedEncoding}} and encode one event per chunk won't suffice from 
[here|https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194]
- Add info about the rationale behind using the {{RecordIO}} format and not 
just relying on encoding an event per chunk.
- Bonus points, if we can have a simple code snippet in C++/Java on reading a 
{{RecordIO}} response to help developers.

  was:
Since the {{RecordIO}} format is not that widely used, searching for it online 
does not offer much help. 
- It would be good if we can add to the docs, a small section on its 
specification for framework developers. 
- Also, add details on why {{RecordIO}} format is being used and why just using 
vanilla {{ChunkedEncoding}} and encode one event per chunk won't suffice.
- Add info about the rationale behind using the {{RecordIO}} format and not 
just relying on encoding an event per chunk.
- Bonus points, if we can have a simple code snippet in C++/Java on reading a 
{{RecordIO}} response to help developers.


> Add a section to the Scheduler HTTP API docs around RecordIO specification
> --
>
> Key: MESOS-3566
> URL: https://issues.apache.org/jira/browse/MESOS-3566
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Since the {{RecordIO}} format is not that widely used, searching for it 
> online does not offer much help. 
> - It would be good if we can add to the docs, a small section on its 
> specification for framework developers. 
> - Also, add details on why {{RecordIO}} format is being used and why just 
> using vanilla {{ChunkedEncoding}} and encode one event per chunk won't 
> suffice from 
> [here|https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194]
> - Add info about the rationale behind using the {{RecordIO}} format and not 
> just relying on encoding an event per chunk.
> - Bonus points, if we can have a simple code snippet in C++/Java on reading a 
> {{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960739#comment-14960739
 ] 

Anand Mazumdar edited comment on MESOS-3566 at 10/16/15 2:11 PM:
-

Thanks, updated description to reflect this now. For now , till we do that, I 
edited my earlier comment on MESOS-3562 to reflect the 
[reasoning|https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194].




was (Author: anandmazumdar):
Thanks, updated description to reflect this now. For now , till we do that, I 
edited my earlier comment on MESOS-3562 to reflect the reasoning.

https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194

> Add a section to the Scheduler HTTP API docs around RecordIO specification
> --
>
> Key: MESOS-3566
> URL: https://issues.apache.org/jira/browse/MESOS-3566
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Since the {{RecordIO}} format is not that widely used, searching for it 
> online does not offer much help. 
> - It would be good if we can add to the docs, a small section on its 
> specification for framework developers. 
> - Also, add details on why {{RecordIO}} format is being used and why just 
> using vanilla {{ChunkedEncoding}} and encode one event per chunk won't 
> suffice from 
> [here|https://issues.apache.org/jira/browse/MESOS-3562?focusedCommentId=14939194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14939194]
> - Add info about the rationale behind using the {{RecordIO}} format and not 
> just relying on encoding an event per chunk.
> - Bonus points, if we can have a simple code snippet in C++/Java on reading a 
> {{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-16 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960954#comment-14960954
 ] 

Yong Tang commented on MESOS-3738:
--

Thanks. Was going to spend more time to investigate this issue but is glad a 
fix is already there.

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not 

[jira] [Updated] (MESOS-3740) LIBPROCESS_IP not passed to Docker containers

2015-10-16 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-3740:
--
Shepherd: Niklas Quarfot Nielsen  (was: Michael Park)

> LIBPROCESS_IP not passed to Docker containers
> -
>
> Key: MESOS-3740
> URL: https://issues.apache.org/jira/browse/MESOS-3740
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Mesos 0.24.1
>Reporter: Cody Maloney
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> Docker containers aren't currently passed all the same environment variables 
> that Mesos Containerizer tasks are. See: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L254
>  for all the environment variables explicitly set for mesos containers.
> While some of them don't necessarily make sense for docker containers, when 
> the docker has inside of it a libprocess process (A mesos framework 
> scheduler) and is using {{--net=host}} the task needs to have LIBPROCESS_IP 
> set otherwise the same sort of problems that happen because of MESOS-3553 can 
> happen (libprocess will try to guess the machine's IP address with likely bad 
> results in a number of operating environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3740) LIBPROCESS_IP not passed to Docker containers

2015-10-16 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-3740:
--
Assignee: Michael Park  (was: Niklas Quarfot Nielsen)

> LIBPROCESS_IP not passed to Docker containers
> -
>
> Key: MESOS-3740
> URL: https://issues.apache.org/jira/browse/MESOS-3740
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Mesos 0.24.1
>Reporter: Cody Maloney
>Assignee: Michael Park
>  Labels: mesosphere
>
> Docker containers aren't currently passed all the same environment variables 
> that Mesos Containerizer tasks are. See: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L254
>  for all the environment variables explicitly set for mesos containers.
> While some of them don't necessarily make sense for docker containers, when 
> the docker has inside of it a libprocess process (A mesos framework 
> scheduler) and is using {{--net=host}} the task needs to have LIBPROCESS_IP 
> set otherwise the same sort of problems that happen because of MESOS-3553 can 
> happen (libprocess will try to guess the machine's IP address with likely bad 
> results in a number of operating environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3752) CentOS 6 dependency install fails at Maven

2015-10-16 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-3752:
--
Target Version/s: 0.26.0  (was: 0.25.0)

> CentOS 6 dependency install fails at Maven
> --
>
> Key: MESOS-3752
> URL: https://issues.apache.org/jira/browse/MESOS-3752
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, installation, mesosphere
>
> It seems the Apache Maven dependencies have changed such that following the 
> Getting Started docs for CentOS 6.6 will fail at Maven installation:
> {code}
> ---> Package apache-maven.noarch 0:3.3.3-2.el6 will be installed
> --> Processing Dependency: java-devel >= 1:1.7.0 for package: 
> apache-maven-3.3.3-2.el6.noarch
> --> Finished Dependency Resolution
> Error: Package: apache-maven-3.3.3-2.el6.noarch (epel-apache-maven)
>Requires: java-devel >= 1:1.7.0
>Available: java-1.5.0-gcj-devel-1.5.0.0-29.1.el6.x86_64 (base)
>java-devel = 1.5.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.35-1.13.7.1.el6_6.x86_64 (base)
>java-devel = 1:1.6.0
>Available: 
> 1:java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64 (updates)
>java-devel = 1:1.6.0
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3736) Support docker local store pull same image simultaneously

2015-10-16 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-3736:
--
Target Version/s: 0.26.0  (was: 0.25.0)

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3735) Mesos master should expose the version of registered agents/schedulers

2015-10-16 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961552#comment-14961552
 ] 

Klaus Ma commented on MESOS-3735:
-

Including API version into this endpoint seems OK :).

> Mesos master should expose the version of registered agents/schedulers 
> ---
>
> Key: MESOS-3735
> URL: https://issues.apache.org/jira/browse/MESOS-3735
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>
> Currently Mesos master doesn't expose the (release and API) version  
> information of clients (agents and schedulers) registered with it. It would 
> be useful to have this information in the WebUI and in an HTTP endpoint. The 
> latter would be especially useful during deploys/upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3085) Make failed on Ubuntu 14.04 ppc64le

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961582#comment-14961582
 ] 

Marco Massenzio commented on MESOS-3085:


As mentioned several times on the dev list, it's always good practice to find a 
willing shepherd so that patches can be committed (and doing so before starting 
the work is even better, so you can make sure your approach is sound and will 
make committing the patch much easier).

Having said that, this is a one-line change, a single {{#include}} so should be 
easy to commit.

[~nnielsen] - could you please help getting this committed?

[~jihun] - could you also please make sure the task status is correct (It was 
still in the "Open" state, it should be "Reviewable").

> Make failed on Ubuntu 14.04 ppc64le
> ---
>
> Key: MESOS-3085
> URL: https://issues.apache.org/jira/browse/MESOS-3085
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.23.0, 0.24.0
> Environment: Ubuntu 14.04 ppc64le
>Reporter: Jihun Kang
>Assignee: Jihun Kang
>
> When trying to compile linux/fs.cpp, make failed with a following message.
> {noformat}
> /bin/bash ../libtool  --tag=CXX   --mode=compile g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.24.0\" 
> -DPACKAGE_STRING=\"mesos\ 0.24.0\" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.24.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" 
> -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 
> -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
> -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
> -DMESOS_HAS_PYTHON=1 -I. -I../../src   -Wall -Werror 
> -DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  
> -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT 
> linux/libmesos_no_3rdparty_la-fs.lo -MD -MP -MF 
> linux/.deps/libmesos_no_3rdparty_la-fs.Tpo -c -o 
> linux/libmesos_no_3rdparty_la-fs.lo `test -f 'linux/fs.cpp' || echo 
> '../../src/'`linux/fs.cpp
> libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
> -DPACKAGE_VERSION=\"0.24.0\" "-DPACKAGE_STRING=\"mesos 0.24.0\"" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
> -DVERSION=\"0.24.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
> -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
> -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
> -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 
> -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -I. -I../../src 
> -Wall -Werror -DLIBDIR=\"/usr/local/lib\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 
> -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT 
> linux/libmesos_no_3rdparty_la-fs.lo -MD -MP -MF 
> linux/.deps/libmesos_no_3rdparty_la-fs.Tpo -c ../../src/linux/fs.cpp  -fPIC 
> 

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961590#comment-14961590
 ] 

Anand Mazumdar commented on MESOS-3747:
---

+1 Marco, see my initial comment around returning an error + clarifying this in 
documentation.

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for 
> executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping 
> launch
> I1015 13:15:34.263478 19638 containerizer.cpp:640] Starting container 
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 
> 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-'
> E1015 13:15:34.264516 19641 slave.cpp:3342] Container 
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 

[jira] [Commented] (MESOS-3708) improve process::subprocess ABORT message

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961592#comment-14961592
 ] 

Marco Massenzio commented on MESOS-3708:


Can you please see whether the use of {{Subprocess::execute()}} as proposed in 
[r/37336|https://reviews.apache.org/r/37336] would address this?

(the error message in the 'Failed' clause prints the invoked command; and in 
case of error, we give access to {{stderr}}).

> improve process::subprocess ABORT message
> -
>
> Key: MESOS-3708
> URL: https://issues.apache.org/jira/browse/MESOS-3708
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>Assignee: James Peach
>Priority: Trivial
>
> When the {{exec}} fails in {{process::subprocess}}, you get a {{execvpe 
> failed ...}} abort message with no indication of what failed to exec. Let's 
> print the path in the message to give a hint to the operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-16 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3738:
---
Description: 
When Mesos slave is within the container, the COMMAND health check from 
Marathon is invoked incorrectly.

In such a scenario, the sandbox directory (instead of the launcher/health-check 
directory) is used. This result in an error with the container.

Command to invoke the Mesos slave container:
{noformat}
sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
/usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
 -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
--executor_registration_timeout=5mins --docker_stop_timeout=10secs 
--launcher=posix
{noformat}
Marathon JSON file:
{code}
{
  "id": "ubuntu",
  "container":
  {
"type": "DOCKER",
"docker":
{
  "image": "ubuntu",
  "network": "BRIDGE",
  "parameters": []
}
  },
  "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
  "uris": [],
  "healthChecks":
  [
{
  "protocol": "COMMAND",
  "command": { "value": "echo Success" },
  "gracePeriodSeconds": 3000,
  "intervalSeconds": 5,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 300
}
  ],
  "instances": 1
}
{code}
{noformat}
STDOUT:

root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
--container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
 --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
--mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
--sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
 --stop_timeout="10secs"
--container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
 --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
--mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
--sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
 --stop_timeout="10secs"
Registered docker executor on b01e2e75afcb
Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
1
Launching health check process: 
/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
 --executor=(1)@10.2.1.7:40695 
--health_check_json={"command":{"shell":true,"value":"docker exec 
mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
 sh -c \" echo Success 
\""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
 --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
Health check process launched at pid: 94
1
1
1
1
1

STDERR:

root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
e20f8959-cd9f-40ae-987d-809401309361-S0
WARNING: Your kernel does not support swap limit capabilities, memory limited 
without swap.
ABORT: 
(/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): 
Failed to os::execvpe in childMain: No such file or directory*** Aborted at 
1444864558 (unix time) try "date -d @1444864558" if you are using GNU date ***
PC: @ 0x7fc8c5975107 (unknown)
*** SIGABRT (@0x5e) received by PID 94 (TID 0x7fc8bee5e700) from PID 94; stack 
trace: ***
@ 0x7fc8c5cf88d0 (unknown)
@ 0x7fc8c5975107 (unknown)
@ 0x7fc8c59764e8 (unknown)
@   0x419142 _Abort()
@   0x41917c _Abort()
@ 0x7fc8c7745780 process::childMain()
@ 0x7fc8c7747a49 std::_Function_handler<>::_M_invoke()
@ 0x7fc8c774561c process::defaultClone()
@ 0x7fc8c7745f81 process::subprocess()
@   0x43c58d 
mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
@ 0x7fc8c771b424 process::ProcessManager::resume()
@ 0x7fc8c771b74f process::internal::schedule()
@ 0x7fc8c64d3970 (unknown)
@ 0x7fc8c5cf10a4 start_thread
@ 0x7fc8c5a2604d (unknown)
{noformat}

  was:
When Mesos slave is within the container, the COMMAND health check from 
Marathon is invoked incorrectly.

In such a scenario, 

[jira] [Commented] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961539#comment-14961539
 ] 

Bhuvan Arumugam commented on MESOS-3724:


i'll verify against newer mesos ...

> subprocess fail to process "docker inspect" output if >64KB
> ---
>
> Key: MESOS-3724
> URL: https://issues.apache.org/jira/browse/MESOS-3724
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.21.1
>Reporter: Bhuvan Arumugam
>
> When running a task with docker and if {{docker inspect}} output size is more 
> than 64k, it fails. The command {{docker inspect}} is blocked. The task 
> remain in ASSIGNED state and after 15mins, the task is KILLED. The subprocess 
> library [1] used in mesos to run this command is not handling the output 
> beyond this size.
> {code}
> docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
> inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
> inspect.out && rm -f inspect.out
>  76K  inspect.out
> {code}
> You can reproduce it using the above image with any framework. I tested it 
> with aurora.
> Here is a sample failure: http://pastebin.com/w1Ty41rb
> [1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam reassigned MESOS-3724:
--

Assignee: Bhuvan Arumugam

> subprocess fail to process "docker inspect" output if >64KB
> ---
>
> Key: MESOS-3724
> URL: https://issues.apache.org/jira/browse/MESOS-3724
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.21.1
>Reporter: Bhuvan Arumugam
>Assignee: Bhuvan Arumugam
>
> When running a task with docker and if {{docker inspect}} output size is more 
> than 64k, it fails. The command {{docker inspect}} is blocked. The task 
> remain in ASSIGNED state and after 15mins, the task is KILLED. The subprocess 
> library [1] used in mesos to run this command is not handling the output 
> beyond this size.
> {code}
> docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
> inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
> inspect.out && rm -f inspect.out
>  76K  inspect.out
> {code}
> You can reproduce it using the above image with any framework. I tested it 
> with aurora.
> Here is a sample failure: http://pastebin.com/w1Ty41rb
> [1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3569) Typos in Mesos Monitoring doc page

2015-10-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam reassigned MESOS-3569:
--

Assignee: Bhuvan Arumugam

> Typos in Mesos Monitoring doc page
> --
>
> Key: MESOS-3569
> URL: https://issues.apache.org/jira/browse/MESOS-3569
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Hari Sekhon
>Assignee: Bhuvan Arumugam
>Priority: Trivial
>
> Minor typos in Mesos Monitoring docs page "udpates" instead of "updates" in 
> several of the monitoring metric keys on this page:
> http://mesos.apache.org/documentation/latest/monitoring/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3085) Make failed on Ubuntu 14.04 ppc64le

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961582#comment-14961582
 ] 

Marco Massenzio edited comment on MESOS-3085 at 10/17/15 12:34 AM:
---

As mentioned several times on the dev list, it's always good practice to find a 
willing shepherd so that patches can be committed (and doing so before starting 
the work is even better, so you can make sure your approach is sound and will 
make committing the patch much easier).

Having said that, this is a one-line change, a single {{#include}} so should be 
easy to commit.

[~nnielsen] - could you please help getting this committed?

[~jihun] - for future reference, could you also please make sure when you work 
on a task that the Jira status is correct (It was still in the "Open" state, it 
should be "Reviewable").
I've already updated this one.

Thanks.


was (Author: marco-mesos):
As mentioned several times on the dev list, it's always good practice to find a 
willing shepherd so that patches can be committed (and doing so before starting 
the work is even better, so you can make sure your approach is sound and will 
make committing the patch much easier).

Having said that, this is a one-line change, a single {{#include}} so should be 
easy to commit.

[~nnielsen] - could you please help getting this committed?

[~jihun] - could you also please make sure the task status is correct (It was 
still in the "Open" state, it should be "Reviewable").

> Make failed on Ubuntu 14.04 ppc64le
> ---
>
> Key: MESOS-3085
> URL: https://issues.apache.org/jira/browse/MESOS-3085
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.23.0, 0.24.0
> Environment: Ubuntu 14.04 ppc64le
>Reporter: Jihun Kang
>Assignee: Jihun Kang
>
> When trying to compile linux/fs.cpp, make failed with a following message.
> {noformat}
> /bin/bash ../libtool  --tag=CXX   --mode=compile g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.24.0\" 
> -DPACKAGE_STRING=\"mesos\ 0.24.0\" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.24.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" 
> -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 
> -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
> -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
> -DMESOS_HAS_PYTHON=1 -I. -I../../src   -Wall -Werror 
> -DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  
> -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT 
> linux/libmesos_no_3rdparty_la-fs.lo -MD -MP -MF 
> linux/.deps/libmesos_no_3rdparty_la-fs.Tpo -c -o 
> linux/libmesos_no_3rdparty_la-fs.lo `test -f 'linux/fs.cpp' || echo 
> '../../src/'`linux/fs.cpp
> libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
> -DPACKAGE_VERSION=\"0.24.0\" "-DPACKAGE_STRING=\"mesos 0.24.0\"" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
> -DVERSION=\"0.24.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
> -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
> -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
> -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 
> -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -I. -I../../src 
> -Wall -Werror -DLIBDIR=\"/usr/local/lib\" 
> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include 

[jira] [Assigned] (MESOS-2935) Documentation Needs Clarification about Compressed artifacts

2015-10-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam reassigned MESOS-2935:
--

Assignee: Bhuvan Arumugam

> Documentation Needs Clarification about Compressed artifacts
> 
>
> Key: MESOS-2935
> URL: https://issues.apache.org/jira/browse/MESOS-2935
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher
>Reporter: Sargun Dhillon
>Assignee: Bhuvan Arumugam
>Priority: Trivial
>  Labels: newbie
>
> Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR 
> -xf" 
> In addition, only the following file suffixes / extensions result in 
> decompression:
> -tgz
> -tar.gz
> -tbz2
> -tar.bz2
> -tar.xz
> -txz
> -zip
> OR 
> Alternatively, change fetcher to accept .tar as a valid suffix to trigger the 
> tarball code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3725) shared library loading depends on environment variable updates

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961599#comment-14961599
 ] 

Marco Massenzio commented on MESOS-3725:


[~jpe...@apache.org] not sure what to do with this one (in view of your latest 
comment).

Do you want me to leave it "Open" or should it be "In Progress"? (or even, 
"Reviewable"?)

Could you please update its status accordingly?

Thanks!

> shared library loading depends on environment variable updates
> --
>
> Key: MESOS-3725
> URL: https://issues.apache.org/jira/browse/MESOS-3725
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, stout
>Reporter: James Peach
>Assignee: James Peach
>
> {{ModuleTest::SetUpTestCase()}} and the various {{libraries::paths()}} is 
> stout assume that updating {{LD_LIBRARY_PATH}} or {{DYLD_LIBRARY_PATH}} is 
> sufficient to alter the search path used by dlopen(3). It is not; those 
> environment variables are only bound at program load.
> My preference is to fix this by requiring the clients of {{DynamicLibrary}} 
> to always pass in an absolute path and to remove all mention of these 
> environment variables.
> FWIW, the tests in {{ModuleTest::SetUpTestCase()}} only work because the 
> libtool wrapper script sets up the library path to the expected value prior 
> to running the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2503) Check TaskStatus::Reason in tests

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961577#comment-14961577
 ] 

Marco Massenzio commented on MESOS-2503:


Thanks for suggesting this.
As a first step, could you (even at a broad brush) create the necessary tasks 
and link to this epic?

> Check TaskStatus::Reason in tests
> -
>
> Key: MESOS-2503
> URL: https://issues.apache.org/jira/browse/MESOS-2503
> Project: Mesos
>  Issue Type: Epic
>  Components: test
>Reporter: Alexander Rukletsov
>Assignee: adyat...@mesosphere.io
>  Labels: mesosphere, tech-debt, tests
>
> We have a number of reasons for task status updates, but we do not test all 
> of them. On the other side, some failures that we check in our tests do not 
> have reason field set. It would be nice to have a check for state, reason and 
> source for every failed or lost task. E.g.
> {code}
>   AWAIT_READY(status4);
>   EXPECT_EQ(TASK_LOST, status4.get().state());
>   EXPECT_EQ(TaskStatus::SOURCE_SLAVE, status4.get().source());
>   EXPECT_EQ(TaskStatus::REASON_EXECUTOR_TERMINATED, status4.get().reason());
> {code}
> To solve this problem several changes in different parts of the codebase 
> should be made. Hence we make an epic out of it and do it step-by-step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3468) Improve apply_reviews.sh script to apply chain of reviews

2015-10-16 Thread Artem Harutyunyan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961506#comment-14961506
 ] 

Artem Harutyunyan commented on MESOS-3468:
--

Posted another review (adding github support)

https://reviews.apache.org/r/38705/
https://reviews.apache.org/r/38883/
https://reviews.apache.org/r/39410/

> Improve apply_reviews.sh script to apply chain of reviews
> -
>
> Key: MESOS-3468
> URL: https://issues.apache.org/jira/browse/MESOS-3468
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>
> Currently the support/apply-review.sh script allows an user (typically 
> committer) to apply a single review on top the HEAD. Since Mesos contributors 
> typically submit a chain of reviews for a given issue it makes sense for the 
> script to apply the whole chain recursively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-10-16 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3753:


 Summary: Test the HTTP Scheduler library with SSL enabled
 Key: MESOS-3753
 URL: https://issues.apache.org/jira/browse/MESOS-3753
 Project: Mesos
  Issue Type: Story
  Components: framework, HTTP API, test
Reporter: Joseph Wu
Assignee: Joseph Wu


Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  

We need to add tests that check the schedule library against SSL-enabled Mesos 
with SSL:
* with downgrade support,
* with/without verification of certificates (framework-side),
* with required framework/client-side certifications,
* with/without verification of certificates (master-side),
* with a custom certificate authority (CA),

These options should be controlled by the same environment variables found on 
the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].

Note: This issue will be broken down into smaller sub-issues as bugs/problems 
are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3569) Typos in Mesos Monitoring doc page

2015-10-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961551#comment-14961551
 ] 

Bhuvan Arumugam commented on MESOS-3569:


Several master metrics like {{allocator/event_queue_dispatches}} and 
{{master/invalid_executor_to_framework_messages}} are not documented. I'll fix 
it when fixing the typo.

Same is true with slave metrics.

> Typos in Mesos Monitoring doc page
> --
>
> Key: MESOS-3569
> URL: https://issues.apache.org/jira/browse/MESOS-3569
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Hari Sekhon
>Assignee: Bhuvan Arumugam
>Priority: Trivial
>
> Minor typos in Mesos Monitoring docs page "udpates" instead of "updates" in 
> several of the monitoring metric keys on this page:
> http://mesos.apache.org/documentation/latest/monitoring/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2331) MasterSlaveReconciliationTest.ReconcileRace is flaky

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961575#comment-14961575
 ] 

Marco Massenzio commented on MESOS-2331:


Can you please report also what OS, distro, etc. you were using for this test?
Also, why is it considered "flaky"? did it pass some times and failed others?

Not contesting that this may be flaky, but just trying to understand why it 
wasn't noticed before and how to repro steps.

Thanks!

> MasterSlaveReconciliationTest.ReconcileRace is flaky
> 
>
> Key: MESOS-2331
> URL: https://issues.apache.org/jira/browse/MESOS-2331
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Qian Zhang
>  Labels: flaky
>
> {noformat:title=}
> [ RUN  ] MasterSlaveReconciliationTest.ReconcileRace
> Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_ReconcileRace_NE9nhV'
> I0206 19:09:44.196542 32362 leveldb.cpp:175] Opened db in 38.230192ms
> I0206 19:09:44.206826 32362 leveldb.cpp:182] Compacted db in 9.988493ms
> I0206 19:09:44.207164 32362 leveldb.cpp:197] Created db iterator in 29979ns
> I0206 19:09:44.207641 32362 leveldb.cpp:203] Seeked to beginning of db in 
> 4478ns
> I0206 19:09:44.207929 32362 leveldb.cpp:272] Iterated through 0 keys in the 
> db in 737ns
> I0206 19:09:44.208222 32362 replica.cpp:743] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 19:09:44.209132 32384 recover.cpp:448] Starting replica recovery
> I0206 19:09:44.209524 32384 recover.cpp:474] Replica is in EMPTY status
> I0206 19:09:44.211094 32384 replica.cpp:640] Replica in EMPTY status received 
> a broadcasted recover request
> I0206 19:09:44.211385 32384 recover.cpp:194] Received a recover response from 
> a replica in EMPTY status
> I0206 19:09:44.211902 32384 recover.cpp:565] Updating replica status to 
> STARTING
> I0206 19:09:44.236177 32381 master.cpp:344] Master 
> 20150206-190944-16842879-36452-32362 (lucid) started on 127.0.1.1:36452
> I0206 19:09:44.236291 32381 master.cpp:390] Master only allowing 
> authenticated frameworks to register
> I0206 19:09:44.236305 32381 master.cpp:395] Master only allowing 
> authenticated slaves to register
> I0206 19:09:44.236327 32381 credentials.hpp:35] Loading credentials for 
> authentication from 
> '/tmp/MasterSlaveReconciliationTest_ReconcileRace_NE9nhV/credentials'
> I0206 19:09:44.236601 32381 master.cpp:439] Authorization enabled
> I0206 19:09:44.238539 32381 hierarchical_allocator_process.hpp:284] 
> Initialized hierarchical allocator process
> I0206 19:09:44.238662 32381 whitelist_watcher.cpp:64] No whitelist given
> I0206 19:09:44.239364 32381 master.cpp:1350] The newly elected leader is 
> master@127.0.1.1:36452 with id 20150206-190944-16842879-36452-32362
> I0206 19:09:44.239392 32381 master.cpp:1363] Elected as the leading master!
> I0206 19:09:44.239413 32381 master.cpp:1181] Recovering from registrar
> I0206 19:09:44.239645 32381 registrar.cpp:312] Recovering registrar
> I0206 19:09:44.241142 32384 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 29.029117ms
> I0206 19:09:44.241189 32384 replica.cpp:322] Persisted replica status to 
> STARTING
> I0206 19:09:44.241478 32384 recover.cpp:474] Replica is in STARTING status
> I0206 19:09:44.243075 32384 replica.cpp:640] Replica in STARTING status 
> received a broadcasted recover request
> I0206 19:09:44.243398 32384 recover.cpp:194] Received a recover response from 
> a replica in STARTING status
> I0206 19:09:44.243964 32384 recover.cpp:565] Updating replica status to VOTING
> I0206 19:09:44.255692 32384 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 11.502759ms
> I0206 19:09:44.255765 32384 replica.cpp:322] Persisted replica status to 
> VOTING
> I0206 19:09:44.256009 32384 recover.cpp:579] Successfully joined the Paxos 
> group
> I0206 19:09:44.256253 32384 recover.cpp:463] Recover process terminated
> I0206 19:09:44.257669 32384 log.cpp:659] Attempting to start the writer
> I0206 19:09:44.259944 32377 replica.cpp:476] Replica received implicit 
> promise request with proposal 1
> I0206 19:09:44.268805 32377 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 8.45858ms
> I0206 19:09:44.269067 32377 replica.cpp:344] Persisted promised to 1
> I0206 19:09:44.277974 32383 coordinator.cpp:229] Coordinator attemping to 
> fill missing position
> I0206 19:09:44.279767 32383 replica.cpp:377] Replica received explicit 
> promise request for position 0 with proposal 2
> I0206 19:09:44.288940 32383 leveldb.cpp:342] Persisting action (8 bytes) to 
> leveldb took 9.128603ms
> I0206 19:09:44.289294 32383 replica.cpp:678] Persisted action at 0
> I0206 19:09:44.296417 32377 replica.cpp:510] Replica received 

[jira] [Commented] (MESOS-3735) Mesos master should expose the version of registered agents/schedulers

2015-10-16 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961605#comment-14961605
 ] 

Marco Massenzio commented on MESOS-3735:


IMO - this should be on a separate endpoint.
It is likely to be a potentially *very* verbose output (whne you may have 
hundreds of frameworks and thousand of agents registered with a Master).

We should also consider having something of a {{filter}} URI parameter to limit 
the returned values.
(eg: {{/api/v1/agents/version?filter=zone:US1}}) where the filter may be 
applied to agents' attributes, for example.

> Mesos master should expose the version of registered agents/schedulers 
> ---
>
> Key: MESOS-3735
> URL: https://issues.apache.org/jira/browse/MESOS-3735
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>
> Currently Mesos master doesn't expose the (release and API) version  
> information of clients (agents and schedulers) registered with it. It would 
> be useful to have this information in the WebUI and in an HTTP endpoint. The 
> latter would be especially useful during deploys/upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3706) Tasks stuck in staging.

2015-10-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961751#comment-14961751
 ] 

haosdent commented on MESOS-3706:
-

Could you find stdout or stderr in sandbox directory 
/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
 ?

> Tasks stuck in staging.
> ---
>
> Key: MESOS-3706
> URL: https://issues.apache.org/jira/browse/MESOS-3706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Affects Versions: 0.23.0, 0.24.1
>Reporter: Jord Sonneveld
> Attachments: Screen Shot 2015-10-12 at 9.08.30 AM.png, Screen Shot 
> 2015-10-12 at 9.24.32 AM.png, mesos-slave.INFO, mesos-slave.INFO.2, 
> mesos-slave.INFO.3
>
>
> I have a docker image which starts fine on all my slaves except for one.  On 
> that one, it is stuck in STAGING for a long time and never starts.  The INFO 
> log is full of messages like this:
> I1012 16:02:09.210306 34905 slave.cpp:1768] Asked to kill task 
> kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72 of framework 
> 20150109-172016-504433162-5050-19367-0002
> E1012 16:02:09.211272 34907 socket.hpp:174] Shutdown failed on fd=12: 
> Transport endpoint is not connected [107]
> kwe-vinland-work is the task that is stuck in staging.  It is launched by 
> marathon.  I have launched 161 instances successfully on my cluster.  But it 
> refuses to launch on this specific slave.
> These machines are all managed via ansible so their configurations are / 
> should be identical.  I have re-run my ansible scripts and rebooted the 
> machines to no avail.
> It's been in this state for almost 30 minutes.  You can see the mesos docker 
> executor is still running:
> jord@dalstgmesos03:~$ date
> Mon Oct 12 16:13:55 UTC 2015
> jord@dalstgmesos03:~$ ps auwx | grep kwe-vinland
> root 35360  0.0  0.0 1070576 21476 ?   Ssl  15:46   0:00 
> mesos-docker-executor 
> --container=mesos-20151012-082619-4145023498-5050-22623-S0.0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --docker=docker --help=false --mapped_directory=/mnt/mesos/sandbox 
> --sandbox_directory=/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --stop_timeout=0ns
> According to docker ps -a, nothing was ever even launched:
> jord@dalstgmesos03:/data/mesos$ sudo docker ps -a
> CONTAINER IDIMAGE  
> COMMAND  CREATED STATUS  PORTS
> NAMES
> 5c858b90b0a0registry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   39 minutes ago  Up 39 minutes   
> 0.0.0.0:9125->8125/udp, 0.0.0.0:9126->8126/tcp   statsd-fe-influxdb
> d765ba3829fdregistry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   41 minutes ago  Up 41 minutes   
> 0.0.0.0:8125->8125/udp, 0.0.0.0:8126->8126/tcp   statsd-repeater
> Those are the only two entries. Nothing about the kwe-vinland job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl

2015-10-16 Thread Greg Mann (JIRA)
Greg Mann created MESOS-3749:


 Summary: Configuration docs are missing --enable-libevent and 
--enable-ssl
 Key: MESOS-3749
 URL: https://issues.apache.org/jira/browse/MESOS-3749
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Affects Versions: 0.25.0
Reporter: Greg Mann
Assignee: Greg Mann


The {{--enable-libevent}} and {{--enable-ssl}} config flags are currently not 
documented in the "Configuration" docs with the rest of the flags. They should 
be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl

2015-10-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3749:
-
Description: The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config 
flags are currently not documented in the "Configuration" docs with the rest of 
the flags. They should be added.  (was: The {{--enable-libevent}} and 
{{--enable-ssl}} config flags are currently not documented in the 
"Configuration" docs with the rest of the flags. They should be added.)

> Configuration docs are missing --enable-libevent and --enable-ssl
> -
>
> Key: MESOS-3749
> URL: https://issues.apache.org/jira/browse/MESOS-3749
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: configuration, documentaion, installation, mesosphere
>
> The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently 
> not documented in the "Configuration" docs with the rest of the flags. They 
> should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3706) Tasks stuck in staging.

2015-10-16 Thread Jord Sonneveld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961034#comment-14961034
 ] 

Jord Sonneveld commented on MESOS-3706:
---

There are no such things because the task never starts.  It is not in my 
'docker ps -a' history.  It never gets past 'STAGING'.

> Tasks stuck in staging.
> ---
>
> Key: MESOS-3706
> URL: https://issues.apache.org/jira/browse/MESOS-3706
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Affects Versions: 0.23.0, 0.24.1
>Reporter: Jord Sonneveld
> Attachments: Screen Shot 2015-10-12 at 9.08.30 AM.png, Screen Shot 
> 2015-10-12 at 9.24.32 AM.png, mesos-slave.INFO, mesos-slave.INFO.2, 
> mesos-slave.INFO.3
>
>
> I have a docker image which starts fine on all my slaves except for one.  On 
> that one, it is stuck in STAGING for a long time and never starts.  The INFO 
> log is full of messages like this:
> I1012 16:02:09.210306 34905 slave.cpp:1768] Asked to kill task 
> kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72 of framework 
> 20150109-172016-504433162-5050-19367-0002
> E1012 16:02:09.211272 34907 socket.hpp:174] Shutdown failed on fd=12: 
> Transport endpoint is not connected [107]
> kwe-vinland-work is the task that is stuck in staging.  It is launched by 
> marathon.  I have launched 161 instances successfully on my cluster.  But it 
> refuses to launch on this specific slave.
> These machines are all managed via ansible so their configurations are / 
> should be identical.  I have re-run my ansible scripts and rebooted the 
> machines to no avail.
> It's been in this state for almost 30 minutes.  You can see the mesos docker 
> executor is still running:
> jord@dalstgmesos03:~$ date
> Mon Oct 12 16:13:55 UTC 2015
> jord@dalstgmesos03:~$ ps auwx | grep kwe-vinland
> root 35360  0.0  0.0 1070576 21476 ?   Ssl  15:46   0:00 
> mesos-docker-executor 
> --container=mesos-20151012-082619-4145023498-5050-22623-S0.0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --docker=docker --help=false --mapped_directory=/mnt/mesos/sandbox 
> --sandbox_directory=/data/mesos/mesos/work/slaves/20151012-082619-4145023498-5050-22623-S0/frameworks/20150109-172016-504433162-5050-19367-0002/executors/kwe-vinland-work.6c939697-70f8-11e5-845c-0242e054dd72/runs/0695c9e0-0adf-4dfb-bc2a-6060245dcabe
>  --stop_timeout=0ns
> According to docker ps -a, nothing was ever even launched:
> jord@dalstgmesos03:/data/mesos$ sudo docker ps -a
> CONTAINER IDIMAGE  
> COMMAND  CREATED STATUS  PORTS
> NAMES
> 5c858b90b0a0registry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   39 minutes ago  Up 39 minutes   
> 0.0.0.0:9125->8125/udp, 0.0.0.0:9126->8126/tcp   statsd-fe-influxdb
> d765ba3829fdregistry.roger.dal.moz.com:5000/moz-statsd-v0.22   
> "/bin/sh -c ./start.s"   41 minutes ago  Up 41 minutes   
> 0.0.0.0:8125->8125/udp, 0.0.0.0:8126->8126/tcp   statsd-repeater
> Those are the only two entries. Nothing about the kwe-vinland job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)