from:"Charles Allen \(JIRA\)"

[jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2017-11-01 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234831#comment-16234831
 ] 

Charles Allen commented on MESOS-8158:
--

[~jieyu] the socket is exposed.

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --
>
> Key: MESOS-8158
> URL: https://issues.apache.org/jira/browse/MESOS-8158
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker, executor
>Affects Versions: 1.4.0
> Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>Reporter: Charles Allen
>Priority: Normal
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
> not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> Wed Nov  1 20:34:06 UTC 2017
> {code}
> But, somehow there is a TASK_FAILED message sent to marathon.
> Upon further investigation, the following snippet can be found in the agent 
> logs (running in a docker container)
> {code}
> I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  from gc
> I1101 20:34:00.950225 9 gc.cpp:93]

[jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2017-11-01 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234830#comment-16234830
 ] 

Charles Allen commented on MESOS-8158:
--

Here's the docker command used to run the agent:

{code}
docker run --network=host -d --rm -v /var/run/mesos:/var/run/mesos -v 
/sys/fs/cgroup:/sys/fs/cgroup -v /var/run/docker.sock:/var/run/docker.sock -v 
/usr/bin/docker:/usr/bin/docker -v /lib64:/host/lib64 -e 
"LD_LIBRARY_PATH=/host/lib64" -e "MESOS_ADVERTISE_IP=10.0.75.2" -e 
"MESOS_HOSTNAME=10.0.75.2" -e "MESOS_IP=10.0.75.2" -e 
"MESOS_MASTER=10.0.75.2:5050" -e "MESOS_WORK_DIR=/var/run/mesos" -e 
"MESOS_CONTAINERIZERS=mesos,docker" -e 
"MESOS_ISOLATORS=appc/runtime,environment_secret,cgroups/blkio,cgroups/cpu,cgroups/cpuset,cgroups/devices,cgroups/hugetlb,cgroups/mem,cgroups/perf_event,cgroups/pids,docker/runtime,docker/volume,linux/capabilities,volume/secret,volume/sandbox_path,volume/image,volume/host_path"
 -e 
"MESOS_AGENT_SUBSYSTEMS=blkio,cpu,cpuacct,cpuset,devices,hugetlb,memory,net_cls,net_prio,perf_event,pids"
  --name=mesos-slave --privileged -e "GLOG_vmodule=docker*=2" -e 
"MESOS_DOCKER_MESOS_IMAGE=mesos-docker" -e 
"MESOS_EXECUTOR_REGISTRATION_RETRY_INTERVAL=30secs" mesosphere/mesos:1.4.0 
mesos-slave --no-systemd_enable_support
{code}

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --
>
> Key: MESOS-8158
> URL: https://issues.apache.org/jira/browse/MESOS-8158
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker, executor
>Affects Versions: 1.4.0
> Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>Reporter: Charles Allen
>Priority: Normal
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker

[jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2017-11-01 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234809#comment-16234809
 ] 

Charles Allen commented on MESOS-8158:
--

Here's the marathon app in question:

{code}
{
  "id": "/testapp",
  "cmd": "date && python -m SimpleHTTPServer $PORT0",
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 0,
  "acceptedResourceRoles": [
"*"
  ],
  "container": {
"type": "DOCKER",
"docker": {
  "forcePullImage": false,
  "image": "python:2",
  "parameters": [],
  "privileged": false
},
"volumes": []
  },
  "healthChecks": [
{
  "gracePeriodSeconds": 300,
  "ignoreHttp1xx": false,
  "intervalSeconds": 60,
  "maxConsecutiveFailures": 3,
  "path": "/",
  "portIndex": 0,
  "protocol": "HTTP",
  "timeoutSeconds": 20,
  "delaySeconds": 15
}
  ],
  "portDefinitions": [
{
  "port": 1,
  "name": "http",
  "protocol": "tcp"
}
  ]
}
{code}

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --
>
> Key: MESOS-8158
> URL: https://issues.apache.org/jira/browse/MESOS-8158
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker, executor
>Affects Versions: 1.4.0
> Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>Reporter: Charles Allen
>Priority: Normal
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
> not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
>

[jira] [Created] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2017-11-01 Thread Charles Allen (JIRA)

Charles Allen created MESOS-8158:


 Summary: Mesos Agent in docker neglects to retry discovering Task 
docker containers
 Key: MESOS-8158
 URL: https://issues.apache.org/jira/browse/MESOS-8158
 Project: Mesos
  Issue Type: Bug
  Components: agent, containerization, docker, executor
Affects Versions: 1.4.0
 Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
Reporter: Charles Allen
Priority: Normal


I have attempted to launch Mesos agents inside of a docker container in such a 
way where the agent docker can be replaced and recovered. Unfortunately I hit a 
major snag in the way the mesos docker launching works.

To test simple functionality a marathon app is setup that simply has the 
following command: {{date && python -m SimpleHTTPServer $PORT0}} 

That way the HTTP port can be accessed to assure things are being assigned 
correctly, and the date is printed out in the log.

When I attempt to start this marathon app, the mesos agent (inside a docker 
container) properly launches an executor which properly creates a second task 
that launches the python code. Here's the output from the executor logs (this 
looks correct):

{code}
I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
10.0.75.2
I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
testapp.fe35282f-bf43-11e7-a24b-0242ac110002
I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
=1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
MARATHON_APP_RESOURCE_MEM=128.0 -e 
MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e PORT0=31464 
-e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
 --net host --entrypoint /bin/sh --name 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
--label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 -c 
date && p
ython -m SimpleHTTPServer $PORT0
I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
Wed Nov  1 20:34:06 UTC 2017
{code}


But, somehow there is a TASK_FAILED message sent to marathon.

Upon further investigation, the following snippet can be found in the agent 
logs (running in a docker container)

{code}
I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling 
'/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 from gc
I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling 
'/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
 from gc
I1101 20:34:00.95047212 slave.cpp:2003] Authorizing task 
'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
I1101 20:34:00.95121012 slave.cpp:2171] Launching task

[jira] [Commented] (MESOS-8127) Static build fails

2017-11-01 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234336#comment-16234336
 ] 

Charles Allen commented on MESOS-8127:
--

{code}
[ 87%] Linking CXX shared library ../../.libs/liblogrotate_container_logger.so
cd /usr/src/mesos/build/src/slave/container_loggers && /usr/bin/cmake -E 
cmake_link_script CMakeFiles/logrotate_container_logger.dir/link.txt --verbose=1
/usr/bin/c++  -fPIC  -std=c++11  -shared 
-Wl,-soname,liblogrotate_container_logger.so -o 
../../.libs/liblogrotate_container_logger.so 
CMakeFiles/logrotate_container_logger.dir/lib_logrotate.cpp.o  
-L/usr/src/mesos/build/3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-lib/lib/lib  
-L/usr/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib  
-L/usr/src/mesos/build/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2-build  
-L/usr/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs  
-L/usr/src/mesos/build/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c 
../../libmesos-1.4.0.a ../../libmesos-protobufs.a 
../../../3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/lib/libzookeeper_mt.a
 ../../../3rdparty/libprocess/src/libprocess-0.0.1.so.0.0.1 -lapr-1 -lcurl 
-lglog -lsvn_delta-1 -lsvn_diff-1 -lsvn_subr-1 -lprotobuf -ldl -lapr-1 
../../../3rdparty/leveldb-1.19/src/leveldb-1.19/out-static/libleveldb.a -lsasl2 
-lrt -lhttp_parser -lev -lrt -lhttp_parser -lev -lz -lpthread 
-Wl,-rpath,/usr/src/mesos/build/3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-lib/lib/lib:/usr/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib:/usr/src/mesos/build/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2-build:/usr/src/mesos/build/3rdparty/libev-4.22/src/libev-4.22-build/.libs:/usr/src/mesos/build/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c:/usr/src/mesos/build/3rdparty/libprocess/src
/usr/bin/ld: ../../libmesos-1.4.0.a(logging.cpp.o): relocation R_X86_64_32 
against `.rodata' can not be used when making a shared object; recompile with 
-fPIC
../../libmesos-1.4.0.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
src/slave/container_loggers/CMakeFiles/logrotate_container_logger.dir/build.make:107:
 recipe for target 'src/.libs/liblogrotate_container_logger.so' failed
make[2]: Leaving directory '/usr/src/mesos/build'
make[2]: *** [src/.libs/liblogrotate_container_logger.so] Error 1
CMakeFiles/Makefile2:2748: recipe for target 
'src/slave/container_loggers/CMakeFiles/logrotate_container_logger.dir/all' 
failed
make[1]: Leaving directory '/usr/src/mesos/build'
make[1]: *** 
[src/slave/container_loggers/CMakeFiles/logrotate_container_logger.dir/all] 
Error 2
make: *** [all] Error 2
Makefile:119: recipe for target 'all' failed
The command '/bin/sh -c set -ex &&   cmake .. -DBUILD_SHARED_LIBS=FALSE 
-DCMAKE_INSTALL_PREFIX=/opt/mesos &&   cmake --build . --config Release' 
returned a non-zero code: 2
{code}

This also fails for cmake

> Static build fails
> --
>
> Key: MESOS-8127
> URL: https://issues.apache.org/jira/browse/MESOS-8127
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.0
>Reporter: Charles Allen
>Priority: Major
>
> {code}
> /usr/bin/ld: ./.libs/libmesos.a(libry_http_parser_la-http_parser.o): 
> relocation R_X86_64_32S against `.rodata' can not be used when making a 
> shared object; recompile with -fPIC
> ./.libs/libmesos.a: error adding symbols: Bad value
> {code}
> Is the error which results when you try to do 
> {code}
> The command '/bin/sh -c set -ex &&   ./bootstrap &&   mkdir build && cd build 
> &&   ../configure --prefix=/opt/mesos --disable-java --disable-python 
> --enable-optimize --enable-static --disable-shared &&   make &&   make 
> install' returned a non-zero code: 2
> {code}
> Dockerfile:
> {code}
> FROM ubuntu:xenial
> WORKDIR /usr/src/mesos
> COPY . .
> RUN set -ex && \
>   apt-get update && \
>   apt-get install build-essential libapr1-dev libsasl2-dev python-dev 
> libcurl4-nss-dev libsasl2-modules libsvn-dev libz-dev git autoconf libtool -y 
> && \
>   ./bootstrap && \
>   mkdir build && cd build && \
>   ../configure --prefix=/opt/mesos --disable-java --disable-python 
> --enable-optimize --enable-static --disable-shared && \
>   make && \
>   make install
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (MESOS-8127) Static build fails

2017-10-24 Thread Charles Allen (JIRA)

Charles Allen created MESOS-8127:


 Summary: Static build fails
 Key: MESOS-8127
 URL: https://issues.apache.org/jira/browse/MESOS-8127
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.4.0
Reporter: Charles Allen


{code}
/usr/bin/ld: ./.libs/libmesos.a(libry_http_parser_la-http_parser.o): relocation 
R_X86_64_32S against `.rodata' can not be used when making a shared object; 
recompile with -fPIC
./.libs/libmesos.a: error adding symbols: Bad value
{code}

Is the error which results when you try to do 

{code}
The command '/bin/sh -c set -ex &&   ./bootstrap &&   mkdir build && cd build 
&&   ../configure --prefix=/opt/mesos --disable-java --disable-python 
--enable-optimize --enable-static --disable-shared &&   make &&   make install' 
returned a non-zero code: 2
{code}


Dockerfile:

{code}
FROM ubuntu:xenial
WORKDIR /usr/src/mesos
COPY . .
RUN set -ex && \
  apt-get update && \
  apt-get install build-essential libapr1-dev libsasl2-dev python-dev 
libcurl4-nss-dev libsasl2-modules libsvn-dev libz-dev git autoconf libtool -y 
&& \
  ./bootstrap && \
  mkdir build && cd build && \
  ../configure --prefix=/opt/mesos --disable-java --disable-python 
--enable-optimize --enable-static --disable-shared && \
  make && \
  make install
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-25 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179286#comment-16179286
 ] 

Charles Allen commented on MESOS-7999:
--

Cool thanks! If that is the way the mesos community wants to sustain going 
forward then this can be considered closed.

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-25 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179278#comment-16179278
 ] 

Charles Allen commented on MESOS-7999:
--

[~jamespeach] Thanks, just to make sure I understand, you are suggesting doing 
something like 
https://github.com/apache/mesos/blob/master/src/slave/metrics.cpp but with an 
anonymous module?

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-21 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7999:


 Summary: Add and document ability to expose new /monitor modules 
on agents
 Key: MESOS-7999
 URL: https://issues.apache.org/jira/browse/MESOS-7999
 Project: Mesos
  Issue Type: Wish
  Components: agent, json api, modules, statistics
Reporter: Charles Allen


When looking at how to collect data about the cluster, the best way to support 
functionality similar to Kubernetes DaemonSets is not completely clear.

One key use case fore DaemonSets is a monitor for system metrics. This ask is 
that agents are able to have a module which either exposes new endpoints in 
{{/monitor}} or allows pluggable entries to be added to {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-08-23 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139221#comment-16139221
 ] 

Charles Allen commented on MESOS-7006:
--

So previously you could lie about cpus on a system via static allocation. Like 
give {{*}} 8 CPUs but give {{cpu-hungry}} 8000 CPUs. This would cause things 
launched against the {{*}} offers to have much lower CPU priority than things 
against the {{cpu-hungry}} resources.

Like many things in mesos, what a "cpu" means (like what a "disk" means) has to 
be pre-arranged between the administrators and the application owners to ensure 
expectations are met.

Having "cpu" directly enforce CFS quotas will cause all these apps with "0.1 
cpu" to suddenly behave VERY differently.

Am I understanding it correctly that this is the intended change in behavior?

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Priority: Minor
> Fix For: 1.4.0
>
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-08-23 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139221#comment-16139221
 ] 

Charles Allen edited comment on MESOS-7006 at 8/23/17 10:06 PM:


So previously you could lie about cpus on a system via static allocation. Like 
give {{\*}} 8 CPUs but give {{cpu-hungry}} 8000 CPUs. This would cause things 
launched against the {{*}} offers to have much lower CPU priority than things 
against the {{cpu-hungry}} resources.

Like many things in mesos, what a "cpu" means (like what a "disk" means) has to 
be pre-arranged between the administrators and the application owners to ensure 
expectations are met.

Having "cpu" directly enforce CFS quotas will cause all these apps with "0.1 
cpu" to suddenly behave VERY differently.

Am I understanding it correctly that this is the intended change in behavior?


was (Author: drcrallen):
So previously you could lie about cpus on a system via static allocation. Like 
give {{*}} 8 CPUs but give {{cpu-hungry}} 8000 CPUs. This would cause things 
launched against the {{*}} offers to have much lower CPU priority than things 
against the {{cpu-hungry}} resources.

Like many things in mesos, what a "cpu" means (like what a "disk" means) has to 
be pre-arranged between the administrators and the application owners to ensure 
expectations are met.

Having "cpu" directly enforce CFS quotas will cause all these apps with "0.1 
cpu" to suddenly behave VERY differently.

Am I understanding it correctly that this is the intended change in behavior?

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Priority: Minor
> Fix For: 1.4.0
>
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (MESOS-7603) longjmp error in libcurl

2017-06-27 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065463#comment-16065463
 ] 

Charles Allen commented on MESOS-7603:
--

This was resolved internally by linking libcurl against c-ares

> longjmp error in libcurl
> 
>
> Key: MESOS-7603
> URL: https://issues.apache.org/jira/browse/MESOS-7603
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> We encountered the following error when the fetcher tries to run on a mesos 
> 1.2.0 agent through systemd:
> {code}
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes 
> uninitialized stack frame ***: /usr/sbin/mesos-agent terminated
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: 
> =
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /usr/sbin/mesos-agent(+0x12810)[0x557e1d691810]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: 
> 
> {code}
> It looks like this error:
> https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame
>  
> Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 
> 1)}} or use a special config option to libcurl



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (MESOS-7717) Compile Problem on coreos build in libprocess

2017-06-24 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062156#comment-16062156
 ] 

Charles Allen commented on MESOS-7717:
--

Thanks! 
https://github.com/apache/mesos/commit/775c38e9a7650d192914e39eb9023110c0d41237.patch
 seems to allow the build to continue.

> Compile Problem on coreos build in libprocess
> -
>
> Key: MESOS-7717
> URL: https://issues.apache.org/jira/browse/MESOS-7717
> Project: Mesos
>  Issue Type: Bug
>  Components: build, libprocess
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> I have a portage ebuild that is supposed to compile mesos for coreos, but 
> with the recent coreos stable I'm getting the following error at compile 
> time. It looks like a boost problem, but mesos is supposed to ship with its 
> own version of boost, so I don't quite understand why the template problems 
> are happening.
> {code}
> make[5]: Entering directory 
> '/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess'
> /bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ 
> -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
> -DPACKAGE_VERSION=\"1.2.0\" -DPACKAGE_STRING=\"mesos\ 1.2.0\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
> -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
> -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
> -DHAVE_LIBZ=1 -I.  
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src  
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include  
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0  -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -fPIE -O2 -pipe 
> -mtune=generic -g -Wno-unused-local-typedefs -Wno-maybe-uninitialized 
> -std=c++11 -c -o libprocess_la-process.lo `test -f 'src/process.cpp' || echo 
> './'`src/process.cpp
> libtool: compile:  x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.2.0\" 
> "-DPACKAGE_STRING=\"mesos 1.2.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
> -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
> -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src 
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include 
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0 -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -O2 -pipe -mtune=generic -g 
> -Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c 
> src/process.cpp  -fPIC -DPIC -o .libs/libprocess_la-process.o
> In file included from ./include/process/http.hpp:38:0,
>  from ./include/process/event.hpp:19,
>  from ./include/process/process.hpp:24,
>  from ./include/process/dispatch.hpp:20,
>  from ./include/process/deferred.hpp:18,
>  from ./include/process/defer.hpp:19,
>  from src/process.cpp:66:
> ./../stout/include/stout/json.hpp: In instantiation of 
> 'JSON::Value::Value(const T&, typename 
> boost::disable_if::type) [with T = 
> std::basic_string; typename 
> boost::disable_if::type = int]':
> src/process.cpp:3525:27:

[jira] [Commented] (MESOS-7717) Compile Problem on coreos build in libprocess

2017-06-24 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062124#comment-16062124
 ] 

Charles Allen commented on MESOS-7717:
--

Related to https://issues.apache.org/jira/browse/MESOS-3799

> Compile Problem on coreos build in libprocess
> -
>
> Key: MESOS-7717
> URL: https://issues.apache.org/jira/browse/MESOS-7717
> Project: Mesos
>  Issue Type: Bug
>  Components: build, libprocess
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> I have a portage ebuild that is supposed to compile mesos for coreos, but 
> with the recent coreos stable I'm getting the following error at compile 
> time. It looks like a boost problem, but mesos is supposed to ship with its 
> own version of boost, so I don't quite understand why the template problems 
> are happening.
> {code}
> make[5]: Entering directory 
> '/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess'
> /bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ 
> -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
> -DPACKAGE_VERSION=\"1.2.0\" -DPACKAGE_STRING=\"mesos\ 1.2.0\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
> -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
> -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
> -DHAVE_LIBZ=1 -I.  
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src  
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include  
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0  -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -fPIE -O2 -pipe 
> -mtune=generic -g -Wno-unused-local-typedefs -Wno-maybe-uninitialized 
> -std=c++11 -c -o libprocess_la-process.lo `test -f 'src/process.cpp' || echo 
> './'`src/process.cpp
> libtool: compile:  x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.2.0\" 
> "-DPACKAGE_STRING=\"mesos 1.2.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
> -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
> -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src 
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include 
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0 -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -O2 -pipe -mtune=generic -g 
> -Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c 
> src/process.cpp  -fPIC -DPIC -o .libs/libprocess_la-process.o
> In file included from ./include/process/http.hpp:38:0,
>  from ./include/process/event.hpp:19,
>  from ./include/process/process.hpp:24,
>  from ./include/process/dispatch.hpp:20,
>  from ./include/process/deferred.hpp:18,
>  from ./include/process/defer.hpp:19,
>  from src/process.cpp:66:
> ./../stout/include/stout/json.hpp: In instantiation of 
> 'JSON::Value::Value(const T&, typename 
> boost::disable_if::type) [with T = 
> std::basic_string; typename 
> boost::disable_if::type = int]':
> src/process.cpp:3525:27:   required from here
> ./../stout/include/stout/json.hpp:261:30: error:

[jira] [Commented] (MESOS-7717) Compile Problem on coreos build in libprocess

2017-06-24 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062114#comment-16062114
 ] 

Charles Allen commented on MESOS-7717:
--

from

{code}
cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\" -I./include -isystem 
../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src -I../http-parser-2.6.2 
-I../libev-4.22 
{code}

it looks like the {{boost}} include is missing an {{-I}}?

> Compile Problem on coreos build in libprocess
> -
>
> Key: MESOS-7717
> URL: https://issues.apache.org/jira/browse/MESOS-7717
> Project: Mesos
>  Issue Type: Bug
>  Components: build, libprocess
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> I have a portage ebuild that is supposed to compile mesos for coreos, but 
> with the recent coreos stable I'm getting the following error at compile 
> time. It looks like a boost problem, but mesos is supposed to ship with its 
> own version of boost, so I don't quite understand why the template problems 
> are happening.
> {code}
> make[5]: Entering directory 
> '/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess'
> /bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ 
> -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
> -DPACKAGE_VERSION=\"1.2.0\" -DPACKAGE_STRING=\"mesos\ 1.2.0\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
> -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
> -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
> -DHAVE_LIBZ=1 -I.  
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src  
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include  
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0  -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -fPIE -O2 -pipe 
> -mtune=generic -g -Wno-unused-local-typedefs -Wno-maybe-uninitialized 
> -std=c++11 -c -o libprocess_la-process.lo `test -f 'src/process.cpp' || echo 
> './'`src/process.cpp
> libtool: compile:  x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.2.0\" 
> "-DPACKAGE_STRING=\"mesos 1.2.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
> -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
> -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
> -DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
>  -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src 
> -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include 
> -I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
> -I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
> -I/build/amd64-usr//usr/include/apr-1.0 -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -O2 -pipe -mtune=generic -g 
> -Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c 
> src/process.cpp  -fPIC -DPIC -o .libs/libprocess_la-process.o
> In file included from ./include/process/http.hpp:38:0,
>  from ./include/process/event.hpp:19,
>  from ./include/process/process.hpp:24,
>  from ./include/process/dispatch.hpp:20,
>  from ./include/process/deferred.hpp:18,
>  from ./include/process/defer.hpp:19,
>  from src/process.cpp:66:
> ./../stout/include/stout/json.hpp: In instantiation of 
> 'JSON::Value::Value(const T&, typename 
> boost::disable_if::type) [with T = 
>

[jira] [Updated] (MESOS-7717) Compile Problem on coreos build in libprocess

2017-06-24 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7717:
-
Description: 
I have a portage ebuild that is supposed to compile mesos for coreos, but with 
the recent coreos stable I'm getting the following error at compile time. It 
looks like a boost problem, but mesos is supposed to ship with its own version 
of boost, so I don't quite understand why the template problems are happening.

{code}
make[5]: Entering directory 
'/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess'
/bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ 
-DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"1.2.0\" -DPACKAGE_STRING=\"mesos\ 1.2.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
-DHAVE_LIBZ=1 -I.  
-DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
 -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src  
-I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include  
-I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
-I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
-I/build/amd64-usr//usr/include/apr-1.0  -Wall -Wsign-compare -Wformat-security 
-fstack-protector-strong -fPIC -fPIE -O2 -pipe -mtune=generic -g 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c -o 
libprocess_la-process.lo `test -f 'src/process.cpp' || echo './'`src/process.cpp
libtool: compile:  x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.2.0\" 
"-DPACKAGE_STRING=\"mesos 1.2.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
-DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
-DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
-DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
 -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src 
-I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include 
-I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
-I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
-I/build/amd64-usr//usr/include/apr-1.0 -Wall -Wsign-compare -Wformat-security 
-fstack-protector-strong -fPIC -O2 -pipe -mtune=generic -g 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c 
src/process.cpp  -fPIC -DPIC -o .libs/libprocess_la-process.o
In file included from ./include/process/http.hpp:38:0,
 from ./include/process/event.hpp:19,
 from ./include/process/process.hpp:24,
 from ./include/process/dispatch.hpp:20,
 from ./include/process/deferred.hpp:18,
 from ./include/process/defer.hpp:19,
 from src/process.cpp:66:
./../stout/include/stout/json.hpp: In instantiation of 
'JSON::Value::Value(const T&, typename 
boost::disable_if::type) [with T = 
std::basic_string; typename boost::disable_if::type = int]':
src/process.cpp:3525:27:   required from here
./../stout/include/stout/json.hpp:261:30: error: no matching function for call 
to 'boost::variant::variant(const 
std::basic_string&)'
 : internal::Variant(value) {}
  ^
./../stout/include/stout/json.hpp:261:30: note: candidates are:
In file included from /build/amd64-usr//usr/include/boost/variant.hpp:17:0,
 from ./../stout/include/stout/json.hpp:35,
 from ./include/process/http.hpp:38,

[jira] [Created] (MESOS-7717) Compile Problem on coreos build in libprocess

2017-06-24 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7717:


 Summary: Compile Problem on coreos build in libprocess
 Key: MESOS-7717
 URL: https://issues.apache.org/jira/browse/MESOS-7717
 Project: Mesos
  Issue Type: Bug
  Components: build, libprocess
Affects Versions: 1.2.0
Reporter: Charles Allen


I have a portage ebuild that is supposed to compile mesos for coreos, but with 
the recent stable I'm getting the following error at compile time. It looks 
like a boost problem, but mesos is supposed to ship with its own version of 
boost, so I don't quite understand why the template problems are happening.

{code}
make[5]: Entering directory 
'/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess'
/bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ 
-DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"1.2.0\" -DPACKAGE_STRING=\"mesos\ 1.2.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 
-DHAVE_LIBZ=1 -I.  
-DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
 -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src  
-I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include  
-I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
-I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
-I/build/amd64-usr//usr/include/apr-1.0  -Wall -Wsign-compare -Wformat-security 
-fstack-protector-strong -fPIC -fPIE -O2 -pipe -mtune=generic -g 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c -o 
libprocess_la-process.lo `test -f 'src/process.cpp' || echo './'`src/process.cpp
libtool: compile:  x86_64-pc-linux-gnu-g++ -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.2.0\" 
"-DPACKAGE_STRING=\"mesos 1.2.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"1.2.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
-DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
-DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
-DBUILD_DIR=\"/build/amd64-usr/var/tmp/portage/sys-cluster/mesos-1.2.0/work/mesos-1.2.0/3rdparty/libprocess\"
 -I./include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src 
-I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I./../stout/include 
-I/build/amd64-usr//usr/include/subversion-1 -I/build/amd64-usr//usr/include 
-I/build/amd64-usr//usr/include -I/build/amd64-usr//usr/include/apr-1 
-I/build/amd64-usr//usr/include/apr-1.0 -Wall -Wsign-compare -Wformat-security 
-fstack-protector-strong -fPIC -O2 -pipe -mtune=generic -g 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c 
src/process.cpp  -fPIC -DPIC -o .libs/libprocess_la-process.o
In file included from ./include/process/http.hpp:38:0,
 from ./include/process/event.hpp:19,
 from ./include/process/process.hpp:24,
 from ./include/process/dispatch.hpp:20,
 from ./include/process/deferred.hpp:18,
 from ./include/process/defer.hpp:19,
 from src/process.cpp:66:
./../stout/include/stout/json.hpp: In instantiation of 
'JSON::Value::Value(const T&, typename 
boost::disable_if::type) [with T = 
std::basic_string; typename boost::disable_if::type = int]':
src/process.cpp:3525:27:   required from here
./../stout/include/stout/json.hpp:261:30: error: no matching function for call 
to 'boost::variant::variant(const 
std::basic_string&)'
 : internal::Variant(value) {}
  ^
./../stout/include/stout/json.hpp:261:30: note: candidates are:
In file included from

[jira] [Assigned] (MESOS-2136) Expose per-cgroup memory pressure

2017-06-19 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen reassigned MESOS-2136:


Assignee: Charles Allen  (was: Chi Zhang)

> Expose per-cgroup memory pressure
> -
>
> Key: MESOS-2136
> URL: https://issues.apache.org/jira/browse/MESOS-2136
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Ian Downes
>Assignee: Charles Allen
>  Labels: twitter
> Fix For: 0.23.0
>
>
> The cgroup memory controller can provide information on the memory pressure 
> of a cgroup. This is in the form of an event based notification where events 
> of (low, medium, critical) are generated when the kernel makes specific 
> actions to allocate memory. This signal is probably more informative than 
> comparing memory usage to memory limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (MESOS-7649) GPF in mesos-executor

2017-06-09 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7649:
-
Description: 
We are running mesos 1.2.0 on a CoreOS system and having the following gpf show 
up:

{code}
[57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a 
sp:7ffdafce3500 error:0
[57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
{code}


Stack trace:

{code}
#0  0x7f59c20cd054 in std::basic_string::basic_string(std::string const&) () from 
/media/root/lib64/libstdc++.so.6
#1  0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () 
from /media/root/lib64/libmesos-1.2.0.so
#2  0x7f59c403e623 in process::SocketManager::close(int) () from 
/media/root/lib64/libmesos-1.2.0.so
#3  0x7f59c403f904 in process::SocketManager::finalize() () from 
/media/root/lib64/libmesos-1.2.0.so
#4  0x7f59c403fc59 in process::finalize(bool) () from 
/media/root/lib64/libmesos-1.2.0.so
#5  0x55c02473c1bd in ?? ()
#6  0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6
#7  0x55c02473c789 in ?? ()
{code}

  was:
We are running mesos 1.2.0 on a CoreOS system and having the following gpf show 
up:

{code}
[57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a 
sp:7ffdafce3500 error:0
[57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
{code}

I have the core dumps and am working on getting more info.


> GPF in mesos-executor
> -
>
> Key: MESOS-7649
> URL: https://issues.apache.org/jira/browse/MESOS-7649
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> We are running mesos 1.2.0 on a CoreOS system and having the following gpf 
> show up:
> {code}
> [57807.639274] traps: mesos-executor[63400] general protection 
> ip:7f4bdfd1b05a sp:7ffdafce3500 error:0
> [57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
> {code}
> Stack trace:
> {code}
> #0  0x7f59c20cd054 in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /media/root/lib64/libstdc++.so.6
> #1  0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () 
> from /media/root/lib64/libmesos-1.2.0.so
> #2  0x7f59c403e623 in process::SocketManager::close(int) () from 
> /media/root/lib64/libmesos-1.2.0.so
> #3  0x7f59c403f904 in process::SocketManager::finalize() () from 
> /media/root/lib64/libmesos-1.2.0.so
> #4  0x7f59c403fc59 in process::finalize(bool) () from 
> /media/root/lib64/libmesos-1.2.0.so
> #5  0x55c02473c1bd in ?? ()
> #6  0x7f59c172b93c in __libc_start_main () from 
> /media/root/lib64/libc.so.6
> #7  0x55c02473c789 in ?? ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (MESOS-7649) GPF in mesos-executor

2017-06-09 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7649:
-
Description: 
We are running mesos 1.2.0 on a CoreOS system and having the following gpf show 
up on occasion:

{code}
[57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a 
sp:7ffdafce3500 error:0
[57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
{code}


Stack trace:

{code}
#0  0x7f59c20cd054 in std::basic_string::basic_string(std::string const&) () from 
/media/root/lib64/libstdc++.so.6
#1  0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () 
from /media/root/lib64/libmesos-1.2.0.so
#2  0x7f59c403e623 in process::SocketManager::close(int) () from 
/media/root/lib64/libmesos-1.2.0.so
#3  0x7f59c403f904 in process::SocketManager::finalize() () from 
/media/root/lib64/libmesos-1.2.0.so
#4  0x7f59c403fc59 in process::finalize(bool) () from 
/media/root/lib64/libmesos-1.2.0.so
#5  0x55c02473c1bd in ?? ()
#6  0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6
#7  0x55c02473c789 in ?? ()
{code}

  was:
We are running mesos 1.2.0 on a CoreOS system and having the following gpf show 
up:

{code}
[57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a 
sp:7ffdafce3500 error:0
[57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
{code}


Stack trace:

{code}
#0  0x7f59c20cd054 in std::basic_string::basic_string(std::string const&) () from 
/media/root/lib64/libstdc++.so.6
#1  0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () 
from /media/root/lib64/libmesos-1.2.0.so
#2  0x7f59c403e623 in process::SocketManager::close(int) () from 
/media/root/lib64/libmesos-1.2.0.so
#3  0x7f59c403f904 in process::SocketManager::finalize() () from 
/media/root/lib64/libmesos-1.2.0.so
#4  0x7f59c403fc59 in process::finalize(bool) () from 
/media/root/lib64/libmesos-1.2.0.so
#5  0x55c02473c1bd in ?? ()
#6  0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6
#7  0x55c02473c789 in ?? ()
{code}


> GPF in mesos-executor
> -
>
> Key: MESOS-7649
> URL: https://issues.apache.org/jira/browse/MESOS-7649
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> We are running mesos 1.2.0 on a CoreOS system and having the following gpf 
> show up on occasion:
> {code}
> [57807.639274] traps: mesos-executor[63400] general protection 
> ip:7f4bdfd1b05a sp:7ffdafce3500 error:0
> [57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
> {code}
> Stack trace:
> {code}
> #0  0x7f59c20cd054 in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /media/root/lib64/libstdc++.so.6
> #1  0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () 
> from /media/root/lib64/libmesos-1.2.0.so
> #2  0x7f59c403e623 in process::SocketManager::close(int) () from 
> /media/root/lib64/libmesos-1.2.0.so
> #3  0x7f59c403f904 in process::SocketManager::finalize() () from 
> /media/root/lib64/libmesos-1.2.0.so
> #4  0x7f59c403fc59 in process::finalize(bool) () from 
> /media/root/lib64/libmesos-1.2.0.so
> #5  0x55c02473c1bd in ?? ()
> #6  0x7f59c172b93c in __libc_start_main () from 
> /media/root/lib64/libc.so.6
> #7  0x55c02473c789 in ?? ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7649) GPF in mesos-executor

2017-06-09 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7649:


 Summary: GPF in mesos-executor
 Key: MESOS-7649
 URL: https://issues.apache.org/jira/browse/MESOS-7649
 Project: Mesos
  Issue Type: Bug
  Components: executor
Affects Versions: 1.2.0
Reporter: Charles Allen


We are running mesos 1.2.0 on a CoreOS system and having the following gpf show 
up:

{code}
[57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a 
sp:7ffdafce3500 error:0
[57807.648470]  in libstdc++.so.6.0.20[7f4bdfc2+155000]
{code}

I have the core dumps and am working on getting more info.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (MESOS-7607) Support for first-class fault domains.

2017-06-08 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043246#comment-16043246
 ] 

Charles Allen commented on MESOS-7607:
--

Can you explain a bit deeper why agent attributes are insufficient for this 
capability?

> Support for first-class fault domains.
> --
>
> Key: MESOS-7607
> URL: https://issues.apache.org/jira/browse/MESOS-7607
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Mesos should support a first-class notion of "fault domains", which 
> effectively provide a common vocabulary for describing the region and zone 
> where a node (either master or agent) is located.
> Design doc: 
> https://drive.google.com/open?id=1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (MESOS-7621) Fetcher has unclear error message on HEAD fetch stage or cache sizing

2017-06-06 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7621:
-
Summary: Fetcher has unclear error message on HEAD fetch stage or cache 
sizing  (was: Fetcher does not handle content length and redirects)

> Fetcher has unclear error message on HEAD fetch stage or cache sizing
> -
>
> Key: MESOS-7621
> URL: https://issues.apache.org/jira/browse/MESOS-7621
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> {code}
> $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz
> *   Trying 172.17.4.10...
> * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0)
> > GET /PATH_REDACTED.tar.gz HTTP/1.1
> > Host: HOSTNAME_REDACTED
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 302 FOUND
> < Server: nginx/1.4.6 (Ubuntu)
> < Date: Mon, 05 Jun 2017 17:58:04 GMT
> < Content-Type: text/html; charset=utf-8
> < Content-Length: 1947
> < Connection: keep-alive
> < Location: 
> https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> <
> * Ignoring the response-body
> { [309 bytes data]
> * Connection #0 to host HOSTNAME_REDACTED left intact
> * Issue another request to this URL: 
> 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D'
> *   Trying 54.231.40.75...
> * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1)
> * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> * Server certificate: *.s3.amazonaws.com
> * Server certificate: DigiCert Baltimore CA-2 G2
> * Server certificate: Baltimore CyberTrust Root
> > GET 
> > /PATH_REDACTED.tar.gz?Signature=REDACTED=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> >  HTTP/1.1
> > Host: BUCKET_REDACTED.s3.amazonaws.com
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < x-amz-id-2: ID_REDACTED=
> < x-amz-request-id: REQUEST_ID_REDACTED
> < Date: Mon, 05 Jun 2017 17:58:07 GMT
> < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT
> < ETag: "ETAG_REDACTED"
> < Accept-Ranges: bytes
> < Content-Type: application/x-tar
> < Content-Length: 208245664
> < Server: AmazonS3
> <
> { [16360 bytes data]
> {code}
> We have a micro-service which signs temporary urls for services which can't 
> speak natively with S3. The above is an example download using {{curl}}. But 
> when using the mesos fetcher the agent logs contain the following information:
> {code}
> fetcher.cpp:479] Reverting to fetching directly into the sandbox for 
> 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through 
> the cache, with error: Could not determine size of cache file for 
> 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL 
> content-length available
> {code}
> Any idea why this error would occur?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (MESOS-7621) Fetcher does not handle content length and redirects

2017-06-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037677#comment-16037677
 ] 

Charles Allen commented on MESOS-7621:
--

In a basic digging, it looks like 
https://github.com/apache/mesos/blob/1.2.0/3rdparty/stout/include/stout/net.hpp#L101
 does another request to the same location to get the content length.


I do see the following in the logs, even though the files download successfully:

{code}
[1B blob data]
HTTP/1.1 403 Forbidden
x-amz-request-id: REQUEST_ID_REDACTED
x-amz-id-2: ID_REDACTED=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Mon, 05 Jun 2017 18:25:45 GMT
Server: AmazonS3
{code}

> Fetcher does not handle content length and redirects
> 
>
> Key: MESOS-7621
> URL: https://issues.apache.org/jira/browse/MESOS-7621
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> {code}
> $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz
> *   Trying 172.17.4.10...
> * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0)
> > GET /PATH_REDACTED.tar.gz HTTP/1.1
> > Host: HOSTNAME_REDACTED
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 302 FOUND
> < Server: nginx/1.4.6 (Ubuntu)
> < Date: Mon, 05 Jun 2017 17:58:04 GMT
> < Content-Type: text/html; charset=utf-8
> < Content-Length: 1947
> < Connection: keep-alive
> < Location: 
> https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> <
> * Ignoring the response-body
> { [309 bytes data]
> * Connection #0 to host HOSTNAME_REDACTED left intact
> * Issue another request to this URL: 
> 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D'
> *   Trying 54.231.40.75...
> * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1)
> * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> * Server certificate: *.s3.amazonaws.com
> * Server certificate: DigiCert Baltimore CA-2 G2
> * Server certificate: Baltimore CyberTrust Root
> > GET 
> > /PATH_REDACTED.tar.gz?Signature=REDACTED=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> >  HTTP/1.1
> > Host: BUCKET_REDACTED.s3.amazonaws.com
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < x-amz-id-2: ID_REDACTED=
> < x-amz-request-id: REQUEST_ID_REDACTED
> < Date: Mon, 05 Jun 2017 17:58:07 GMT
> < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT
> < ETag: "ETAG_REDACTED"
> < Accept-Ranges: bytes
> < Content-Type: application/x-tar
> < Content-Length: 208245664
> < Server: AmazonS3
> <
> { [16360 bytes data]
> {code}
> We have a micro-service which signs temporary urls for services which can't 
> speak natively with S3. The above is an example download using {{curl}}. But 
> when using the mesos fetcher the agent logs contain the following information:
> {code}
> fetcher.cpp:479] Reverting to fetching directly into the sandbox for 
> 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through 
> the cache, with error: Could not determine size of cache file for 
> 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL 
> content-length available
> {code}
> Any idea why this error would occur?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (MESOS-7621) Fetcher does not handle content length and redirects

2017-06-05 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7621:
-
Summary: Fetcher does not handle content length and redirects  (was: 
Fetcher does not handle content length in redirects)

> Fetcher does not handle content length and redirects
> 
>
> Key: MESOS-7621
> URL: https://issues.apache.org/jira/browse/MESOS-7621
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> {code}
> $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz
> *   Trying 172.17.4.10...
> * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0)
> > GET /PATH_REDACTED.tar.gz HTTP/1.1
> > Host: HOSTNAME_REDACTED
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 302 FOUND
> < Server: nginx/1.4.6 (Ubuntu)
> < Date: Mon, 05 Jun 2017 17:58:04 GMT
> < Content-Type: text/html; charset=utf-8
> < Content-Length: 1947
> < Connection: keep-alive
> < Location: 
> https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> <
> * Ignoring the response-body
> { [309 bytes data]
> * Connection #0 to host HOSTNAME_REDACTED left intact
> * Issue another request to this URL: 
> 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D'
> *   Trying 54.231.40.75...
> * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1)
> * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> * Server certificate: *.s3.amazonaws.com
> * Server certificate: DigiCert Baltimore CA-2 G2
> * Server certificate: Baltimore CyberTrust Root
> > GET 
> > /PATH_REDACTED.tar.gz?Signature=REDACTED=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
> >  HTTP/1.1
> > Host: BUCKET_REDACTED.s3.amazonaws.com
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < x-amz-id-2: ID_REDACTED=
> < x-amz-request-id: REQUEST_ID_REDACTED
> < Date: Mon, 05 Jun 2017 17:58:07 GMT
> < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT
> < ETag: "ETAG_REDACTED"
> < Accept-Ranges: bytes
> < Content-Type: application/x-tar
> < Content-Length: 208245664
> < Server: AmazonS3
> <
> { [16360 bytes data]
> {code}
> We have a micro-service which signs temporary urls for services which can't 
> speak natively with S3. The above is an example download using {{curl}}. But 
> when using the mesos fetcher the agent logs contain the following information:
> {code}
> fetcher.cpp:479] Reverting to fetching directly into the sandbox for 
> 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through 
> the cache, with error: Could not determine size of cache file for 
> 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL 
> content-length available
> {code}
> Any idea why this error would occur?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7621) Fetcher does not handle content length in redirects

2017-06-05 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7621:


 Summary: Fetcher does not handle content length in redirects
 Key: MESOS-7621
 URL: https://issues.apache.org/jira/browse/MESOS-7621
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 1.2.0
Reporter: Charles Allen


{code}
$ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz
*   Trying 172.17.4.10...
* Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0)
> GET /PATH_REDACTED.tar.gz HTTP/1.1
> Host: HOSTNAME_REDACTED
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 302 FOUND
< Server: nginx/1.4.6 (Ubuntu)
< Date: Mon, 05 Jun 2017 17:58:04 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 1947
< Connection: keep-alive
< Location: 
https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
<
* Ignoring the response-body
{ [309 bytes data]
* Connection #0 to host HOSTNAME_REDACTED left intact
* Issue another request to this URL: 
'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D'
*   Trying 54.231.40.75...
* Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: *.s3.amazonaws.com
* Server certificate: DigiCert Baltimore CA-2 G2
* Server certificate: Baltimore CyberTrust Root
> GET 
> /PATH_REDACTED.tar.gz?Signature=REDACTED=1496689084=KEY_REDACTED=TOKEN_REDACTED%3D
>  HTTP/1.1
> Host: BUCKET_REDACTED.s3.amazonaws.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< x-amz-id-2: ID_REDACTED=
< x-amz-request-id: REQUEST_ID_REDACTED
< Date: Mon, 05 Jun 2017 17:58:07 GMT
< Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT
< ETag: "ETAG_REDACTED"
< Accept-Ranges: bytes
< Content-Type: application/x-tar
< Content-Length: 208245664
< Server: AmazonS3
<
{ [16360 bytes data]
{code}

We have a micro-service which signs temporary urls for services which can't 
speak natively with S3. The above is an example download using {{curl}}. But 
when using the mesos fetcher the agent logs contain the following information:

{code}
fetcher.cpp:479] Reverting to fetching directly into the sandbox for 
'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through 
the cache, with error: Could not determine size of cache file for 
'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL 
content-length available
{code}

Any idea why this error would occur?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (MESOS-7603) longjmp error in libcurl

2017-06-01 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7603:
-
Description: 
We encountered the following error when the fetcher tries to run on a mesos 
1.2.0 agent through systemd:

{code}
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes 
uninitialized stack frame ***: /usr/sbin/mesos-agent terminated
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: 
=
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x12810)[0x557e1d691810]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: 

{code}

It looks like this error:

https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame
 

Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 
1)}} or use a special config option to libcurl

  was:
We encountered the following error when the fetcher tries to run on a mesos 
1.2.0 agent:

{code}
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes 
uninitialized stack frame ***: /usr/sbin/mesos-agent terminated
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: 
=
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x12810)[0x557e1d691810]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: 

{code}

It looks like this error:

https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame
 

Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 
1)}} or use a special config option to libcurl


> longjmp error in libcurl
> 
>
> Key: MESOS-7603
> URL: https://issues.apache.org/jira/browse/MESOS-7603
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> We encountered the following error when the fetcher tries to run on a mesos 
> 1.2.0 agent through systemd:
> {code}
> Jun

[jira] [Updated] (MESOS-7603) longjmp error in libcurl

2017-06-01 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7603:
-
Summary: longjmp error in libcurl  (was: lngjmp error in libcurl)

> longjmp error in libcurl
> 
>
> Key: MESOS-7603
> URL: https://issues.apache.org/jira/browse/MESOS-7603
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Charles Allen
>
> We encountered the following error when the fetcher tries to run on a mesos 
> 1.2.0 agent:
> {code}
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes 
> uninitialized stack frame ***: /usr/sbin/mesos-agent terminated
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: 
> =
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /usr/sbin/mesos-agent(+0x12810)[0x557e1d691810]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
> /usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9]
> Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: 
> 
> {code}
> It looks like this error:
> https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame
>  
> Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 
> 1)}} or use a special config option to libcurl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7603) lngjmp error in libcurl

2017-06-01 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7603:


 Summary: lngjmp error in libcurl
 Key: MESOS-7603
 URL: https://issues.apache.org/jira/browse/MESOS-7603
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 1.2.0
Reporter: Charles Allen


We encountered the following error when the fetcher tries to run on a mesos 
1.2.0 agent:

{code}
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes 
uninitialized stack frame ***: /usr/sbin/mesos-agent terminated
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: 
=
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x12810)[0x557e1d691810]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: 
/usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9]
Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: 

{code}

It looks like this error:

https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame
 

Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 
1)}} or use a special config option to libcurl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (MESOS-7169) Documentation still references `ContainerLogger::recover`

2017-03-27 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-7169:
-
Component/s: modules

> Documentation still references `ContainerLogger::recover`
> -
>
> Key: MESOS-7169
> URL: https://issues.apache.org/jira/browse/MESOS-7169
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, modules
>Affects Versions: 1.1.0
>Reporter: Charles Allen
>
> MESOS-6371 removed {{ContainerLogger::recover}} but 
> https://github.com/apache/mesos/blob/1.1.0/include/mesos/slave/container_logger.hpp#L143
>  still discusses the recovery process as being important.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7286) Move CRAM with MD5 support test to check rather than configure

2017-03-22 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7286:


 Summary: Move CRAM with MD5 support test to check rather than 
configure
 Key: MESOS-7286
 URL: https://issues.apache.org/jira/browse/MESOS-7286
 Project: Mesos
  Issue Type: Wish
  Components: security, test, testing, tests
Reporter: Charles Allen


https://www.mail-archive.com/user@mesos.apache.org/msg04222.html

I recently ran into an issue with cross compiling very similar to what the user 
in the above thread ran into, where the SASL2 library with MD5 support had to 
be in {{LD_LIBRARY_PATH}} in order to get {{configure}} to run.

This ask is that the test be put as one of the tests in {{make check}} rather 
than {{configure}}.

The difference here is assuming the target environment is all setup in {{make 
check}}, but for {{configure}} the target environment paths are passed in and 
may not be equal to what is in the environment in which {{configure}} is run.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7169) Documentation still references `ContainerLogger::recover`

2017-02-24 Thread Charles Allen (JIRA)

Charles Allen created MESOS-7169:


 Summary: Documentation still references `ContainerLogger::recover`
 Key: MESOS-7169
 URL: https://issues.apache.org/jira/browse/MESOS-7169
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.1.0
Reporter: Charles Allen


MESOS-6371 removed {{ContainerLogger::recover}} but 
https://github.com/apache/mesos/blob/1.1.0/include/mesos/slave/container_logger.hpp#L143
 still discusses the recovery process as being important.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (MESOS-6857) Mesos master UI resources per role

2017-01-06 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806302#comment-15806302
 ] 

Charles Allen commented on MESOS-6857:
--

[~bmahler] cool thanks!

> Mesos master UI resources per role
> --
>
> Key: MESOS-6857
> URL: https://issues.apache.org/jira/browse/MESOS-6857
> Project: Mesos
>  Issue Type: Wish
>  Components: master, webui
>Reporter: Charles Allen
>
> Currently when viewing resources in the mesos master ui all resources are 
> jumbled together. This makes it challenging for operators to determine how 
> different roles are utilizing the cluster resources. This ask is that the 
> mesos master web ui have a per-role view of resources, similar in function to 
> the current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6861) Add endpoint for replicated log status on mesos master

2017-01-05 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6861:


 Summary: Add endpoint for replicated log status on mesos master
 Key: MESOS-6861
 URL: https://issues.apache.org/jira/browse/MESOS-6861
 Project: Mesos
  Issue Type: Wish
  Components: HTTP API, master
Reporter: Charles Allen


Having to parse metrics for {{registrar/log/recovered}} is not as convenient as 
simply hitting an endpoint and getting a 200 or not.

This ask is that an endpoint be added to the mesos master whose return value 
relates to {{registrar/log/recovered}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6859) Document HA behavior during mesos master replacement

2017-01-05 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6859:


 Summary: Document HA behavior during mesos master replacement
 Key: MESOS-6859
 URL: https://issues.apache.org/jira/browse/MESOS-6859
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, master
Reporter: Charles Allen


In a discussion in https://mesos.slack.com/archives/general/p1483637159001494 
the question was brought up when a "new" master is really fully ready.

Specifically, in the case where new masters can spin up faster than masters can 
sync their logs, it is unclear from the HA docs at 
http://mesos.apache.org/documentation/latest/high-availability/ how to ensure a 
freshly spawned master is ready to take over leadership.

There is documentation at 
http://mesos.apache.org/documentation/latest/monitoring/ about using 
{{registrar/log/recovered}} to gather this kind of information, but such 
information is very easy to overlook.

This ask is that the HA docs be amended to include more information about how 
to use {{registrar/log/recovered}} properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-6857) Mesos master UI resources per role

2017-01-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801753#comment-15801753
 ] 

Charles Allen edited comment on MESOS-6857 at 1/5/17 4:15 PM:
--

This came about in our cluster when we started having significant spark 
resources as well as druid resources on the same mesos master, with the desire 
being figuring out "how many cpu cores is spark using, and how many does it 
have available."

In the spark stand-alone cluster view, it was easy to determine the total pool 
of resources available and scale up and down appropriately (this is in a cloud 
env)


was (Author: drcrallen):
This came about in our cluster when we started having significant spark 
resources as well as druid resources on the same mesos master.

> Mesos master UI resources per role
> --
>
> Key: MESOS-6857
> URL: https://issues.apache.org/jira/browse/MESOS-6857
> Project: Mesos
>  Issue Type: Wish
>  Components: master, webui
>Reporter: Charles Allen
>
> Currently when viewing resources in the mesos master ui all resources are 
> jumbled together. This makes it challenging for operators to determine how 
> different roles are utilizing the cluster resources. This ask is that the 
> mesos master web ui have a per-role view of resources, similar in function to 
> the current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6857) Mesos master UI resources per role

2017-01-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801753#comment-15801753
 ] 

Charles Allen commented on MESOS-6857:
--

This came about in our cluster when we started having significant spark 
resources as well as druid resources on the same mesos master.

> Mesos master UI resources per role
> --
>
> Key: MESOS-6857
> URL: https://issues.apache.org/jira/browse/MESOS-6857
> Project: Mesos
>  Issue Type: Wish
>  Components: master, webui
>Reporter: Charles Allen
>
> Currently when viewing resources in the mesos master ui all resources are 
> jumbled together. This makes it challenging for operators to determine how 
> different roles are utilizing the cluster resources. This ask is that the 
> mesos master web ui have a per-role view of resources, similar in function to 
> the current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6857) Mesos master UI resources per role

2017-01-05 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6857:


 Summary: Mesos master UI resources per role
 Key: MESOS-6857
 URL: https://issues.apache.org/jira/browse/MESOS-6857
 Project: Mesos
  Issue Type: Wish
  Components: master, webui
Reporter: Charles Allen


Currently when viewing resources in the mesos master ui all resources are 
jumbled together. This makes it challenging for operators to determine how 
different roles are utilizing the cluster resources. This ask is that the mesos 
master web ui have a per-role view of resources, similar in function to the 
current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6570) Keep container information around for a time.

2016-11-09 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6570:


 Summary: Keep container information around for a time.
 Key: MESOS-6570
 URL: https://issues.apache.org/jira/browse/MESOS-6570
 Project: Mesos
  Issue Type: Wish
  Components: containerization, HTTP API, statistics
Reporter: Charles Allen


http://mesos.apache.org/documentation/latest/endpoints/slave/containers/ 
describes the stats that are available upon probing an agent. If tasks start 
and finish quickly, they might be missed between probes of the {{/containers}} 
endpoint. The endpoint documentation states that only running containers are 
described.

This ask is that recently terminated containers still report their stats for a 
configurable time, and an extra field be added to the json indicating if the 
container is currently running or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1763) Add support for frameworks to receive resources for multiple roles.

2016-11-09 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652742#comment-15652742
 ] 

Charles Allen commented on MESOS-1763:
--

Is a specific application required to only register one framework? Why would 
you need to modify frameworks to use multiple roles instead of just ensuring 
some application can have multiple parallel registrations with different 
framework IDs?

> Add support for frameworks to receive resources for multiple roles.
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework api, master
>Reporter: Vinod Kone
>Assignee: Benjamin Mahler
>  Labels: mesosphere, multi-tenancy
>
> Currently, a framework can only obtain resources for a single allocation 
> role. This design discusses allowing frameworks to obtain resources for 
> multiple allocation roles.
> Use cases:
> * Allow an instance of a framework to be “multi-tenant” (e.g. Marathon, 
> Aurora, etc). Currently, users run multiple instances of a framework under 
> different roles to support multiple tenants.
> * Allow a framework to further leverage the resource allocation primitives 
> within Mesos to ensure it has sufficient resource guarantees in place (e.g. a 
> framework may want to set different guarantees amongst the tasks it needs to 
> run, without necessarily being multi-tenant).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5081) Posix disk isolator allows unrestricted sandbox disk usage if the executor/task doesn't specify disk resource

2016-10-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550653#comment-15550653
 ] 

Charles Allen commented on MESOS-5081:
--

Does fixing this mean that things that are kind of dumb about disk (like Spark) 
won't be able to run on slaves which specify disk resources?

> Posix disk isolator allows unrestricted sandbox disk usage if the 
> executor/task doesn't specify disk resource
> -
>
> Key: MESOS-5081
> URL: https://issues.apache.org/jira/browse/MESOS-5081
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Yan Xu
>  Labels: mesosphere
>
> This is the case even if {{flags.enforce_container_disk_quota}} is true. When 
> a task/executor doesn't specify a disk resource, it still gets to write to 
> the container sandbox. However the posix disk isolator doesn't limit it.
> Even though tasks always have access to the sandbox, it should be able to 
> write zero bytes if it doesn't have any {{disk}} resource (it can still touch 
> files). This likely will cause tasks to immediately fail due to 
> stdout/stderr/executor download, etc. but should be the correct behavior 
> (when {{flags.enforce_container_disk_quota}} is true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510662#comment-15510662
 ] 

Charles Allen commented on MESOS-6213:
--

I'm really curious what changed in my build env that allowed this to pass :-/

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6210) Master redirect with suffix gets in redirect loop

2016-09-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508695#comment-15508695
 ] 

Charles Allen commented on MESOS-6210:
--

Submitted https://reviews.apache.org/r/52105/

> Master redirect with suffix gets in redirect loop
> -
>
> Key: MESOS-6210
> URL: https://issues.apache.org/jira/browse/MESOS-6210
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Charles Allen
>Assignee: Charles Allen
>  Labels: newbie
>
> Trying to go to a URI like 
> {{http://SOME_MASTER:5050/master/redirect/master/frameworks}} ends up in a 
> redirect loop.
> The expected behavior is to either not support anything after {{redirect}} in 
> the path (redirect must be handled by a smart client), or to redirect to the 
> suffix (redirect can be handled by a dumb client).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6213) Build failure on OSX

2016-09-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507725#comment-15507725
 ] 

Charles Allen commented on MESOS-6213:
--

Something was stale in configs, fresh reboot solved this.

> Build failure on OSX
> 
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6210) Master redirect with suffix gets in redirect loop

2016-09-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507132#comment-15507132
 ] 

Charles Allen commented on MESOS-6210:
--

[~haosd...@gmail.com] Thanks! I'll manually test it internally and report back 
here.

> Master redirect with suffix gets in redirect loop
> -
>
> Key: MESOS-6210
> URL: https://issues.apache.org/jira/browse/MESOS-6210
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Charles Allen
>Assignee: Charles Allen
>  Labels: newbie
>
> Trying to go to a URI like 
> {{http://SOME_MASTER:5050/master/redirect/master/frameworks}} ends up in a 
> redirect loop.
> The expected behavior is to either not support anything after {{redirect}} in 
> the path (redirect must be handled by a smart client), or to redirect to the 
> suffix (redirect can be handled by a dumb client).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6210) Master redirect with suffix gets in redirect loop

2016-09-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507095#comment-15507095
 ] 

Charles Allen commented on MESOS-6210:
--

I meant to open that PR internally for internal review before going to 
apache-proper, sorry for the noise.

> Master redirect with suffix gets in redirect loop
> -
>
> Key: MESOS-6210
> URL: https://issues.apache.org/jira/browse/MESOS-6210
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Charles Allen
>  Labels: newbie
>
> Trying to go to a URI like 
> {{http://SOME_MASTER:5050/master/redirect/master/frameworks}} ends up in a 
> redirect loop.
> The expected behavior is to either not support anything after {{redirect}} in 
> the path (redirect must be handled by a smart client), or to redirect to the 
> suffix (redirect can be handled by a dumb client).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6210) Master redirect with suffix gets in redirect loop

2016-09-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507019#comment-15507019
 ] 

Charles Allen commented on MESOS-6210:
--

Thanks [~vinodkone]. I took a crack at a patch, but I can't seem to find any 
test coverage for routing or {{Master::Http}}. Is there a place where tests 
should go?

> Master redirect with suffix gets in redirect loop
> -
>
> Key: MESOS-6210
> URL: https://issues.apache.org/jira/browse/MESOS-6210
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Charles Allen
>  Labels: newbie
>
> Trying to go to a URI like 
> {{http://SOME_MASTER:5050/master/redirect/master/frameworks}} ends up in a 
> redirect loop.
> The expected behavior is to either not support anything after {{redirect}} in 
> the path (redirect must be handled by a smart client), or to redirect to the 
> suffix (redirect can be handled by a dumb client).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6213) Build failure on OSX

2016-09-20 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6213:


 Summary: Build failure on OSX
 Key: MESOS-6213
 URL: https://issues.apache.org/jira/browse/MESOS-6213
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: Charles Allen


Building on OSX is giving the following error.

{code}
In file included from 
../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
 error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
  deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
from  instead [-Werror,-Wdeprecated-declarations]
if (OSAtomicCompareAndSwap64Barrier(
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
 note:
  'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
here
boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t __newValue,
^
{code}

Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6210) Master redirect with suffix gets in redirect loop

2016-09-19 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6210:


 Summary: Master redirect with suffix gets in redirect loop
 Key: MESOS-6210
 URL: https://issues.apache.org/jira/browse/MESOS-6210
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: Charles Allen


Trying to go to a URI like 
{{http://SOME_MASTER:5050/master/redirect/master/frameworks}} ends up in a 
redirect loop.

The expected behavior is to either not support anything after {{redirect}} in 
the path (redirect must be handled by a smart client), or to redirect to the 
suffix (redirect can be handled by a dumb client).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-09-14 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491937#comment-15491937
 ] 

Charles Allen commented on MESOS-4697:
--

Cool thanks!

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a 
> different hierarchy. This gradually become not true with unified cgroup 
> hierarchy introduced in kernel 3.16([The unified control group hierarchy in 
> 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]).
>  Also, on some popular linux distributions, some subsystems are co-mounted 
> within the same hierarchy (e.g., net_cls and net_prio, cpu and cpuacct). It 
> becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator by introduce a Subsystem abstraction. But we don't plan to support 
> cgroup v2 in this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-09-14 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491529#comment-15491529
 ] 

Charles Allen commented on MESOS-4697:
--

Thanks [~jieyu]! As per my prior comment, I'm very interested in getting 
per-role cgroup hierarchies so I can tune cgroup settings better, I was under 
the impression that should be able to be accomplished much easier with the 
consolidated isolator. If I were to try and find a way to do such a thing, 
where would be a good starting point?

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a 
> different hierarchy. This gradually become not true with unified cgroup 
> hierarchy introduced in kernel 3.16([The unified control group hierarchy in 
> 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]).
>  Also, on some popular linux distributions, some subsystems are co-mounted 
> within the same hierarchy (e.g., net_cls and net_prio, cpu and cpuacct). It 
> becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator by introduce a Subsystem abstraction. But we don't plan to support 
> cgroup v2 in this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-09-14 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491402#comment-15491402
 ] 

Charles Allen commented on MESOS-4697:
--

Now that this is mostly in, are there any particular areas to look (or 
examples) on how to tinker with the new isolator stuff?

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a 
> different hierarchy. This gradually become not true with unified cgroup 
> hierarchy introduced in kernel 3.16([The unified control group hierarchy in 
> 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]).
>  Also, on some popular linux distributions, some subsystems are co-mounted 
> within the same hierarchy (e.g., net_cls and net_prio, cpu and cpuacct). It 
> becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator by introduce a Subsystem abstraction. But we don't plan to support 
> cgroup v2 in this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6097) Empty /etc/mesos-slave/hostname causes strange registration behavior

2016-08-26 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439975#comment-15439975
 ] 

Charles Allen commented on MESOS-6097:
--

Filed https://github.com/mesosphere/mesos-deb-packaging/issues/89 , thanks!

> Empty /etc/mesos-slave/hostname causes strange registration behavior
> 
>
> Key: MESOS-6097
> URL: https://issues.apache.org/jira/browse/MESOS-6097
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Charles Allen
>
> Using the mesosphere packaged Mesos 1.0.0 I had a node that ended up with a 
> blank /etc/mesos-slave/hostname due to something going wrong during the 
> auto-config process.
> When the agent registered with the master, it ended up reporting very 
> strangely, including reporting the agent IP as the master's and having a 
> master IP blank.
> The following items are in the /etc config:
> {code}
> $ ls /etc/mesos-slave/
> attributes  cgroups_hierarchy  cgroups_limit_swap  gc_delay  hostname  
> isolation  resources  slave_subsystems  work_dir
> {code}
> Not sure if this is the right place to report mesosphere package config 
> problems, but it seems odd the agent would start under such a broken config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6097) Empty /etc/mesos-slave/hostname causes strange registration behavior

2016-08-26 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6097:


 Summary: Empty /etc/mesos-slave/hostname causes strange 
registration behavior
 Key: MESOS-6097
 URL: https://issues.apache.org/jira/browse/MESOS-6097
 Project: Mesos
  Issue Type: Bug
  Components: cli
Reporter: Charles Allen


Using the mesosphere packaged Mesos 1.0.0 I had a node that ended up with a 
blank /etc/mesos-slave/hostname due to something going wrong during the 
auto-config process.

When the agent registered with the master, it ended up reporting very 
strangely, including reporting the agent IP as the master's and having a master 
IP blank.

The following items are in the /etc config:

{code}
$ ls /etc/mesos-slave/
attributes  cgroups_hierarchy  cgroups_limit_swap  gc_delay  hostname  
isolation  resources  slave_subsystems  work_dir
{code}

Not sure if this is the right place to report mesosphere package config 
problems, but it seems odd the agent would start under such a broken config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431839#comment-15431839
 ] 

Charles Allen commented on MESOS-6055:
--

I'll close it as {{can't reproduce}} for now. May have just been an oddity of 
the system I was testing on.

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431840#comment-15431840
 ] 

Charles Allen commented on MESOS-6055:
--

Thanks for checking it out!

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-22 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431398#comment-15431398
 ] 

Charles Allen commented on MESOS-6055:
--

Have mesos installed in a way where the main shared library isn't found. ex: 
launching a slave should fail by default with errors about not able to 
find/bind the mesos library.

Change the LD path via {{LD_LIBRARY_PATH}} such that the slave succeeds in 
running.

Try and launch something with a URI to be fetched, it will fail in confusing 
ways.
Try and launch something without a URI  (like {{echo something}}). it will 
print out {{something} as expected. 

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors

2016-08-17 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-6055:
-
Summary: Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report 
errors  (was: Mesos libs in LD_LIBRARY_PATH cause fetcher to fail)

> Mesos libs in LD_LIBRARY_PATH cause fetcher to fail and not report errors
> -
>
> Key: MESOS-6055
> URL: https://issues.apache.org/jira/browse/MESOS-6055
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Charles Allen
>
> in 1.0.0, if the agent is launched such that the mesos libraries can only be 
> found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with 
> no output. The log will not show linker errors. I'm not sure where they are 
> swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
> the mesos libs can be found, the fetcher functions as expected.
> The problem is that the errors in the fetcher linking are not obvious as no 
> logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6055) Mesos libs in LD_LIBRARY_PATH cause fetcher to fail

2016-08-17 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6055:


 Summary: Mesos libs in LD_LIBRARY_PATH cause fetcher to fail
 Key: MESOS-6055
 URL: https://issues.apache.org/jira/browse/MESOS-6055
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Reporter: Charles Allen


in 1.0.0, if the agent is launched such that the mesos libraries can only be 
found under {{LD_LIBRARY_PATH}}, the fetcher will fail and simply exit with no 
output. The log will not show linker errors. I'm not sure where they are 
swallowed. If the task is launched with LD_LIBRARY_PATH set to include where 
the mesos libs can be found, the fetcher functions as expected.

The problem is that the errors in the fetcher linking are not obvious as no 
logs are produced from the fetcher sub process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-314) Support the cgroups 'cpusets' subsystem.

2016-08-16 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423043#comment-15423043
 ] 

Charles Allen commented on MESOS-314:
-

Needs more experimentation / validation though.

> Support the cgroups 'cpusets' subsystem.
> 
>
> Key: MESOS-314
> URL: https://issues.apache.org/jira/browse/MESOS-314
> Project: Mesos
>  Issue Type: Story
>Reporter: Benjamin Mahler
>  Labels: twitter
>
> We'd like to add support for the cpusets subsystem, in order to support 
> pinning to cpus.
> This has several potential benefits:
> 1. Improved isolation against other tenants, when given exclusive access to 
> cores.
> 2. Improved performance, if pinned to several cores with good locality in the 
> CPU topology.
> 3. An alternative / complement to CFS for applying an upper limit on CPU 
> usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-314) Support the cgroups 'cpusets' subsystem.

2016-08-16 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423040#comment-15423040
 ] 

Charles Allen commented on MESOS-314:
-

Druid.io is having big-node scaling issues reported by the community (see 
https://groups.google.com/d/msg/druid-user/l5grbWf2x9w/nF5fPyiaBQAJ ) I think 
having proper cpuset / memory controller isolation per task would help with 
this tremendously. 

> Support the cgroups 'cpusets' subsystem.
> 
>
> Key: MESOS-314
> URL: https://issues.apache.org/jira/browse/MESOS-314
> Project: Mesos
>  Issue Type: Story
>Reporter: Benjamin Mahler
>  Labels: twitter
>
> We'd like to add support for the cpusets subsystem, in order to support 
> pinning to cpus.
> This has several potential benefits:
> 1. Improved isolation against other tenants, when given exclusive access to 
> cores.
> 2. Improved performance, if pinned to several cores with good locality in the 
> CPU topology.
> 3. An alternative / complement to CFS for applying an upper limit on CPU 
> usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6024) CORS issue in 1.0.0 master UI

2016-08-10 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416033#comment-15416033
 ] 

Charles Allen commented on MESOS-6024:
--

[~kaysoky] cool thanks

> CORS issue in 1.0.0 master UI
> -
>
> Key: MESOS-6024
> URL: https://issues.apache.org/jira/browse/MESOS-6024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.0.0
>Reporter: Charles Allen
>
> My setup is such that I have a dns entry which points to one of a group of 
> mesos masters (3). The mesos masters are setup to use IP only stuff 
> (advertise and bind based on ip address), and not try to do things based on 
> hostname / DNS resolution.
> For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
> 192.168.1.103 behind master.mesos.internal
> Whenever I try and go to the "overall" dns endpoint, I get the error: 
> XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
> 'Access-Control-Allow-Origin' header is present on the requested resource. 
> Origin 'http://master.mesos.internal:5050' is therefore not allowed access.
> And the page itself simply gives me the retry modal of "Failed to connect to 
> master.mesos.internal:5050!"
> If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
> behaves as expected.
> Overall this is probably a good thing as I suspect proper security 
> enforcement may have been lacking previously.
> The expected behavior is that I am redirected to the leader.
> This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing 
> with OSX Chrome  51.0.2704.103 (64-bit)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6024) CORS issue in 1.0.0 master UI

2016-08-10 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-6024:
-
Description: 
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the error: 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

And the page itself simply gives me the retry modal of "Failed to connect to 
master.mesos.internal:5050!"

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing with 
OSX Chrome  51.0.2704.103 (64-bit)

  was:
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

And the page itself simply gives me the retry modal of "Failed to connect to 
master.mesos.internal:5050!"

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing with 
OSX Chrome  51.0.2704.103 (64-bit)


> CORS issue in 1.0.0 master UI
> -
>
> Key: MESOS-6024
> URL: https://issues.apache.org/jira/browse/MESOS-6024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.0.0
>Reporter: Charles Allen
>
> My setup is such that I have a dns entry which points to one of a group of 
> mesos masters (3). The mesos masters are setup to use IP only stuff 
> (advertise and bind based on ip address), and not try to do things based on 
> hostname / DNS resolution.
> For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
> 192.168.1.103 behind master.mesos.internal
> Whenever I try and go to the "overall" dns endpoint, I get the error: 
> XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
> 'Access-Control-Allow-Origin' header is present on the requested resource. 
> Origin 'http://master.mesos.internal:5050' is therefore not allowed access.
> And the page itself simply gives me the retry modal of "Failed to connect to 
> master.mesos.internal:5050!"
> If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
> behaves as expected.
> Overall this is probably a good thing as I suspect proper security 
> enforcement may have been lacking previously.
> The expected behavior is that I am redirected to the leader.
> This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing 
> with OSX Chrome  51.0.2704.103 (64-bit)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6024) CORS issue in 1.0.0 master UI

2016-08-10 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-6024:
-
Description: 
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

And the page itself simply gives me the retry modal of "Failed to connect to 
master.mesos.internal:5050!"

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing with 
OSX Chrome  51.0.2704.103 (64-bit)

  was:
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing with 
OSX Chrome  51.0.2704.103 (64-bit)


> CORS issue in 1.0.0 master UI
> -
>
> Key: MESOS-6024
> URL: https://issues.apache.org/jira/browse/MESOS-6024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.0.0
>Reporter: Charles Allen
>
> My setup is such that I have a dns entry which points to one of a group of 
> mesos masters (3). The mesos masters are setup to use IP only stuff 
> (advertise and bind based on ip address), and not try to do things based on 
> hostname / DNS resolution.
> For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
> 192.168.1.103 behind master.mesos.internal
> Whenever I try and go to the "overall" dns endpoint, I get the name 
> XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
> 'Access-Control-Allow-Origin' header is present on the requested resource. 
> Origin 'http://master.mesos.internal:5050' is therefore not allowed access.
> And the page itself simply gives me the retry modal of "Failed to connect to 
> master.mesos.internal:5050!"
> If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
> behaves as expected.
> Overall this is probably a good thing as I suspect proper security 
> enforcement may have been lacking previously.
> The expected behavior is that I am redirected to the leader.
> This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing 
> with OSX Chrome  51.0.2704.103 (64-bit)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6024) CORS issue in 1.0.0 master UI

2016-08-10 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-6024:
-
Description: 
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing with 
OSX Chrome  51.0.2704.103 (64-bit)

  was:
My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository


> CORS issue in 1.0.0 master UI
> -
>
> Key: MESOS-6024
> URL: https://issues.apache.org/jira/browse/MESOS-6024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.0.0
>Reporter: Charles Allen
>
> My setup is such that I have a dns entry which points to one of a group of 
> mesos masters (3). The mesos masters are setup to use IP only stuff 
> (advertise and bind based on ip address), and not try to do things based on 
> hostname / DNS resolution.
> For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
> 192.168.1.103 behind master.mesos.internal
> Whenever I try and go to the "overall" dns endpoint, I get the name 
> XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
> 'Access-Control-Allow-Origin' header is present on the requested resource. 
> Origin 'http://master.mesos.internal:5050' is therefore not allowed access.
> If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
> behaves as expected.
> Overall this is probably a good thing as I suspect proper security 
> enforcement may have been lacking previously.
> The expected behavior is that I am redirected to the leader.
> This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository viewing 
> with OSX Chrome  51.0.2704.103 (64-bit)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6024) CORS issue in 1.0.0 master UI

2016-08-10 Thread Charles Allen (JIRA)

Charles Allen created MESOS-6024:


 Summary: CORS issue in 1.0.0 master UI
 Key: MESOS-6024
 URL: https://issues.apache.org/jira/browse/MESOS-6024
 Project: Mesos
  Issue Type: Bug
  Components: master, webui
Affects Versions: 1.0.0
Reporter: Charles Allen


My setup is such that I have a dns entry which points to one of a group of 
mesos masters (3). The mesos masters are setup to use IP only stuff (advertise 
and bind based on ip address), and not try to do things based on hostname / DNS 
resolution.

For example, I might have 3 masters 192.168.1.100, 192.168.1.101, and 
192.168.1.103 behind master.mesos.internal

Whenever I try and go to the "overall" dns endpoint, I get the name 

XMLHttpRequest cannot load http://192.168.1.100:5050/master/state. No 
'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://master.mesos.internal:5050' is therefore not allowed access.

If I go to http://192.168.1.100:5050/ from the error message the whole web ui 
behaves as expected.

Overall this is probably a good thing as I suspect proper security enforcement 
may have been lacking previously.

The expected behavior is that I am redirected to the leader.

This is on 1.0.0-2.0.89.ubuntu1404 from the mesosphere repository



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5929) Total cluster resources on master Mesos UI should have better spacing.

2016-08-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409987#comment-15409987
 ] 

Charles Allen commented on MESOS-5929:
--

Thanks!

> Total cluster resources on master Mesos UI should have better spacing.
> --
>
> Key: MESOS-5929
> URL: https://issues.apache.org/jira/browse/MESOS-5929
> Project: Mesos
>  Issue Type: Wish
>  Components: webui
>Affects Versions: 0.28.2
>Reporter: Charles Allen
>Assignee: Charles Allen
> Fix For: 1.1.0
>
> Attachments: Screen Shot 2016-07-29 at 9.45.25 AM.png
>
>
> The Resources for total cluster resources formats oddly even when there are 
> only a few terabytes of memory and disk across a cluster. I'll try to attach 
> a screenshot shortly.
> This ask is that the data be presented more cleanly.
> One approach could be to scale the number to the appropriate scale. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5929) Total cluster resources on master Mesos UI should have better spacing.

2016-08-05 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen reassigned MESOS-5929:


Assignee: Charles Allen

> Total cluster resources on master Mesos UI should have better spacing.
> --
>
> Key: MESOS-5929
> URL: https://issues.apache.org/jira/browse/MESOS-5929
> Project: Mesos
>  Issue Type: Wish
>  Components: webui
>Affects Versions: 0.28.2
>Reporter: Charles Allen
>Assignee: Charles Allen
> Attachments: Screen Shot 2016-07-29 at 9.45.25 AM.png
>
>
> The Resources for total cluster resources formats oddly even when there are 
> only a few terabytes of memory and disk across a cluster. I'll try to attach 
> a screenshot shortly.
> This ask is that the data be presented more cleanly.
> One approach could be to scale the number to the appropriate scale. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2021) Mesos will not execute on ARM CPU based System

2016-07-31 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401340#comment-15401340
 ] 

Charles Allen commented on MESOS-2021:
--

http://blog.haosdent.me/2016/04/24/mesos-on-arm/ has some more info as well.

> Mesos will not execute on ARM CPU based System
> --
>
> Key: MESOS-2021
> URL: https://issues.apache.org/jira/browse/MESOS-2021
> Project: Mesos
>  Issue Type: Bug
>  Components: cli, general
>Affects Versions: 0.20.1
> Environment: Linux c1-10-1-2-103.cloud.online.net 3.17.0-85 #2 SMP 
> Wed Oct 15 15:31:27 CEST 2014 armv7l Marvell Armada 370/XP (Device Tree) 
> GNU/Linux
>Reporter: Axel Etcheverry
>
> Configuration : ../configure --disable-java
> {noformat}
> c1-10-1-2-103 build # ./bin/mesos-master.sh --ip=127.0.0.1 
> --work_dir=/var/lib/mesos
> I1031 12:11:04.993352 21161 main.cpp:155] Build: 2014-10-31 01:04:48 by root
> I1031 12:11:04.995046 21161 main.cpp:157] Version: 0.20.1
> F1031 12:11:04.997396 21161 leveldb.cpp:160] Check failed: 
> leveldb::BytewiseComparator()->Compare(one, two) < 0 
> *** Check failure stack trace: ***
> @ 0xb63db678  google::LogMessage::Fail()
> @ 0xb63dd428  google::LogMessage::SendToLog()
> @ 0xb63db294  google::LogMessage::Flush()
> @ 0xb63ddbc4  google::LogMessageFatal::~LogMessageFatal()
> @ 0xb62c8fb0  mesos::internal::log::LevelDBStorage::restore()
> @ 0xb6337198  mesos::internal::log::ReplicaProcess::restore()
> @ 0xb6337920  mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0xb6337ad4  mesos::internal::log::Replica::Replica()
> @ 0xb62ca948  mesos::internal::log::LogProcess::LogProcess()
> @ 0xb62cabc8  mesos::internal::log::Log::Log()
> @0x41818  main
> @ 0xb51965f8  (unknown)
> Aborted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-5929) Total cluster resources on master Mesos UI should have better spacing.

2016-07-29 Thread Charles Allen (JIRA)

Charles Allen created MESOS-5929:


 Summary: Total cluster resources on master Mesos UI should have 
better spacing.
 Key: MESOS-5929
 URL: https://issues.apache.org/jira/browse/MESOS-5929
 Project: Mesos
  Issue Type: Wish
  Components: webui
Affects Versions: 0.28.2
Reporter: Charles Allen
 Attachments: Screen Shot 2016-07-29 at 9.45.25 AM.png

The Resources for total cluster resources formats oddly even when there are 
only a few terabytes of memory and disk across a cluster. I'll try to attach a 
screenshot shortly.

This ask is that the data be presented more cleanly.

One approach could be to scale the number to the appropriate scale. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5929) Total cluster resources on master Mesos UI should have better spacing.

2016-07-29 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated MESOS-5929:
-
Attachment: Screen Shot 2016-07-29 at 9.45.25 AM.png

> Total cluster resources on master Mesos UI should have better spacing.
> --
>
> Key: MESOS-5929
> URL: https://issues.apache.org/jira/browse/MESOS-5929
> Project: Mesos
>  Issue Type: Wish
>  Components: webui
>Affects Versions: 0.28.2
>Reporter: Charles Allen
> Attachments: Screen Shot 2016-07-29 at 9.45.25 AM.png
>
>
> The Resources for total cluster resources formats oddly even when there are 
> only a few terabytes of memory and disk across a cluster. I'll try to attach 
> a screenshot shortly.
> This ask is that the data be presented more cleanly.
> One approach could be to scale the number to the appropriate scale. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5749) Have maven run in batch mode

2016-06-30 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357809#comment-15357809
 ] 

Charles Allen commented on MESOS-5749:
--

discussion in https://reviews.apache.org/r/49422/

> Have maven run in batch mode
> 
>
> Key: MESOS-5749
> URL: https://issues.apache.org/jira/browse/MESOS-5749
> Project: Mesos
>  Issue Type: Improvement
>  Components: java api
>Reporter: Charles Allen
>Priority: Minor
>
> Currently when the Makefile invokes maven, it does not use the -B flag. This 
> ask is to have maven use the -B flag to make it friendly for automated build 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5749) Have maven run in batch mode

2016-06-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356339#comment-15356339
 ] 

Charles Allen commented on MESOS-5749:
--

patch incoming through reviewboard shortly

> Have maven run in batch mode
> 
>
> Key: MESOS-5749
> URL: https://issues.apache.org/jira/browse/MESOS-5749
> Project: Mesos
>  Issue Type: Improvement
>  Components: java api
>Reporter: Charles Allen
>Priority: Minor
>
> Currently when the Makefile invokes maven, it does not use the -B flag. This 
> ask is to have maven use the -B flag to make it friendly for automated build 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2386) Provide full filesystem isolation as a native mesos isolator

2016-05-27 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304879#comment-15304879
 ] 

Charles Allen commented on MESOS-2386:
--

It still isn't :(

> Provide full filesystem isolation as a native mesos isolator
> 
>
> Key: MESOS-2386
> URL: https://issues.apache.org/jira/browse/MESOS-2386
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation
>Affects Versions: 0.22.1
>Reporter: Dominic Hamon
>Assignee: Ian Downes
>  Labels: mesosphere, twitter
>
> Design
> https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-05-22 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295596#comment-15295596
 ] 

Charles Allen commented on MESOS-4697:
--

Thank you [~haosd...@gmail.com], that sounds great!

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a 
> different hierarchy. This gradually become not true with unified cgroup 
> hierarchy introduced in kernel 3.16([The unified control group hierarchy in 
> 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]).
>  Also, on some popular linux distributions, some subsystems are co-mounted 
> within the same hierarchy (e.g., net_cls and net_prio, cpu and cpuacct). It 
> becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator by introduce a Subsystem abstraction. But we don't plan to support 
> cgroup v2 in this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4697) Consolidate cgroup isolators into one single isolator.

2016-05-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293970#comment-15293970
 ] 

Charles Allen commented on MESOS-4697:
--

General feedback from someone who has written frameworks and deployed other 
frameworks.

Whenever I'm deploying a framework on Mesos, I rarely care about what isolation 
group it is using. Usually I simply want to have an understanding of how my 
resources are going to be requested/handled. This comes in play largely with 
frameworks who have different levels of resource awareness. Some know about 
memory and cpu, and a select few about disk needs.

As the capabilities of resource isolation expand, I do NOT want to have to go 
back and update older frameworks to make sure they play nice with more-modern 
frameworks with better resource awareness.

My current approach to handling this is through [roles | 
http://mesos.apache.org/documentation/latest/roles/] where a role is really a 
pre-agreed upon set of resource expectations.

What I would love to see is a way for me to have different cgroup roots per 
role. Or at least more clear expectations on how to have such a scenario. This 
way I can tune cgroups at a system level regardless of how aware mesos is of 
the node's capabilities.

As a discrete example, I would like to have blkio tuned on a node such that all 
tasks from a particular mesos role have some expectations of blkio, all tasks 
from a DIFFERENT mesos task have some other expectations, and a THIRD group of 
tasks which are NOT part of mesos might have a third set of tuningset. This 
could be accomplished within mesos IFF mesos were aware of all potential 
cgroups my kernel supports, AND all my frameworks had ways of running through 
mesos, but neither one of those is a guaranteed assumption.

My ask here is that the intended behavior is clarified for when a cgroup is 
present on a system, but the version of mesos running is not aware of how to 
use such a cgroup (blkio or maybe 
https://issues.apache.org/jira/browse/MESOS-4424 or something else even).

> Consolidate cgroup isolators into one single isolator.
> --
>
> Key: MESOS-4697
> URL: https://issues.apache.org/jira/browse/MESOS-4697
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: haosdent
> Attachments: cgroup_v2.pdf
>
>
> There are two motivations for this:
> 1) It's very verbose to add a new isolator. For cgroup isolators (e.g., cpu, 
> mem, net_cls, etc.), many of the logics are the same. We are currently 
> duplicating a lot of the code.
> 2) Initially, we decided to use a separate isolator for each cgroup subsystem 
> is because we want each subsystem to be mounted under a 
> different hierarchy. This gradually become not true with unified cgroup 
> hierarchy introduced in kernel 3.16([The unified control group hierarchy in 
> 3.16|https://lwn.net/Articles/601840/], 
> [cgroup-v2|https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v2.txt|]).
>  Also, on some popular linux distributions, some subsystems are co-mounted 
> within the same hierarchy (e.g., net_cls and net_prio, cpu and cpuacct). It 
> becomes very hard to co-manage a hierarchy by two isolators.
> We can still introduce subsystem specific code under the unified cgroup 
> isolator by introduce a Subsystem abstraction. But we don't plan to support 
> cgroup v2 in this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5041) Add cgroups unified isolator

2016-05-20 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293815#comment-15293815
 ] 

Charles Allen commented on MESOS-5041:
--

Since this is listed in the mesos roadmap, can there be a little more flavor in 
the master comment about why this is needed and what it is doing differently 
than the current cgroup impl?

> Add cgroups unified isolator
> 
>
> Key: MESOS-5041
> URL: https://issues.apache.org/jira/browse/MESOS-5041
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, isolation
>Reporter: haosdent
>Assignee: haosdent
>
> Implement the cgroups unified isolator and enable it in Mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4857) RFC5424 logging support

2016-03-03 Thread Charles Allen (JIRA)

Charles Allen created MESOS-4857:


 Summary: RFC5424 logging support
 Key: MESOS-4857
 URL: https://issues.apache.org/jira/browse/MESOS-4857
 Project: Mesos
  Issue Type: Wish
  Components: general
Reporter: Charles Allen


RFC5424 https://tools.ietf.org/html/rfc5424 is a standard for syslog.

Other logging implementations like log4j2 support a RFC5424 format when writing 
to syslog: 
https://logging.apache.org/log4j/2.x/manual/layouts.html#RFC5424Layout

This ask is that Mesos have an option for supporting the RFC5424 standard when 
logging so that multiple RFC5424 compliant services running on a single machine 
can log in a mutually parseable way, eliminating the need for log aggregators 
to support a wide variety of logging formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-313) Report executor terminations to framework schedulers.

2016-01-05 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083942#comment-15083942
 ] 

Charles Allen commented on MESOS-313:
-

Can you expand a little more on what conditions must be met for an executor to 
be considered lost? Is it only considered lost if the slave/agent is reachable 
and reports the executor as terminated? 

> Report executor terminations to framework schedulers.
> -
>
> Key: MESOS-313
> URL: https://issues.apache.org/jira/browse/MESOS-313
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Charles Reiss
>Assignee: Zhitao Li
>  Labels: mesosphere, newbie
>
> The Scheduler interface has a callback for executorLost, but currently it is 
> never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3908) Mesos slave happily freezes itself on startup.

2015-11-12 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002945#comment-15002945
 ] 

Charles Allen commented on MESOS-3908:
--

The work around is to simply remove freezer from the slave_subsystems list

> Mesos slave happily freezes itself on startup.
> --
>
> Key: MESOS-3908
> URL: https://issues.apache.org/jira/browse/MESOS-3908
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.24.1
>Reporter: Charles Allen
>Priority: Minor
>
> If freezer is specified in slave_subsystems the slave will happily freeze 
> itself on startup. This will cause the thread to be locked until you manually 
> thaw it via 
> {code}
> echo THAWED | sudo tee /sys/fs/cgroup/freezer/mesos/slave/freezer.state
> {code} 
> at which point it gets confused and exits.
> These were the cgroup settings I was tinkering with
> {code}
> export MESOS_isolation=cgroups/cpu,cgroups/mem
> export MESOS_cgroups_limit_swap=false
> export MESOS_cgroups_hierarchy=/sys/fs/cgroup
> export 
> MESOS_slave_subsystems=cpuacct,memory,blkio,cpuacct,cpuset,devices,freezer
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3908) Mesos slave happily freezes itself on startup.

2015-11-12 Thread Charles Allen (JIRA)

Charles Allen created MESOS-3908:


 Summary: Mesos slave happily freezes itself on startup.
 Key: MESOS-3908
 URL: https://issues.apache.org/jira/browse/MESOS-3908
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.24.1
Reporter: Charles Allen
Priority: Minor


If freezer is specified in slave_subsystems the slave will happily freeze 
itself on startup. This will cause the thread to be locked until you manually 
thaw it via 
{code}
echo THAWED | sudo tee /sys/fs/cgroup/freezer/mesos/slave/freezer.state
{code} 
at which point it gets confused and exits.

These were the cgroup settings I was tinkering with
{code}
export MESOS_isolation=cgroups/cpu,cgroups/mem
export MESOS_cgroups_limit_swap=false
export MESOS_cgroups_hierarchy=/sys/fs/cgroup
export 
MESOS_slave_subsystems=cpuacct,memory,blkio,cpuacct,cpuset,devices,freezer
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2044) Use one IP address per container for network isolation

2015-09-03 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730046#comment-14730046
 ] 

Charles Allen commented on MESOS-2044:
--

I'm a bit confused on why this needs to be so integrated into Mesos all through 
the stack instead of just being used as another type of Resource that any 
particular slave can expose, and exposing it as a pluggable resource on the 
slave. Then frameworks which know or care about such a resource can request it, 
and ones that don't know or care can simply ignore it.

>From the proposals I've seen this is trying to be a global resource that 
>either must be supported by all nodes or not supported at all. Is that really 
>required?

What use cases fail if IP address per container are simply exposed as a slave 
resource?

> Use one IP address per container for network isolation
> --
>
> Key: MESOS-2044
> URL: https://issues.apache.org/jira/browse/MESOS-2044
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cong Wang
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> If there are enough IP addresses, either IPv4 or IPv6, we should use one IP 
> address per container, instead of the ugly port range based solution. One 
> problem with this is the IP address management, usually it is managed by a 
> DHCP server, maybe we need to manage them in mesos master/slave.
> Also, maybe use macvlan instead of veth for better isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-06-24 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600410#comment-14600410
 ] 

Charles Allen commented on MESOS-2340:
--

Is rolling back from 0.24 to 0.23 during a failed rolling upgrade or 
immediately after something that is intended to be supported?

 Publish JSON in ZK instead of serialized MasterInfo
 ---

 Key: MESOS-2340
 URL: https://issues.apache.org/jira/browse/MESOS-2340
 Project: Mesos
  Issue Type: Improvement
  Components: leader election
Reporter: Zameer Manji
Assignee: Marco Massenzio

 Currently to discover the master a client needs the ZK node location and 
 access to the MasterInfo protobuf so it can deserialize the binary blob in 
 the node.
 I think it would be nice to publish JSON (like Twitter's ServerSets) so 
 clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1796) Support multiple working paths

2014-09-15 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134465#comment-14134465
 ] 

Charles Allen commented on MESOS-1796:
--

If there is a way to specify multiple working paths, then I have completely 
missed it in the docs.

 Support multiple working paths
 --

 Key: MESOS-1796
 URL: https://issues.apache.org/jira/browse/MESOS-1796
 Project: Mesos
  Issue Type: Wish
  Components: slave
Reporter: Charles Allen
Priority: Minor

 As a framework developer, I would like the ability to have multiple working 
 paths as part of a slave reporting its resources.
 Currently, if a slave (like an ec2 instance) has multiple disks, the disks 
 must be combined in a MD array or similar in order to be fully utilized in 
 Mesos. This ask is to allow multiple disks to be mounted on multiple paths, 
 and have the slave be able to support and report availability on these 
 various working paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

84 matches

Mail list logo