[jira] [Updated] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")

2016-03-18 Thread Shuai Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Lin updated MESOS-4877:
-
Assignee: Gilbert Song  (was: Shuai Lin)

> Mesos containerizer can't handle top level docker image like "alpine" (must 
> use "library/alpine")
> -
>
> Key: MESOS-4877
> URL: https://issues.apache.org/jira/browse/MESOS-4877
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.27.0, 0.27.1
>Reporter: Shuai Lin
>Assignee: Gilbert Song
>
> This can be demonstrated with the {{mesos-execute}} command:
> # Docker containerizer with image {{alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{alpine}}: failure
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{library/alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=library/alpine 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=localhost:5050
> {code}
> In the slave logs:
> {code}
> ea-4460-83
> 9c-838da86af34c-0007'
> I0306 16:32:41.418269  3403 metadata_manager.cpp:159] Looking for image 
> 'alpine:latest'
> I0306 16:32:41.418699  3403 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test
> /store/docker/staging/ka7MlQ'
> E0306 16:32:43.098131  3400 slave.cpp:3773] Container 
> '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of 
> framework 4f055c6f-1bea-4460-839c-838da86af34c-0
> 007 failed to start: Collect failed: Unexpected HTTP response '401 
> Unauthorized
> {code}
> curl command executed:
> {code}
> $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and 
> proc.name=curl
>16:42:53.198998042 curl -s -S -L -D - 
> https://registry-1.docker.io:443/v2/alpine/manifests/latest
> 16:42:53.784958541 curl -s -S -L -D - 
> https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull
> 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer 
> eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw
>  https://registry-1.docker.io:443/v2/alpine/manifests/latest
> {code}
> Also got the same result with {{ubuntu}} docker image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")

2016-03-18 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202611#comment-15202611
 ] 

Shuai Lin commented on MESOS-4877:
--

Never mind! I'll reassign this ticket you.

> Mesos containerizer can't handle top level docker image like "alpine" (must 
> use "library/alpine")
> -
>
> Key: MESOS-4877
> URL: https://issues.apache.org/jira/browse/MESOS-4877
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.27.0, 0.27.1
>Reporter: Shuai Lin
>Assignee: Shuai Lin
>
> This can be demonstrated with the {{mesos-execute}} command:
> # Docker containerizer with image {{alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{alpine}}: failure
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{library/alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=library/alpine 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=localhost:5050
> {code}
> In the slave logs:
> {code}
> ea-4460-83
> 9c-838da86af34c-0007'
> I0306 16:32:41.418269  3403 metadata_manager.cpp:159] Looking for image 
> 'alpine:latest'
> I0306 16:32:41.418699  3403 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test
> /store/docker/staging/ka7MlQ'
> E0306 16:32:43.098131  3400 slave.cpp:3773] Container 
> '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of 
> framework 4f055c6f-1bea-4460-839c-838da86af34c-0
> 007 failed to start: Collect failed: Unexpected HTTP response '401 
> Unauthorized
> {code}
> curl command executed:
> {code}
> $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and 
> proc.name=curl
>16:42:53.198998042 curl -s -S -L -D - 
> https://registry-1.docker.io:443/v2/alpine/manifests/latest
> 16:42:53.784958541 curl -s -S -L -D - 
> https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull
> 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer 
> eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw
>  https://registry-1.docker.io:443/v2/alpine/manifests/latest
> {code}
> Also got the same result with {{ubuntu}} docker image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")

2016-03-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199862#comment-15199862
 ] 

Gilbert Song commented on MESOS-4877:
-

Sorry [~lins05], I addressed those TODOs before I saw this JIRA. Would you mind 
to take those patches?

https://reviews.apache.org/r/44672/

> Mesos containerizer can't handle top level docker image like "alpine" (must 
> use "library/alpine")
> -
>
> Key: MESOS-4877
> URL: https://issues.apache.org/jira/browse/MESOS-4877
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.27.0, 0.27.1
>Reporter: Shuai Lin
>Assignee: Shuai Lin
>
> This can be demonstrated with the {{mesos-execute}} command:
> # Docker containerizer with image {{alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{alpine}}: failure
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos 
> --name=just-a-test --command="sleep 1000" --master=localhost:5050
> {code}
> # Mesos containerizer with image {{library/alpine}}: success
> {code}
> sudo ./build/src/mesos-execute --docker_image=library/alpine 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=localhost:5050
> {code}
> In the slave logs:
> {code}
> ea-4460-83
> 9c-838da86af34c-0007'
> I0306 16:32:41.418269  3403 metadata_manager.cpp:159] Looking for image 
> 'alpine:latest'
> I0306 16:32:41.418699  3403 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test
> /store/docker/staging/ka7MlQ'
> E0306 16:32:43.098131  3400 slave.cpp:3773] Container 
> '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of 
> framework 4f055c6f-1bea-4460-839c-838da86af34c-0
> 007 failed to start: Collect failed: Unexpected HTTP response '401 
> Unauthorized
> {code}
> curl command executed:
> {code}
> $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and 
> proc.name=curl
>16:42:53.198998042 curl -s -S -L -D - 
> https://registry-1.docker.io:443/v2/alpine/manifests/latest
> 16:42:53.784958541 curl -s -S -L -D - 
> https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull
> 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer 
> eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw
>  https://registry-1.docker.io:443/v2/alpine/manifests/latest
> {code}
> Also got the same result with {{ubuntu}} docker image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4979) os::rmdir does not handle special files (e.g., device, socket).

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4979:
--
Component/s: stout

> os::rmdir does not handle special files (e.g., device, socket).
> ---
>
> Key: MESOS-4979
> URL: https://issues.apache.org/jira/browse/MESOS-4979
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Affects Versions: 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.24.0, 0.25.0, 
> 0.26.0, 0.27.0, 0.27.1, 0.27.2
>Reporter: Jie Yu
>Assignee: Jojy Varghese
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Stout os::rmdir does not handle special files like device files or socket 
> files. This could cause failures when GC sandboxes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3573) Mesos does not kill orphaned docker containers

2016-03-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3573:
--
Shepherd: Timothy Chen

> Mesos does not kill orphaned docker containers
> --
>
> Key: MESOS-3573
> URL: https://issues.apache.org/jira/browse/MESOS-3573
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Reporter: Ian Babrou
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> After upgrade to 0.24.0 we noticed hanging containers appearing. Looks like 
> there were changes between 0.23.0 and 0.24.0 that broke cleanup.
> Here's how to trigger this bug:
> 1. Deploy app in docker container.
> 2. Kill corresponding mesos-docker-executor process
> 3. Observe hanging container
> Here are the logs after kill:
> {noformat}
> slave_1| I1002 12:12:59.362002  7791 docker.cpp:1576] Executor for 
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' has exited
> slave_1| I1002 12:12:59.362284  7791 docker.cpp:1374] Destroying 
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1| I1002 12:12:59.363404  7791 docker.cpp:1478] Running docker stop 
> on container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1| I1002 12:12:59.363876  7791 slave.cpp:3399] Executor 
> 'sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c' of framework 
> 20150923-122130-2153451692-5050-1- terminated with signal Terminated
> slave_1| I1002 12:12:59.367570  7791 slave.cpp:2696] Handling status 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- from @0.0.0.0:0
> slave_1| I1002 12:12:59.367842  7791 slave.cpp:5094] Terminating task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c
> slave_1| W1002 12:12:59.368484  7791 docker.cpp:986] Ignoring updating 
> unknown container: f083aaa2-d5c3-43c1-b6ba-342de8829fa8
> slave_1| I1002 12:12:59.368671  7791 status_update_manager.cpp:322] 
> Received status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> slave_1| I1002 12:12:59.368741  7791 status_update_manager.cpp:826] 
> Checkpointing UPDATE for status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> slave_1| I1002 12:12:59.370636  7791 status_update_manager.cpp:376] 
> Forwarding update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) 
> for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to the slave
> slave_1| I1002 12:12:59.371335  7791 slave.cpp:2975] Forwarding the 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to master@172.16.91.128:5050
> slave_1| I1002 12:12:59.371908  7791 slave.cpp:2899] Status update 
> manager successfully handled status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> master_1   | I1002 12:12:59.37204711 master.cpp:4069] Status update 
> TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- from slave 
> 20151002-120829-2153451692-5050-1-S0 at slave(1)@172.16.91.128:5051 
> (172.16.91.128)
> master_1   | I1002 12:12:59.37253411 master.cpp:4108] Forwarding status 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> master_1   | I1002 12:12:59.37301811 master.cpp:5576] Updating the latest 
> state of task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to TASK_FAILED
> master_1   | I1002 12:12:59.37344711 hierarchical.hpp:814] Recovered 
> cpus(*):0.1; mem(*):16; ports(*):[31685-31685] (total: cpus(*):4; 
> mem(*):1001; disk(*):52869; ports(*):[31000-32000], allocated: 
> cpus(*):8.32667e-17) on slave 20151002-120829-2153451692-5050-1-S0 from 
> framework 20150923-122130-2153451692-5050-1-
> {noformat}
> Another issue: if you restart mesos-slave on the host with orphaned docker 
> containers, they are not getting killed. This was the case before and I hoped 
> for this trick to kill hanging containers, but it doesn't work now.
> Marking this as critical because it hoards cluster resources and blocks 
> scheduling.



--

[jira] [Commented] (MESOS-4033) Add a commit hook for non-ascii charachters

2016-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201985#comment-15201985
 ] 

haosdent commented on MESOS-4033:
-

I have a question about this ticket, should we avoid those zero-space 
characters only or avoid all no-ascii characters? If disable no-ascii 
characters, we could not use emoji and some languages which contains no-ascii 
characters. For the powered-by-mesos.md and user-groups.html.md, this hard 
limit seems inconvenient.

> Add a commit hook for non-ascii charachters
> ---
>
> Key: MESOS-4033
> URL: https://issues.apache.org/jira/browse/MESOS-4033
> Project: Mesos
>  Issue Type: Task
>Reporter: Alexander Rukletsov
>Assignee: Yong Tang
>Priority: Minor
>  Labels: mesosphere
>
> Non-ascii characters invisible in some editors may sneak into the codebase 
> (see e.g. https://reviews.apache.org/r/40799/). To avoid this, a pre-commit 
> hook can be added.
> Quick searching suggested a simple perl script: 
> https://superuser.com/questions/417305/how-can-i-identify-non-ascii-characters-from-the-shell



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.

2016-03-18 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197746#comment-15197746
 ] 

Yong Tang commented on MESOS-4621:
--

I take a look at the review request. Feel like it is probably too big to be 
reviewed in one pass. Maybe it could be better if it is decomposed into several 
RRs?

> --disable-optimize triggers optimized builds.
> -
>
> Key: MESOS-4621
> URL: https://issues.apache.org/jira/browse/MESOS-4621
> Project: Mesos
>  Issue Type: Bug
>Reporter: Till Toenshoff
>Assignee: Yong Tang
>Priority: Minor
>
> The toggle-logic of the build configuration argument {{optimize}} appears to 
> be implemented incorrectly. When using the perfectly legal invocation;
> {noformat}
> ../configure --disable-optimize
> {noformat}
> What you get here is enabled optimizing {{O2}}.
> {noformat}
> ccache g++ -Qunused-arguments -fcolor-diagnostics 
> -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" 
> -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" 
> -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. 
> -I../../../../3rdparty/libprocess/3rdparty  
> -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src 
>  -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 
> -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src 
> -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
> -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0   -O2 -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
> stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o 
> stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo 
> '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp
> {noformat}
> It seems more straightforward to actually disable optimizing for the above 
> argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4959) Enable support for mesos-style assertion macros in clang-tidy core analyzers

2016-03-18 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4959:

Description: 
clang-tidy has a number of core analyzers that analyze control flow to make 
sure that e.g., dereferenced pointers are not null. The clang control flow 
analysis framework uses e.g., the presence of {{assert}} to prune certain edges 
from the control flow graph.

Mesos uses a number of custom assertion macros from glog which are not 
understood by these analyzers. We should find a way to add support for these 
macros, either by redefining these macros in ways clang static analysis can 
understand, or by extending the framework.

  was:
clang-tidy has a number of core analyzers that analyze control flow to make 
sure that e.g., dereferenced pointers are not null. The clang control flow 
analysis framework uses e.g., the presence of `assert` to prune certain edges 
from the control flow graph.

Mesos uses a number of custom assertion macros from glog which are not 
understood by these analyzers. We should find a way to add support for these 
macros, either by redefining these macros in ways clang static analysis can 
understand, or by extending the framework.


> Enable support for mesos-style assertion macros in clang-tidy core analyzers
> 
>
> Key: MESOS-4959
> URL: https://issues.apache.org/jira/browse/MESOS-4959
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>
> clang-tidy has a number of core analyzers that analyze control flow to make 
> sure that e.g., dereferenced pointers are not null. The clang control flow 
> analysis framework uses e.g., the presence of {{assert}} to prune certain 
> edges from the control flow graph.
> Mesos uses a number of custom assertion macros from glog which are not 
> understood by these analyzers. We should find a way to add support for these 
> macros, either by redefining these macros in ways clang static analysis can 
> understand, or by extending the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4740) Improve master metrics/snapshot performace

2016-03-18 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199863#comment-15199863
 ] 

Michael Park commented on MESOS-4740:
-

{noformat}
commit 8be869abab468706274e435247e8e22eef0dd0a0
Author: Cong Wang 
Date:   Thu Mar 17 12:14:00 2016 -0400

Updated `/metrics/snapshot` endpoint to use `jsonify`.

Review: https://reviews.apache.org/r/44675/
{noformat}
NOTE: committed more so under: https://issues.apache.org/jira/browse/MESOS-4732

> Improve master metrics/snapshot performace
> --
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why master metrics/snapshot could take such a long time to 
> complete under load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4976) Reject RESERVE on revocable resources

2016-03-18 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201188#comment-15201188
 ] 

Jian Qiu commented on MESOS-4976:
-

It has been validated in master
https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L151
Not sure whether it is sill need to be checked in allocator.

> Reject RESERVE on revocable resources
> -
>
> Key: MESOS-4976
> URL: https://issues.apache.org/jira/browse/MESOS-4976
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Klaus Ma
>
> In {{Resources::apply}}, we did not check whether the resources is revocable 
> or not. It does not make sense to reserve a revocable resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4963) Compile error with GCC 6

2016-03-18 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199232#comment-15199232
 ] 

Benjamin Bannier commented on MESOS-4963:
-

If one turns off silent rules one could see
{code}
/Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
/bin/sh ../libtool  --tag=CXX   --mode=compile ccache g++-6 
-DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"0.29.0\" -DPACKAGE_STRING=\"mesos\ 0.29.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"0.29.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src   -Wall -Werror 
-DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" 
-DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../include 
-I../../3rdparty/libprocess/include 
-I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
-I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 
-I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/leveldb-1.4/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
-I../3rdparty/zookeeper-3.4.5/src/c/generated 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I/Users/XYZ/src/homebrew/opt/openssl/include 
-I/Users/XYZ/src/homebrew/opt/libevent/include 
-I/Users/XYZ/src/homebrew/opt/subversion/include/subversion-1 
-I/usr/include/apr-1 -I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O2 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF 
appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c -o 
appc/libmesos_no_3rdparty_la-spec.lo `test -f 'appc/spec.cpp' || echo 
'../../src/'`appc/spec.cpp
libtool: compile:  ccache g++-6 -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.29.0\" 
"-DPACKAGE_STRING=\"mesos 0.29.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"0.29.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src 
-Wall -Werror -DLIBDIR=\"/usr/local/lib\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" 
-DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../include 
-I../../3rdparty/libprocess/include 
-I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
-I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 
-I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/leveldb-1.4/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
-I../3rdparty/zookeeper-3.4.5/src/c/generated 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I/Users/XYZ/src/homebrew/opt/openssl/include 
-I/Users/XYZ/src/homebrew/opt/libevent/include 
-I/Users/XYZ/src/homebrew/opt/subversion/include/subversion-1 
-I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O2 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -DGTEST_USE_OWN_TR1_TUPLE=1 
-DGTEST_LANG_CXX11 -MT appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF 
appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c ../../src/appc/spec.cpp  
-fno-common -DPIC -o appc/.libs/libmesos_no_3rdparty_la-spec.o
In file included from 
../../3rdparty/libprocess/3rdparty/stout/include/stout/os/shell.hpp:22:0,
 from 
../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:56,
 from ../../src/appc/spec.cpp:17:
../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp: In 
instantiation of 'int os::execlp(const char*, T ...) [with T = {const char*, 
const char*, const char*, char*}]':
../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fork.hpp:371:5

[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.

2016-03-18 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197471#comment-15197471
 ] 

Yong Tang commented on MESOS-4621:
--

Added a review request:
https://reviews.apache.org/r/44911/

The issue was that, in original configure.ac, 
{code}
AC_ARG_ENABLE([optimize],
  AS_HELP_STRING(...),
  [enable_optimize=yes], [])
{code}
The third field is "action-if-present", it could be
# --enable-optimize
# --enable-optimize=yes
# --enable-optimize=no
# --disable-optimize 
Yet in the original code it always set "[enable_optimize=yes]".

This could be fixed by simply replace "[enable_optimize=yes]" with "[]" as by 
default AC_ARG_ENABLE will always set the value of "enable_optimize" correctly 
anyway.

> --disable-optimize triggers optimized builds.
> -
>
> Key: MESOS-4621
> URL: https://issues.apache.org/jira/browse/MESOS-4621
> Project: Mesos
>  Issue Type: Bug
>Reporter: Till Toenshoff
>Assignee: Yong Tang
>Priority: Minor
>
> The toggle-logic of the build configuration argument {{optimize}} appears to 
> be implemented incorrectly. When using the perfectly legal invocation;
> {noformat}
> ../configure --disable-optimize
> {noformat}
> What you get here is enabled optimizing {{O2}}.
> {noformat}
> ccache g++ -Qunused-arguments -fcolor-diagnostics 
> -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" 
> -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" 
> -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. 
> -I../../../../3rdparty/libprocess/3rdparty  
> -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src 
>  -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 
> -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src 
> -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
> -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0   -O2 -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
> stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o 
> stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo 
> '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp
> {noformat}
> It seems more straightforward to actually disable optimizing for the above 
> argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3902) The Location header when non-leading master redirects to leading master is incomplete.

2016-03-18 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199959#comment-15199959
 ] 

Vinod Kone commented on MESOS-3902:
---

Great to hear. There are no redirect related tests because bringing up two live 
masters is not currently possible with our testing abstractions :/ Not sure if 
that changed with the recent refactor that [~kaysoky] did. Feel free to send 
the review without a test. Also you can reach me on #mesos IRC for quick 
questions.

> The Location header when non-leading master redirects to leading master is 
> incomplete.
> --
>
> Key: MESOS-3902
> URL: https://issues.apache.org/jira/browse/MESOS-3902
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Affects Versions: 0.25.0
> Environment: 3 masters, 10 slaves
>Reporter: Ben Whitehead
>Assignee: Ashwin Murthy
>  Labels: mesosphere
>
> The master now sets a location header, but it's incomplete. The path of the 
> URL isn't set. Consider an example:
> {code}
> > cat /tmp/subscribe-1072944352375841456 | httpp POST 
> > 127.1.0.3:5050/api/v1/scheduler Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Accept: application/json
> Accept-Encoding: gzip, deflate
> Connection: keep-alive
> Content-Length: 123
> Content-Type: application/x-protobuf
> Host: 127.1.0.3:5050
> User-Agent: HTTPie/0.9.0
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 307 Temporary Redirect
> Content-Length: 0
> Date: Fri, 26 Feb 2016 00:54:41 GMT
> Location: //127.1.0.1:5050
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.

2016-03-18 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4985:
-

 Summary: Destroy a container while it's provisioning can lead to 
leaked provisioned directories.
 Key: MESOS-4985
 URL: https://issues.apache.org/jira/browse/MESOS-4985
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.0
Reporter: Jie Yu
Priority: Critical
 Fix For: 0.28.1


Here is the possible sequence of events:
1) containerizer->launch
2) provisioner->provision is called. it is fetching the image
3) executor registration timed out
4) containerizer->destroy is called
5) container->state is still in PREPARING
6) provisioner->destroy is called

So we can be calling provisioner->destory while provisioner->provision hasn't 
finished yet. provisioner->destroy might just skip since there's no information 
about the container yet, and later, provisioner will prepare the root 
filesystem. This root filesystem will not be destroyed as destroy already 
finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1607) Introduce optimistic offers.

2016-03-18 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199503#comment-15199503
 ] 

Klaus Ma commented on MESOS-1607:
-

I update this JIRA to align with its description. The feature we're working on 
is Oversubscription for Reservation which is moved to MESOS-4967.

Thanks
Klaus

> Introduce optimistic offers.
> 
>
> Key: MESOS-1607
> URL: https://issues.apache.org/jira/browse/MESOS-1607
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework, master
>Reporter: Benjamin Hindman
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
> Attachments: optimisitic-offers.pdf
>
>
> *Background*
> The current implementation of resource offers only enable a single framework 
> scheduler to make scheduling decisions for some available resources at a 
> time. In some circumstances, this is good, i.e., when we don't want other 
> framework schedulers to have access to some resources. However, in other 
> circumstances, there are advantages to letting multiple framework schedulers 
> attempt to make scheduling decisions for the _same_ allocation of resources 
> in parallel.
> If you think about this from a "concurrency control" perspective, the current 
> implementation of resource offers is _pessimistic_, the resources contained 
> within an offer are _locked_ until the framework scheduler that they were 
> offered to launches tasks with them or declines them. In addition to making 
> pessimistic offers we'd like to give out _optimistic_ offers, where the same 
> resources are offered to multiple framework schedulers at the same time, and 
> framework schedulers "compete" for those resources on a 
> first-come-first-serve basis (i.e., the first to launch a task "wins"). We've 
> always reserved the right to rescind resource offers using the 'rescind' 
> primitive in the API, and a framework scheduler should be prepared to launch 
> a task and have those tasks go lost because another framework already started 
> to use those resources.
> *Feature*
> We plan to take a step towards optimistic offers, by introducing primitives 
> that allow resources to be offered to multiple frameworks at once.  At first, 
> we will use these primitives to optimistically allocate resources that are 
> reserved for a particular framework/role but have not been allocated by that 
> framework/role.  
> The work with optimistic offers will closely resemble the existing 
> oversubscription feature.  Optimistically offered resources are likely to be 
> considered "revocable resources" (the concept that using resources not 
> reserved for you means you might get those resources revoked).  In effect, we 
> can may create something like a "spot" market for unused resources, driving 
> up utilization by letting frameworks that are willing to use revocable 
> resources run tasks.
> *Future Work*
> This ticket tracks the introduction of some aspects of optimistic offers.  
> Taken to the limit, one could imagine always making optimistic resource 
> offers. This bears a striking resemblance with the Google Omega model (an 
> isomorphism even). However, being able to configure what resources should be 
> allocated optimistically and what resources should be allocated 
> pessimistically gives even more control to a datacenter/cluster operator that 
> might want to, for example, never let multiple frameworks (roles) compete for 
> some set of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4959) Enable support for mesos-style assertion macros in clang-tidy core analyzers

2016-03-18 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4959:
---

 Summary: Enable support for mesos-style assertion macros in 
clang-tidy core analyzers
 Key: MESOS-4959
 URL: https://issues.apache.org/jira/browse/MESOS-4959
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Bannier


clang-tidy has a number of core analyzers that analyze control flow to make 
sure that e.g., dereferenced pointers are not null. The clang control flow 
analysis framework uses e.g., the presence of `assert` to prune certain edges 
from the control flow graph.

Mesos uses a number of custom assertion macros from glog which are not 
understood by these analyzers. We should find a way to add support for these 
macros, either by redefining these macros in ways clang static analysis can 
understand, or by extending the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4828) XFS disk quota isolator

2016-03-18 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198694#comment-15198694
 ] 

James Peach commented on MESOS-4828:


Based on feedback from [~xujyan], I discarded the previous review request and 
restructure the branch into a series is smaller commits. I hope that this makes 
it easier to digest.

https://reviews.apache.org/r/44945/
https://reviews.apache.org/r/44946/
https://reviews.apache.org/r/44947/
https://reviews.apache.org/r/44948/
https://reviews.apache.org/r/44949/
https://reviews.apache.org/r/44950/

> XFS disk quota isolator
> ---
>
> Key: MESOS-4828
> URL: https://issues.apache.org/jira/browse/MESOS-4828
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: James Peach
>Assignee: James Peach
>
> Implement a disk resource isolator using XFS project quotas. Compared to the 
> {{posix/disk}} isolator, this doesn't need to scan the filesystem 
> periodically, and applications receive a {{ENOSPC}} error instead of being 
> summarily killed.
> This initial implementation only isolates sandbox directory resources, since 
> isolation doesn't have any visibility into the the lifecycle of volumes, 
> which is needed to assign and track project IDs.
> The build dependencies for this are XFS header (from xfsprogs-devel) and 
> libblkid. We need libblkid or the equivalent to map filesystem paths to block 
> devices in order to apply quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4969) improve overlayfs detection

2016-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200046#comment-15200046
 ] 

haosdent commented on MESOS-4969:
-

how about use {{lsmod}} to detect this? CentOS 7 is same on this.

> improve overlayfs detection
> ---
>
> Key: MESOS-4969
> URL: https://issues.apache.org/jira/browse/MESOS-4969
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, volumes
>Reporter: James Peach
>Priority: Minor
>
> On my Fedora 23, overlayfs is a module that is not loaded by default 
> (attempting to mount an overlayfs automatically triggers the module loading). 
> However {{mesos-slave}} won't start until I manually load the module since it 
> is not listed in {{/proc/filesystems}} until is it loaded.
> It would be nice if there was a more reliable way to determine overlayfs 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4971) Add unit tests for MOUNT persistent volumes

2016-03-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4971:
---
Assignee: Joris Van Remoortere

> Add unit tests for MOUNT persistent volumes
> ---
>
> Key: MESOS-4971
> URL: https://issues.apache.org/jira/browse/MESOS-4971
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: Neil Conway
>Assignee: Joris Van Remoortere
>  Labels: mesosphere, persistent-volumes, test
>
> We currently have unit tests for root and {{PATH}} disk types, but not 
> {{MOUNT}} disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4772) TaskInfo/ExecutorInfo should include owner information

2016-03-18 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199191#comment-15199191
 ] 

Adam B commented on MESOS-4772:
---

Sorry for the hasty decision to add a TaskInfo.owner. After more discussion, 
[~vinodkone] and I now favor a namespace approach (similar to Marathon's 
app-groups or [~jdef]'s suggestion), instead of the previously discussed 
task-owner approach. Also notice that I've linked this ticket under the Epic 
MESOS-4931 where [~js84] is designing authorization-based filtering of state 
endpoints. Let's discard [~nfnt]'s previous patch while we confirm what we 
really want to do. Allow me to explain some of the recent thinking. 

What we have now: flat roles
1. Each framework can only have a single role, but multiple frameworks can 
share a role. A role does not have a parent role.
2. Resources may be reserved for a role, and volumes created only within the 
resources reserved for a role.
3. Quota and DRF weights may be assigned to each role, and resources for a 
framework are allocated to its role.
4. Problem: Every http user can see every framework/role's tasks and sandbox 
data.

We could (and should) add coarse-grained access control:
1. Add Authz around state json to restrict which users can see 
tasks/executors/frameworks from which roles.
2. Add Authz around sandbox access to restrict which users can see executor 
sandboxes belonging to which roles.
But coarse-grained per-role access control doesn't help the multi-user 
framework use case, since every user in the framework can see the entire 
role/framework.

We could (but shouldn't) have multi-user frameworks pass TaskInfo.owner, as 
previously proposed: 
1. Add Authz around state json to restrict which users can see 
tasks/executors/frameworks owned by which (other) users. 
2. Add Authz around sandbox access to restrict which users can see executor 
sandboxes owned by which (other) users.
TaskInfo.owner provides appropriate fine-grained access control if you only 
want a user to ever see their own tasks. But as soon as you want to share one 
user's task/sandbox with another, you are forced to grant the second user 
access to all tasks/sandboxes owned by the first user. This is not flexible 
enough for most real-world use cases, where users work on some projects 
together but not others.

In the long run, we've discussed hierarchies both above and below roles.
1. Hierarchical framework groups (outer roles): Although a framework only has a 
single role, that role may belong to another role, up a role hierarchy.
2. These hierarchical outer roles would be great for organizing quota and DRF 
weights across many frameworks.
3. Hierarchical task groups (inner roles): Although a framework has a single 
role, it can dynamically create roles underneath its role in the hierarchy.
4. These inner roles would be great for organizing quota/weights between groups 
of apps/projects/jobs in a framework.
5. These inner roles would be ideal for associating reserved resources and 
volumes to specific apps/projects/jobs, so the framework has less bookkeeping 
to do.
6. These inner roles could be used to grant users visibility of one group of 
apps/projects/jobs without granting access to all others with the same 
owner/role. 
The above situation would be great, but there are many pieces to that puzzle, 
and it'll take us a while to get there.

We could start by introducing TaskInfo.group (name TBD) for visibility, but not 
reservations/volumes/quota/DRF.
1. Multi-user frameworks would pass Mesos the app/project/job name (including 
hierarchy) when creating a task.
2. Add Authz around state json to restrict which users can see 
tasks/executors/frameworks from which ":".
3. Add Authz around sandbox access to restrict which users can see executor 
sandboxes belonging to which ":".
This model allows admins to assign users to projects within frameworks as they 
please, and those users will only be able to see the tasks/sandboxes for their 
projects. A particular project may be visible only to a single user, or to 
multiple users. Sharing one project between two users has no impact on their 
access to other projects.

We could easily add the new TaskInfo.group field now, but the difficult part 
(besides naming) is figuring out how we might integrate the concept with 
hierarchical roles in the future. Can we base ACLs on "role:taskgroup" now, and 
then change TaskInfo.group to TaskInfo.role and base ACLs on "role" in the 
future? Or does introducing a separate "taskgroup" concept now prevent us from 
incorporating it into "role" in the future?

> TaskInfo/ExecutorInfo should include owner information
> --
>
> Key: MESOS-4772
> URL: https://issues.apache.org/jira/browse/MESOS-4772
> Project: Mesos
>  Issue Type: Improvement
>  Components: 

[jira] [Commented] (MESOS-4969) improve overlayfs detection

2016-03-18 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200118#comment-15200118
 ] 

Yan Xu commented on MESOS-4969:
---

{{modprobe -q overlay}} and check exit status sound reasonable to me.

> improve overlayfs detection
> ---
>
> Key: MESOS-4969
> URL: https://issues.apache.org/jira/browse/MESOS-4969
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, volumes
>Reporter: James Peach
>Priority: Minor
>
> On my Fedora 23, overlayfs is a module that is not loaded by default 
> (attempting to mount an overlayfs automatically triggers the module loading). 
> However {{mesos-slave}} won't start until I manually load the module since it 
> is not listed in {{/proc/filesystems}} until is it loaded.
> It would be nice if there was a more reliable way to determine overlayfs 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4878:
--
Affects Version/s: (was: 0.27.1)
   (was: 0.27.0)

> Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
> 
>
> Key: MESOS-4878
> URL: https://issues.apache.org/jira/browse/MESOS-4878
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0
>Reporter: Shuai Lin
>Assignee: Shuai Lin
>Priority: Critical
> Fix For: 0.28.1
>
>
> When a task is launched with the mesos containerizer and a docker image, if 
> the docker fetcher failed to pull the image, no more task updates are sent to 
> the scheduler.
> {code}
> I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test/store/docker/staging/V2dqJv'
> E0306 17:29:00.749889 17651 slave.cpp:3773] Container 
> '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of 
> framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect 
> failed: Unexpected HTTP response '401 Unauthorized' when trying to get the 
> manifest
> I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container 
> '6b98026b-a58d-434c-9432-b517012edc35'
> I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators 
> to complete preparing before destroying the container
> I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor 
> ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73
> {code}
> Scheduler logs:
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine:latest 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=33.33.33.33:5050
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0306 17:28:57.491081 17740 sched.cpp:1642] 
> **
> Scheduler driver bound to loopback interface! Cannot communicate with remote 
> master(s). You might want to set 'LIBPROCESS_IP' environment variable to use 
> a routable IP address.
> **
> I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0
> I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at 
> master@33.33.33.33:5050
> I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with 
> a4ff93ba-2141-48e2-92a9-7354e4028282-
> Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282-
> task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2154) Port CFS quota support to Docker Containerizer

2016-03-18 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198107#comment-15198107
 ] 

Jie Yu commented on MESOS-2154:
---

commit be9e86022ff86620800503c41d2fef0c6387aaba
Author: Steve Niemitz 
Date:   Wed Mar 16 13:36:33 2016 -0700

Fix for docker containerizer not configuring CFS quotas correctly.

It would be nice to refactor all this isolation code in a way that can
be shared between all containerizers, as this is basically just copied
from the CgroupsCpushareIsolator, but that's a much bigger undertaking.

Review: https://reviews.apache.org/r/33174/

> Port CFS quota support to Docker Containerizer
> --
>
> Key: MESOS-2154
> URL: https://issues.apache.org/jira/browse/MESOS-2154
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, isolation
>Affects Versions: 0.21.0
> Environment: Linux (Ubuntu 14.04.1)
>Reporter: Andrew Ortman
>Assignee: haosdent
>Priority: Minor
>
> Port the CFS quota support the Mesos Containerizer has to the Docker 
> Containerizer. Whenever the --cgroup_enable_cfs flag is set, the Docker 
> Containerizer should update the cfs_period_us and cfs_quota_us values to 
> allow hard CPU capping on the container. 
> Current workaround is to pass those values as LXC configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4744) mesos-execute should allow setting role

2016-03-18 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200811#comment-15200811
 ] 

Jian Qiu commented on MESOS-4744:
-

Opened another ticket https://issues.apache.org/jira/browse/MESOS-4974 for 
command_uris

> mesos-execute should allow setting role
> ---
>
> Key: MESOS-4744
> URL: https://issues.apache.org/jira/browse/MESOS-4744
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Jian Qiu
>Assignee: Jian Qiu
>Priority: Minor
>
> It will be quite useful if we can set role when running mesos-execute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4909) Introduce kill policy for tasks.

2016-03-18 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195387#comment-15195387
 ] 

Alexander Rukletsov edited comment on MESOS-4909 at 3/18/16 5:18 PM:
-

https://reviews.apache.org/r/44656/
https://reviews.apache.org/r/44707/
https://reviews.apache.org/r/45040/
https://reviews.apache.org/r/44657/
https://reviews.apache.org/r/44660/


was (Author: alexr):
https://reviews.apache.org/r/44656/
https://reviews.apache.org/r/44707/
https://reviews.apache.org/r/44657/
https://reviews.apache.org/r/44660/

> Introduce kill policy for tasks.
> 
>
> Key: MESOS-4909
> URL: https://issues.apache.org/jira/browse/MESOS-4909
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> A task may require some time to clean up or even a special mechanism to issue 
> a kill request (currently it's a SIGTERM followed by SIGKILL). Introducing 
> kill policies per task will help address these issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4979) os::rmdir does not handle special files (e.g., device, socket).

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4979:
--
Labels: mesosphere twitter  (was: mesosphere)

> os::rmdir does not handle special files (e.g., device, socket).
> ---
>
> Key: MESOS-4979
> URL: https://issues.apache.org/jira/browse/MESOS-4979
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Affects Versions: 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.24.0, 0.25.0, 
> 0.26.0, 0.27.0, 0.27.1, 0.27.2
>Reporter: Jie Yu
>Assignee: Jojy Varghese
>Priority: Blocker
>  Labels: mesosphere, twitter
> Fix For: 0.28.0
>
>
> Stout os::rmdir does not handle special files like device files or socket 
> files. This could cause failures when GC sandboxes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4985:
--
Assignee: Gilbert Song

> Destroy a container while it's provisioning can lead to leaked provisioned 
> directories.
> ---
>
> Key: MESOS-4985
> URL: https://issues.apache.org/jira/browse/MESOS-4985
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
>Reporter: Jie Yu
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
> Fix For: 0.28.1
>
>
> Here is the possible sequence of events:
> 1) containerizer->launch
> 2) provisioner->provision is called. it is fetching the image
> 3) executor registration timed out
> 4) containerizer->destroy is called
> 5) container->state is still in PREPARING
> 6) provisioner->destroy is called
> So we can be calling provisioner->destory while provisioner->provision hasn't 
> finished yet. provisioner->destroy might just skip since there's no 
> information about the container yet, and later, provisioner will prepare the 
> root filesystem. This root filesystem will not be destroyed as destroy 
> already finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky

2016-03-18 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4984:
--

 Summary: MasterTest.SlavesEndpointTwoSlaves is flaky
 Key: MESOS-4984
 URL: https://issues.apache.org/jira/browse/MESOS-4984
 Project: Mesos
  Issue Type: Bug
  Components: tests
Reporter: Neil Conway


Observed on Arch Linux with GCC 6, running in a virtualbox VM:

[ RUN  ] MasterTest.SlavesEndpointTwoSlaves
/mesos-2/src/tests/master_tests.cpp:1710: Failure
Value of: array.get().values.size()
  Actual: 1
Expected: 2u
Which is: 2
[  FAILED  ] MasterTest.SlavesEndpointTwoSlaves (86 ms)

Hasn't repro'd yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4983) Segfault in ProcessTest.Spawn with GCC 6

2016-03-18 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202367#comment-15202367
 ] 

Benjamin Bannier commented on MESOS-4983:
-

Since not even the GCC maintainers feel comfortable calling GCC6 
production-ready I am not sure these kinds of bugreports are very useful. It is 
interesting to see if e.g., the added GCC6 diagnostics find unknown issues in 
Mesos code (they don't currently), but I feel this kind of report would be more 
interesting for GCC developers. Since they are preparing a release currently 
they might even be very interested in reduced reproducers (as an example, the 
GCC tag you are using is the first one able to compile Mesos code without 
internal compiler errors, and the maintainers where quick to come up with fixes 
for two issues I raised over there).

> Segfault in ProcessTest.Spawn with GCC 6
> 
>
> Key: MESOS-4983
> URL: https://issues.apache.org/jira/browse/MESOS-4983
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, tests
>Reporter: Neil Conway
>  Labels: mesosphere
>
> {{ProcessTest.Spawn}} fails deterministically for me with GCC 6 and 
> {{--enable-optimize}}. Recent Arch Linux, GCC "6.0.0 20160227".
> {noformat}
> [ RUN  ] ProcessTest.Spawn
> *** Aborted at 145817 (unix time) try "date -d @145817" if you are 
> using GNU date ***
> PC: @   0x522926 SpawnProcess::initialize()
> *** SIGSEGV (@0x0) received by PID 11359 (TID 0x7faa6075f700) from PID 0; 
> stack trace: ***
> @ 0x7faa670dbe80 (unknown)
> @   0x522926 SpawnProcess::initialize()
> @   0x646fa6 process::ProcessManager::resume()
> @   0x6471ff 
> _ZNSt6thread11_State_implISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv
> @ 0x7faa6764a812 execute_native_thread_routine
> @ 0x7faa670d2424 start_thread
> @ 0x7faa65b04cbd __clone
> @0x0 (unknown)
> Makefile:1748: recipe for target 'check-local' failed
> make[5]: *** [check-local] Segmentation fault (core dumped)
> {noformat}
> Backtrace:
> {noformat}
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) 
> at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373
> 1373void GetValueAndDelete() const { delete this; }
> [Current thread is 1 (Thread 0x7faa6075f700 (LWP 11365))]
> (gdb) bt
> #0  testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) 
> at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373
> #1  testing::internal::FunctionMockerBase::InvokeWith(std::tuple<> 
> const&) (args=empty std::tuple, this=0x712a7c88) at 
> 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1530
> #2  testing::internal::FunctionMocker::Invoke() 
> (this=0x712a7c88) at 
> 3rdparty/gmock-1.7.0/include/gmock/gmock-generated-function-mockers.h:76
> #3  SpawnProcess::initialize (this=0x712a7c80) at 
> /mesos-2/3rdparty/libprocess/src/tests/process_tests.cpp:113
> #4  0x00646fa6 in process::ProcessManager::resume (this=0x25a2b60, 
> process=0x712a7d38) at /mesos-2/3rdparty/libprocess/src/process.cpp:2504
> #5  0x006471ff in process::ProcessManager:: atomic_bool&)>::operator() (__closure=, joining=...) at 
> /mesos-2/3rdparty/libprocess/src/process.cpp:2218
> #6  std::_Bind atomic_bool&)>(std::reference_wrapper 
> >)>::__call (__args=, this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:943
> #7  std::_Bind atomic_bool&)>(std::reference_wrapper 
> >)>::operator()<> (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1002
> #8  
> std::_Bind_simple  atomic_bool&)>(std::reference_wrapper 
> >)>()>::_M_invoke<> (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1400
> #9  
> std::_Bind_simple  atomic_bool&)>(std::reference_wrapper 
> >)>()>::operator() (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1389
> #10 
> std::thread::_State_impl  atomic_bool&)>(std::reference_wrapper >)>()> 
> >::_M_run(void) (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/thread:196
> #11 0x7faa6764a812 in std::(anonymous 
> namespace)::execute_native_thread_routine (__p=0x25a3bf0) at 
> ../../../../../gcc-trunk/libstdc++-v3/src/c++11/thread.cc:83
> #12 0x7faa670d2424 in start_thread () from /usr/lib/libpthread.so.0
> #13 0x7faa65b04cbd in clone () from /usr/lib/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.

2016-03-18 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202285#comment-15202285
 ] 

Cong Wang commented on MESOS-4070:
--

Hi,

Do you have a use case for this? Currently in Mesos code base, we don't use 
negative hex numbers, therefore we didn't fix this on purpose. It is never too 
late to add it when you have a use case.


> numify() handles negative numbers inconsistently.
> -
>
> Key: MESOS-4070
> URL: https://issues.apache.org/jira/browse/MESOS-4070
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Jie Yu
>Assignee: Yong Tang
>  Labels: tech-debt
>
> As pointed by [~neilc] in this review:
> https://reviews.apache.org/r/40988
> {noformat}
> Try num2 = numify("-10");
> EXPECT_SOME_EQ(-10, num2);
> // TODO(neilc): This is inconsistent with the handling of non-hex numbers.
> EXPECT_ERROR(numify("-0x10"));
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.

2016-03-18 Thread SERGEY GALKIN (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201757#comment-15201757
 ] 

SERGEY GALKIN commented on MESOS-4977:
--

Logs from mesos-master

1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 - with 
"Cmd":["-c","echo 'No such file or directory'; exit 1"] (failed)

mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.224059  2638 master.hpp:176] Adding task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 with 
resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.224105  2638 master.cpp:3621] Launching task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at 
scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources 
cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:W0318
 15:14:33.154769  2656 master.cpp:4885] Ignoring unknown exited executor 
'1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0' of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:33.156250  2639 master.cpp:4789] Status update TASK_FAILED (UUID: 
7c90d238-fcc4-4ede-9238-200744693449) for task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)


1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 - with 
"Cmd":null (running)

mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.223767  2638 master.hpp:176] Adding task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 with 
resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.223814  2638 master.cpp:3621] Launching task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at 
scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources 
cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:33.200388  2648 master.cpp:4789] Status update TASK_RUNNING (UUID: 
563864b0-8780-4fd3-a106-041600599e2e) for task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)


> Sometime Cmd":["-c","echo 'No such file or directory'] in task.
> ---
>
> Key: MESOS-4977
> URL: https://issues.apache.org/jira/browse/MESOS-4977
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2
> Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS
>Reporter: SERGEY GALKIN
>
> mesos - 0.27.0
> marathon - 0.15.2
> I am trying to launch 1 simple docker application with nginx with 500 
> instances on cluster with 189 HW nodes through Marathon
> {code}
> ID /1f532267a08494e3081c1acb42d273b7
> Command Unspecified
> Constraints Unspecified
> Dependencies Unspecified
> Labels Unspecified
> Resource Roles Unspecified
> Container
> {
>   "type": "DOCKER",
>   "volumes": [],
>   "docker": {
> "image": "nginx",
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 80,
> "hostPort": 0,
> "servicePort": 1,
> "protocol": "tcp"
>   }
> ],
> "privileged": false,
> "parameters": [],
> "forcePullI

[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky

2016-03-18 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4835:
-
Sprint:   (was: Mesosphere Sprint 31)

> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
> -
>
> Key: MESOS-4835
> URL: https://issues.apache.org/jira/browse/MESOS-4835
> Project: Mesos
>  Issue Type: Bug
> Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9
>Reporter: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> Verbose logs: 
> {code}
> [ RUN  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess
> I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 139.46496ms
> I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 141.811968ms
> ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure
> Value of: ::waitpid(pid, &status, 0)
>   Actual: 23809
> Expected: -1
> ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure
> Value of: (*__errno_location ())
>   Actual: 0
> Expected: 10
> [  FAILED  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4878:
--
Fix Version/s: 0.28.1

> Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
> 
>
> Key: MESOS-4878
> URL: https://issues.apache.org/jira/browse/MESOS-4878
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0
>Reporter: Shuai Lin
>Assignee: Shuai Lin
> Fix For: 0.28.1
>
>
> When a task is launched with the mesos containerizer and a docker image, if 
> the docker fetcher failed to pull the image, no more task updates are sent to 
> the scheduler.
> {code}
> I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test/store/docker/staging/V2dqJv'
> E0306 17:29:00.749889 17651 slave.cpp:3773] Container 
> '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of 
> framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect 
> failed: Unexpected HTTP response '401 Unauthorized' when trying to get the 
> manifest
> I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container 
> '6b98026b-a58d-434c-9432-b517012edc35'
> I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators 
> to complete preparing before destroying the container
> I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor 
> ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73
> {code}
> Scheduler logs:
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine:latest 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=33.33.33.33:5050
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0306 17:28:57.491081 17740 sched.cpp:1642] 
> **
> Scheduler driver bound to loopback interface! Cannot communicate with remote 
> master(s). You might want to set 'LIBPROCESS_IP' environment variable to use 
> a routable IP address.
> **
> I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0
> I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at 
> master@33.33.33.33:5050
> I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with 
> a4ff93ba-2141-48e2-92a9-7354e4028282-
> Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282-
> task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4878:
--
Priority: Critical  (was: Major)

> Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
> 
>
> Key: MESOS-4878
> URL: https://issues.apache.org/jira/browse/MESOS-4878
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0
>Reporter: Shuai Lin
>Assignee: Shuai Lin
>Priority: Critical
> Fix For: 0.28.1
>
>
> When a task is launched with the mesos containerizer and a docker image, if 
> the docker fetcher failed to pull the image, no more task updates are sent to 
> the scheduler.
> {code}
> I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image 
> 'alpine:latest' from 
> 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to 
> '/tmp/mesos-test/store/docker/staging/V2dqJv'
> E0306 17:29:00.749889 17651 slave.cpp:3773] Container 
> '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of 
> framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect 
> failed: Unexpected HTTP response '401 Unauthorized' when trying to get the 
> manifest
> I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container 
> '6b98026b-a58d-434c-9432-b517012edc35'
> I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators 
> to complete preparing before destroying the container
> I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor 
> ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73
> {code}
> Scheduler logs:
> {code}
> sudo ./build/src/mesos-execute --docker_image=alpine:latest 
> --containerizer=mesos --name=just-a-test --command="sleep 1000" 
> --master=33.33.33.33:5050
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0306 17:28:57.491081 17740 sched.cpp:1642] 
> **
> Scheduler driver bound to loopback interface! Cannot communicate with remote 
> master(s). You might want to set 'LIBPROCESS_IP' environment variable to use 
> a routable IP address.
> **
> I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0
> I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at 
> master@33.33.33.33:5050
> I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with 
> a4ff93ba-2141-48e2-92a9-7354e4028282-
> Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282-
> task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky

2016-03-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4984:
---
Description: 
Observed on Arch Linux with GCC 6, running in a virtualbox VM:

[ RUN  ] MasterTest.SlavesEndpointTwoSlaves
/mesos-2/src/tests/master_tests.cpp:1710: Failure
Value of: array.get().values.size()
  Actual: 1
Expected: 2u
Which is: 2
[  FAILED  ] MasterTest.SlavesEndpointTwoSlaves (86 ms)

Seems to fail non-deterministically, perhaps more often when there is 
concurrent CPU load on the machine.

  was:
Observed on Arch Linux with GCC 6, running in a virtualbox VM:

[ RUN  ] MasterTest.SlavesEndpointTwoSlaves
/mesos-2/src/tests/master_tests.cpp:1710: Failure
Value of: array.get().values.size()
  Actual: 1
Expected: 2u
Which is: 2
[  FAILED  ] MasterTest.SlavesEndpointTwoSlaves (86 ms)

Hasn't repro'd yet.


> MasterTest.SlavesEndpointTwoSlaves is flaky
> ---
>
> Key: MESOS-4984
> URL: https://issues.apache.org/jira/browse/MESOS-4984
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>  Labels: flaky-test, mesosphere
>
> Observed on Arch Linux with GCC 6, running in a virtualbox VM:
> [ RUN  ] MasterTest.SlavesEndpointTwoSlaves
> /mesos-2/src/tests/master_tests.cpp:1710: Failure
> Value of: array.get().values.size()
>   Actual: 1
> Expected: 2u
> Which is: 2
> [  FAILED  ] MasterTest.SlavesEndpointTwoSlaves (86 ms)
> Seems to fail non-deterministically, perhaps more often when there is 
> concurrent CPU load on the machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.

2016-03-18 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202312#comment-15202312
 ] 

Neil Conway commented on MESOS-4070:


Not at the moment, but it is still intended to be standalone and might be 
separated in the future (along with libprocess). e.g., this is the reason that 
commits that span libprocess and/or stout and/or Mesos proper are not allowed.

> numify() handles negative numbers inconsistently.
> -
>
> Key: MESOS-4070
> URL: https://issues.apache.org/jira/browse/MESOS-4070
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Jie Yu
>Assignee: Yong Tang
>  Labels: tech-debt
>
> As pointed by [~neilc] in this review:
> https://reviews.apache.org/r/40988
> {noformat}
> Try num2 = numify("-10");
> EXPECT_SOME_EQ(-10, num2);
> // TODO(neilc): This is inconsistent with the handling of non-hex numbers.
> EXPECT_ERROR(numify("-0x10"));
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4985:
--
Sprint: Mesosphere Sprint 31

> Destroy a container while it's provisioning can lead to leaked provisioned 
> directories.
> ---
>
> Key: MESOS-4985
> URL: https://issues.apache.org/jira/browse/MESOS-4985
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
>Reporter: Jie Yu
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
> Fix For: 0.28.1
>
>
> Here is the possible sequence of events:
> 1) containerizer->launch
> 2) provisioner->provision is called. it is fetching the image
> 3) executor registration timed out
> 4) containerizer->destroy is called
> 5) container->state is still in PREPARING
> 6) provisioner->destroy is called
> So we can be calling provisioner->destory while provisioner->provision hasn't 
> finished yet. provisioner->destroy might just skip since there's no 
> information about the container yet, and later, provisioner will prepare the 
> root filesystem. This root filesystem will not be destroyed as destroy 
> already finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4985:
--
Component/s: containerization

> Destroy a container while it's provisioning can lead to leaked provisioned 
> directories.
> ---
>
> Key: MESOS-4985
> URL: https://issues.apache.org/jira/browse/MESOS-4985
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0
>Reporter: Jie Yu
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
> Fix For: 0.28.1
>
>
> Here is the possible sequence of events:
> 1) containerizer->launch
> 2) provisioner->provision is called. it is fetching the image
> 3) executor registration timed out
> 4) containerizer->destroy is called
> 5) container->state is still in PREPARING
> 6) provisioner->destroy is called
> So we can be calling provisioner->destory while provisioner->provision hasn't 
> finished yet. provisioner->destroy might just skip since there's no 
> information about the container yet, and later, provisioner will prepare the 
> root filesystem. This root filesystem will not be destroyed as destroy 
> already finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4112) Clean up libprocess gtest macros

2016-03-18 Thread Yong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Tang reassigned MESOS-4112:


Assignee: Yong Tang

> Clean up libprocess gtest macros
> 
>
> Key: MESOS-4112
> URL: https://issues.apache.org/jira/browse/MESOS-4112
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Michael Park
>Assignee: Yong Tang
>
> This ticket is regarding the libprocess gtest helpers in 
> {{3rdparty/libprocess/include/process/gtest.hpp}}.
> The pattern in this file seems to be a set of macros:
> * {{AWAIT_ASSERT__FOR}}
> * {{AWAIT_ASSERT_}} -- default of 15 seconds
> * {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}}
> * {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}}
> * {{AWAIT_EXPECT__FOR}}
> * {{AWAIT_EXPECT_}} -- default of 15 seconds
> (1) {{AWAIT_EQ_FOR}} should be added for completeness.
> (2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific 
> versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}.
> We should adopt this pattern in these helpers as well. Keeping the pattern 
> above in mind, the following are missing:
> * {{AWAIT_ASSERT_TRUE_FOR}}
> * {{AWAIT_ASSERT_TRUE}}
> * {{AWAIT_ASSERT_FALSE_FOR}}
> * {{AWAIT_ASSERT_FALSE}}
> * {{AWAIT_EXPECT_TRUE_FOR}}
> * {{AWAIT_EXPECT_FALSE_FOR}}
> (3) There are HTTP response related macros at the bottom of the file, e.g. 
> {{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their 
> {{ASSERT}} counterparts.
> (4) The reason for (3) presumably is because we reach for {{EXPECT}} over 
> {{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. 
> If this is the case, it would be worthwhile considering whether macros such 
> as {{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than 
> {{AWAIT_ASSERT_READY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver

2016-03-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4981:
--
Description: The counters {{master/messages_register_framework}} and 
{{master/messages_reregister_framework}} are no longer being incremented after 
the scheduler driver started sending {{Call}} messages to the master in Mesos 
0.23. Either, we should think about adding new counter(s) for {{Subscribe}} 
calls to the master for both PID/HTTP frameworks or modify the existing code to 
correctly increment the counters.  (was: The counters 
{{master/messages_register_framework}} and 
{master/messages_reregister_framework}} are no longer being incremented after 
the scheduler driver started sending {{Call}} messages to the master in Mesos 
0.23. Either, we should think about adding new counter(s) for {{Subscribe}} 
calls to the master for both PID/HTTP frameworks or modify the existing code to 
correctly increment the counters.)

> Framework (re-)register metric counters broken for calls made via scheduler 
> driver
> --
>
> Key: MESOS-4981
> URL: https://issues.apache.org/jira/browse/MESOS-4981
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> The counters {{master/messages_register_framework}} and 
> {{master/messages_reregister_framework}} are no longer being incremented 
> after the scheduler driver started sending {{Call}} messages to the master in 
> Mesos 0.23. Either, we should think about adding new counter(s) for 
> {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the 
> existing code to correctly increment the counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky

2016-03-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4984:
---
Attachment: slaves_endpoint_flaky_4984_verbose_log.txt

> MasterTest.SlavesEndpointTwoSlaves is flaky
> ---
>
> Key: MESOS-4984
> URL: https://issues.apache.org/jira/browse/MESOS-4984
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>  Labels: flaky-test, mesosphere, tech-debt
> Attachments: slaves_endpoint_flaky_4984_verbose_log.txt
>
>
> Observed on Arch Linux with GCC 6, running in a virtualbox VM:
> [ RUN  ] MasterTest.SlavesEndpointTwoSlaves
> /mesos-2/src/tests/master_tests.cpp:1710: Failure
> Value of: array.get().values.size()
>   Actual: 1
> Expected: 2u
> Which is: 2
> [  FAILED  ] MasterTest.SlavesEndpointTwoSlaves (86 ms)
> Seems to fail non-deterministically, perhaps more often when there is 
> concurrent CPU load on the machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4969) improve overlayfs detection

2016-03-18 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200077#comment-15200077
 ] 

James Peach commented on MESOS-4969:


{{/sys/module/overlay}} would only exist once he module is loaded right? AFAICT 
you need something to trigger loading the module in the first place.

> improve overlayfs detection
> ---
>
> Key: MESOS-4969
> URL: https://issues.apache.org/jira/browse/MESOS-4969
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, volumes
>Reporter: James Peach
>Priority: Minor
>
> On my Fedora 23, overlayfs is a module that is not loaded by default 
> (attempting to mount an overlayfs automatically triggers the module loading). 
> However {{mesos-slave}} won't start until I manually load the module since it 
> is not listed in {{/proc/filesystems}} until is it loaded.
> It would be nice if there was a more reliable way to determine overlayfs 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4744) mesos-execute should allow setting role

2016-03-18 Thread Jian Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Qiu updated MESOS-4744:

Summary: mesos-execute should allow setting role  (was: mesos-execute 
should allow setting role and command uris)

> mesos-execute should allow setting role
> ---
>
> Key: MESOS-4744
> URL: https://issues.apache.org/jira/browse/MESOS-4744
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Jian Qiu
>Assignee: Jian Qiu
>Priority: Minor
>
> It will be quite useful if we can set role and command uris when running 
> mesos-execute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4963) Incorrect CXXFLAGS with GCC 6

2016-03-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4963:
---
Shepherd: Joris Van Remoortere

> Incorrect CXXFLAGS with GCC 6
> -
>
> Key: MESOS-4963
> URL: https://issues.apache.org/jira/browse/MESOS-4963
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> $ head config.log
> [...]
> /mesos-2/configure --enable-optimize --disable-python CC=ccache 
> /home/vagrant/local/gcc/bin/gcc CXX=ccache /home/vagrant/local/gcc/bin/g++
> $ ~/local/gcc/bin/g++ --version
> g++ (GCC) 6.0.0 20160227 (experimental)
> Copyright (C) 2016 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> $ make V=0
> make[2]: Entering directory '/home/vagrant/build-mesos-2-gcc6/src'
>   CXX  appc/libmesos_no_3rdparty_la-spec.lo
> In file included from 
> /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/shell.hpp:22:0,
>  from 
> /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:56,
>  from /mesos-2/src/appc/spec.cpp:17:
> /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp: 
> In instantiation of ‘int os::execlp(const char*, T ...) [with T = {const 
> char*, const char*, const char*, char*}]’:
> /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fork.hpp:371:52:
>required from here
> /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp:151:18:
>  error: missing sentinel in function call [-Werror=format=]
>return ::execlp(file, t...);
>   ^~~~
> cc1plus: all warnings being treated as errors
> Makefile:5584: recipe for target 'appc/libmesos_no_3rdparty_la-spec.lo' failed
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4974) mesos-execute should allow setting command_uris

2016-03-18 Thread Jian Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Qiu updated MESOS-4974:

Description: 
Based on discussion in MESOS-4744, it will be helpful to let mesos-execute 
support setting uris in command info.

We can add a flag:
{code}
--uris=uri1,uri2..
{code} 
and set other values in CommandInfo::URI as default.

  was:
Based on discussion in MESOS-4744, it will be helpful to let mesos-execute 
support setting uris in command info.

We can add a flag:
{code}
--uris=uri1,uri2..
{code} 
and set other values in CommandInfo::URIS as default.


> mesos-execute should allow setting command_uris
> ---
>
> Key: MESOS-4974
> URL: https://issues.apache.org/jira/browse/MESOS-4974
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Reporter: Jian Qiu
>Priority: Minor
>
> Based on discussion in MESOS-4744, it will be helpful to let mesos-execute 
> support setting uris in command info.
> We can add a flag:
> {code}
> --uris=uri1,uri2..
> {code} 
> and set other values in CommandInfo::URI as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4968) ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky

2016-03-18 Thread Greg Mann (JIRA)
Greg Mann created MESOS-4968:


 Summary: ResourceOffersTest.ResourceOfferWithMultipleSlaves is 
flaky
 Key: MESOS-4968
 URL: https://issues.apache.org/jira/browse/MESOS-4968
 Project: Mesos
  Issue Type: Bug
  Components: tests
 Environment: Ubuntu 14.04 with clang, without libevent/SSL
Reporter: Greg Mann


Just observed on the ASF CI:

{code}
[ RUN  ] ResourceOffersTest.ResourceOfferWithMultipleSlaves
I0317 16:31:52.635798 32063 cluster.cpp:139] Creating default 'local' authorizer
I0317 16:31:52.743732 32063 leveldb.cpp:174] Opened db in 107.706253ms
I0317 16:31:52.782537 32063 leveldb.cpp:181] Compacted db in 38.758479ms
I0317 16:31:52.782641 32063 leveldb.cpp:196] Created db iterator in 34392ns
I0317 16:31:52.782662 32063 leveldb.cpp:202] Seeked to beginning of db in 
10490ns
I0317 16:31:52.782675 32063 leveldb.cpp:271] Iterated through 0 keys in the db 
in 7177ns
I0317 16:31:52.782728 32063 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0317 16:31:52.783476 32094 recover.cpp:447] Starting replica recovery
I0317 16:31:52.783738 32094 recover.cpp:473] Replica is in EMPTY status
I0317 16:31:52.785109 32086 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (9482)@172.17.0.2:43540
I0317 16:31:52.785851 32081 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0317 16:31:52.786602 32085 recover.cpp:564] Updating replica status to STARTING
I0317 16:31:52.790009 32090 master.cpp:376] Master 
0196163d-91f7-4337-9dd7-9fef49e8cd75 (76df5a57a9ca) started on 172.17.0.2:43540
I0317 16:31:52.790082 32090 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/Rxm8q7/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.29.0/_inst/share/mesos/webui" 
--work_dir="/tmp/Rxm8q7/master" --zk_session_timeout="10secs"
I0317 16:31:52.790520 32090 master.cpp:423] Master only allowing authenticated 
frameworks to register
I0317 16:31:52.790534 32090 master.cpp:428] Master only allowing authenticated 
slaves to register
I0317 16:31:52.790541 32090 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/Rxm8q7/credentials'
I0317 16:31:52.790952 32090 master.cpp:468] Using default 'crammd5' 
authenticator
I0317 16:31:52.791162 32090 master.cpp:537] Using default 'basic' HTTP 
authenticator
I0317 16:31:52.791335 32090 master.cpp:571] Authorization enabled
I0317 16:31:52.791538 32092 whitelist_watcher.cpp:77] No whitelist given
I0317 16:31:52.791584 32089 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0317 16:31:52.794270 32089 master.cpp:1806] The newly elected leader is 
master@172.17.0.2:43540 with id 0196163d-91f7-4337-9dd7-9fef49e8cd75
I0317 16:31:52.794325 32089 master.cpp:1819] Elected as the leading master!
I0317 16:31:52.794342 32089 master.cpp:1508] Recovering from registrar
I0317 16:31:52.794919 32093 registrar.cpp:307] Recovering registrar
I0317 16:31:52.815771 32081 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 28.36765ms
I0317 16:31:52.815862 32081 replica.cpp:320] Persisted replica status to 
STARTING
I0317 16:31:52.816234 32096 recover.cpp:473] Replica is in STARTING status
I0317 16:31:52.818017 32082 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (9484)@172.17.0.2:43540
I0317 16:31:52.818408 32090 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0317 16:31:52.819475 32090 recover.cpp:564] Updating replica status to VOTING
I0317 16:31:52.840878 32090 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 21.14905ms
I0317 16:31:52.840965 32090 replica.cpp:320] Persisted replica status to VOTING
I0317 16:31:52.841151 32090 recover.cpp:578] Successfully joined the Paxos group
I0317 16:31:52.841361 32090 recover.cpp:462] Recover process terminated
I0317 16:31:52.842133 32090 log.cpp:659] Attempting to start the writer
I0317 16:31:52.843859 32090 replica.cpp:493] Replica received implicit promise 
request from (9485)@172.17.0.2:43540 with proposal 1
I0317 16:31:52.86

[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.

2016-03-18 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202293#comment-15202293
 ] 

Neil Conway commented on MESOS-4070:


I didn't have an explicit use case in mind, but not supporting negative hex 
numbers seems needlessly inconsistent. {{stout}} is intended to be a 
general-purpose library, so only supporting the exact functionality that 
happens to be used by Mesos at the moment is generally not a good rule of thumb.


> numify() handles negative numbers inconsistently.
> -
>
> Key: MESOS-4070
> URL: https://issues.apache.org/jira/browse/MESOS-4070
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Jie Yu
>Assignee: Yong Tang
>  Labels: tech-debt
>
> As pointed by [~neilc] in this review:
> https://reviews.apache.org/r/40988
> {noformat}
> Try num2 = numify("-10");
> EXPECT_SOME_EQ(-10, num2);
> // TODO(neilc): This is inconsistent with the handling of non-hex numbers.
> EXPECT_ERROR(numify("-0x10"));
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4885) Unzip should force overwrite

2016-03-18 Thread Tomasz Janiszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Janiszewski reassigned MESOS-4885:
-

Assignee: Tomasz Janiszewski

> Unzip should force overwrite
> 
>
> Key: MESOS-4885
> URL: https://issues.apache.org/jira/browse/MESOS-4885
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Tomasz Janiszewski
>Assignee: Tomasz Janiszewski
>Priority: Trivial
>
> Consider situation when zip file is malformed and contains duplicated files . 
> When fetcher downloads malformed zip file, that contains duplicated files 
> (e.g., dist zips generated by gradle could have duplicated files in libs dir) 
> and try to uncompress it, deployment hang in staged phase because unzip 
> prompt if file should be replaced. unzip should overrite this file or break 
> with error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.

2016-03-18 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4985:
--
Labels: mesosphere  (was: )

> Destroy a container while it's provisioning can lead to leaked provisioned 
> directories.
> ---
>
> Key: MESOS-4985
> URL: https://issues.apache.org/jira/browse/MESOS-4985
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
>Reporter: Jie Yu
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
> Fix For: 0.28.1
>
>
> Here is the possible sequence of events:
> 1) containerizer->launch
> 2) provisioner->provision is called. it is fetching the image
> 3) executor registration timed out
> 4) containerizer->destroy is called
> 5) container->state is still in PREPARING
> 6) provisioner->destroy is called
> So we can be calling provisioner->destory while provisioner->provision hasn't 
> finished yet. provisioner->destroy might just skip since there's no 
> information about the container yet, and later, provisioner will prepare the 
> root filesystem. This root filesystem will not be destroyed as destroy 
> already finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4983) Segfault in ProcessTest.Spawn with GCC 6

2016-03-18 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202268#comment-15202268
 ] 

Neil Conway commented on MESOS-4983:


Doesn't appear to repro if {{--enable-optimize}} is not specified. This might 
also be a GCC bug.

> Segfault in ProcessTest.Spawn with GCC 6
> 
>
> Key: MESOS-4983
> URL: https://issues.apache.org/jira/browse/MESOS-4983
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, tests
>Reporter: Neil Conway
>  Labels: mesosphere
>
> {{ProcessTest.Spawn}} fails deterministically for me with GCC 6 and 
> {{--enable-optimize}}. Recent Arch Linux, GCC "6.0.0 20160227".
> {noformat}
> [ RUN  ] ProcessTest.Spawn
> *** Aborted at 145817 (unix time) try "date -d @145817" if you are 
> using GNU date ***
> PC: @   0x522926 SpawnProcess::initialize()
> *** SIGSEGV (@0x0) received by PID 11359 (TID 0x7faa6075f700) from PID 0; 
> stack trace: ***
> @ 0x7faa670dbe80 (unknown)
> @   0x522926 SpawnProcess::initialize()
> @   0x646fa6 process::ProcessManager::resume()
> @   0x6471ff 
> _ZNSt6thread11_State_implISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv
> @ 0x7faa6764a812 execute_native_thread_routine
> @ 0x7faa670d2424 start_thread
> @ 0x7faa65b04cbd __clone
> @0x0 (unknown)
> Makefile:1748: recipe for target 'check-local' failed
> make[5]: *** [check-local] Segmentation fault (core dumped)
> {noformat}
> Backtrace:
> {noformat}
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) 
> at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373
> 1373void GetValueAndDelete() const { delete this; }
> [Current thread is 1 (Thread 0x7faa6075f700 (LWP 11365))]
> (gdb) bt
> #0  testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) 
> at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373
> #1  testing::internal::FunctionMockerBase::InvokeWith(std::tuple<> 
> const&) (args=empty std::tuple, this=0x712a7c88) at 
> 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1530
> #2  testing::internal::FunctionMocker::Invoke() 
> (this=0x712a7c88) at 
> 3rdparty/gmock-1.7.0/include/gmock/gmock-generated-function-mockers.h:76
> #3  SpawnProcess::initialize (this=0x712a7c80) at 
> /mesos-2/3rdparty/libprocess/src/tests/process_tests.cpp:113
> #4  0x00646fa6 in process::ProcessManager::resume (this=0x25a2b60, 
> process=0x712a7d38) at /mesos-2/3rdparty/libprocess/src/process.cpp:2504
> #5  0x006471ff in process::ProcessManager:: atomic_bool&)>::operator() (__closure=, joining=...) at 
> /mesos-2/3rdparty/libprocess/src/process.cpp:2218
> #6  std::_Bind atomic_bool&)>(std::reference_wrapper 
> >)>::__call (__args=, this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:943
> #7  std::_Bind atomic_bool&)>(std::reference_wrapper 
> >)>::operator()<> (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1002
> #8  
> std::_Bind_simple  atomic_bool&)>(std::reference_wrapper 
> >)>()>::_M_invoke<> (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1400
> #9  
> std::_Bind_simple  atomic_bool&)>(std::reference_wrapper 
> >)>()>::operator() (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/functional:1389
> #10 
> std::thread::_State_impl  atomic_bool&)>(std::reference_wrapper >)>()> 
> >::_M_run(void) (this=) at 
> /home/vagrant/local/gcc/include/c++/6.0.0/thread:196
> #11 0x7faa6764a812 in std::(anonymous 
> namespace)::execute_native_thread_routine (__p=0x25a3bf0) at 
> ../../../../../gcc-trunk/libstdc++-v3/src/c++11/thread.cc:83
> #12 0x7faa670d2424 in start_thread () from /usr/lib/libpthread.so.0
> #13 0x7faa65b04cbd in clone () from /usr/lib/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.

2016-03-18 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202306#comment-15202306
 ] 

Cong Wang commented on MESOS-4070:
--

Understand, but the fact is that stout is never shipped separately as a library?


> numify() handles negative numbers inconsistently.
> -
>
> Key: MESOS-4070
> URL: https://issues.apache.org/jira/browse/MESOS-4070
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Jie Yu
>Assignee: Yong Tang
>  Labels: tech-debt
>
> As pointed by [~neilc] in this review:
> https://reviews.apache.org/r/40988
> {noformat}
> Try num2 = numify("-10");
> EXPECT_SOME_EQ(-10, num2);
> // TODO(neilc): This is inconsistent with the handling of non-hex numbers.
> EXPECT_ERROR(numify("-0x10"));
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4112) Clean up libprocess gtest macros

2016-03-18 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202511#comment-15202511
 ] 

Yong Tang commented on MESOS-4112:
--

Hi [~mcypark], I created a review request:
https://reviews.apache.org/r/45070/
and would appreciate if you have a chance to take a look.
In this review request I get the item (1) and (2) in your list done. For item 
(3) and (4) I would like your confirmation before I move forward. Let me know 
what you think and I will get it done. Thanks!

> Clean up libprocess gtest macros
> 
>
> Key: MESOS-4112
> URL: https://issues.apache.org/jira/browse/MESOS-4112
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Michael Park
>Assignee: Yong Tang
>
> This ticket is regarding the libprocess gtest helpers in 
> {{3rdparty/libprocess/include/process/gtest.hpp}}.
> The pattern in this file seems to be a set of macros:
> * {{AWAIT_ASSERT__FOR}}
> * {{AWAIT_ASSERT_}} -- default of 15 seconds
> * {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}}
> * {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}}
> * {{AWAIT_EXPECT__FOR}}
> * {{AWAIT_EXPECT_}} -- default of 15 seconds
> (1) {{AWAIT_EQ_FOR}} should be added for completeness.
> (2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific 
> versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}.
> We should adopt this pattern in these helpers as well. Keeping the pattern 
> above in mind, the following are missing:
> * {{AWAIT_ASSERT_TRUE_FOR}}
> * {{AWAIT_ASSERT_TRUE}}
> * {{AWAIT_ASSERT_FALSE_FOR}}
> * {{AWAIT_ASSERT_FALSE}}
> * {{AWAIT_EXPECT_TRUE_FOR}}
> * {{AWAIT_EXPECT_FALSE_FOR}}
> (3) There are HTTP response related macros at the bottom of the file, e.g. 
> {{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their 
> {{ASSERT}} counterparts.
> (4) The reason for (3) presumably is because we reach for {{EXPECT}} over 
> {{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. 
> If this is the case, it would be worthwhile considering whether macros such 
> as {{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than 
> {{AWAIT_ASSERT_READY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)