[jira] [Updated] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")
[ https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Lin updated MESOS-4877: - Assignee: Gilbert Song (was: Shuai Lin) > Mesos containerizer can't handle top level docker image like "alpine" (must > use "library/alpine") > - > > Key: MESOS-4877 > URL: https://issues.apache.org/jira/browse/MESOS-4877 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.0, 0.27.1 >Reporter: Shuai Lin >Assignee: Gilbert Song > > This can be demonstrated with the {{mesos-execute}} command: > # Docker containerizer with image {{alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{alpine}}: failure > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{library/alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=library/alpine > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=localhost:5050 > {code} > In the slave logs: > {code} > ea-4460-83 > 9c-838da86af34c-0007' > I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image > 'alpine:latest' > I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test > /store/docker/staging/ka7MlQ' > E0306 16:32:43.098131 3400 slave.cpp:3773] Container > '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of > framework 4f055c6f-1bea-4460-839c-838da86af34c-0 > 007 failed to start: Collect failed: Unexpected HTTP response '401 > Unauthorized > {code} > curl command executed: > {code} > $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and > proc.name=curl >16:42:53.198998042 curl -s -S -L -D - > https://registry-1.docker.io:443/v2/alpine/manifests/latest > 16:42:53.784958541 curl -s -S -L -D - > https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull > 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer > eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw > https://registry-1.docker.io:443/v2/alpine/manifests/latest > {code} > Also got the same result with {{ubuntu}} docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")
[ https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202611#comment-15202611 ] Shuai Lin commented on MESOS-4877: -- Never mind! I'll reassign this ticket you. > Mesos containerizer can't handle top level docker image like "alpine" (must > use "library/alpine") > - > > Key: MESOS-4877 > URL: https://issues.apache.org/jira/browse/MESOS-4877 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.0, 0.27.1 >Reporter: Shuai Lin >Assignee: Shuai Lin > > This can be demonstrated with the {{mesos-execute}} command: > # Docker containerizer with image {{alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{alpine}}: failure > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{library/alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=library/alpine > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=localhost:5050 > {code} > In the slave logs: > {code} > ea-4460-83 > 9c-838da86af34c-0007' > I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image > 'alpine:latest' > I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test > /store/docker/staging/ka7MlQ' > E0306 16:32:43.098131 3400 slave.cpp:3773] Container > '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of > framework 4f055c6f-1bea-4460-839c-838da86af34c-0 > 007 failed to start: Collect failed: Unexpected HTTP response '401 > Unauthorized > {code} > curl command executed: > {code} > $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and > proc.name=curl >16:42:53.198998042 curl -s -S -L -D - > https://registry-1.docker.io:443/v2/alpine/manifests/latest > 16:42:53.784958541 curl -s -S -L -D - > https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull > 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer > eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw > https://registry-1.docker.io:443/v2/alpine/manifests/latest > {code} > Also got the same result with {{ubuntu}} docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")
[ https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199862#comment-15199862 ] Gilbert Song commented on MESOS-4877: - Sorry [~lins05], I addressed those TODOs before I saw this JIRA. Would you mind to take those patches? https://reviews.apache.org/r/44672/ > Mesos containerizer can't handle top level docker image like "alpine" (must > use "library/alpine") > - > > Key: MESOS-4877 > URL: https://issues.apache.org/jira/browse/MESOS-4877 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.0, 0.27.1 >Reporter: Shuai Lin >Assignee: Shuai Lin > > This can be demonstrated with the {{mesos-execute}} command: > # Docker containerizer with image {{alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{alpine}}: failure > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{library/alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=library/alpine > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=localhost:5050 > {code} > In the slave logs: > {code} > ea-4460-83 > 9c-838da86af34c-0007' > I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image > 'alpine:latest' > I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test > /store/docker/staging/ka7MlQ' > E0306 16:32:43.098131 3400 slave.cpp:3773] Container > '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of > framework 4f055c6f-1bea-4460-839c-838da86af34c-0 > 007 failed to start: Collect failed: Unexpected HTTP response '401 > Unauthorized > {code} > curl command executed: > {code} > $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and > proc.name=curl >16:42:53.198998042 curl -s -S -L -D - > https://registry-1.docker.io:443/v2/alpine/manifests/latest > 16:42:53.784958541 curl -s -S -L -D - > https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull > 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer > eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw > https://registry-1.docker.io:443/v2/alpine/manifests/latest > {code} > Also got the same result with {{ubuntu}} docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4979) os::rmdir does not handle special files (e.g., device, socket).
[ https://issues.apache.org/jira/browse/MESOS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4979: -- Component/s: stout > os::rmdir does not handle special files (e.g., device, socket). > --- > > Key: MESOS-4979 > URL: https://issues.apache.org/jira/browse/MESOS-4979 > Project: Mesos > Issue Type: Bug > Components: stout >Affects Versions: 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.24.0, 0.25.0, > 0.26.0, 0.27.0, 0.27.1, 0.27.2 >Reporter: Jie Yu >Assignee: Jojy Varghese >Priority: Blocker > Labels: mesosphere > Fix For: 0.28.0 > > > Stout os::rmdir does not handle special files like device files or socket > files. This could cause failures when GC sandboxes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3573) Mesos does not kill orphaned docker containers
[ https://issues.apache.org/jira/browse/MESOS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-3573: -- Shepherd: Timothy Chen > Mesos does not kill orphaned docker containers > -- > > Key: MESOS-3573 > URL: https://issues.apache.org/jira/browse/MESOS-3573 > Project: Mesos > Issue Type: Bug > Components: docker, slave >Reporter: Ian Babrou >Assignee: Anand Mazumdar > Labels: mesosphere > > After upgrade to 0.24.0 we noticed hanging containers appearing. Looks like > there were changes between 0.23.0 and 0.24.0 that broke cleanup. > Here's how to trigger this bug: > 1. Deploy app in docker container. > 2. Kill corresponding mesos-docker-executor process > 3. Observe hanging container > Here are the logs after kill: > {noformat} > slave_1| I1002 12:12:59.362002 7791 docker.cpp:1576] Executor for > container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' has exited > slave_1| I1002 12:12:59.362284 7791 docker.cpp:1374] Destroying > container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' > slave_1| I1002 12:12:59.363404 7791 docker.cpp:1478] Running docker stop > on container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' > slave_1| I1002 12:12:59.363876 7791 slave.cpp:3399] Executor > 'sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c' of framework > 20150923-122130-2153451692-5050-1- terminated with signal Terminated > slave_1| I1002 12:12:59.367570 7791 slave.cpp:2696] Handling status > update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- from @0.0.0.0:0 > slave_1| I1002 12:12:59.367842 7791 slave.cpp:5094] Terminating task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c > slave_1| W1002 12:12:59.368484 7791 docker.cpp:986] Ignoring updating > unknown container: f083aaa2-d5c3-43c1-b6ba-342de8829fa8 > slave_1| I1002 12:12:59.368671 7791 status_update_manager.cpp:322] > Received status update TASK_FAILED (UUID: > 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- > slave_1| I1002 12:12:59.368741 7791 status_update_manager.cpp:826] > Checkpointing UPDATE for status update TASK_FAILED (UUID: > 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- > slave_1| I1002 12:12:59.370636 7791 status_update_manager.cpp:376] > Forwarding update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) > for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- to the slave > slave_1| I1002 12:12:59.371335 7791 slave.cpp:2975] Forwarding the > update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- to master@172.16.91.128:5050 > slave_1| I1002 12:12:59.371908 7791 slave.cpp:2899] Status update > manager successfully handled status update TASK_FAILED (UUID: > 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- > master_1 | I1002 12:12:59.37204711 master.cpp:4069] Status update > TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- from slave > 20151002-120829-2153451692-5050-1-S0 at slave(1)@172.16.91.128:5051 > (172.16.91.128) > master_1 | I1002 12:12:59.37253411 master.cpp:4108] Forwarding status > update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task > sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- > master_1 | I1002 12:12:59.37301811 master.cpp:5576] Updating the latest > state of task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework > 20150923-122130-2153451692-5050-1- to TASK_FAILED > master_1 | I1002 12:12:59.37344711 hierarchical.hpp:814] Recovered > cpus(*):0.1; mem(*):16; ports(*):[31685-31685] (total: cpus(*):4; > mem(*):1001; disk(*):52869; ports(*):[31000-32000], allocated: > cpus(*):8.32667e-17) on slave 20151002-120829-2153451692-5050-1-S0 from > framework 20150923-122130-2153451692-5050-1- > {noformat} > Another issue: if you restart mesos-slave on the host with orphaned docker > containers, they are not getting killed. This was the case before and I hoped > for this trick to kill hanging containers, but it doesn't work now. > Marking this as critical because it hoards cluster resources and blocks > scheduling. --
[jira] [Commented] (MESOS-4033) Add a commit hook for non-ascii charachters
[ https://issues.apache.org/jira/browse/MESOS-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201985#comment-15201985 ] haosdent commented on MESOS-4033: - I have a question about this ticket, should we avoid those zero-space characters only or avoid all no-ascii characters? If disable no-ascii characters, we could not use emoji and some languages which contains no-ascii characters. For the powered-by-mesos.md and user-groups.html.md, this hard limit seems inconvenient. > Add a commit hook for non-ascii charachters > --- > > Key: MESOS-4033 > URL: https://issues.apache.org/jira/browse/MESOS-4033 > Project: Mesos > Issue Type: Task >Reporter: Alexander Rukletsov >Assignee: Yong Tang >Priority: Minor > Labels: mesosphere > > Non-ascii characters invisible in some editors may sneak into the codebase > (see e.g. https://reviews.apache.org/r/40799/). To avoid this, a pre-commit > hook can be added. > Quick searching suggested a simple perl script: > https://superuser.com/questions/417305/how-can-i-identify-non-ascii-characters-from-the-shell -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.
[ https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197746#comment-15197746 ] Yong Tang commented on MESOS-4621: -- I take a look at the review request. Feel like it is probably too big to be reviewed in one pass. Maybe it could be better if it is decomposed into several RRs? > --disable-optimize triggers optimized builds. > - > > Key: MESOS-4621 > URL: https://issues.apache.org/jira/browse/MESOS-4621 > Project: Mesos > Issue Type: Bug >Reporter: Till Toenshoff >Assignee: Yong Tang >Priority: Minor > > The toggle-logic of the build configuration argument {{optimize}} appears to > be implemented incorrectly. When using the perfectly legal invocation; > {noformat} > ../configure --disable-optimize > {noformat} > What you get here is enabled optimizing {{O2}}. > {noformat} > ccache g++ -Qunused-arguments -fcolor-diagnostics > -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" > -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" > -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" > -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 > -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 > -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 > -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 > -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. > -I../../../../3rdparty/libprocess/3rdparty > -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src > -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 > -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src > -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include > -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -O2 -Wno-unused-local-typedef -std=c++11 > -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT > stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o > stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo > '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp > {noformat} > It seems more straightforward to actually disable optimizing for the above > argument. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4959) Enable support for mesos-style assertion macros in clang-tidy core analyzers
[ https://issues.apache.org/jira/browse/MESOS-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-4959: Description: clang-tidy has a number of core analyzers that analyze control flow to make sure that e.g., dereferenced pointers are not null. The clang control flow analysis framework uses e.g., the presence of {{assert}} to prune certain edges from the control flow graph. Mesos uses a number of custom assertion macros from glog which are not understood by these analyzers. We should find a way to add support for these macros, either by redefining these macros in ways clang static analysis can understand, or by extending the framework. was: clang-tidy has a number of core analyzers that analyze control flow to make sure that e.g., dereferenced pointers are not null. The clang control flow analysis framework uses e.g., the presence of `assert` to prune certain edges from the control flow graph. Mesos uses a number of custom assertion macros from glog which are not understood by these analyzers. We should find a way to add support for these macros, either by redefining these macros in ways clang static analysis can understand, or by extending the framework. > Enable support for mesos-style assertion macros in clang-tidy core analyzers > > > Key: MESOS-4959 > URL: https://issues.apache.org/jira/browse/MESOS-4959 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Bannier > > clang-tidy has a number of core analyzers that analyze control flow to make > sure that e.g., dereferenced pointers are not null. The clang control flow > analysis framework uses e.g., the presence of {{assert}} to prune certain > edges from the control flow graph. > Mesos uses a number of custom assertion macros from glog which are not > understood by these analyzers. We should find a way to add support for these > macros, either by redefining these macros in ways clang static analysis can > understand, or by extending the framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4740) Improve master metrics/snapshot performace
[ https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199863#comment-15199863 ] Michael Park commented on MESOS-4740: - {noformat} commit 8be869abab468706274e435247e8e22eef0dd0a0 Author: Cong Wang Date: Thu Mar 17 12:14:00 2016 -0400 Updated `/metrics/snapshot` endpoint to use `jsonify`. Review: https://reviews.apache.org/r/44675/ {noformat} NOTE: committed more so under: https://issues.apache.org/jira/browse/MESOS-4732 > Improve master metrics/snapshot performace > -- > > Key: MESOS-4740 > URL: https://issues.apache.org/jira/browse/MESOS-4740 > Project: Mesos > Issue Type: Task >Reporter: Cong Wang >Assignee: Cong Wang > > [~drobinson] noticed retrieving metrics/snapshot statistics could be very > inefficient. > {noformat} > [user@server ~]$ time curl -s localhost:5050/metrics/snapshot > real 0m35.654s > user 0m0.019s > sys 0m0.011s > {noformat} > MESOS-1287 introduces a timeout parameter for this query, but for > metric-collectors like ours they are not aware of such URL-specific > parameter, so we need: > 1) We should always have a timeout and set some default value to it > 2) Investigate why master metrics/snapshot could take such a long time to > complete under load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4976) Reject RESERVE on revocable resources
[ https://issues.apache.org/jira/browse/MESOS-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201188#comment-15201188 ] Jian Qiu commented on MESOS-4976: - It has been validated in master https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L151 Not sure whether it is sill need to be checked in allocator. > Reject RESERVE on revocable resources > - > > Key: MESOS-4976 > URL: https://issues.apache.org/jira/browse/MESOS-4976 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Klaus Ma > > In {{Resources::apply}}, we did not check whether the resources is revocable > or not. It does not make sense to reserve a revocable resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4963) Compile error with GCC 6
[ https://issues.apache.org/jira/browse/MESOS-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199232#comment-15199232 ] Benjamin Bannier commented on MESOS-4963: - If one turns off silent rules one could see {code} /Applications/Xcode.app/Contents/Developer/usr/bin/make all-am /bin/sh ../libtool --tag=CXX --mode=compile ccache g++-6 -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.29.0\" -DPACKAGE_STRING=\"mesos\ 0.29.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.29.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DPKGDATADIR=\"/usr/local/share/mesos\" -DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.4/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I/Users/XYZ/src/homebrew/opt/openssl/include -I/Users/XYZ/src/homebrew/opt/libevent/include -I/Users/XYZ/src/homebrew/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O2 -Wno-unused-local-typedefs -Wno-maybe-uninitialized -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c -o appc/libmesos_no_3rdparty_la-spec.lo `test -f 'appc/spec.cpp' || echo '../../src/'`appc/spec.cpp libtool: compile: ccache g++-6 -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.29.0\" "-DPACKAGE_STRING=\"mesos 0.29.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.29.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DPKGDATADIR=\"/usr/local/share/mesos\" -DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.4/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I/Users/XYZ/src/homebrew/opt/openssl/include -I/Users/XYZ/src/homebrew/opt/libevent/include -I/Users/XYZ/src/homebrew/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O2 -Wno-unused-local-typedefs -Wno-maybe-uninitialized -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c ../../src/appc/spec.cpp -fno-common -DPIC -o appc/.libs/libmesos_no_3rdparty_la-spec.o In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/os/shell.hpp:22:0, from ../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:56, from ../../src/appc/spec.cpp:17: ../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp: In instantiation of 'int os::execlp(const char*, T ...) [with T = {const char*, const char*, const char*, char*}]': ../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fork.hpp:371:5
[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.
[ https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197471#comment-15197471 ] Yong Tang commented on MESOS-4621: -- Added a review request: https://reviews.apache.org/r/44911/ The issue was that, in original configure.ac, {code} AC_ARG_ENABLE([optimize], AS_HELP_STRING(...), [enable_optimize=yes], []) {code} The third field is "action-if-present", it could be # --enable-optimize # --enable-optimize=yes # --enable-optimize=no # --disable-optimize Yet in the original code it always set "[enable_optimize=yes]". This could be fixed by simply replace "[enable_optimize=yes]" with "[]" as by default AC_ARG_ENABLE will always set the value of "enable_optimize" correctly anyway. > --disable-optimize triggers optimized builds. > - > > Key: MESOS-4621 > URL: https://issues.apache.org/jira/browse/MESOS-4621 > Project: Mesos > Issue Type: Bug >Reporter: Till Toenshoff >Assignee: Yong Tang >Priority: Minor > > The toggle-logic of the build configuration argument {{optimize}} appears to > be implemented incorrectly. When using the perfectly legal invocation; > {noformat} > ../configure --disable-optimize > {noformat} > What you get here is enabled optimizing {{O2}}. > {noformat} > ccache g++ -Qunused-arguments -fcolor-diagnostics > -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" > -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" > -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" > -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 > -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 > -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 > -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 > -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 > -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 > -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. > -I../../../../3rdparty/libprocess/3rdparty > -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src > -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 > -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src > -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include > -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -O2 -Wno-unused-local-typedef -std=c++11 > -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT > stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o > stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo > '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp > {noformat} > It seems more straightforward to actually disable optimizing for the above > argument. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3902) The Location header when non-leading master redirects to leading master is incomplete.
[ https://issues.apache.org/jira/browse/MESOS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199959#comment-15199959 ] Vinod Kone commented on MESOS-3902: --- Great to hear. There are no redirect related tests because bringing up two live masters is not currently possible with our testing abstractions :/ Not sure if that changed with the recent refactor that [~kaysoky] did. Feel free to send the review without a test. Also you can reach me on #mesos IRC for quick questions. > The Location header when non-leading master redirects to leading master is > incomplete. > -- > > Key: MESOS-3902 > URL: https://issues.apache.org/jira/browse/MESOS-3902 > Project: Mesos > Issue Type: Bug > Components: HTTP API, master >Affects Versions: 0.25.0 > Environment: 3 masters, 10 slaves >Reporter: Ben Whitehead >Assignee: Ashwin Murthy > Labels: mesosphere > > The master now sets a location header, but it's incomplete. The path of the > URL isn't set. Consider an example: > {code} > > cat /tmp/subscribe-1072944352375841456 | httpp POST > > 127.1.0.3:5050/api/v1/scheduler Content-Type:application/x-protobuf > POST /api/v1/scheduler HTTP/1.1 > Accept: application/json > Accept-Encoding: gzip, deflate > Connection: keep-alive > Content-Length: 123 > Content-Type: application/x-protobuf > Host: 127.1.0.3:5050 > User-Agent: HTTPie/0.9.0 > +-+ > | NOTE: binary data not shown in terminal | > +-+ > HTTP/1.1 307 Temporary Redirect > Content-Length: 0 > Date: Fri, 26 Feb 2016 00:54:41 GMT > Location: //127.1.0.1:5050 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.
Jie Yu created MESOS-4985: - Summary: Destroy a container while it's provisioning can lead to leaked provisioned directories. Key: MESOS-4985 URL: https://issues.apache.org/jira/browse/MESOS-4985 Project: Mesos Issue Type: Bug Affects Versions: 0.28.0 Reporter: Jie Yu Priority: Critical Fix For: 0.28.1 Here is the possible sequence of events: 1) containerizer->launch 2) provisioner->provision is called. it is fetching the image 3) executor registration timed out 4) containerizer->destroy is called 5) container->state is still in PREPARING 6) provisioner->destroy is called So we can be calling provisioner->destory while provisioner->provision hasn't finished yet. provisioner->destroy might just skip since there's no information about the container yet, and later, provisioner will prepare the root filesystem. This root filesystem will not be destroyed as destroy already finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1607) Introduce optimistic offers.
[ https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199503#comment-15199503 ] Klaus Ma commented on MESOS-1607: - I update this JIRA to align with its description. The feature we're working on is Oversubscription for Reservation which is moved to MESOS-4967. Thanks Klaus > Introduce optimistic offers. > > > Key: MESOS-1607 > URL: https://issues.apache.org/jira/browse/MESOS-1607 > Project: Mesos > Issue Type: Epic > Components: allocation, framework, master >Reporter: Benjamin Hindman >Assignee: Artem Harutyunyan > Labels: mesosphere > Attachments: optimisitic-offers.pdf > > > *Background* > The current implementation of resource offers only enable a single framework > scheduler to make scheduling decisions for some available resources at a > time. In some circumstances, this is good, i.e., when we don't want other > framework schedulers to have access to some resources. However, in other > circumstances, there are advantages to letting multiple framework schedulers > attempt to make scheduling decisions for the _same_ allocation of resources > in parallel. > If you think about this from a "concurrency control" perspective, the current > implementation of resource offers is _pessimistic_, the resources contained > within an offer are _locked_ until the framework scheduler that they were > offered to launches tasks with them or declines them. In addition to making > pessimistic offers we'd like to give out _optimistic_ offers, where the same > resources are offered to multiple framework schedulers at the same time, and > framework schedulers "compete" for those resources on a > first-come-first-serve basis (i.e., the first to launch a task "wins"). We've > always reserved the right to rescind resource offers using the 'rescind' > primitive in the API, and a framework scheduler should be prepared to launch > a task and have those tasks go lost because another framework already started > to use those resources. > *Feature* > We plan to take a step towards optimistic offers, by introducing primitives > that allow resources to be offered to multiple frameworks at once. At first, > we will use these primitives to optimistically allocate resources that are > reserved for a particular framework/role but have not been allocated by that > framework/role. > The work with optimistic offers will closely resemble the existing > oversubscription feature. Optimistically offered resources are likely to be > considered "revocable resources" (the concept that using resources not > reserved for you means you might get those resources revoked). In effect, we > can may create something like a "spot" market for unused resources, driving > up utilization by letting frameworks that are willing to use revocable > resources run tasks. > *Future Work* > This ticket tracks the introduction of some aspects of optimistic offers. > Taken to the limit, one could imagine always making optimistic resource > offers. This bears a striking resemblance with the Google Omega model (an > isomorphism even). However, being able to configure what resources should be > allocated optimistically and what resources should be allocated > pessimistically gives even more control to a datacenter/cluster operator that > might want to, for example, never let multiple frameworks (roles) compete for > some set of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4959) Enable support for mesos-style assertion macros in clang-tidy core analyzers
Benjamin Bannier created MESOS-4959: --- Summary: Enable support for mesos-style assertion macros in clang-tidy core analyzers Key: MESOS-4959 URL: https://issues.apache.org/jira/browse/MESOS-4959 Project: Mesos Issue Type: Improvement Reporter: Benjamin Bannier clang-tidy has a number of core analyzers that analyze control flow to make sure that e.g., dereferenced pointers are not null. The clang control flow analysis framework uses e.g., the presence of `assert` to prune certain edges from the control flow graph. Mesos uses a number of custom assertion macros from glog which are not understood by these analyzers. We should find a way to add support for these macros, either by redefining these macros in ways clang static analysis can understand, or by extending the framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4828) XFS disk quota isolator
[ https://issues.apache.org/jira/browse/MESOS-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198694#comment-15198694 ] James Peach commented on MESOS-4828: Based on feedback from [~xujyan], I discarded the previous review request and restructure the branch into a series is smaller commits. I hope that this makes it easier to digest. https://reviews.apache.org/r/44945/ https://reviews.apache.org/r/44946/ https://reviews.apache.org/r/44947/ https://reviews.apache.org/r/44948/ https://reviews.apache.org/r/44949/ https://reviews.apache.org/r/44950/ > XFS disk quota isolator > --- > > Key: MESOS-4828 > URL: https://issues.apache.org/jira/browse/MESOS-4828 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: James Peach >Assignee: James Peach > > Implement a disk resource isolator using XFS project quotas. Compared to the > {{posix/disk}} isolator, this doesn't need to scan the filesystem > periodically, and applications receive a {{ENOSPC}} error instead of being > summarily killed. > This initial implementation only isolates sandbox directory resources, since > isolation doesn't have any visibility into the the lifecycle of volumes, > which is needed to assign and track project IDs. > The build dependencies for this are XFS header (from xfsprogs-devel) and > libblkid. We need libblkid or the equivalent to map filesystem paths to block > devices in order to apply quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4969) improve overlayfs detection
[ https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200046#comment-15200046 ] haosdent commented on MESOS-4969: - how about use {{lsmod}} to detect this? CentOS 7 is same on this. > improve overlayfs detection > --- > > Key: MESOS-4969 > URL: https://issues.apache.org/jira/browse/MESOS-4969 > Project: Mesos > Issue Type: Bug > Components: isolation, volumes >Reporter: James Peach >Priority: Minor > > On my Fedora 23, overlayfs is a module that is not loaded by default > (attempting to mount an overlayfs automatically triggers the module loading). > However {{mesos-slave}} won't start until I manually load the module since it > is not listed in {{/proc/filesystems}} until is it loaded. > It would be nice if there was a more reliable way to determine overlayfs > support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4971) Add unit tests for MOUNT persistent volumes
[ https://issues.apache.org/jira/browse/MESOS-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4971: --- Assignee: Joris Van Remoortere > Add unit tests for MOUNT persistent volumes > --- > > Key: MESOS-4971 > URL: https://issues.apache.org/jira/browse/MESOS-4971 > Project: Mesos > Issue Type: Task > Components: tests >Reporter: Neil Conway >Assignee: Joris Van Remoortere > Labels: mesosphere, persistent-volumes, test > > We currently have unit tests for root and {{PATH}} disk types, but not > {{MOUNT}} disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4772) TaskInfo/ExecutorInfo should include owner information
[ https://issues.apache.org/jira/browse/MESOS-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199191#comment-15199191 ] Adam B commented on MESOS-4772: --- Sorry for the hasty decision to add a TaskInfo.owner. After more discussion, [~vinodkone] and I now favor a namespace approach (similar to Marathon's app-groups or [~jdef]'s suggestion), instead of the previously discussed task-owner approach. Also notice that I've linked this ticket under the Epic MESOS-4931 where [~js84] is designing authorization-based filtering of state endpoints. Let's discard [~nfnt]'s previous patch while we confirm what we really want to do. Allow me to explain some of the recent thinking. What we have now: flat roles 1. Each framework can only have a single role, but multiple frameworks can share a role. A role does not have a parent role. 2. Resources may be reserved for a role, and volumes created only within the resources reserved for a role. 3. Quota and DRF weights may be assigned to each role, and resources for a framework are allocated to its role. 4. Problem: Every http user can see every framework/role's tasks and sandbox data. We could (and should) add coarse-grained access control: 1. Add Authz around state json to restrict which users can see tasks/executors/frameworks from which roles. 2. Add Authz around sandbox access to restrict which users can see executor sandboxes belonging to which roles. But coarse-grained per-role access control doesn't help the multi-user framework use case, since every user in the framework can see the entire role/framework. We could (but shouldn't) have multi-user frameworks pass TaskInfo.owner, as previously proposed: 1. Add Authz around state json to restrict which users can see tasks/executors/frameworks owned by which (other) users. 2. Add Authz around sandbox access to restrict which users can see executor sandboxes owned by which (other) users. TaskInfo.owner provides appropriate fine-grained access control if you only want a user to ever see their own tasks. But as soon as you want to share one user's task/sandbox with another, you are forced to grant the second user access to all tasks/sandboxes owned by the first user. This is not flexible enough for most real-world use cases, where users work on some projects together but not others. In the long run, we've discussed hierarchies both above and below roles. 1. Hierarchical framework groups (outer roles): Although a framework only has a single role, that role may belong to another role, up a role hierarchy. 2. These hierarchical outer roles would be great for organizing quota and DRF weights across many frameworks. 3. Hierarchical task groups (inner roles): Although a framework has a single role, it can dynamically create roles underneath its role in the hierarchy. 4. These inner roles would be great for organizing quota/weights between groups of apps/projects/jobs in a framework. 5. These inner roles would be ideal for associating reserved resources and volumes to specific apps/projects/jobs, so the framework has less bookkeeping to do. 6. These inner roles could be used to grant users visibility of one group of apps/projects/jobs without granting access to all others with the same owner/role. The above situation would be great, but there are many pieces to that puzzle, and it'll take us a while to get there. We could start by introducing TaskInfo.group (name TBD) for visibility, but not reservations/volumes/quota/DRF. 1. Multi-user frameworks would pass Mesos the app/project/job name (including hierarchy) when creating a task. 2. Add Authz around state json to restrict which users can see tasks/executors/frameworks from which ":". 3. Add Authz around sandbox access to restrict which users can see executor sandboxes belonging to which ":". This model allows admins to assign users to projects within frameworks as they please, and those users will only be able to see the tasks/sandboxes for their projects. A particular project may be visible only to a single user, or to multiple users. Sharing one project between two users has no impact on their access to other projects. We could easily add the new TaskInfo.group field now, but the difficult part (besides naming) is figuring out how we might integrate the concept with hierarchical roles in the future. Can we base ACLs on "role:taskgroup" now, and then change TaskInfo.group to TaskInfo.role and base ACLs on "role" in the future? Or does introducing a separate "taskgroup" concept now prevent us from incorporating it into "role" in the future? > TaskInfo/ExecutorInfo should include owner information > -- > > Key: MESOS-4772 > URL: https://issues.apache.org/jira/browse/MESOS-4772 > Project: Mesos > Issue Type: Improvement > Components:
[jira] [Commented] (MESOS-4969) improve overlayfs detection
[ https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200118#comment-15200118 ] Yan Xu commented on MESOS-4969: --- {{modprobe -q overlay}} and check exit status sound reasonable to me. > improve overlayfs detection > --- > > Key: MESOS-4969 > URL: https://issues.apache.org/jira/browse/MESOS-4969 > Project: Mesos > Issue Type: Bug > Components: isolation, volumes >Reporter: James Peach >Priority: Minor > > On my Fedora 23, overlayfs is a module that is not loaded by default > (attempting to mount an overlayfs automatically triggers the module loading). > However {{mesos-slave}} won't start until I manually load the module since it > is not listed in {{/proc/filesystems}} until is it loaded. > It would be nice if there was a more reliable way to determine overlayfs > support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
[ https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4878: -- Affects Version/s: (was: 0.27.1) (was: 0.27.0) > Task stuck in TASK_STAGING when docker fetcher failed to fetch the image > > > Key: MESOS-4878 > URL: https://issues.apache.org/jira/browse/MESOS-4878 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.28.0 >Reporter: Shuai Lin >Assignee: Shuai Lin >Priority: Critical > Fix For: 0.28.1 > > > When a task is launched with the mesos containerizer and a docker image, if > the docker fetcher failed to pull the image, no more task updates are sent to > the scheduler. > {code} > I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test/store/docker/staging/V2dqJv' > E0306 17:29:00.749889 17651 slave.cpp:3773] Container > '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of > framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect > failed: Unexpected HTTP response '401 Unauthorized' when trying to get the > manifest > I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container > '6b98026b-a58d-434c-9432-b517012edc35' > I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators > to complete preparing before destroying the container > I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor > ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73 > {code} > Scheduler logs: > {code} > sudo ./build/src/mesos-execute --docker_image=alpine:latest > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=33.33.33.33:5050 > WARNING: Logging before InitGoogleLogging() is written to STDERR > W0306 17:28:57.491081 17740 sched.cpp:1642] > ** > Scheduler driver bound to loopback interface! Cannot communicate with remote > master(s). You might want to set 'LIBPROCESS_IP' environment variable to use > a routable IP address. > ** > I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0 > I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at > master@33.33.33.33:5050 > I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with > a4ff93ba-2141-48e2-92a9-7354e4028282- > Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282- > task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2154) Port CFS quota support to Docker Containerizer
[ https://issues.apache.org/jira/browse/MESOS-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198107#comment-15198107 ] Jie Yu commented on MESOS-2154: --- commit be9e86022ff86620800503c41d2fef0c6387aaba Author: Steve Niemitz Date: Wed Mar 16 13:36:33 2016 -0700 Fix for docker containerizer not configuring CFS quotas correctly. It would be nice to refactor all this isolation code in a way that can be shared between all containerizers, as this is basically just copied from the CgroupsCpushareIsolator, but that's a much bigger undertaking. Review: https://reviews.apache.org/r/33174/ > Port CFS quota support to Docker Containerizer > -- > > Key: MESOS-2154 > URL: https://issues.apache.org/jira/browse/MESOS-2154 > Project: Mesos > Issue Type: Improvement > Components: docker, isolation >Affects Versions: 0.21.0 > Environment: Linux (Ubuntu 14.04.1) >Reporter: Andrew Ortman >Assignee: haosdent >Priority: Minor > > Port the CFS quota support the Mesos Containerizer has to the Docker > Containerizer. Whenever the --cgroup_enable_cfs flag is set, the Docker > Containerizer should update the cfs_period_us and cfs_quota_us values to > allow hard CPU capping on the container. > Current workaround is to pass those values as LXC configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4744) mesos-execute should allow setting role
[ https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200811#comment-15200811 ] Jian Qiu commented on MESOS-4744: - Opened another ticket https://issues.apache.org/jira/browse/MESOS-4974 for command_uris > mesos-execute should allow setting role > --- > > Key: MESOS-4744 > URL: https://issues.apache.org/jira/browse/MESOS-4744 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Assignee: Jian Qiu >Priority: Minor > > It will be quite useful if we can set role when running mesos-execute -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4909) Introduce kill policy for tasks.
[ https://issues.apache.org/jira/browse/MESOS-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195387#comment-15195387 ] Alexander Rukletsov edited comment on MESOS-4909 at 3/18/16 5:18 PM: - https://reviews.apache.org/r/44656/ https://reviews.apache.org/r/44707/ https://reviews.apache.org/r/45040/ https://reviews.apache.org/r/44657/ https://reviews.apache.org/r/44660/ was (Author: alexr): https://reviews.apache.org/r/44656/ https://reviews.apache.org/r/44707/ https://reviews.apache.org/r/44657/ https://reviews.apache.org/r/44660/ > Introduce kill policy for tasks. > > > Key: MESOS-4909 > URL: https://issues.apache.org/jira/browse/MESOS-4909 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > A task may require some time to clean up or even a special mechanism to issue > a kill request (currently it's a SIGTERM followed by SIGKILL). Introducing > kill policies per task will help address these issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4979) os::rmdir does not handle special files (e.g., device, socket).
[ https://issues.apache.org/jira/browse/MESOS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4979: -- Labels: mesosphere twitter (was: mesosphere) > os::rmdir does not handle special files (e.g., device, socket). > --- > > Key: MESOS-4979 > URL: https://issues.apache.org/jira/browse/MESOS-4979 > Project: Mesos > Issue Type: Bug > Components: stout >Affects Versions: 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.24.0, 0.25.0, > 0.26.0, 0.27.0, 0.27.1, 0.27.2 >Reporter: Jie Yu >Assignee: Jojy Varghese >Priority: Blocker > Labels: mesosphere, twitter > Fix For: 0.28.0 > > > Stout os::rmdir does not handle special files like device files or socket > files. This could cause failures when GC sandboxes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.
[ https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4985: -- Assignee: Gilbert Song > Destroy a container while it's provisioning can lead to leaked provisioned > directories. > --- > > Key: MESOS-4985 > URL: https://issues.apache.org/jira/browse/MESOS-4985 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 >Reporter: Jie Yu >Assignee: Gilbert Song >Priority: Critical > Labels: mesosphere > Fix For: 0.28.1 > > > Here is the possible sequence of events: > 1) containerizer->launch > 2) provisioner->provision is called. it is fetching the image > 3) executor registration timed out > 4) containerizer->destroy is called > 5) container->state is still in PREPARING > 6) provisioner->destroy is called > So we can be calling provisioner->destory while provisioner->provision hasn't > finished yet. provisioner->destroy might just skip since there's no > information about the container yet, and later, provisioner will prepare the > root filesystem. This root filesystem will not be destroyed as destroy > already finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky
Neil Conway created MESOS-4984: -- Summary: MasterTest.SlavesEndpointTwoSlaves is flaky Key: MESOS-4984 URL: https://issues.apache.org/jira/browse/MESOS-4984 Project: Mesos Issue Type: Bug Components: tests Reporter: Neil Conway Observed on Arch Linux with GCC 6, running in a virtualbox VM: [ RUN ] MasterTest.SlavesEndpointTwoSlaves /mesos-2/src/tests/master_tests.cpp:1710: Failure Value of: array.get().values.size() Actual: 1 Expected: 2u Which is: 2 [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) Hasn't repro'd yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4983) Segfault in ProcessTest.Spawn with GCC 6
[ https://issues.apache.org/jira/browse/MESOS-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202367#comment-15202367 ] Benjamin Bannier commented on MESOS-4983: - Since not even the GCC maintainers feel comfortable calling GCC6 production-ready I am not sure these kinds of bugreports are very useful. It is interesting to see if e.g., the added GCC6 diagnostics find unknown issues in Mesos code (they don't currently), but I feel this kind of report would be more interesting for GCC developers. Since they are preparing a release currently they might even be very interested in reduced reproducers (as an example, the GCC tag you are using is the first one able to compile Mesos code without internal compiler errors, and the maintainers where quick to come up with fixes for two issues I raised over there). > Segfault in ProcessTest.Spawn with GCC 6 > > > Key: MESOS-4983 > URL: https://issues.apache.org/jira/browse/MESOS-4983 > Project: Mesos > Issue Type: Bug > Components: libprocess, tests >Reporter: Neil Conway > Labels: mesosphere > > {{ProcessTest.Spawn}} fails deterministically for me with GCC 6 and > {{--enable-optimize}}. Recent Arch Linux, GCC "6.0.0 20160227". > {noformat} > [ RUN ] ProcessTest.Spawn > *** Aborted at 145817 (unix time) try "date -d @145817" if you are > using GNU date *** > PC: @ 0x522926 SpawnProcess::initialize() > *** SIGSEGV (@0x0) received by PID 11359 (TID 0x7faa6075f700) from PID 0; > stack trace: *** > @ 0x7faa670dbe80 (unknown) > @ 0x522926 SpawnProcess::initialize() > @ 0x646fa6 process::ProcessManager::resume() > @ 0x6471ff > _ZNSt6thread11_State_implISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv > @ 0x7faa6764a812 execute_native_thread_routine > @ 0x7faa670d2424 start_thread > @ 0x7faa65b04cbd __clone > @0x0 (unknown) > Makefile:1748: recipe for target 'check-local' failed > make[5]: *** [check-local] Segmentation fault (core dumped) > {noformat} > Backtrace: > {noformat} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) > at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373 > 1373void GetValueAndDelete() const { delete this; } > [Current thread is 1 (Thread 0x7faa6075f700 (LWP 11365))] > (gdb) bt > #0 testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) > at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373 > #1 testing::internal::FunctionMockerBase::InvokeWith(std::tuple<> > const&) (args=empty std::tuple, this=0x712a7c88) at > 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1530 > #2 testing::internal::FunctionMocker::Invoke() > (this=0x712a7c88) at > 3rdparty/gmock-1.7.0/include/gmock/gmock-generated-function-mockers.h:76 > #3 SpawnProcess::initialize (this=0x712a7c80) at > /mesos-2/3rdparty/libprocess/src/tests/process_tests.cpp:113 > #4 0x00646fa6 in process::ProcessManager::resume (this=0x25a2b60, > process=0x712a7d38) at /mesos-2/3rdparty/libprocess/src/process.cpp:2504 > #5 0x006471ff in process::ProcessManager:: atomic_bool&)>::operator() (__closure=, joining=...) at > /mesos-2/3rdparty/libprocess/src/process.cpp:2218 > #6 std::_Bind atomic_bool&)>(std::reference_wrapper > >)>::__call (__args=, this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:943 > #7 std::_Bind atomic_bool&)>(std::reference_wrapper > >)>::operator()<> (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1002 > #8 > std::_Bind_simple atomic_bool&)>(std::reference_wrapper > >)>()>::_M_invoke<> (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1400 > #9 > std::_Bind_simple atomic_bool&)>(std::reference_wrapper > >)>()>::operator() (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1389 > #10 > std::thread::_State_impl atomic_bool&)>(std::reference_wrapper >)>()> > >::_M_run(void) (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/thread:196 > #11 0x7faa6764a812 in std::(anonymous > namespace)::execute_native_thread_routine (__p=0x25a3bf0) at > ../../../../../gcc-trunk/libstdc++-v3/src/c++11/thread.cc:83 > #12 0x7faa670d2424 in start_thread () from /usr/lib/libpthread.so.0 > #13 0x7faa65b04cbd in clone () from /usr/lib/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.
[ https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202285#comment-15202285 ] Cong Wang commented on MESOS-4070: -- Hi, Do you have a use case for this? Currently in Mesos code base, we don't use negative hex numbers, therefore we didn't fix this on purpose. It is never too late to add it when you have a use case. > numify() handles negative numbers inconsistently. > - > > Key: MESOS-4070 > URL: https://issues.apache.org/jira/browse/MESOS-4070 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Jie Yu >Assignee: Yong Tang > Labels: tech-debt > > As pointed by [~neilc] in this review: > https://reviews.apache.org/r/40988 > {noformat} > Try num2 = numify("-10"); > EXPECT_SOME_EQ(-10, num2); > // TODO(neilc): This is inconsistent with the handling of non-hex numbers. > EXPECT_ERROR(numify("-0x10")); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.
[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201757#comment-15201757 ] SERGEY GALKIN commented on MESOS-4977: -- Logs from mesos-master 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 - with "Cmd":["-c","echo 'No such file or directory'; exit 1"] (failed) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.224059 2638 master.hpp:176] Adding task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.224105 2638 master.cpp:3621] Launching task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:W0318 15:14:33.154769 2656 master.cpp:4885] Ignoring unknown exited executor '1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0' of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:33.156250 2639 master.cpp:4789] Status update TASK_FAILED (UUID: 7c90d238-fcc4-4ede-9238-200744693449) for task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 - with "Cmd":null (running) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.223767 2638 master.hpp:176] Adding task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.223814 2638 master.cpp:3621] Launching task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:33.200388 2648 master.cpp:4789] Status update TASK_RUNNING (UUID: 563864b0-8780-4fd3-a106-041600599e2e) for task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS >Reporter: SERGEY GALKIN > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 1, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullI
[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
[ https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4835: - Sprint: (was: Mesosphere Sprint 31) > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky > - > > Key: MESOS-4835 > URL: https://issues.apache.org/jira/browse/MESOS-4835 > Project: Mesos > Issue Type: Bug > Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9 >Reporter: Joseph Wu > Labels: flaky, mesosphere, test > > Verbose logs: > {code} > [ RUN ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess > I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test after 139.46496ms > I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos_test after 141.811968ms > ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure > Value of: ::waitpid(pid, &status, 0) > Actual: 23809 > Expected: -1 > ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure > Value of: (*__errno_location ()) > Actual: 0 > Expected: 10 > [ FAILED ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
[ https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4878: -- Fix Version/s: 0.28.1 > Task stuck in TASK_STAGING when docker fetcher failed to fetch the image > > > Key: MESOS-4878 > URL: https://issues.apache.org/jira/browse/MESOS-4878 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.28.0 >Reporter: Shuai Lin >Assignee: Shuai Lin > Fix For: 0.28.1 > > > When a task is launched with the mesos containerizer and a docker image, if > the docker fetcher failed to pull the image, no more task updates are sent to > the scheduler. > {code} > I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test/store/docker/staging/V2dqJv' > E0306 17:29:00.749889 17651 slave.cpp:3773] Container > '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of > framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect > failed: Unexpected HTTP response '401 Unauthorized' when trying to get the > manifest > I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container > '6b98026b-a58d-434c-9432-b517012edc35' > I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators > to complete preparing before destroying the container > I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor > ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73 > {code} > Scheduler logs: > {code} > sudo ./build/src/mesos-execute --docker_image=alpine:latest > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=33.33.33.33:5050 > WARNING: Logging before InitGoogleLogging() is written to STDERR > W0306 17:28:57.491081 17740 sched.cpp:1642] > ** > Scheduler driver bound to loopback interface! Cannot communicate with remote > master(s). You might want to set 'LIBPROCESS_IP' environment variable to use > a routable IP address. > ** > I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0 > I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at > master@33.33.33.33:5050 > I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with > a4ff93ba-2141-48e2-92a9-7354e4028282- > Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282- > task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4878) Task stuck in TASK_STAGING when docker fetcher failed to fetch the image
[ https://issues.apache.org/jira/browse/MESOS-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4878: -- Priority: Critical (was: Major) > Task stuck in TASK_STAGING when docker fetcher failed to fetch the image > > > Key: MESOS-4878 > URL: https://issues.apache.org/jira/browse/MESOS-4878 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.28.0 >Reporter: Shuai Lin >Assignee: Shuai Lin >Priority: Critical > Fix For: 0.28.1 > > > When a task is launched with the mesos containerizer and a docker image, if > the docker fetcher failed to pull the image, no more task updates are sent to > the scheduler. > {code} > I0306 17:28:57.627169 17647 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test/store/docker/staging/V2dqJv' > E0306 17:29:00.749889 17651 slave.cpp:3773] Container > '6b98026b-a58d-434c-9432-b517012edc35' for executor 'just-a-test' of > framework a4ff93ba-2141-48e2-92a9-7354e4028282- failed to start: Collect > failed: Unexpected HTTP response '401 Unauthorized' when trying to get the > manifest > I0306 17:29:00.751579 17646 containerizer.cpp:1392] Destroying container > '6b98026b-a58d-434c-9432-b517012edc35' > I0306 17:29:00.752188 17646 containerizer.cpp:1395] Waiting for the isolators > to complete preparing before destroying the container > I0306 17:29:57.618649 17649 slave.cpp:4322] Terminating executor > ''just-a-test' of framework a4ff93ba-2141-48e2-92a9-73 > {code} > Scheduler logs: > {code} > sudo ./build/src/mesos-execute --docker_image=alpine:latest > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=33.33.33.33:5050 > WARNING: Logging before InitGoogleLogging() is written to STDERR > W0306 17:28:57.491081 17740 sched.cpp:1642] > ** > Scheduler driver bound to loopback interface! Cannot communicate with remote > master(s). You might want to set 'LIBPROCESS_IP' environment variable to use > a routable IP address. > ** > I0306 17:28:57.498028 17740 sched.cpp:222] Version: 0.29.0 > I0306 17:28:57.533071 17761 sched.cpp:326] New master detected at > master@33.33.33.33:5050 > I0306 17:28:57.536761 17761 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0306 17:28:57.557729 17759 sched.cpp:703] Framework registered with > a4ff93ba-2141-48e2-92a9-7354e4028282- > Framework registered with a4ff93ba-2141-48e2-92a9-7354e4028282- > task just-a-test submitted to slave a4ff93ba-2141-48e2-92a9-7354e4028282-S0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky
[ https://issues.apache.org/jira/browse/MESOS-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4984: --- Description: Observed on Arch Linux with GCC 6, running in a virtualbox VM: [ RUN ] MasterTest.SlavesEndpointTwoSlaves /mesos-2/src/tests/master_tests.cpp:1710: Failure Value of: array.get().values.size() Actual: 1 Expected: 2u Which is: 2 [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) Seems to fail non-deterministically, perhaps more often when there is concurrent CPU load on the machine. was: Observed on Arch Linux with GCC 6, running in a virtualbox VM: [ RUN ] MasterTest.SlavesEndpointTwoSlaves /mesos-2/src/tests/master_tests.cpp:1710: Failure Value of: array.get().values.size() Actual: 1 Expected: 2u Which is: 2 [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) Hasn't repro'd yet. > MasterTest.SlavesEndpointTwoSlaves is flaky > --- > > Key: MESOS-4984 > URL: https://issues.apache.org/jira/browse/MESOS-4984 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Neil Conway > Labels: flaky-test, mesosphere > > Observed on Arch Linux with GCC 6, running in a virtualbox VM: > [ RUN ] MasterTest.SlavesEndpointTwoSlaves > /mesos-2/src/tests/master_tests.cpp:1710: Failure > Value of: array.get().values.size() > Actual: 1 > Expected: 2u > Which is: 2 > [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) > Seems to fail non-deterministically, perhaps more often when there is > concurrent CPU load on the machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.
[ https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202312#comment-15202312 ] Neil Conway commented on MESOS-4070: Not at the moment, but it is still intended to be standalone and might be separated in the future (along with libprocess). e.g., this is the reason that commits that span libprocess and/or stout and/or Mesos proper are not allowed. > numify() handles negative numbers inconsistently. > - > > Key: MESOS-4070 > URL: https://issues.apache.org/jira/browse/MESOS-4070 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Jie Yu >Assignee: Yong Tang > Labels: tech-debt > > As pointed by [~neilc] in this review: > https://reviews.apache.org/r/40988 > {noformat} > Try num2 = numify("-10"); > EXPECT_SOME_EQ(-10, num2); > // TODO(neilc): This is inconsistent with the handling of non-hex numbers. > EXPECT_ERROR(numify("-0x10")); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.
[ https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4985: -- Sprint: Mesosphere Sprint 31 > Destroy a container while it's provisioning can lead to leaked provisioned > directories. > --- > > Key: MESOS-4985 > URL: https://issues.apache.org/jira/browse/MESOS-4985 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 >Reporter: Jie Yu >Assignee: Gilbert Song >Priority: Critical > Labels: mesosphere > Fix For: 0.28.1 > > > Here is the possible sequence of events: > 1) containerizer->launch > 2) provisioner->provision is called. it is fetching the image > 3) executor registration timed out > 4) containerizer->destroy is called > 5) container->state is still in PREPARING > 6) provisioner->destroy is called > So we can be calling provisioner->destory while provisioner->provision hasn't > finished yet. provisioner->destroy might just skip since there's no > information about the container yet, and later, provisioner will prepare the > root filesystem. This root filesystem will not be destroyed as destroy > already finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.
[ https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4985: -- Component/s: containerization > Destroy a container while it's provisioning can lead to leaked provisioned > directories. > --- > > Key: MESOS-4985 > URL: https://issues.apache.org/jira/browse/MESOS-4985 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 0.28.0 >Reporter: Jie Yu >Assignee: Gilbert Song >Priority: Critical > Labels: mesosphere > Fix For: 0.28.1 > > > Here is the possible sequence of events: > 1) containerizer->launch > 2) provisioner->provision is called. it is fetching the image > 3) executor registration timed out > 4) containerizer->destroy is called > 5) container->state is still in PREPARING > 6) provisioner->destroy is called > So we can be calling provisioner->destory while provisioner->provision hasn't > finished yet. provisioner->destroy might just skip since there's no > information about the container yet, and later, provisioner will prepare the > root filesystem. This root filesystem will not be destroyed as destroy > already finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4112) Clean up libprocess gtest macros
[ https://issues.apache.org/jira/browse/MESOS-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Tang reassigned MESOS-4112: Assignee: Yong Tang > Clean up libprocess gtest macros > > > Key: MESOS-4112 > URL: https://issues.apache.org/jira/browse/MESOS-4112 > Project: Mesos > Issue Type: Task > Components: libprocess, test >Reporter: Michael Park >Assignee: Yong Tang > > This ticket is regarding the libprocess gtest helpers in > {{3rdparty/libprocess/include/process/gtest.hpp}}. > The pattern in this file seems to be a set of macros: > * {{AWAIT_ASSERT__FOR}} > * {{AWAIT_ASSERT_}} -- default of 15 seconds > * {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}} > * {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}} > * {{AWAIT_EXPECT__FOR}} > * {{AWAIT_EXPECT_}} -- default of 15 seconds > (1) {{AWAIT_EQ_FOR}} should be added for completeness. > (2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific > versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}. > We should adopt this pattern in these helpers as well. Keeping the pattern > above in mind, the following are missing: > * {{AWAIT_ASSERT_TRUE_FOR}} > * {{AWAIT_ASSERT_TRUE}} > * {{AWAIT_ASSERT_FALSE_FOR}} > * {{AWAIT_ASSERT_FALSE}} > * {{AWAIT_EXPECT_TRUE_FOR}} > * {{AWAIT_EXPECT_FALSE_FOR}} > (3) There are HTTP response related macros at the bottom of the file, e.g. > {{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their > {{ASSERT}} counterparts. > (4) The reason for (3) presumably is because we reach for {{EXPECT}} over > {{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. > If this is the case, it would be worthwhile considering whether macros such > as {{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than > {{AWAIT_ASSERT_READY}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4981) Framework (re-)register metric counters broken for calls made via scheduler driver
[ https://issues.apache.org/jira/browse/MESOS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-4981: -- Description: The counters {{master/messages_register_framework}} and {{master/messages_reregister_framework}} are no longer being incremented after the scheduler driver started sending {{Call}} messages to the master in Mesos 0.23. Either, we should think about adding new counter(s) for {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the existing code to correctly increment the counters. (was: The counters {{master/messages_register_framework}} and {master/messages_reregister_framework}} are no longer being incremented after the scheduler driver started sending {{Call}} messages to the master in Mesos 0.23. Either, we should think about adding new counter(s) for {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the existing code to correctly increment the counters.) > Framework (re-)register metric counters broken for calls made via scheduler > driver > -- > > Key: MESOS-4981 > URL: https://issues.apache.org/jira/browse/MESOS-4981 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Anand Mazumdar > Labels: mesosphere > > The counters {{master/messages_register_framework}} and > {{master/messages_reregister_framework}} are no longer being incremented > after the scheduler driver started sending {{Call}} messages to the master in > Mesos 0.23. Either, we should think about adding new counter(s) for > {{Subscribe}} calls to the master for both PID/HTTP frameworks or modify the > existing code to correctly increment the counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4984) MasterTest.SlavesEndpointTwoSlaves is flaky
[ https://issues.apache.org/jira/browse/MESOS-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4984: --- Attachment: slaves_endpoint_flaky_4984_verbose_log.txt > MasterTest.SlavesEndpointTwoSlaves is flaky > --- > > Key: MESOS-4984 > URL: https://issues.apache.org/jira/browse/MESOS-4984 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Neil Conway > Labels: flaky-test, mesosphere, tech-debt > Attachments: slaves_endpoint_flaky_4984_verbose_log.txt > > > Observed on Arch Linux with GCC 6, running in a virtualbox VM: > [ RUN ] MasterTest.SlavesEndpointTwoSlaves > /mesos-2/src/tests/master_tests.cpp:1710: Failure > Value of: array.get().values.size() > Actual: 1 > Expected: 2u > Which is: 2 > [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) > Seems to fail non-deterministically, perhaps more often when there is > concurrent CPU load on the machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4969) improve overlayfs detection
[ https://issues.apache.org/jira/browse/MESOS-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200077#comment-15200077 ] James Peach commented on MESOS-4969: {{/sys/module/overlay}} would only exist once he module is loaded right? AFAICT you need something to trigger loading the module in the first place. > improve overlayfs detection > --- > > Key: MESOS-4969 > URL: https://issues.apache.org/jira/browse/MESOS-4969 > Project: Mesos > Issue Type: Bug > Components: isolation, volumes >Reporter: James Peach >Priority: Minor > > On my Fedora 23, overlayfs is a module that is not loaded by default > (attempting to mount an overlayfs automatically triggers the module loading). > However {{mesos-slave}} won't start until I manually load the module since it > is not listed in {{/proc/filesystems}} until is it loaded. > It would be nice if there was a more reliable way to determine overlayfs > support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4744) mesos-execute should allow setting role
[ https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Qiu updated MESOS-4744: Summary: mesos-execute should allow setting role (was: mesos-execute should allow setting role and command uris) > mesos-execute should allow setting role > --- > > Key: MESOS-4744 > URL: https://issues.apache.org/jira/browse/MESOS-4744 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Assignee: Jian Qiu >Priority: Minor > > It will be quite useful if we can set role and command uris when running > mesos-execute -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4963) Incorrect CXXFLAGS with GCC 6
[ https://issues.apache.org/jira/browse/MESOS-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4963: --- Shepherd: Joris Van Remoortere > Incorrect CXXFLAGS with GCC 6 > - > > Key: MESOS-4963 > URL: https://issues.apache.org/jira/browse/MESOS-4963 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > {noformat} > $ head config.log > [...] > /mesos-2/configure --enable-optimize --disable-python CC=ccache > /home/vagrant/local/gcc/bin/gcc CXX=ccache /home/vagrant/local/gcc/bin/g++ > $ ~/local/gcc/bin/g++ --version > g++ (GCC) 6.0.0 20160227 (experimental) > Copyright (C) 2016 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > $ make V=0 > make[2]: Entering directory '/home/vagrant/build-mesos-2-gcc6/src' > CXX appc/libmesos_no_3rdparty_la-spec.lo > In file included from > /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/shell.hpp:22:0, > from > /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:56, > from /mesos-2/src/appc/spec.cpp:17: > /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp: > In instantiation of ‘int os::execlp(const char*, T ...) [with T = {const > char*, const char*, const char*, char*}]’: > /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fork.hpp:371:52: >required from here > /mesos-2/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp:151:18: > error: missing sentinel in function call [-Werror=format=] >return ::execlp(file, t...); > ^~~~ > cc1plus: all warnings being treated as errors > Makefile:5584: recipe for target 'appc/libmesos_no_3rdparty_la-spec.lo' failed > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4974) mesos-execute should allow setting command_uris
[ https://issues.apache.org/jira/browse/MESOS-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Qiu updated MESOS-4974: Description: Based on discussion in MESOS-4744, it will be helpful to let mesos-execute support setting uris in command info. We can add a flag: {code} --uris=uri1,uri2.. {code} and set other values in CommandInfo::URI as default. was: Based on discussion in MESOS-4744, it will be helpful to let mesos-execute support setting uris in command info. We can add a flag: {code} --uris=uri1,uri2.. {code} and set other values in CommandInfo::URIS as default. > mesos-execute should allow setting command_uris > --- > > Key: MESOS-4974 > URL: https://issues.apache.org/jira/browse/MESOS-4974 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Priority: Minor > > Based on discussion in MESOS-4744, it will be helpful to let mesos-execute > support setting uris in command info. > We can add a flag: > {code} > --uris=uri1,uri2.. > {code} > and set other values in CommandInfo::URI as default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4968) ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky
Greg Mann created MESOS-4968: Summary: ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky Key: MESOS-4968 URL: https://issues.apache.org/jira/browse/MESOS-4968 Project: Mesos Issue Type: Bug Components: tests Environment: Ubuntu 14.04 with clang, without libevent/SSL Reporter: Greg Mann Just observed on the ASF CI: {code} [ RUN ] ResourceOffersTest.ResourceOfferWithMultipleSlaves I0317 16:31:52.635798 32063 cluster.cpp:139] Creating default 'local' authorizer I0317 16:31:52.743732 32063 leveldb.cpp:174] Opened db in 107.706253ms I0317 16:31:52.782537 32063 leveldb.cpp:181] Compacted db in 38.758479ms I0317 16:31:52.782641 32063 leveldb.cpp:196] Created db iterator in 34392ns I0317 16:31:52.782662 32063 leveldb.cpp:202] Seeked to beginning of db in 10490ns I0317 16:31:52.782675 32063 leveldb.cpp:271] Iterated through 0 keys in the db in 7177ns I0317 16:31:52.782728 32063 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0317 16:31:52.783476 32094 recover.cpp:447] Starting replica recovery I0317 16:31:52.783738 32094 recover.cpp:473] Replica is in EMPTY status I0317 16:31:52.785109 32086 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9482)@172.17.0.2:43540 I0317 16:31:52.785851 32081 recover.cpp:193] Received a recover response from a replica in EMPTY status I0317 16:31:52.786602 32085 recover.cpp:564] Updating replica status to STARTING I0317 16:31:52.790009 32090 master.cpp:376] Master 0196163d-91f7-4337-9dd7-9fef49e8cd75 (76df5a57a9ca) started on 172.17.0.2:43540 I0317 16:31:52.790082 32090 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/Rxm8q7/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.29.0/_inst/share/mesos/webui" --work_dir="/tmp/Rxm8q7/master" --zk_session_timeout="10secs" I0317 16:31:52.790520 32090 master.cpp:423] Master only allowing authenticated frameworks to register I0317 16:31:52.790534 32090 master.cpp:428] Master only allowing authenticated slaves to register I0317 16:31:52.790541 32090 credentials.hpp:35] Loading credentials for authentication from '/tmp/Rxm8q7/credentials' I0317 16:31:52.790952 32090 master.cpp:468] Using default 'crammd5' authenticator I0317 16:31:52.791162 32090 master.cpp:537] Using default 'basic' HTTP authenticator I0317 16:31:52.791335 32090 master.cpp:571] Authorization enabled I0317 16:31:52.791538 32092 whitelist_watcher.cpp:77] No whitelist given I0317 16:31:52.791584 32089 hierarchical.cpp:144] Initialized hierarchical allocator process I0317 16:31:52.794270 32089 master.cpp:1806] The newly elected leader is master@172.17.0.2:43540 with id 0196163d-91f7-4337-9dd7-9fef49e8cd75 I0317 16:31:52.794325 32089 master.cpp:1819] Elected as the leading master! I0317 16:31:52.794342 32089 master.cpp:1508] Recovering from registrar I0317 16:31:52.794919 32093 registrar.cpp:307] Recovering registrar I0317 16:31:52.815771 32081 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 28.36765ms I0317 16:31:52.815862 32081 replica.cpp:320] Persisted replica status to STARTING I0317 16:31:52.816234 32096 recover.cpp:473] Replica is in STARTING status I0317 16:31:52.818017 32082 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (9484)@172.17.0.2:43540 I0317 16:31:52.818408 32090 recover.cpp:193] Received a recover response from a replica in STARTING status I0317 16:31:52.819475 32090 recover.cpp:564] Updating replica status to VOTING I0317 16:31:52.840878 32090 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 21.14905ms I0317 16:31:52.840965 32090 replica.cpp:320] Persisted replica status to VOTING I0317 16:31:52.841151 32090 recover.cpp:578] Successfully joined the Paxos group I0317 16:31:52.841361 32090 recover.cpp:462] Recover process terminated I0317 16:31:52.842133 32090 log.cpp:659] Attempting to start the writer I0317 16:31:52.843859 32090 replica.cpp:493] Replica received implicit promise request from (9485)@172.17.0.2:43540 with proposal 1 I0317 16:31:52.86
[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.
[ https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202293#comment-15202293 ] Neil Conway commented on MESOS-4070: I didn't have an explicit use case in mind, but not supporting negative hex numbers seems needlessly inconsistent. {{stout}} is intended to be a general-purpose library, so only supporting the exact functionality that happens to be used by Mesos at the moment is generally not a good rule of thumb. > numify() handles negative numbers inconsistently. > - > > Key: MESOS-4070 > URL: https://issues.apache.org/jira/browse/MESOS-4070 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Jie Yu >Assignee: Yong Tang > Labels: tech-debt > > As pointed by [~neilc] in this review: > https://reviews.apache.org/r/40988 > {noformat} > Try num2 = numify("-10"); > EXPECT_SOME_EQ(-10, num2); > // TODO(neilc): This is inconsistent with the handling of non-hex numbers. > EXPECT_ERROR(numify("-0x10")); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4885) Unzip should force overwrite
[ https://issues.apache.org/jira/browse/MESOS-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomasz Janiszewski reassigned MESOS-4885: - Assignee: Tomasz Janiszewski > Unzip should force overwrite > > > Key: MESOS-4885 > URL: https://issues.apache.org/jira/browse/MESOS-4885 > Project: Mesos > Issue Type: Bug > Components: fetcher >Reporter: Tomasz Janiszewski >Assignee: Tomasz Janiszewski >Priority: Trivial > > Consider situation when zip file is malformed and contains duplicated files . > When fetcher downloads malformed zip file, that contains duplicated files > (e.g., dist zips generated by gradle could have duplicated files in libs dir) > and try to uncompress it, deployment hang in staged phase because unzip > prompt if file should be replaced. unzip should overrite this file or break > with error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4985) Destroy a container while it's provisioning can lead to leaked provisioned directories.
[ https://issues.apache.org/jira/browse/MESOS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4985: -- Labels: mesosphere (was: ) > Destroy a container while it's provisioning can lead to leaked provisioned > directories. > --- > > Key: MESOS-4985 > URL: https://issues.apache.org/jira/browse/MESOS-4985 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 >Reporter: Jie Yu >Assignee: Gilbert Song >Priority: Critical > Labels: mesosphere > Fix For: 0.28.1 > > > Here is the possible sequence of events: > 1) containerizer->launch > 2) provisioner->provision is called. it is fetching the image > 3) executor registration timed out > 4) containerizer->destroy is called > 5) container->state is still in PREPARING > 6) provisioner->destroy is called > So we can be calling provisioner->destory while provisioner->provision hasn't > finished yet. provisioner->destroy might just skip since there's no > information about the container yet, and later, provisioner will prepare the > root filesystem. This root filesystem will not be destroyed as destroy > already finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4983) Segfault in ProcessTest.Spawn with GCC 6
[ https://issues.apache.org/jira/browse/MESOS-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202268#comment-15202268 ] Neil Conway commented on MESOS-4983: Doesn't appear to repro if {{--enable-optimize}} is not specified. This might also be a GCC bug. > Segfault in ProcessTest.Spawn with GCC 6 > > > Key: MESOS-4983 > URL: https://issues.apache.org/jira/browse/MESOS-4983 > Project: Mesos > Issue Type: Bug > Components: libprocess, tests >Reporter: Neil Conway > Labels: mesosphere > > {{ProcessTest.Spawn}} fails deterministically for me with GCC 6 and > {{--enable-optimize}}. Recent Arch Linux, GCC "6.0.0 20160227". > {noformat} > [ RUN ] ProcessTest.Spawn > *** Aborted at 145817 (unix time) try "date -d @145817" if you are > using GNU date *** > PC: @ 0x522926 SpawnProcess::initialize() > *** SIGSEGV (@0x0) received by PID 11359 (TID 0x7faa6075f700) from PID 0; > stack trace: *** > @ 0x7faa670dbe80 (unknown) > @ 0x522926 SpawnProcess::initialize() > @ 0x646fa6 process::ProcessManager::resume() > @ 0x6471ff > _ZNSt6thread11_State_implISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv > @ 0x7faa6764a812 execute_native_thread_routine > @ 0x7faa670d2424 start_thread > @ 0x7faa65b04cbd __clone > @0x0 (unknown) > Makefile:1748: recipe for target 'check-local' failed > make[5]: *** [check-local] Segmentation fault (core dumped) > {noformat} > Backtrace: > {noformat} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) > at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373 > 1373void GetValueAndDelete() const { delete this; } > [Current thread is 1 (Thread 0x7faa6075f700 (LWP 11365))] > (gdb) bt > #0 testing::internal::ActionResultHolder::GetValueAndDelete (this=0x0) > at 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1373 > #1 testing::internal::FunctionMockerBase::InvokeWith(std::tuple<> > const&) (args=empty std::tuple, this=0x712a7c88) at > 3rdparty/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1530 > #2 testing::internal::FunctionMocker::Invoke() > (this=0x712a7c88) at > 3rdparty/gmock-1.7.0/include/gmock/gmock-generated-function-mockers.h:76 > #3 SpawnProcess::initialize (this=0x712a7c80) at > /mesos-2/3rdparty/libprocess/src/tests/process_tests.cpp:113 > #4 0x00646fa6 in process::ProcessManager::resume (this=0x25a2b60, > process=0x712a7d38) at /mesos-2/3rdparty/libprocess/src/process.cpp:2504 > #5 0x006471ff in process::ProcessManager:: atomic_bool&)>::operator() (__closure=, joining=...) at > /mesos-2/3rdparty/libprocess/src/process.cpp:2218 > #6 std::_Bind atomic_bool&)>(std::reference_wrapper > >)>::__call (__args=, this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:943 > #7 std::_Bind atomic_bool&)>(std::reference_wrapper > >)>::operator()<> (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1002 > #8 > std::_Bind_simple atomic_bool&)>(std::reference_wrapper > >)>()>::_M_invoke<> (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1400 > #9 > std::_Bind_simple atomic_bool&)>(std::reference_wrapper > >)>()>::operator() (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/functional:1389 > #10 > std::thread::_State_impl atomic_bool&)>(std::reference_wrapper >)>()> > >::_M_run(void) (this=) at > /home/vagrant/local/gcc/include/c++/6.0.0/thread:196 > #11 0x7faa6764a812 in std::(anonymous > namespace)::execute_native_thread_routine (__p=0x25a3bf0) at > ../../../../../gcc-trunk/libstdc++-v3/src/c++11/thread.cc:83 > #12 0x7faa670d2424 in start_thread () from /usr/lib/libpthread.so.0 > #13 0x7faa65b04cbd in clone () from /usr/lib/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4070) numify() handles negative numbers inconsistently.
[ https://issues.apache.org/jira/browse/MESOS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202306#comment-15202306 ] Cong Wang commented on MESOS-4070: -- Understand, but the fact is that stout is never shipped separately as a library? > numify() handles negative numbers inconsistently. > - > > Key: MESOS-4070 > URL: https://issues.apache.org/jira/browse/MESOS-4070 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Jie Yu >Assignee: Yong Tang > Labels: tech-debt > > As pointed by [~neilc] in this review: > https://reviews.apache.org/r/40988 > {noformat} > Try num2 = numify("-10"); > EXPECT_SOME_EQ(-10, num2); > // TODO(neilc): This is inconsistent with the handling of non-hex numbers. > EXPECT_ERROR(numify("-0x10")); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4112) Clean up libprocess gtest macros
[ https://issues.apache.org/jira/browse/MESOS-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202511#comment-15202511 ] Yong Tang commented on MESOS-4112: -- Hi [~mcypark], I created a review request: https://reviews.apache.org/r/45070/ and would appreciate if you have a chance to take a look. In this review request I get the item (1) and (2) in your list done. For item (3) and (4) I would like your confirmation before I move forward. Let me know what you think and I will get it done. Thanks! > Clean up libprocess gtest macros > > > Key: MESOS-4112 > URL: https://issues.apache.org/jira/browse/MESOS-4112 > Project: Mesos > Issue Type: Task > Components: libprocess, test >Reporter: Michael Park >Assignee: Yong Tang > > This ticket is regarding the libprocess gtest helpers in > {{3rdparty/libprocess/include/process/gtest.hpp}}. > The pattern in this file seems to be a set of macros: > * {{AWAIT_ASSERT__FOR}} > * {{AWAIT_ASSERT_}} -- default of 15 seconds > * {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}} > * {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}} > * {{AWAIT_EXPECT__FOR}} > * {{AWAIT_EXPECT_}} -- default of 15 seconds > (1) {{AWAIT_EQ_FOR}} should be added for completeness. > (2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific > versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}. > We should adopt this pattern in these helpers as well. Keeping the pattern > above in mind, the following are missing: > * {{AWAIT_ASSERT_TRUE_FOR}} > * {{AWAIT_ASSERT_TRUE}} > * {{AWAIT_ASSERT_FALSE_FOR}} > * {{AWAIT_ASSERT_FALSE}} > * {{AWAIT_EXPECT_TRUE_FOR}} > * {{AWAIT_EXPECT_FALSE_FOR}} > (3) There are HTTP response related macros at the bottom of the file, e.g. > {{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their > {{ASSERT}} counterparts. > (4) The reason for (3) presumably is because we reach for {{EXPECT}} over > {{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. > If this is the case, it would be worthwhile considering whether macros such > as {{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than > {{AWAIT_ASSERT_READY}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)