[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181609#comment-15181609
 ] 

haosdent commented on MESOS-4869:
-

I try 
{code}
./src/mesos-health-check --executor=\(1\)@localhost:8000 
--health_check_json='{"command":{"shell":true,"value":"docker exec 
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
 sh -c \" curl --silent -f -X GET http:\/\/www.google.com > \/dev\/null 
\""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0}'
 --task_id=mesos-test
{code}

to reproduce your problem in my machine. But after 1 hours, I saw the memory 
nearly don't have increase any. How long you could find it increase?

Or could you try the above command directly in your side? 

> /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
> ---
>
> Key: MESOS-4869
> URL: https://issues.apache.org/jira/browse/MESOS-4869
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.1
>Reporter: Anthony Scalisi
>Priority: Critical
>
> We switched our health checks in Marathon from HTTP to COMMAND:
> {noformat}
> "healthChecks": [
> {
>   "protocol": "COMMAND",
>   "path": "/ops/ping",
>   "command": { "value": "curl --silent -f -X GET 
> http://$HOST:$PORT0/ops/ping > /dev/null" },
>   "gracePeriodSeconds": 90,
>   "intervalSeconds": 2,
>   "portIndex": 0,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 3
> }
>   ]
> {noformat}
> All our applications have the same health check (and /ops/ping endpoint).
> Even though we have the issue on all our Meos slaves, I'm going to focus on a 
> particular one: *mesos-slave-i-e3a9c724*.
> The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:
> !https://i.imgur.com/gbRf804.png!
> Here is a *docker ps* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724 # docker ps
> CONTAINER IDIMAGE   COMMAND  CREATED  
>STATUS  PORTS NAMES
> 4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31926->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
> 66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31939->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
> f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31656->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
> 880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago 
>Up 24 hours 0.0.0.0:31371->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
> 5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31500->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
> b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31382->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
> 5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31186->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
> 53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31839->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
> {noformat}
> Here is a *docker stats* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724  # docker stats
> CONTAINER   CPU %   MEM USAGE / LIMIT MEM %   
> NET I/O   BLOCK I/O
> 4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%  
> 1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
> 53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%  
> 419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
> 5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%  
> 423 MB / 526.5 MB 3.219 MB / 61.44 kB
> 5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%  
> 2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
> 66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%  
> 258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
> 880934c0049e10.07%  735.1 MB / 

[jira] [Commented] (MESOS-4874) overlayfs does not work with kerenel 4.2.3

2016-03-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181555#comment-15181555
 ] 

haosdent commented on MESOS-4874:
-

I think could add add method {{OverlayBackend::supported()}}

> overlayfs does not work with kerenel 4.2.3
> --
>
> Key: MESOS-4874
> URL: https://issues.apache.org/jira/browse/MESOS-4874
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The current {{overlay}} support logic is checking {{/proc/filesystems}} to 
> see if has the field of {{overlayfs}}, but this is not true for kernel 4.2 
> and higher version as the {{overlayfs}} has been renamed to {{overlay}}.
> https://lists.launchpad.net/kernel-packages/msg102430.html
> {code}
> root@mesos002:/home/gyliu# uname -r
> 4.2.3-040203-generic
> root@mesos002:/home/gyliu# lsmod  | grep overlay
> overlay45056  0 
> root@mesos002:/home/gyliu# cat /proc/filesystems | grep overlay
> nodev overlay
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4874) overlayfs does not work with kerenel 4.2.3

2016-03-04 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4874:
--

 Summary: overlayfs does not work with kerenel 4.2.3
 Key: MESOS-4874
 URL: https://issues.apache.org/jira/browse/MESOS-4874
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu



The current {{overlay}} support logic is checking {{/proc/filesystems}} to see 
if has the field of {{overlayfs}}, but this is not true for kernel 4.2 and 
higher version as the {{overlayfs}} has been renamed to {{overlay}}.

https://lists.launchpad.net/kernel-packages/msg102430.html

{code}
root@mesos002:/home/gyliu# uname -r
4.2.3-040203-generic
root@mesos002:/home/gyliu# lsmod  | grep overlay
overlay45056  0 
root@mesos002:/home/gyliu# cat /proc/filesystems | grep overlay
nodev   overlay
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4873) Add documentation about container image support.

2016-03-04 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4873:
-

 Summary: Add documentation about container image support.
 Key: MESOS-4873
 URL: https://issues.apache.org/jira/browse/MESOS-4873
 Project: Mesos
  Issue Type: Documentation
Reporter: Jie Yu
Assignee: Jie Yu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-3505:
--

Assignee: Guangya Liu

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>Assignee: Guangya Liu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4838) Update unavailable in batch to avoid several allocate(slaveId) call

2016-03-04 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181435#comment-15181435
 ] 

Klaus Ma commented on MESOS-4838:
-

[~jvanremoortere], Yes, I'd like to contribute some improvements around 
maintenance; I create a working group 
(mesos-maintenance-working-gr...@googlegroups.com) for more discussion.

> Update unavailable in batch to avoid several allocate(slaveId) call
> ---
>
> Key: MESOS-4838
> URL: https://issues.apache.org/jira/browse/MESOS-4838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> In "/machine/schedule", all machines in master will trigger a 
> {{allocate(slaveId)}} which will increase the workload of master. The 
> proposal of this JIRA is to update unavailable in batch to avoid several 
> {{allocate(slaveId)}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181429#comment-15181429
 ] 

Guangya Liu commented on MESOS-3505:


Thanks [~xujyan], got it. Yes, the docker tag is not immutable, it is more 
accurate to use digest to pull a docker images.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4872) Dump the contents of the sandbox when a test fails

2016-03-04 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-4872:


Assignee: Joseph Wu

> Dump the contents of the sandbox when a test fails
> --
>
> Key: MESOS-4872
> URL: https://issues.apache.org/jira/browse/MESOS-4872
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, newbie, test
>
> [~bernd-mesos] added this logic for extra info about a rare flaky test:
> https://github.com/apache/mesos/blob/d26baee1f377aedb148ad04cc004bb38b85ee4f6/src/tests/fetcher_cache_tests.cpp#L249-L259
> This information is useful regardless of the test type and should be 
> generalized for {{cluster::Slave}}.  i.e. 
> # When a {{cluster::Slave}} is destructed, it can detect if the test has 
> failed.  
> # If so, navigate through its own {{work_dir}} and print sandboxes and/or 
> other useful debugging info.
> Also see the refactor in [MESOS-4634].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4872) Dump the contents of the sandbox when a test fails

2016-03-04 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4872:


 Summary: Dump the contents of the sandbox when a test fails
 Key: MESOS-4872
 URL: https://issues.apache.org/jira/browse/MESOS-4872
 Project: Mesos
  Issue Type: Improvement
  Components: test
Reporter: Joseph Wu


[~bernd-mesos] added this logic for extra info about a rare flaky test:
https://github.com/apache/mesos/blob/d26baee1f377aedb148ad04cc004bb38b85ee4f6/src/tests/fetcher_cache_tests.cpp#L249-L259

This information is useful regardless of the test type and should be 
generalized for {{cluster::Slave}}.  i.e. 
# When a {{cluster::Slave}} is destructed, it can detect if the test has 
failed.  
# If so, navigate through its own {{work_dir}} and print sandboxes and/or other 
useful debugging info.
Also see the refactor in [MESOS-4634].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4871) Make use of C++11 `override` keyword

2016-03-04 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4871:
--

 Summary: Make use of C++11 `override` keyword
 Key: MESOS-4871
 URL: https://issues.apache.org/jira/browse/MESOS-4871
 Project: Mesos
  Issue Type: Improvement
  Components: general
Reporter: Neil Conway


Per Google C++ style guide (as well as general common sense), we should 
probably be using the {{override}} keyword to explicitly denote situations 
where we expect a virtual member function declaration to override a virtual 
function declared in a parent class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4849) Add agent flags for HTTP authentication

2016-03-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4849:
-
Description: 
Flags should be added to the agent to:
1. Enable HTTP authentication ({{--authenticate_http}})
2. Specify credentials ({{--http_credentials}})
3. Specify HTTP authenticators ({{--authenticators}})

  was:
Flags should be added to the agent to:
1. Enable HTTP authentication ({{--authenticate_http}})
2. Specify credentials ({{--credentials}})
3. Specify HTTP authenticators ({{--authenticators}})


> Add agent flags for HTTP authentication
> ---
>
> Key: MESOS-4849
> URL: https://issues.apache.org/jira/browse/MESOS-4849
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Flags should be added to the agent to:
> 1. Enable HTTP authentication ({{--authenticate_http}})
> 2. Specify credentials ({{--http_credentials}})
> 3. Specify HTTP authenticators ({{--authenticators}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180710#comment-15180710
 ] 

Ian Downes commented on MESOS-3505:
---

It's not super urgent, more that as we decide what to do in Aurora we need to 
determine if this is going to land in the near term or not.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180709#comment-15180709
 ] 

Yan Xu commented on MESOS-3505:
---

Sorry for the delay of my response.

1) Because between when the image is fetched by the agent and when the user 
launches a task, the meaning of "debian:latest" could have changed. (Somebody 
pushed some changes)

2) Out-of-band. If they want a specific image ID, they would know, right? If 
they don't, they can continue to use a tag.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4849) Add agent flags for HTTP authentication

2016-03-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4849:
-
Description: 
Flags should be added to the agent to:
1. Enable HTTP authentication ({{--authenticate_http}})
2. Specify credentials ({{--credentials}})
3. Specify HTTP authenticators ({{--authenticators}})

  was:
`--authenticate_http`
`--http_authenticators`
etc.


> Add agent flags for HTTP authentication
> ---
>
> Key: MESOS-4849
> URL: https://issues.apache.org/jira/browse/MESOS-4849
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Flags should be added to the agent to:
> 1. Enable HTTP authentication ({{--authenticate_http}})
> 2. Specify credentials ({{--credentials}})
> 3. Specify HTTP authenticators ({{--authenticators}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4849) Add agent flags for HTTP authentication

2016-03-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4849:
-
Summary: Add agent flags for HTTP authentication  (was: Add agent flags for 
http authentication)

> Add agent flags for HTTP authentication
> ---
>
> Key: MESOS-4849
> URL: https://issues.apache.org/jira/browse/MESOS-4849
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> `--authenticate_http`
> `--http_authenticators`
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180704#comment-15180704
 ] 

Yan Xu commented on MESOS-3505:
---

+1

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4849) Add agent flags for http authentication

2016-03-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4849:
-
  Sprint: Mesosphere Sprint 30
Story Points: 2

> Add agent flags for http authentication
> ---
>
> Key: MESOS-4849
> URL: https://issues.apache.org/jira/browse/MESOS-4849
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> `--authenticate_http`
> `--http_authenticators`
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180694#comment-15180694
 ] 

Jie Yu commented on MESOS-4427:
---

Yes. Let me close this one.


> Ensure ip_address in state.json (from NetworkInfo) is valid
> ---
>
> Key: MESOS-4427
> URL: https://issues.apache.org/jira/browse/MESOS-4427
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> We have seen a master state.json where the state.json has a field that looks 
> similar to:
> ---REDACTED---
> {code:json}
> {
> "container": {
> "docker": {
> "force_pull_image": false,
> "image": "REDACTED",
> "network": "HOST",
> "privileged": false
> },
> "type": "DOCKER"
> },
> "executor_id": "",
> "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-",
> "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25",
> "name": "ping-as-a-service",
> "resources": {
> "cpus": 0.1,
> "disk": 0,
> "mem": 64,
> "ports": "[7907-7907]"
> },
> "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "container_status": {
> "network_infos": [
> {
> "ip_address": "",
> "ip_addresses": [
> {
> "ip_address": ""
> }
> ]
> }
> ]
> },
> "labels": [
> {
> "key": "Docker.NetworkSettings.IPAddress",
> "value": ""
> }
> ],
> "state": "TASK_RUNNING",
> "timestamp": 1453149270.95511
> }
> ]
> }
> {code}
> ---REDACTED---
> This is invalid, and it mesos-core should filter it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4868) PersistentVolumeTests do not need to set up ACLs.

2016-03-04 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180690#comment-15180690
 ] 

Yong Tang commented on MESOS-4868:
--

Added Review Request: https://reviews.apache.org/r/44408/

> PersistentVolumeTests do not need to set up ACLs.
> -
>
> Key: MESOS-4868
> URL: https://issues.apache.org/jira/browse/MESOS-4868
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Joseph Wu
>Assignee: Yong Tang
>  Labels: mesosphere, newbie, test
>
> The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in 
> the {{master::Flags}}:
> {code}
> ACLs acls;
> hashset roles;
> foreach (const FrameworkInfo& framework, frameworks) {
>   mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
>   acl->mutable_principals()->add_values(framework.principal());
>   acl->mutable_roles()->add_values(framework.role());
>   roles.insert(framework.role());
> }
> flags.acls = acls;
> flags.roles = strings::join(",", roles);
> {code}
> This is no longer necessary with implicit roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4868) PersistentVolumeTests do not need to set up ACLs.

2016-03-04 Thread Yong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Tang reassigned MESOS-4868:


Assignee: Yong Tang

> PersistentVolumeTests do not need to set up ACLs.
> -
>
> Key: MESOS-4868
> URL: https://issues.apache.org/jira/browse/MESOS-4868
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Joseph Wu
>Assignee: Yong Tang
>  Labels: mesosphere, newbie, test
>
> The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in 
> the {{master::Flags}}:
> {code}
> ACLs acls;
> hashset roles;
> foreach (const FrameworkInfo& framework, frameworks) {
>   mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
>   acl->mutable_principals()->add_values(framework.principal());
>   acl->mutable_roles()->add_values(framework.role());
>   roles.insert(framework.role());
> }
> flags.acls = acls;
> flags.roles = strings::join(",", roles);
> {code}
> This is no longer necessary with implicit roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid

2016-03-04 Thread Sargun Dhillon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180686#comment-15180686
 ] 

Sargun Dhillon commented on MESOS-4427:
---

[~jieyu] This is closed, right?

> Ensure ip_address in state.json (from NetworkInfo) is valid
> ---
>
> Key: MESOS-4427
> URL: https://issues.apache.org/jira/browse/MESOS-4427
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> We have seen a master state.json where the state.json has a field that looks 
> similar to:
> ---REDACTED---
> {code:json}
> {
> "container": {
> "docker": {
> "force_pull_image": false,
> "image": "REDACTED",
> "network": "HOST",
> "privileged": false
> },
> "type": "DOCKER"
> },
> "executor_id": "",
> "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-",
> "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25",
> "name": "ping-as-a-service",
> "resources": {
> "cpus": 0.1,
> "disk": 0,
> "mem": 64,
> "ports": "[7907-7907]"
> },
> "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "container_status": {
> "network_infos": [
> {
> "ip_address": "",
> "ip_addresses": [
> {
> "ip_address": ""
> }
> ]
> }
> ]
> },
> "labels": [
> {
> "key": "Docker.NetworkSettings.IPAddress",
> "value": ""
> }
> ],
> "state": "TASK_RUNNING",
> "timestamp": 1453149270.95511
> }
> ]
> }
> {code}
> ---REDACTED---
> This is invalid, and it mesos-core should filter it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid

2016-03-04 Thread Sargun Dhillon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180685#comment-15180685
 ] 

Sargun Dhillon commented on MESOS-4427:
---

No.

> Ensure ip_address in state.json (from NetworkInfo) is valid
> ---
>
> Key: MESOS-4427
> URL: https://issues.apache.org/jira/browse/MESOS-4427
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> We have seen a master state.json where the state.json has a field that looks 
> similar to:
> ---REDACTED---
> {code:json}
> {
> "container": {
> "docker": {
> "force_pull_image": false,
> "image": "REDACTED",
> "network": "HOST",
> "privileged": false
> },
> "type": "DOCKER"
> },
> "executor_id": "",
> "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-",
> "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25",
> "name": "ping-as-a-service",
> "resources": {
> "cpus": 0.1,
> "disk": 0,
> "mem": 64,
> "ports": "[7907-7907]"
> },
> "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "container_status": {
> "network_infos": [
> {
> "ip_address": "",
> "ip_addresses": [
> {
> "ip_address": ""
> }
> ]
> }
> ]
> },
> "labels": [
> {
> "key": "Docker.NetworkSettings.IPAddress",
> "value": ""
> }
> ],
> "state": "TASK_RUNNING",
> "timestamp": 1453149270.95511
> }
> ]
> }
> {code}
> ---REDACTED---
> This is invalid, and it mesos-core should filter it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4848) Agent Authn Research Spike

2016-03-04 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180681#comment-15180681
 ] 

Greg Mann commented on MESOS-4848:
--

These changes look pretty straightforward, so I didn't go for a full design 
doc, just a short report. Summary can be found 
[here|https://docs.google.com/a/mesosphere.io/document/d/1697N7jiDfyDbpz8tiEoKtQHykvYWmnUl_mea4j6GRvg/edit?usp=sharing],
 and the document is linked to this ticket.

> Agent Authn Research Spike
> --
>
> Key: MESOS-4848
> URL: https://issues.apache.org/jira/browse/MESOS-4848
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Research the master authentication flags to see what changes will be 
> necessary for agent http authentication.
> Write up a 1-2 page summary/design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180659#comment-15180659
 ] 

Jie Yu commented on MESOS-3505:
---

Yeah, we'll support it (and I'd like to). In fact, we've already had that in 
mind when writing the code.
https://github.com/apache/mesos/blob/master/include/mesos/docker/spec.proto

I am just saying that there is some workaround if you guys need it urgently. 
[~idownes] Do you guys plan to do it? I can shepherd it for sure.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180650#comment-15180650
 ] 

Yan Xu commented on MESOS-3505:
---

Process is fine but it's enforced by humans which are error-prone.

The intention of this ticket was: since docker pull already supports the syntax 
of {{docker pull 
debian@sha256:cbbf2f9a99b47fc460d422812b6a5adff7dfee951d8fa2e4a98caa0382cfbdbf}}
 (See https://docs.docker.com/engine/reference/commandline/pull/), we should 
too.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180644#comment-15180644
 ] 

Ian Downes commented on MESOS-3505:
---

It's still not a guarantee because there's no enforcement of the name matching 
the content digest, which you do have with the id. With suitable policy it 
could go some way towards it though.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180628#comment-15180628
 ] 

Jie Yu edited comment on MESOS-3505 at 3/4/16 10:13 PM:


Yeah, docker itself does not enforce immutability. But you can enforce 
something like: the tag of a image should be the git commit sha to guarantee 
immutability. Or, you can ask users to use a uuid for their image tag.


was (Author: jieyu):
Yeah, docker itself does not enforce immutability. But you can enforce 
something like: the tag of a image should be the git commit sha to guarantee 
immutability.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180628#comment-15180628
 ] 

Jie Yu commented on MESOS-3505:
---

Yeah, docker itself does not enforce immutability. But you can enforce 
something like: the tag of a image should be the git commit sha to guarantee 
immutability.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4870) As a developer I WANT Mesos to provide a channel for richly structured error messages to surface from events like TASK_FAILED

2016-03-04 Thread James DeFelice (JIRA)
James DeFelice created MESOS-4870:
-

 Summary: As a developer I WANT Mesos to provide a channel for 
richly structured error messages to surface from events like TASK_FAILED
 Key: MESOS-4870
 URL: https://issues.apache.org/jira/browse/MESOS-4870
 Project: Mesos
  Issue Type: Improvement
Reporter: James DeFelice


For example, a storage module attempts to mount a volume into my task's 
container. The mount operation fails because the file system driver required by 
the volume type isn't available on the host. Mesos generates a TASK_FAILED 
event and passes along the failure message generated by the module.

If I'm LUCKY then the module populates the failure message with some text that 
explains the nature of the problem and the rich Mesos console that I'm using 
surfaces the nicely formatted text message.

If I'm UNLUCKY then the module populates the failure message with something 
cryptic that doesn't help me understand what went wrong at all. I'm left with 
little context with which to troubleshoot the problem and my rich Mesos console 
can't help because there's very little additional context that shipped with the 
TASK_FAILED event.

What I WANT is additional context so that my rich Mesos console can offer 
features like:
a) tell me which subsystem/module failed (subsystem="storage", 
modulename="libfoobaz") and subsystem-specific details (storageprovider="foo" 
providerversion=0.1)
b) provide an OS process details:
i) the OS command line that failed
ii) the UID of the process that failed
iii) the GID of the process that failed
iv) the environment of the command line that failed
v) the error code that the process exited with
c) how many time this type of error has happened, for this (or other) 
frameworks, and when



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Joshua Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180615#comment-15180615
 ] 

Joshua Cohen commented on MESOS-3505:
-

Maybe I don't fully understand the problem space, but I don't think unique tags 
solve the problem of immutability.

My understanding is that Docker tags can be updated to point to a new revision. 
Docker is aware of this, but their recommended solution is simply a process one 
(https://github.com/docker/docker/issues/3158). Other than explicitly 
identifying an image by its digest or image id, I don't see how to guarantee 
immutability?

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180605#comment-15180605
 ] 

Jie Yu edited comment on MESOS-3505 at 3/4/16 9:52 PM:
---

Can you ask users to make sure 'tag' is unique? That's the simplest work around 
at this moment.



was (Author: jieyu):
Can you ask uses to make sure 'tag' is unique? That's the simplest work around 
at this moment.


> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180605#comment-15180605
 ] 

Jie Yu commented on MESOS-3505:
---

Can you ask uses to make sure 'tag' is unique? That's the simplest work around 
at this moment.


> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180602#comment-15180602
 ] 

Ian Downes commented on MESOS-3505:
---

Yes. We're looking at improving the support for container images in Aurora, 
both Docker and AppC images, and task immutability is required.

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4859) Add explicit upgrade instructions to the docs

2016-03-04 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180593#comment-15180593
 ] 

Vinod Kone commented on MESOS-4859:
---

I fixed some of it when doing the 0.28.0 release. 

But the bigger thing I realized was that people people were describing API 
changes etc in upgrades.md and CHANGELOG, sometimes inconsistently. I think it 
would be great if we can cleanup upgrades.md to keep it specific to upgrade 
instructions (for e.g., compatiblity matrix that we talked about) and move all 
the other stuff to CHANGELOG. 

Thoughts?

> Add explicit upgrade instructions to the docs
> -
>
> Key: MESOS-4859
> URL: https://issues.apache.org/jira/browse/MESOS-4859
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Greg Mann
>  Labels: documentation, mesosphere
>
> The documentation currently contains per-version upgrade guidelines, which 
> for recent releases only outlines the upgrade concerns for that version, 
> without detailing explicit upgrade instructions.
> We should add explicit upgrade instructions to the top of the upgrades 
> documentation, which can be supplemented by the per-version concerns.
> This is done within the upgrade docs for some early versions, with text like:
> {code}
> In order to upgrade a running cluster:
> Install the new master binaries and restart the masters.
> Upgrade the schedulers by linking the latest native library and mesos jar (if 
> necessary).
> Restart the schedulers.
> Install the new slave binaries and restart the slaves.
> Upgrade the executors by linking the latest native library and mesos jar (if 
> necessary).
> {code}
> Instructions to this effect should be featured prominently in the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180580#comment-15180580
 ] 

Jie Yu commented on MESOS-3505:
---

[~idownes] I already unassigned the ticket. Do you guys need that?

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid

2016-03-04 Thread Martin Evgeniev (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180581#comment-15180581
 ] 

Martin Evgeniev commented on MESOS-4427:


Are you using Docker libnetwork plugin with custom network (overlay etc..)?

> Ensure ip_address in state.json (from NetworkInfo) is valid
> ---
>
> Key: MESOS-4427
> URL: https://issues.apache.org/jira/browse/MESOS-4427
> Project: Mesos
>  Issue Type: Bug
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> We have seen a master state.json where the state.json has a field that looks 
> similar to:
> ---REDACTED---
> {code:json}
> {
> "container": {
> "docker": {
> "force_pull_image": false,
> "image": "REDACTED",
> "network": "HOST",
> "privileged": false
> },
> "type": "DOCKER"
> },
> "executor_id": "",
> "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-",
> "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25",
> "name": "ping-as-a-service",
> "resources": {
> "cpus": 0.1,
> "disk": 0,
> "mem": 64,
> "ports": "[7907-7907]"
> },
> "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043",
> "state": "TASK_RUNNING",
> "statuses": [
> {
> "container_status": {
> "network_infos": [
> {
> "ip_address": "",
> "ip_addresses": [
> {
> "ip_address": ""
> }
> ]
> }
> ]
> },
> "labels": [
> {
> "key": "Docker.NetworkSettings.IPAddress",
> "value": ""
> }
> ],
> "state": "TASK_RUNNING",
> "timestamp": 1453149270.95511
> }
> ]
> }
> {code}
> ---REDACTED---
> This is invalid, and it mesos-core should filter it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-03-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180574#comment-15180574
 ] 

Ian Downes commented on MESOS-3505:
---

Is this work still underway? If [~xujyan] is no longer working on it then it 
should be unassigned. [~jieyu] do you plan to work on it?

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread Anthony Scalisi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Scalisi updated MESOS-4869:
---
Description: 
We switched our health checks in Marathon from HTTP to COMMAND:

{noformat}
"healthChecks": [
{
  "protocol": "COMMAND",
  "path": "/ops/ping",
  "command": { "value": "curl --silent -f -X GET 
http://$HOST:$PORT0/ops/ping > /dev/null" },
  "gracePeriodSeconds": 90,
  "intervalSeconds": 2,
  "portIndex": 0,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 3
}
  ]
{noformat}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a 
particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{noformat}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER IDIMAGE   COMMAND  CREATED
 STATUS  PORTS NAMES
4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31926->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31939->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31656->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago   
 Up 24 hours 0.0.0.0:31371->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31500->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31382->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31186->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31839->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{noformat}

Here is a *docker stats* on it:

{noformat}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER   CPU %   MEM USAGE / LIMIT MEM % 
  NET I/O   BLOCK I/O
4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%
  1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%
  419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%
  423 MB / 526.5 MB 3.219 MB / 61.44 kB
5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%
  2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%
  258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%
  1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e712.04%  629 MB / 1.611 GB 39.06%
  10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce6.21%   505 MB / 1.611 GB 31.36%
  153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{noformat}

Not much else is running on the slave, yet the used memory doesn't map to the 
tasks memory:

{noformat}
Mem:16047M used:13340M buffers:1139M cache:776M
{noformat}


If I exec into the container (*java:8* image), I can see correctly the shell 
calls to execute the curl specified in the health check as expected and exit 
correctly.

The only change we noticed since the memory usage woes was related to moving to 
Mesos doing the health checks instead, so I decided to take a look:

{noformat}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check 
--executor=(1)@10.92.32.63:53432 
--health_check_json={"command":{"shell":true,"value":"docker exec 
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/p

[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread Anthony Scalisi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Scalisi updated MESOS-4869:
---
Description: 
We switched our health checks in Marathon from HTTP to COMMAND:

{noformat}
"healthChecks": [
{
  "protocol": "COMMAND",
  "path": "/ops/ping",
  "command": { "value": "curl --silent -f -X GET 
http://$HOST:$PORT0/ops/ping > /dev/null" },
  "gracePeriodSeconds": 90,
  "intervalSeconds": 2,
  "portIndex": 0,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 3
}
  ]
{noformat}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a 
particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{noformat}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER IDIMAGE   COMMAND  CREATED
 STATUS  PORTS NAMES
4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31926->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31939->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31656->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago   
 Up 24 hours 0.0.0.0:31371->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31500->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31382->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31186->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31839->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{noformat}

Here is a *docker stats* on it:

{noformat}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER   CPU %   MEM USAGE / LIMIT MEM % 
  NET I/O   BLOCK I/O
4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%
  1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%
  419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%
  423 MB / 526.5 MB 3.219 MB / 61.44 kB
5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%
  2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%
  258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%
  1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e712.04%  629 MB / 1.611 GB 39.06%
  10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce6.21%   505 MB / 1.611 GB 31.36%
  153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{noformat}

Not much else is running on the slave, yet the used memory doesn't map to the 
tasks memory:

{noformat}
Mem:16047M used:13340M buffers:1139M cache:776M
{noformat}


If I exec into the container (*java:8* image), I can see correctly the shell 
calls to execute the curl specified in the health check as expected and exit 
correctly.

The only change we noticed since the memory usage woes was related to moving to 
Mesos doing the health checks instead, so I decided to take a look:

{noformat}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check 
--executor=(1)@10.92.32.63:53432 
--health_check_json={"command":{"shell":true,"value":"docker exec 
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/p

[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread Anthony Scalisi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Scalisi updated MESOS-4869:
---
Description: 
We switched our health checks in Marathon from HTTP to COMMAND:

{noformat}
"healthChecks": [
{
  "protocol": "COMMAND",
  "path": "/ops/ping",
  "command": { "value": "curl --silent -f -X GET 
http://$HOST:$PORT0/ops/ping > /dev/null" },
  "gracePeriodSeconds": 90,
  "intervalSeconds": 2,
  "portIndex": 0,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 3
}
  ]
{noformat}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a 
particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{noformat}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER IDIMAGE   COMMAND  CREATED
 STATUS  PORTS NAMES
4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31926->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31939->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31656->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago   
 Up 24 hours 0.0.0.0:31371->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31500->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31382->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31186->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31839->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{noformat}

Here is a *docker stats* on it:

{quote}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER   CPU %   MEM USAGE / LIMIT MEM % 
  NET I/O   BLOCK I/O
4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%
  1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%
  419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%
  423 MB / 526.5 MB 3.219 MB / 61.44 kB
5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%
  2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%
  258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%
  1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e712.04%  629 MB / 1.611 GB 39.06%
  10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce6.21%   505 MB / 1.611 GB 31.36%
  153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{noformat}

Not much else is running on the slave, yet the used memory doesn't map to the 
tasks memory:

{noformat}
Mem:16047M used:13340M buffers:1139M cache:776M
{noformat}


If I exec into the container (*java:8* image), I can see correctly the shell 
calls to execute the curl specified in the health check as expected and exit 
correctly.

The only change we noticed since the memory usage woes was related to moving to 
Mesos doing the health checks instead, so I decided to take a look:

{noformat}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check 
--executor=(1)@10.92.32.63:53432 
--health_check_json={"command":{"shell":true,"value":"docker exec 
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping

[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread Anthony Scalisi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Scalisi updated MESOS-4869:
---
Description: 
We switched our health checks in Marathon from HTTP to COMMAND:

{noformat}
"healthChecks": [
{
  "protocol": "COMMAND",
  "path": "/ops/ping",
  "command": { "value": "curl --silent -f -X GET 
http://$HOST:$PORT0/ops/ping > /dev/null" },
  "gracePeriodSeconds": 90,
  "intervalSeconds": 2,
  "portIndex": 0,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 3
}
  ]
{noformat}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a 
particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{noformat}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER IDIMAGE   COMMAND  CREATED
 STATUS  PORTS NAMES
4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31926->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31939->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31656->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago   
 Up 24 hours 0.0.0.0:31371->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31500->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31382->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31186->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31839->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{quote}

Here is a *docker stats* on it:

{quote}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER   CPU %   MEM USAGE / LIMIT MEM % 
  NET I/O   BLOCK I/O
4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%
  1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%
  419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%
  423 MB / 526.5 MB 3.219 MB / 61.44 kB
5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%
  2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%
  258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%
  1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e712.04%  629 MB / 1.611 GB 39.06%
  10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce6.21%   505 MB / 1.611 GB 31.36%
  153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{noformat}

Not much else is running on the slave, yet the used memory doesn't map to the 
tasks memory:

{noformat}
Mem:16047M used:13340M buffers:1139M cache:776M
{noformat}


If I exec into the container (*java:8* image), I can see correctly the shell 
calls to execute the curl specified in the health check as expected and exit 
correctly.

The only change we noticed since the memory usage woes was related to moving to 
Mesos doing the health checks instead, so I decided to take a look:

{noformat}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check 
--executor=(1)@10.92.32.63:53432 
--health_check_json={"command":{"shell":true,"value":"docker exec 
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > 

[jira] [Created] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-03-04 Thread Anthony Scalisi (JIRA)
Anthony Scalisi created MESOS-4869:
--

 Summary: /usr/libexec/mesos/mesos-health-check using/leaking a lot 
of memory
 Key: MESOS-4869
 URL: https://issues.apache.org/jira/browse/MESOS-4869
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27.1
Reporter: Anthony Scalisi
Priority: Critical


We switched our health checks in Marathon from HTTP to COMMAND:

{quote}
"healthChecks": [
{
  "protocol": "COMMAND",
  "path": "/ops/ping",
  "command": { "value": "curl --silent -f -X GET 
http://$HOST:$PORT0/ops/ping > /dev/null" },
  "gracePeriodSeconds": 90,
  "intervalSeconds": 2,
  "portIndex": 0,
  "timeoutSeconds": 5,
  "maxConsecutiveFailures": 3
}
  ]
{quote}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a 
particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{quote}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER IDIMAGE   COMMAND  CREATED
 STATUS  PORTS NAMES
4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31926->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31939->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago
 Up 6 hours  0.0.0.0:31656->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago   
 Up 24 hours 0.0.0.0:31371->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31500->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago   
 Up 46 hours 0.0.0.0:31382->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31186->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago 
 Up 2 days   0.0.0.0:31839->8080/tcp   
mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{quote}

Here is a *docker stats* on it:

{quote}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER   CPU %   MEM USAGE / LIMIT MEM % 
  NET I/O   BLOCK I/O
4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%
  1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%
  419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%
  423 MB / 526.5 MB 3.219 MB / 61.44 kB
5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%
  2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%
  258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%
  1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e712.04%  629 MB / 1.611 GB 39.06%
  10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce6.21%   505 MB / 1.611 GB 31.36%
  153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{quote}

Not much else is running on the slave, yet the used memory doesn't map to the 
tasks memory:

{quote}
Mem:16047M used:13340M buffers:1139M cache:776M
{quote}


If I exec into the container (*java:8* image), I can see correctly the shell 
calls to execute the curl specified in the health check as expected and exit 
correctly.

The only change we noticed since the memory usage woes was related to moving to 
Mesos doing the health checks instead, so I decided to take a look:

{quote}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check 
--executor=(1)@10.92.32.63:53432 
--health_check_json={"command":{"

[jira] [Commented] (MESOS-4856) Some Mesos 0.27.1 API endpoints are empty

2016-03-04 Thread Anthony Scalisi (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180532#comment-15180532
 ] 

Anthony Scalisi commented on MESOS-4856:


Not sure what happened, but the endpoints are "back".

Closing I guess.

> Some Mesos 0.27.1 API endpoints are empty
> -
>
> Key: MESOS-4856
> URL: https://issues.apache.org/jira/browse/MESOS-4856
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.1
>Reporter: Anthony Scalisi
>Priority: Critical
>
> - /master/slaves returns an empty array (even though slaves are configured)
> - /master/tasks returns an empty array as well (even though plenty of tasks 
> are up)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4853) Considering using libcurl multi interface to implement 'curl' in Mesos.

2016-03-04 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180520#comment-15180520
 ] 

Neil Conway commented on MESOS-4853:


Another option might be to arrange (via a new libprocess API?) to temporarily 
get a dedicated OS thread in which to use the synchronous API. Not sure which 
would be cleaner/simpler.

> Considering using libcurl multi interface to implement 'curl' in Mesos.
> ---
>
> Key: MESOS-4853
> URL: https://issues.apache.org/jira/browse/MESOS-4853
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Klaus Ma
>
> Reference:
> https://curl.haxx.se/libcurl/c/libcurl-multi.html
> Currently, some URI fetchers rely on 'curl' command directly (using 
> subprocess). This is not ideal. The libcurl easy interface is blocking, so it 
> does not compose well with our async environment. However, the multi 
> interface seems to be suitable for our async environment. The tricky part is 
> that we need to hook the fd selecting logic with our underlying 
> libev/libevent runtime, but this should be doable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4838) Update unavailable in batch to avoid several allocate(slaveId) call

2016-03-04 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180507#comment-15180507
 ] 

Joris Van Remoortere commented on MESOS-4838:
-

[~klaus1982] I'm not sure why we need to do this.
1. Are you seeing performance issues with the {{allocate(slaveId)}} calls 
generated by the maintenance schedule?
2. If this is the case, why wouldn't the general batching proposal for the 
allocator cover this case? Why do we need to implement batching in specific API 
entry points?
3. If this is being suggested because a maintenance schedule tends to update 
many agents simultaneously, then would it make more sense to consider calling 
the batch {{allocate()}} function in the allocator after updating all the agent 
availabilities?

If you are interested in considering some improvements around maintenance, 
let's set up a working group. I know others are also interested in this 
feature, and I know [~kaysoky] would love to help guide these discussions.
We should discuss these kinds of larger changes and ideas in terms of their 
operational and development consequences before posting patches. (Though if you 
just want to try it out to understand the performance implications or what code 
would need to be touched that's totally fine; we just may decide to go in a 
very different direction).

> Update unavailable in batch to avoid several allocate(slaveId) call
> ---
>
> Key: MESOS-4838
> URL: https://issues.apache.org/jira/browse/MESOS-4838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> In "/machine/schedule", all machines in master will trigger a 
> {{allocate(slaveId)}} which will increase the workload of master. The 
> proposal of this JIRA is to update unavailable in batch to avoid several 
> {{allocate(slaveId)}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4868) PersistentVolumeTests do not need to set up ACLs.

2016-03-04 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4868:


 Summary: PersistentVolumeTests do not need to set up ACLs.
 Key: MESOS-4868
 URL: https://issues.apache.org/jira/browse/MESOS-4868
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt, test
Reporter: Joseph Wu


The {{PersistentVolumeTest}}s have a custom helper for setting up ACLs in the 
{{master::Flags}}:
{code}
ACLs acls;
hashset roles;

foreach (const FrameworkInfo& framework, frameworks) {
  mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
  acl->mutable_principals()->add_values(framework.principal());
  acl->mutable_roles()->add_values(framework.role());

  roles.insert(framework.role());
}

flags.acls = acls;
flags.roles = strings::join(",", roles);
{code}

This is no longer necessary with implicit roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4868) PersistentVolumeTests do not need to set up ACLs.

2016-03-04 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4868:
-
Description: 
The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in the 
{{master::Flags}}:
{code}
ACLs acls;
hashset roles;

foreach (const FrameworkInfo& framework, frameworks) {
  mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
  acl->mutable_principals()->add_values(framework.principal());
  acl->mutable_roles()->add_values(framework.role());

  roles.insert(framework.role());
}

flags.acls = acls;
flags.roles = strings::join(",", roles);
{code}

This is no longer necessary with implicit roles.

  was:
The {{PersistentVolumeTest}}s have a custom helper for setting up ACLs in the 
{{master::Flags}}:
{code}
ACLs acls;
hashset roles;

foreach (const FrameworkInfo& framework, frameworks) {
  mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
  acl->mutable_principals()->add_values(framework.principal());
  acl->mutable_roles()->add_values(framework.role());

  roles.insert(framework.role());
}

flags.acls = acls;
flags.roles = strings::join(",", roles);
{code}

This is no longer necessary with implicit roles.


> PersistentVolumeTests do not need to set up ACLs.
> -
>
> Key: MESOS-4868
> URL: https://issues.apache.org/jira/browse/MESOS-4868
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Joseph Wu
>  Labels: mesosphere, newbie, test
>
> The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in 
> the {{master::Flags}}:
> {code}
> ACLs acls;
> hashset roles;
> foreach (const FrameworkInfo& framework, frameworks) {
>   mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
>   acl->mutable_principals()->add_values(framework.principal());
>   acl->mutable_roles()->add_values(framework.role());
>   roles.insert(framework.role());
> }
> flags.acls = acls;
> flags.roles = strings::join(",", roles);
> {code}
> This is no longer necessary with implicit roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4625) Implement Nvidia GPU isolation w/o filesystem isolation enabled.

2016-03-04 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-4625:
---
Assignee: Robert Todd  (was: Kevin Klues)

> Implement Nvidia GPU isolation w/o filesystem isolation enabled.
> 
>
> Key: MESOS-4625
> URL: https://issues.apache.org/jira/browse/MESOS-4625
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Robert Todd
>
> The Nvidia GPU isolator will need to use the device cgroup to restrict access 
> to GPU resources, and will need to recover this information after agent 
> failover. For now this will require that the operator specifies the GPU 
> devices via a flag.
> To handle filesystem isolation requires that we provide mechanisms for 
> operators to inject volumes with the necessary libraries into all containers 
> using GPU resources, we'll tackle this in a separate ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4867) Consider suggesting `make install` in "Getting Started"

2016-03-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180405#comment-15180405
 ] 

James Peach commented on MESOS-4867:


libtool is supposed to deal with locating the shared libraries

> Consider suggesting `make install` in "Getting Started"
> ---
>
> Key: MESOS-4867
> URL: https://issues.apache.org/jira/browse/MESOS-4867
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>  Labels: documentation, mesosphere
>
> The current "Getting Started" instructions suggest running the example 
> frameworks from the build directory. On many platforms (e.g., OSX), this 
> doesn't work by default because the dynamic linker can't find {{libmesos}}. 
> Possible remedies:
> 1. Change docs to recommend {{make install}}
> 2. Change docs to recommend adding the appropriate build dirs to the dynamic 
> linker search path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4867) Consider suggesting `make install` in "Getting Started"

2016-03-04 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4867:
--

 Summary: Consider suggesting `make install` in "Getting Started"
 Key: MESOS-4867
 URL: https://issues.apache.org/jira/browse/MESOS-4867
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Neil Conway


The current "Getting Started" instructions suggest running the example 
frameworks from the build directory. On many platforms (e.g., OSX), this 
doesn't work by default because the dynamic linker can't find {{libmesos}}. 
Possible remedies:

1. Change docs to recommend {{make install}}
2. Change docs to recommend adding the appropriate build dirs to the dynamic 
linker search path.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4867) Consider suggesting `make install` in "Getting Started"

2016-03-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4867:
---
Labels: documentation mesosphere  (was: )

> Consider suggesting `make install` in "Getting Started"
> ---
>
> Key: MESOS-4867
> URL: https://issues.apache.org/jira/browse/MESOS-4867
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>  Labels: documentation, mesosphere
>
> The current "Getting Started" instructions suggest running the example 
> frameworks from the build directory. On many platforms (e.g., OSX), this 
> doesn't work by default because the dynamic linker can't find {{libmesos}}. 
> Possible remedies:
> 1. Change docs to recommend {{make install}}
> 2. Change docs to recommend adding the appropriate build dirs to the dynamic 
> linker search path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4838) Update unavailable in batch to avoid several allocate(slaveId) call

2016-03-04 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180130#comment-15180130
 ] 

Klaus Ma commented on MESOS-4838:
-

RR: https://reviews.apache.org/r/44396/

> Update unavailable in batch to avoid several allocate(slaveId) call
> ---
>
> Key: MESOS-4838
> URL: https://issues.apache.org/jira/browse/MESOS-4838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> In "/machine/schedule", all machines in master will trigger a 
> {{allocate(slaveId)}} which will increase the workload of master. The 
> proposal of this JIRA is to update unavailable in batch to avoid several 
> {{allocate(slaveId)}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4866) Added document for overlayfs backend.

2016-03-04 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4866:
--

 Summary: Added document for overlayfs backend.
 Key: MESOS-4866
 URL: https://issues.apache.org/jira/browse/MESOS-4866
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


The overlay fs backend was finished in MESOS-2971 , the document should also be 
updated to reflect this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.

2016-03-04 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179916#comment-15179916
 ] 

Klaus Ma commented on MESOS-3070:
-

[~vi...@twitter.com], after check the document about {{task_id}}, it said 
framework can only re-use task id after previous task finished. But in this 
case, it re-use task id before tasks finished (task is running, salve/master 
failed over). It seems we just need to highlight in document that it'll cause 
some un-execpted behaviour if did not follow task_id's rule?

> Master CHECK failure if a framework uses duplicated task id.
> 
>
> Key: MESOS-3070
> URL: https://issues.apache.org/jira/browse/MESOS-3070
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.22.1
>Reporter: Jie Yu
>Assignee: Klaus Ma
>
> We observed this in one of our testing cluster.
> One framework (under development) keeps launching tasks using the same 
> task_id. We don't expect the master to crash even if the framework is not 
> doing what it's supposed to do. However, under a series of events, this could 
> happen and keeps crashing the master.
> 1) frameworkA launches task 'task_id_1' on slaveA
> 2) master fails over
> 3) slaveA has not re-registered yet
> 4) frameworkA re-registered and launches task 'task_id_1' on slaveB
> 5) slaveA re-registering and add task "task_id_1' to frameworkA
> 6) CHECK failure in addTask
> {noformat}
> I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with 
> resources cpus(*):4; mem(*):32768 on slave 
> 20150417-232509-1735470090-5050-48870-S25 (hostname)
> ...
> ...
> F0716 21:52:50.760136 28805 master.hpp:362] Check failed: 
> !tasks.contains(task->task_id()) Duplicate task 'task_id_1' of framework 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-03-04 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179762#comment-15179762
 ] 

Jan Schlicht commented on MESOS-3937:
-

Do you logs for this? I don't think that the issues are related, because  
{{DockerContainerizerTest.ROOT_DOCKER_Kill}} doesn't launch a test executor and 
it fails because it can't resolve the containers hostname.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 1

[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-03-04 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179744#comment-15179744
 ] 

Jan Schlicht commented on MESOS-3937:
-

Seems to be related to 
https://github.com/docker/docker/issues/17190#issuecomment-149438025
The overwriting of {{/etc/hosts}} by Docker is inherently racy. Seems like our 
test setup triggers this behavior somehow or {{/etc/hosts}} isn't overwritten. 
Hostname resolution inside the container will fail if the hosts hostname isn't 
in the container's {{/etc/hosts}} if it couldn't be overwritten.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464]

[jira] [Issue Comment Deleted] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-03-04 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3937:

Comment: was deleted

(was: Do you have logs for that? I'm pretty sure that this issue is due to the 
test executor being written in Go. {{DockerContainerizerTest.ROOT_DOCKER_Kill}} 
doesn't use that executor, hence an issue there seem to be something different. 
You should probably file a separate JIRA for that.)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 

[jira] [Issue Comment Deleted] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-03-04 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3937:

Comment: was deleted

(was: Running {{sudo docker run --rm --hostname foo -it tnachen/test-executor 
cat /etc/hosts}} shows the expected behavior ({{/etc/hosts}} being changed) but 
running {{docker run --rm --hostname foo -it tnachen/test-executor 
./bin/test-executor & cat /etc/hosts}} shows the problem.
As the executor is written in Go it should be this docker bug: 
https://github.com/docker/docker/issues/17190. Racy conditions are triggered 
because Docker and the Executor are both written in Go, resulting in 
{{/etc/hosts}} not being changed by Docker, alas being the same as the one of 
the host. Because Mesos' containerizer reuses the agents hostname for 
containers, hostname resolution in the container won't fail if the agents 
{{/etc/hosts}} contains an entry for the hostame.)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd

[jira] [Commented] (MESOS-4189) Dynamic weights

2016-03-04 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179691#comment-15179691
 ] 

Adam B commented on MESOS-4189:
---

I spoke to [~jvanremoortere] about committing dynamic weights in time for Mesos 
0.28, and his biggest concern was the lack of tests for the registry changes 
(MESOS-4797), since the registry is a critical and complex piece of Mesos.
Additional TODOs from my conversation with Joris:
- MESOS-4316: He'd prefer to see a GET for /weights in the same release as the 
PUT/POST. We can't change what's displayed in /roles without going through a 
deprecation cycle, but we can show the weights in both places for now.
- (no JIRA yet): Rescind all outstanding offers, to facilitate satisfying the 
updated weights. Since the current behavior requires restarting the master, all 
outstanding offers would be rescinded anyway. This can be done in a new patch. 
See how quota handles rescinding offers after updating the allocator: 
https://github.com/apache/mesos/blob/0.28.0-rc1/src/master/quota_handler.cpp#L319
- MESOS-3945: Update docs to reference the change for 0.29 instead of 0.28. The 
0.28.0-rc1 has already been cut, so any non-critical fixes/features have to 
wait until 0.29, which should only be a month away.

That said, I'll review the current patches soon so we can get them committed 
for 0.29 and work on landing the registry tests and rescind behavior asap.

> Dynamic weights
> ---
>
> Key: MESOS-4189
> URL: https://issues.apache.org/jira/browse/MESOS-4189
> Project: Mesos
>  Issue Type: Epic
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>
> Mesos current uses a static list of weights that are configured when the 
> master startup(via the --weights flag), this place some limitation about 
> change the resource allocation priority for a role/frameworks(changing the 
> set of weights requires restarting all the masters). 
> This JIRA will add a new endpoint /weight to update/show weight of a role 
> with the authorized principles, and the non-default weights will be persisted 
> in registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-03-04 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179687#comment-15179687
 ] 

Jan Schlicht commented on MESOS-3937:
-

Do you have logs for that? I'm pretty sure that this issue is due to the test 
executor being written in Go. {{DockerContainerizerTest.ROOT_DOCKER_Kill}} 
doesn't use that executor, hence an issue there seem to be something different. 
You should probably file a separate JIRA for that.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I11

[jira] [Commented] (MESOS-4730) test-framework exits with SIGABRT

2016-03-04 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179668#comment-15179668
 ] 

Klaus Ma commented on MESOS-4730:
-

Based on my test, we will use the installation director to execute binaries or 
load libs. I'd suggest to update "Getting Started" to include "make install" 
and execute commands in installed directory.

Any suggestion?

> test-framework exits with SIGABRT
> -
>
> Key: MESOS-4730
> URL: https://issues.apache.org/jira/browse/MESOS-4730
> Project: Mesos
>  Issue Type: Bug
> Environment: OSX 10.11.3 (El Cap).
>Reporter: Neil Conway
>  Labels: mesosphere, test-framework
> Attachments: mesos-master-console-log.txt, mesos-slave-console-log.txt
>
>
> Steps to repro:
> 1. Build mesos from git
> 2. ./src/mesos-master.sh --registry=in_memory
> 3. ./src/mesos-slave.sh --master=X:5050
> 4. ./src/test-framework --master=X:5050
> Observed behavior:
> {noformat}
> $ ./src/test-framework --master=10.0.0.11:5050
> I0221 16:55:39.760979 1933606912 sched.cpp:222] Version: 0.28.0
> I0221 16:55:39.768154 2674688 sched.cpp:326] New master detected at 
> master@10.0.0.11:5050
> I0221 16:55:39.768378 2674688 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0221 16:55:39.769650 2674688 sched.cpp:703] Framework registered with 
> 227af8fe-56b7-4853-bc65-4076bd7be95d-
> Registered!
> Received offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0 with cpus(*):8; 
> mem(*):15360; disk(*):233112; ports(*):[31000-32000]
> Launching task 0 using offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0
> Launching task 1 using offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0
> Launching task 2 using offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0
> Launching task 3 using offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0
> Launching task 4 using offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O0
> Received offer 227af8fe-56b7-4853-bc65-4076bd7be95d-O1 with cpus(*):3; 
> mem(*):14720; disk(*):233112; ports(*):[31000-32000]
> Task 0 is in state TASK_FAILED
> Aborting because task 0 is in unexpected state TASK_FAILED with reason 1 from 
> source 1 with message 'Executor terminated'
> I0221 16:55:49.532826 528384 sched.cpp:1937] Asked to abort the driver
> I0221 16:55:49.532888 528384 sched.cpp:1173] Aborting framework 
> '227af8fe-56b7-4853-bc65-4076bd7be95d-'
> I0221 16:55:49.533144 1933606912 sched.cpp:1903] Asked to stop the driver
> {noformat}
> Content of agent's stderr file for the crashed executor:
> {noformat}
> ABORT: (../../../mesos/3rdparty/libprocess/src/subprocess.cpp:322): Failed to 
> os::execvpe on path '/usr/local/libexec/mesos/mesos-containerizer': No such 
> file or directory
> *** Aborted at 1456102539 (unix time) try "date -d @1456102539" if you are 
> using GNU date ***
> PC: @ 0x7fff9666f002 __pthread_kill
> *** SIGABRT (@0x7fff9666f002) received by PID 2637 (TID 0x70104000) stack 
> trace: ***
> @ 0x7fff926e6eaa _sigtramp
> @ 0x700ff7e0 (unknown)
> @ 0x7fff867d96e7 abort
> @0x10e087010 _Abort()
> @0x10e086e5b _Abort()
> @0x1109a3938 process::childMain()
> @0x1109b3269 
> _ZNSt3__128__invoke_void_return_wrapperIiE6__callIJRNS_6__bindIPFiRKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcPPcSD_RK6OptionINS_8functionIFivRKN7process10Subprocess2IO20InputFileDescriptorsERKNSN_21OutputFileDescriptorsEST_bPiEJSB_RSD_SX_SK_RSO_RSR_SZ_RKbRA2_iEEiDpOT_
> @0x1109b267c 
> _ZNSt3__110__function6__funcINS_6__bindIPFiRKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcPPcSC_RK6OptionINS_8functionIFivRKN7process10Subprocess2IO20InputFileDescriptorsERKNSM_21OutputFileDescriptorsESS_bPiEJSA_RSC_SW_SJ_RSN_RSQ_SY_RKbRA2_iEEENS6_IS13_EESF_EclEv
> @0x10f815721 std::__1::function<>::operator()()
> @0x1109a2f33 process::defaultClone()
> @0x1109b0fdd 
> _ZNSt3__128__invoke_void_return_wrapperIiE6__callIJRPFiRKNS_8functionIFivS7_EEEiDpOT_
> @0x1109b0e6c std::__1::__function::__func<>::operator()()
> @0x1109ad3b7 std::__1::function<>::operator()()
> @0x1109a0fd0 process::subprocess()
> @0x10f814653 mesos::internal::slave::PosixLauncher::fork()
> @0x10f658c7d 
> mesos::internal::slave::MesosContainerizerProcess::__launch()::$_5::operator()()
> @0x10f65d289 
> _ZZZNK7process9_DeferredIZN5mesos8internal5slave25MesosContainerizerProcess8__launchERKNS1_11ContainerIDERKNS1_12ExecutorInfoERKNSt3__112basic_stringIcNSB_11char_traitsIcEENSB_9allocatorIcRK6OptionISH_ERKNS1_7SlaveIDERKNS_3PIDINS3_5SlaveEEEbRKNSB_4listISK_INS1_5slave19ContainerLaunchInfoEENSF_ISZ_E3$_5EcvNSB_8functionIFT_T0_EEEINS_6FutureIbEERKNSX_15ContainerLogger14Subpro

[jira] [Commented] (MESOS-4802) Update leveldb patch file to suport PowerPC LE

2016-03-04 Thread Chen Zhiwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179628#comment-15179628
 ] 

Chen Zhiwei commented on MESOS-4802:


https://reviews.apache.org/r/44382/

> Update leveldb patch file to suport PowerPC LE
> --
>
> Key: MESOS-4802
> URL: https://issues.apache.org/jira/browse/MESOS-4802
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Chen Zhiwei
>
> See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
> bug fixes.
> The motivation is that leveldb 1.18 has officially supported IBM Power 
> (ppc64le), so this is needed by 
> [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].
> Update: Since someone updated leveldb to 1.4, so I only update the patch file 
> to support PowerPC LE. Because I don't think upgrade 3rdparty library 
> frequently is a good thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4802) Update leveldb patch file to suport PowerPC LE

2016-03-04 Thread Chen Zhiwei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhiwei updated MESOS-4802:
---
Description: 
See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
bug fixes.
The motivation is that leveldb 1.18 has officially supported IBM Power 
(ppc64le), so this is needed by 
[MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].

Update: Since someone updated leveldb to 1.4, so I only update the patch file 
to support PowerPC LE. Because I don't think upgrade 3rdparty library 
frequently is a good thing.

  was:
See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
bug fixes.
The motivation is that leveldb 1.18 has officially supported IBM Power 
(ppc64le), so this is needed by 
[MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].


> Update leveldb patch file to suport PowerPC LE
> --
>
> Key: MESOS-4802
> URL: https://issues.apache.org/jira/browse/MESOS-4802
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Chen Zhiwei
>
> See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
> bug fixes.
> The motivation is that leveldb 1.18 has officially supported IBM Power 
> (ppc64le), so this is needed by 
> [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].
> Update: Since someone updated leveldb to 1.4, so I only update the patch file 
> to support PowerPC LE. Because I don't think upgrade 3rdparty library 
> frequently is a good thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4803) Update vendored libev to 4.22

2016-03-04 Thread Chen Zhiwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179629#comment-15179629
 ] 

Chen Zhiwei commented on MESOS-4803:


https://reviews.apache.org/r/44378/

> Update vendored libev to 4.22
> -
>
> Key: MESOS-4803
> URL: https://issues.apache.org/jira/browse/MESOS-4803
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Chen Zhiwei
>
> The motivation is that libev 4.22 has officially supported IBM Power 
> (ppc64le), so this is needed by 
> [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4802) Update leveldb patch file to suport PowerPC LE

2016-03-04 Thread Chen Zhiwei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhiwei updated MESOS-4802:
---
Summary: Update leveldb patch file to suport PowerPC LE  (was: Update 
vendored leveldb to 1.18)

> Update leveldb patch file to suport PowerPC LE
> --
>
> Key: MESOS-4802
> URL: https://issues.apache.org/jira/browse/MESOS-4802
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Chen Zhiwei
>
> See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
> bug fixes.
> The motivation is that leveldb 1.18 has officially supported IBM Power 
> (ppc64le), so this is needed by 
> [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4678) Upgrade vendored Protobuf

2016-03-04 Thread Chen Zhiwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179627#comment-15179627
 ] 

Chen Zhiwei commented on MESOS-4678:


Thanks, I think I need to create a PR on 
https://github.com/3rdparty/mesos-3rdparty

> Upgrade vendored Protobuf
> -
>
> Key: MESOS-4678
> URL: https://issues.apache.org/jira/browse/MESOS-4678
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Chen Zhiwei
>  Labels: 3rdParty, mesosphere, protobuf, tech-debt
>
> We currently vendor Protobuf 2.5.0. We should upgrade to Protobuf 2.6.1. This 
> introduces various bugfixes, performance improvements, and at least one new 
> feature we might want to eventually take advantage of ({{map}} data type). 
> AFAIK there should be no backward compatibility concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-03-04 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4390:
--
Shepherd: Yan Xu  (was: Adam B)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3421) Support sharing of resources across task instances

2016-03-04 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3421:
--
Shepherd: Yan Xu  (was: Adam B)

> Support sharing of resources across task instances
> --
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, volumes
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.
> Based on discussion within the community, we would allow sharing of resources 
> in general, and add support to enable shareability for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3421) Support sharing of resources across task instances

2016-03-04 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179595#comment-15179595
 ] 

Adam B commented on MESOS-3421:
---

Sounds good. I've gotten busy with other work lately. I'll reassign 
shepherdship to you. Interested to see what you come up with.

> Support sharing of resources across task instances
> --
>
> Key: MESOS-3421
> URL: https://issues.apache.org/jira/browse/MESOS-3421
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, volumes
>Affects Versions: 0.23.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> A service that needs persistent volume needs to have access to the same 
> persistent volume (RW) from multiple task(s) instances on the same agent 
> node. Currently, a persistent volume once offered to the framework(s) can be 
> scheduled to a task and until that tasks terminates, that persistent volume 
> cannot be used by another task.
> Explore providing the capability of sharing persistent volumes across task 
> instances scheduled on a single agent node.
> Based on discussion within the community, we would allow sharing of resources 
> in general, and add support to enable shareability for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)