[jira] [Assigned] (MESOS-5255) Add GPUs to container resource consumption metrics.

2017-05-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao reassigned MESOS-5255:
--

Assignee: Chun-Hung Hsiao

> Add GPUs to container resource consumption metrics.
> ---
>
> Key: MESOS-5255
> URL: https://issues.apache.org/jira/browse/MESOS-5255
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Chun-Hung Hsiao
>  Labels: gpu
>
> Currently the usage callback in the Nvidia GPU isolator is unimplemented:
> {noformat}
> src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp
> {noformat}
> It should use functionality from NVML to gather the current GPU usage and add 
> it to a ResourceStatistics object. It is still an open question as to exactly 
> what information we want to expose here (power, memory consumption, current 
> load, etc.). Whatever we decide on should be standard across different GPU 
> types, different GPU vendors, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7080) Expose GPU hardware information to schedulers.

2017-05-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao reassigned MESOS-7080:
--

Assignee: (was: Chun-Hung Hsiao)

> Expose GPU hardware information to schedulers.
> --
>
> Key: MESOS-7080
> URL: https://issues.apache.org/jira/browse/MESOS-7080
> Project: Mesos
>  Issue Type: Improvement
>  Components: gpu
>Reporter: Benjamin Mahler
>
> GPU hardware has many attributes that may impose scheduling constraints (e.g. 
> core count, total memory, topology (via PCI-E, NVLINK, etc), driver versions, 
> etc). Tasks may require a particular card type to run correctly.
> With respect to topology, tasks that require access to more than 1 GPU may 
> need to be placed onto NVLink connected GPUs for acceptable performance 
> compared to having to communicate over PCI or QPI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5255) Add GPUs to container resource consumption metrics.

2017-05-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao reassigned MESOS-5255:
--

Assignee: (was: Chun-Hung Hsiao)

> Add GPUs to container resource consumption metrics.
> ---
>
> Key: MESOS-5255
> URL: https://issues.apache.org/jira/browse/MESOS-5255
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>  Labels: gpu
>
> Currently the usage callback in the Nvidia GPU isolator is unimplemented:
> {noformat}
> src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp
> {noformat}
> It should use functionality from NVML to gather the current GPU usage and add 
> it to a ResourceStatistics object. It is still an open question as to exactly 
> what information we want to expose here (power, memory consumption, current 
> load, etc.). Whatever we decide on should be standard across different GPU 
> types, different GPU vendors, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7015) Frameworks should be able to (re)register in suppressed state

2017-05-04 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997669#comment-15997669
 ] 

Anindya Sinha commented on MESOS-7015:
--

Updated RRs:
https://reviews.apache.org/r/57815/
https://reviews.apache.org/r/57817/
https://reviews.apache.org/r/57818/

> Frameworks should be able to (re)register in suppressed state
> -
>
> Key: MESOS-7015
> URL: https://issues.apache.org/jira/browse/MESOS-7015
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation, framework
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>
> We should consider allowing frameworks to specify their "suppressed mode" 
> when they register or re-register with the mesos master.
> This should help to keep traffic and the load on the cluster low especially 
> when there are high number of frameworks and/or agents in the cluster during 
> failovers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7015) Frameworks should be able to (re)register in suppressed state

2017-05-04 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997668#comment-15997668
 ] 

Anindya Sinha commented on MESOS-7015:
--

Based on discussion in the Slack channel, this modified review chain does the 
following:

1. Add {{repeated string deactivated_roles}} to {{FrameworkInfo}} which 
represents a subset of roles which are deactivated. Offers pertaining to the 
deactivated roles shall not be sent out.
2. {{SUPPRESS}} and {{REVIVE}} calls will toggle the (de)activation mode of the 
roles.
3. Allocator's {{activateFramework()}} call will activate all roles of the 
framework which are not in {{FrameworkInfo::deactivated_roles}}. Similarly, in 
{{deactivateFramework()}} call will deactivate all roles that are not 
deactivated already.

> Frameworks should be able to (re)register in suppressed state
> -
>
> Key: MESOS-7015
> URL: https://issues.apache.org/jira/browse/MESOS-7015
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation, framework
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>
> We should consider allowing frameworks to specify their "suppressed mode" 
> when they register or re-register with the mesos master.
> This should help to keep traffic and the load on the cluster low especially 
> when there are high number of frameworks and/or agents in the cluster during 
> failovers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7461) balloon test and disk full framework test relies on possibly unavailable ports

2017-05-04 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-7461:


 Summary: balloon test and disk full framework test relies on 
possibly unavailable ports
 Key: MESOS-7461
 URL: https://issues.apache.org/jira/browse/MESOS-7461
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Zhitao Li


balloon_framework_test.sh and disk_full_framework_test.sh all have code to 
directly listen at a {{5432}} port, but in our environment that port is 
directly reserved by something else.

A possible fix is to write some utility to try to find an unused port, and try 
to use it for the master. It's not perfect though as there could still be a 
race condition.

Another possible fix if to move listen "port" to a domain socket, when that's 
supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7080) Expose GPU hardware information to schedulers.

2017-05-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao reassigned MESOS-7080:
--

Assignee: Chun-Hung Hsiao

> Expose GPU hardware information to schedulers.
> --
>
> Key: MESOS-7080
> URL: https://issues.apache.org/jira/browse/MESOS-7080
> Project: Mesos
>  Issue Type: Improvement
>  Components: gpu
>Reporter: Benjamin Mahler
>Assignee: Chun-Hung Hsiao
>
> GPU hardware has many attributes that may impose scheduling constraints (e.g. 
> core count, total memory, topology (via PCI-E, NVLINK, etc), driver versions, 
> etc). Tasks may require a particular card type to run correctly.
> With respect to topology, tasks that require access to more than 1 GPU may 
> need to be placed onto NVLink connected GPUs for acceptable performance 
> compared to having to communicate over PCI or QPI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7460) UpdateFrameworkMessage may send a Framework role(s) change to a non-MULTI_ROLE agent.

2017-05-04 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7460:
--

 Summary: UpdateFrameworkMessage may send a Framework role(s) 
change to a non-MULTI_ROLE agent.
 Key: MESOS-7460
 URL: https://issues.apache.org/jira/browse/MESOS-7460
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Benjamin Mahler
Assignee: Michael Park
Priority: Blocker


When a framework is MULTI_ROLE capable, if the framework was previously running 
tasks on an old agent (non-MULTI_ROLE capable), the master *must* ensure the 
UpdateFramework message sent to this old agent preserves the framework's 
original role. Otherwise the agent will interpret the role to have changed, 
which can break things (e.g. not locate the reservations, volumes, etc).

In addition, a framework without MULTI_ROLE has the ability to change their 
role. We'll need to change this to ensure that the {{role}} field is immutable 
and frameworks need to use the {{roles}} field with the MULTI_ROLE capability 
if they want to change their role.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7459) Fix the duration.hpp warning

2017-05-04 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7459:
---

 Summary: Fix the duration.hpp warning
 Key: MESOS-7459
 URL: https://issues.apache.org/jira/browse/MESOS-7459
 Project: Mesos
  Issue Type: Bug
  Components: stout
 Environment: Windows
Reporter: Andrew Schwartzmeyer
Priority: Minor


When building Mesos on Windows, there are a pair of warnings from 
`duration.hpp` that are repeated hundreds (if not thousands) of times:

{{mesos\3rdparty\stout\include\stout/duration.hpp(102): warning C4244: '=': 
conversion from 'int64_t' to 'long', possible loss of data}}
and
{{mesos\3rdparty\stout\include\stout/duration.hpp(103): warning C4244: '=': 
conversion from 'int64_t' to 'long', possible loss of data}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7458) webui display of framework resources is confusing

2017-05-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-7458:
---
Attachment: Screen Shot 2017-05-04 at 11.15.25 AM.png
Screen Shot 2017-05-04 at 11.15.12 AM.png

First screenshot is framework's entry in list of frameworks; second screenshot 
is resource usage in per-framework page.

> webui display of framework resources is confusing
> -
>
> Key: MESOS-7458
> URL: https://issues.apache.org/jira/browse/MESOS-7458
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: Neil Conway
>Assignee: haosdent
>  Labels: mesosphere
> Attachments: Screen Shot 2017-05-04 at 11.15.12 AM.png, Screen Shot 
> 2017-05-04 at 11.15.25 AM.png
>
>
> In the webui, the list of frameworks displays the {{used_resources}} for each 
> framework. When you click on the framework to access the per-framework page, 
> the resources displayed are the *total* resources (the {{resources}} key in 
> state.json, which is {{used_resources}} + {{offered_resources}}). This is 
> confusing in situations when the offered resources are very different from 
> the used resources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7458) webui display of framework resources is confusing

2017-05-04 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7458:
--

 Summary: webui display of framework resources is confusing
 Key: MESOS-7458
 URL: https://issues.apache.org/jira/browse/MESOS-7458
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Neil Conway
Assignee: haosdent


In the webui, the list of frameworks displays the {{used_resources}} for each 
framework. When you click on the framework to access the per-framework page, 
the resources displayed are the *total* resources (the {{resources}} key in 
state.json, which is {{used_resources}} + {{offered_resources}}). This is 
confusing in situations when the offered resources are very different from the 
used resources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)