[jira] [Assigned] (MESOS-5255) Add GPUs to container resource consumption metrics.
[ https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-5255: -- Assignee: Chun-Hung Hsiao > Add GPUs to container resource consumption metrics. > --- > > Key: MESOS-5255 > URL: https://issues.apache.org/jira/browse/MESOS-5255 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Chun-Hung Hsiao > Labels: gpu > > Currently the usage callback in the Nvidia GPU isolator is unimplemented: > {noformat} > src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp > {noformat} > It should use functionality from NVML to gather the current GPU usage and add > it to a ResourceStatistics object. It is still an open question as to exactly > what information we want to expose here (power, memory consumption, current > load, etc.). Whatever we decide on should be standard across different GPU > types, different GPU vendors, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7080) Expose GPU hardware information to schedulers.
[ https://issues.apache.org/jira/browse/MESOS-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-7080: -- Assignee: (was: Chun-Hung Hsiao) > Expose GPU hardware information to schedulers. > -- > > Key: MESOS-7080 > URL: https://issues.apache.org/jira/browse/MESOS-7080 > Project: Mesos > Issue Type: Improvement > Components: gpu >Reporter: Benjamin Mahler > > GPU hardware has many attributes that may impose scheduling constraints (e.g. > core count, total memory, topology (via PCI-E, NVLINK, etc), driver versions, > etc). Tasks may require a particular card type to run correctly. > With respect to topology, tasks that require access to more than 1 GPU may > need to be placed onto NVLink connected GPUs for acceptable performance > compared to having to communicate over PCI or QPI. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-5255) Add GPUs to container resource consumption metrics.
[ https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-5255: -- Assignee: (was: Chun-Hung Hsiao) > Add GPUs to container resource consumption metrics. > --- > > Key: MESOS-5255 > URL: https://issues.apache.org/jira/browse/MESOS-5255 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: gpu > > Currently the usage callback in the Nvidia GPU isolator is unimplemented: > {noformat} > src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp > {noformat} > It should use functionality from NVML to gather the current GPU usage and add > it to a ResourceStatistics object. It is still an open question as to exactly > what information we want to expose here (power, memory consumption, current > load, etc.). Whatever we decide on should be standard across different GPU > types, different GPU vendors, etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7015) Frameworks should be able to (re)register in suppressed state
[ https://issues.apache.org/jira/browse/MESOS-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997669#comment-15997669 ] Anindya Sinha commented on MESOS-7015: -- Updated RRs: https://reviews.apache.org/r/57815/ https://reviews.apache.org/r/57817/ https://reviews.apache.org/r/57818/ > Frameworks should be able to (re)register in suppressed state > - > > Key: MESOS-7015 > URL: https://issues.apache.org/jira/browse/MESOS-7015 > Project: Mesos > Issue Type: Improvement > Components: allocation, framework >Reporter: Anindya Sinha >Assignee: Anindya Sinha > > We should consider allowing frameworks to specify their "suppressed mode" > when they register or re-register with the mesos master. > This should help to keep traffic and the load on the cluster low especially > when there are high number of frameworks and/or agents in the cluster during > failovers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7015) Frameworks should be able to (re)register in suppressed state
[ https://issues.apache.org/jira/browse/MESOS-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997668#comment-15997668 ] Anindya Sinha commented on MESOS-7015: -- Based on discussion in the Slack channel, this modified review chain does the following: 1. Add {{repeated string deactivated_roles}} to {{FrameworkInfo}} which represents a subset of roles which are deactivated. Offers pertaining to the deactivated roles shall not be sent out. 2. {{SUPPRESS}} and {{REVIVE}} calls will toggle the (de)activation mode of the roles. 3. Allocator's {{activateFramework()}} call will activate all roles of the framework which are not in {{FrameworkInfo::deactivated_roles}}. Similarly, in {{deactivateFramework()}} call will deactivate all roles that are not deactivated already. > Frameworks should be able to (re)register in suppressed state > - > > Key: MESOS-7015 > URL: https://issues.apache.org/jira/browse/MESOS-7015 > Project: Mesos > Issue Type: Improvement > Components: allocation, framework >Reporter: Anindya Sinha >Assignee: Anindya Sinha > > We should consider allowing frameworks to specify their "suppressed mode" > when they register or re-register with the mesos master. > This should help to keep traffic and the load on the cluster low especially > when there are high number of frameworks and/or agents in the cluster during > failovers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7461) balloon test and disk full framework test relies on possibly unavailable ports
Zhitao Li created MESOS-7461: Summary: balloon test and disk full framework test relies on possibly unavailable ports Key: MESOS-7461 URL: https://issues.apache.org/jira/browse/MESOS-7461 Project: Mesos Issue Type: Bug Components: test Reporter: Zhitao Li balloon_framework_test.sh and disk_full_framework_test.sh all have code to directly listen at a {{5432}} port, but in our environment that port is directly reserved by something else. A possible fix is to write some utility to try to find an unused port, and try to use it for the master. It's not perfect though as there could still be a race condition. Another possible fix if to move listen "port" to a domain socket, when that's supported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7080) Expose GPU hardware information to schedulers.
[ https://issues.apache.org/jira/browse/MESOS-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-7080: -- Assignee: Chun-Hung Hsiao > Expose GPU hardware information to schedulers. > -- > > Key: MESOS-7080 > URL: https://issues.apache.org/jira/browse/MESOS-7080 > Project: Mesos > Issue Type: Improvement > Components: gpu >Reporter: Benjamin Mahler >Assignee: Chun-Hung Hsiao > > GPU hardware has many attributes that may impose scheduling constraints (e.g. > core count, total memory, topology (via PCI-E, NVLINK, etc), driver versions, > etc). Tasks may require a particular card type to run correctly. > With respect to topology, tasks that require access to more than 1 GPU may > need to be placed onto NVLink connected GPUs for acceptable performance > compared to having to communicate over PCI or QPI. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7460) UpdateFrameworkMessage may send a Framework role(s) change to a non-MULTI_ROLE agent.
Benjamin Mahler created MESOS-7460: -- Summary: UpdateFrameworkMessage may send a Framework role(s) change to a non-MULTI_ROLE agent. Key: MESOS-7460 URL: https://issues.apache.org/jira/browse/MESOS-7460 Project: Mesos Issue Type: Bug Components: master Reporter: Benjamin Mahler Assignee: Michael Park Priority: Blocker When a framework is MULTI_ROLE capable, if the framework was previously running tasks on an old agent (non-MULTI_ROLE capable), the master *must* ensure the UpdateFramework message sent to this old agent preserves the framework's original role. Otherwise the agent will interpret the role to have changed, which can break things (e.g. not locate the reservations, volumes, etc). In addition, a framework without MULTI_ROLE has the ability to change their role. We'll need to change this to ensure that the {{role}} field is immutable and frameworks need to use the {{roles}} field with the MULTI_ROLE capability if they want to change their role. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7459) Fix the duration.hpp warning
Andrew Schwartzmeyer created MESOS-7459: --- Summary: Fix the duration.hpp warning Key: MESOS-7459 URL: https://issues.apache.org/jira/browse/MESOS-7459 Project: Mesos Issue Type: Bug Components: stout Environment: Windows Reporter: Andrew Schwartzmeyer Priority: Minor When building Mesos on Windows, there are a pair of warnings from `duration.hpp` that are repeated hundreds (if not thousands) of times: {{mesos\3rdparty\stout\include\stout/duration.hpp(102): warning C4244: '=': conversion from 'int64_t' to 'long', possible loss of data}} and {{mesos\3rdparty\stout\include\stout/duration.hpp(103): warning C4244: '=': conversion from 'int64_t' to 'long', possible loss of data}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7458) webui display of framework resources is confusing
[ https://issues.apache.org/jira/browse/MESOS-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7458: --- Attachment: Screen Shot 2017-05-04 at 11.15.25 AM.png Screen Shot 2017-05-04 at 11.15.12 AM.png First screenshot is framework's entry in list of frameworks; second screenshot is resource usage in per-framework page. > webui display of framework resources is confusing > - > > Key: MESOS-7458 > URL: https://issues.apache.org/jira/browse/MESOS-7458 > Project: Mesos > Issue Type: Bug > Components: webui >Reporter: Neil Conway >Assignee: haosdent > Labels: mesosphere > Attachments: Screen Shot 2017-05-04 at 11.15.12 AM.png, Screen Shot > 2017-05-04 at 11.15.25 AM.png > > > In the webui, the list of frameworks displays the {{used_resources}} for each > framework. When you click on the framework to access the per-framework page, > the resources displayed are the *total* resources (the {{resources}} key in > state.json, which is {{used_resources}} + {{offered_resources}}). This is > confusing in situations when the offered resources are very different from > the used resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7458) webui display of framework resources is confusing
Neil Conway created MESOS-7458: -- Summary: webui display of framework resources is confusing Key: MESOS-7458 URL: https://issues.apache.org/jira/browse/MESOS-7458 Project: Mesos Issue Type: Bug Components: webui Reporter: Neil Conway Assignee: haosdent In the webui, the list of frameworks displays the {{used_resources}} for each framework. When you click on the framework to access the per-framework page, the resources displayed are the *total* resources (the {{resources}} key in state.json, which is {{used_resources}} + {{offered_resources}}). This is confusing in situations when the offered resources are very different from the used resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)