----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48364/#review137590 -----------------------------------------------------------
Fix it, then Ship it! Nice to see the build time depedency eliminated! Kevin and I went over this and made some further adjustments (mostly inside nvml::initialize()). src/Makefile.am (line 1265) <https://reviews.apache.org/r/48364/#comment202781> Per our chat, we should remove this? src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (lines 125 - 144) <https://reviews.apache.org/r/48364/#comment202778> We should add some context to each of these error messages. src/slave/containerizer/mesos/isolators/gpu/nvml.hpp (line 42) <https://reviews.apache.org/r/48364/#comment202779> Should pull in stout/nothing.hpp src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 84 - 90) <https://reviews.apache.org/r/48364/#comment203027> We should include some context in each of these errors, e.g.: ``` *error = Error("Failed to load symbol 'nvmlInit_v2': " + symbol.error()); ``` Unfortunately `DynamicLibrary` is not following our error message composition convention and is already including caller information :( I'd still like to include context here for when we fix `DynamicLibrary` to not log caller-available information. src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 126 - 128) <https://reviews.apache.org/r/48364/#comment203031> I guess this should say v2 now? src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 185 - 186) <https://reviews.apache.org/r/48364/#comment203018> We should include glog to obtain CHECK src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (line 216) <https://reviews.apache.org/r/48364/#comment203019> To avoid double logging we should omit the caller available information here (the index). - Benjamin Mahler On June 11, 2016, 3:16 a.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48364/ > ----------------------------------------------------------- > > (Updated June 11, 2016, 3:16 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-5550 > https://issues.apache.org/jira/browse/MESOS-5550 > > > Repository: mesos > > > Description > ------- > > We now use a singleton class called `NvidiaManagementLibrary` that > loads `libnvidia-ml` at runtime once it is initialized. By loading > this library dynamically, `libmesos` no longer has a hard dependence > on it, so it doesn't have to be installed on every machine where mesos > is deployed. > > This was a problem previously, whereby the master and agents that > didn't even have GPUs would unnecessarily need to have `libnvidia-ml` > installed on their systems. This library is not easily installable > (it's not bundled in standard apt-get or yum repositories), so this > was a major inconvenience. > > > Diffs > ----- > > configure.ac e344c56e1be5e232ee331c933b8c04c4c2e55d1e > src/Makefile.am b656702d918e747cbd4b3d8f2c4257f61c83b385 > src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp > 181a2aad97da9ee0f6ffa42cdba9c93dc0077ff7 > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp > d7557a0c338e8c0e51461b2326600c03f89c2e8b > src/slave/containerizer/mesos/isolators/gpu/nvml.hpp PRE-CREATION > src/slave/containerizer/mesos/isolators/gpu/nvml.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/48364/diff/ > > > Testing > ------- > > GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests > > > Thanks, > > Kevin Klues > >