Hi all, 

did someone tested it on ubuntu 18.04 + nvidia-docker2 ? We are having some 
issues using the cuda 10+ images when doing real processing. We still need to 
check some things but basically we get: 
kernel version 418.56.0 does not match DSO version 410.48.0 -- cannot find 
working devices in this configuration

Logs:
I0424 13:27:14.000586    30 executor.cpp:726] Forked command at 73
Preparing rootfs at 
'/data0/mesos/work/provisioner/containers/548d3cae-30b5-4530-a8db-f94b00215718/backends/overlay/rootfses/e1ceb89e-3abc-4587-a87c-d63037b7ae8b'
Marked '/' as rslave
Executing pre-exec command 
'{"arguments":["ln","-s","/sys/fs/cgroup/cpu,cpuacct","/data0/mesos/work/provisioner/containers/548d3cae-30b5-4530-a8db-f94b00215718/backends/overlay/rootfses/e1ceb89e-3abc-4587-a87c-d63037b7ae8b/sys/fs/cgroup/cpuacct"],"shell":false,"value":"ln"}'
Executing pre-exec command 
'{"arguments":["ln","-s","/sys/fs/cgroup/cpu,cpuacct","/data0/mesos/work/provisioner/containers/548d3cae-30b5-4530-a8db-f94b00215718/backends/overlay/rootfses/e1ceb89e-3abc-4587-a87c-d63037b7ae8b/sys/fs/cgroup/cpu"],"shell":false,"value":"ln"}'
Changing root to 
/data0/mesos/work/provisioner/containers/548d3cae-30b5-4530-a8db-f94b00215718/backends/overlay/rootfses/e1ceb89e-3abc-4587-a87c-d63037b7ae8b
2019-04-24 13:27:18.346994: I 
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-24 13:27:18.352203: E 
tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: 
CUDA_ERROR_UNKNOWN: unknown error
2019-04-24 13:27:18.352243: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:161] retrieving CUDA 
diagnostic information for host: __host__
2019-04-24 13:27:18.352252: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:168] hostname: __host__
2019-04-24 13:27:18.352295: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:192] libcuda reported 
version is: 410.48.0
2019-04-24 13:27:18.352329: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:196] kernel reported 
version is: 418.56.0
2019-04-24 13:27:18.352338: E 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:306] kernel version 
418.56.0 does not match DSO version 410.48.0 -- cannot find working devices in 
this configuration
2019-04-24 13:27:18.374940: I 
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 
2593920000 Hz
2019-04-24 13:27:18.378793: I tensorflow/compiler/xla/service/service.cc:150] 
XLA service 0x4f41e10 executing computations on platform Host. Devices:
2019-04-24 13:27:18.378821: I tensorflow/compiler/xla/service/service.cc:158]   
StreamExecutor device (0): <undefined>, <undefined>
W0424 13:27:18.385210 140191267731200 deprecation.py:323] From 
/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263:
 colocate_with (from tensorflow.python.framework.ops) is deprecated and will be 
removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0424 13:27:18.399287 140191267731200 deprecation.py:323] From 
/user/tf-benchmarks-113/scripts/tf_cnn_benchmarks/convnet_builder.py:129: 
conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be 
removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W0424 13:27:18.433226 140191267731200 deprecation.py:323] From 
/user/tf-benchmarks-113/scripts/tf_cnn_benchmarks/convnet_builder.py:261: 
max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be 
removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W0424 13:27:20.197937 140191267731200 deprecation.py:323] From 
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209:
 to_float (from tensorflow.python.ops.math_ops) is deprecated and will be 
removed in a future version.
Instructions for updating:
Use tf.cast instead.
W0424 13:27:20.312573 140191267731200 deprecation.py:323] From 
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: 
to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be 
removed in a future version.
Instructions for updating:
Use tf.cast instead.
W0424 13:27:21.082763 140191267731200 deprecation.py:323] From 
/user/tf-benchmarks-113/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: 
__init__ (from tensorflow.python.training.supervisor) is deprecated and will be 
removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
I0424 13:27:22.013817 140191267731200 session_manager.py:491] Running 
local_init_op.
I0424 13:27:22.193911 140191267731200 session_manager.py:493] Done running 
local_init_op.
2019-04-24 13:27:23.181740: E tensorflow/core/common_runtime/executor.cc:624] 
Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only 
supports NHWC on device type CPU
         [[{{node tower_0/v/cg/mpool0/MaxPool}}]]
I0424 13:27:23.262847 140191267731200 coordinator.py:224] Error reported to 
Coordinator: <class 
'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Default 
MaxPoolingOp only supports NHWC on device type CPU
         [[node tower_0/v/cg/mpool0/MaxPool (defined at 
/user/tf-benchmarks-113/scripts/tf_cnn_benchmarks/convnet_builder.py:261) ]
running this on nvidia-docker2 works fine. 
image used: tensorflow/tensorflow:latest-gpu
command:  python 
/user/tf-benchmarks-113/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py 
--num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server
on the host nvidia-smi says: NVIDIA-SMI 418.56       Driver Version: 418.56     
  CUDA Version: 10.1
thx
Jorge 
> On 26 Apr 2019, at 18:28, Benno Evers <bev...@mesosphere.com> wrote:
> 
> Hi all,
> 
> Please vote on releasing the following candidate as Apache Mesos 1.8.0.
> 
> 
> 1.8.0 includes the following:
> --------------------------------------------------------------------------------
> * Greatly reduced allocator cycle time.
> * Operation feedback for v1 schedulers.
> * Per-framework minimum allocatable resources.
> * New CLI subcommands `task attach` and `task exec`.
> * New `linux/seccomp` isolator.
> * Support for Docker v2 Schema2 manifest format.
> * XFS quota for persistent volumes.
> * **Experimental** Support for the new CSI v1 API.
> 
> In addition, 1.8.0-rc2 includes the following changes:
> ---------------------------------------------------------------------------------
> * Docker manifest v2s2 config with image GC.
> * Expanded `highlights` section in the CHANGELOG.
> 
> In addition, 1.8.0-rc3 includes the following changes:
> ---------------------------------------------------------------------------------
> * Relaxed protobuf union validation strictness. (MESOS-9740)
> * Fixed a bug causing non-uniform random results in the random sorter.
> (MESOS-9733)
> 
> 
> The CHANGELOG for the release is available at:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.8.0-rc3
> --------------------------------------------------------------------------------
> 
> The candidate for Mesos 1.8.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc3/mesos-1.8.0.tar.gz
> 
> The tag to be voted on is 1.8.0-rc3:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.8.0-rc3
> 
> The SHA512 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc3/mesos-1.8.0.tar.gz.sha512
> 
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc3/mesos-1.8.0.tar.gz.asc
> 
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
> 
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1253
> 
> Please vote on releasing this package as Apache Mesos 1.8.0!
> 
> The vote is open until  and passes if a majority of at least 3 +1 PMC votes
> are cast.
> 
> [ ] +1 Release this package as Apache Mesos 1.8.0
> [ ] -1 Do not release this package because ...
> 
> Thanks,
> Benno and Joseph

Reply via email to