On 5/27/21 10:46 AM, Alexander Grund wrote:
/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal error: bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No space left on device

What device might that be?  As shown above, I have quite a bit of disk space.  Is /tmp being used and getting full?

 > export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build

 > tmpfs                               19G   19G   30M 100% /run/user/983

This clearly shows that your buildpath is full. So that is the issue. Try using another buildpath, Kenneth is right, we make sure Bazel doesn't use /tmp.

I have found out that /run/user/$UID defaults to 10% of the system RAM memory as defined in /etc/systemd/logind.conf (see man 5 logind.conf). This 10% value is 19 GB on my server. It seems to be prudent to use /dev/shm in stead:

export EASYBUILD_BUILDPATH=/dev/shm

While building TensorFlow the /dev/shm grows to a gigantic size:

# df -Ph /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            94G   46G   48G  50% /dev/shm

Unfortunately, the build still fails and I need to look for the source of errors in the logfile:

== installing extension TensorFlow 2.4.1 (28/28)...
==      configuring...
==      building...
==      testing...
== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/TensorFlow/2.4.1/fosscuda-2020b): build failed (first 300 chars): At least 2 gpu tests failed: //tensorflow/core/common_runtime/gpu:gpu_device_test, //tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu (took 55 min 27 sec) == Results of the build can be found in the log file(s) /scratch/modules/eb-3l5Ptk/easybuild-TensorFlow-2.4.1-20210527.114011.EmOkP.log ERROR: Build of /home/modules/software/EasyBuild/4.3.4/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb failed (err: 'build failed (first 300 chars): At least 2 gpu tests failed:\n//tensorflow/core/common_runtime/gpu:gpu_device_test, //tensorflow/core/common_runtime/gpu:gpu_device_unified_memory_test_gpu')


/Ole

Reply via email to