(please keep Alexander in the loop)

On 27/05/2021 10:34, Loris Bennett wrote:
Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

On 5/27/21 9:48 AM, Alexander Grund wrote:

The EB log file reports an error:

//tensorflow/core/common_runtime:graph_constructor_test FAILED TO BUILD

and the log file ends with:

Executed 137 out of 814 tests: 137 tests pass, 1 fails to build and 676 were
skipped.
FAILED: Build did NOT complete successfully
This is a build failure, so something we should fix or at least find the
cause.
Please check the log, there should be something about why/how it failed to
compile. Just search for the name and scroll a bit around. If you attach it, I
can also take a look.

The EB log file is 205 MB, so it's hard to share :-(

I have this environment:

export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build
ulimit -s 2000240
export EASYBUILD_TMPDIR=/scratch/$USER

and there is quite a bit of space available:

$ df -h /run/user/$UID/eb_build /scratch
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                               19G   19G   30M 100% /run/user/983
/dev/mapper/VolGroup00-lv_scratch  850G  675M  849G   1% /scratch

...

/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal error: 
bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No 
space left on device

What device might that be?  As shown above, I have quite a bit of disk space.
Is /tmp being used and getting full?

This might be the case.  In the past I ran into this problem and solved
it with the following:

   eb TensorFlow-1.15.0-fosscuda-2019b-Python-3.7.4.eb --robot 
--cuda-compute-capabilities=6.1,7.5 --buildpath=/dev/shm 
--tmpdir=/scratch/eb-build

Hmm, this surprises me a bit, because I think we make an effort to avoid that Bazel is using /tmp for too many things, and we tell it to use the build directory instead...

Please try using --tmpdir to specify an alternate directory than /tmp, and see if that helps at all.

Alexandre: should we look for patterns like "No space left on device" in the Bazel output and highlight them better, perhaps with a concrete suggestion to use --tmpdir to avoid the usage of /tmp?


regards,

Kenneth



YMMV

Cheers,

Loris

I'd also suggest to join Slack as discussions there are potentially faster.

I'll take a look - are there instructions for Slack?

Thanks,
Ole

Reply via email to