Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > On 5/27/21 9:48 AM, Alexander Grund wrote: >> >>>> The EB log file reports an error: >>>> >>>> //tensorflow/core/common_runtime:graph_constructor_test FAILED TO BUILD >>>> >>>> and the log file ends with: >>>> >>>> Executed 137 out of 814 tests: 137 tests pass, 1 fails to build and 676 >>>> were >>>> skipped. >>>> FAILED: Build did NOT complete successfully >> This is a build failure, so something we should fix or at least find the >> cause. >> Please check the log, there should be something about why/how it failed to >> compile. Just search for the name and scroll a bit around. If you attach it, >> I >> can also take a look. > > The EB log file is 205 MB, so it's hard to share :-( > > I have this environment: > > export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build > ulimit -s 2000240 > export EASYBUILD_TMPDIR=/scratch/$USER > > and there is quite a bit of space available: > > $ df -h /run/user/$UID/eb_build /scratch > Filesystem Size Used Avail Use% Mounted on > tmpfs 19G 19G 30M 100% /run/user/983 > /dev/mapper/VolGroup00-lv_scratch 850G 675M 849G 1% /scratch > > Searching for FAIL in the log file, I noticed this section: > > == 2021-05-26 15:20:28,456 tensorflow.py:899 INFO Starting cpu test > == 2021-05-26 15:20:28,457 run.py:225 INFO running cmd: bazel > --output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf > --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws > --config=nogcp --config=nohd > fs --compilation_mode=opt --config=opt --subcommands --verbose_failures > --jobs=64 --copt="-fPIC" > --action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore-10.2.0/include:/home/modu > les/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/home/modules/software/JsonC > pp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules/software/nsync/1.24.0-GCC > core-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/inclu > de:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include' > --action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/double-conversion/3.1.5-GCCco > re-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2.0/lib:/home/modules/softwa > re/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/software/nsync/1.24.0-GCCcore-1 > 0.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/lib:/home/modules/software/ > SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib' > --action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1 > --distinct_host_configuration=false --config=mkl --test_output=errors > --build_tests_only --local_test_jobs=64 - > -test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only' > --build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only' > --test_env=CUDA_VISIBLE_DEVICES='-1' --test_timeo > ut=3600 --test_size_filters=small -- > //tensorflow/core/... -//tensorflow/core:example_java_proto > -//tensorflow/core/example:example_protos_closure > //tensorflow/cc/... //tensorflow/c/... //tensorflow/python/... > -//tensorflow/core/profiler/internal/gpu:devi > ce_tracer_test -//tensorflow/c/eager:c_api_test_gpu > -//tensorflow/c/eager:c_api_distributed_test > -//tensorflow/c/eager:c_api_distributed_test_gpu > -//tensorflow/c/eager:c_api_cluster_test_gpu > -//tensorflow/c/eager:c_api_remote_function_test_gpu -//tensorfl > ow/c/eager:c_api_remote_test_gpu > -//tensorflow/core/kernels:sparse_matmul_op_test > -//tensorflow/core/kernels:sparse_matmul_op_test_gpu > -//tensorflow/core/common_runtime:collective_param_resolver_local_test > -//tensorflow/core/common_runtime:mkl_layout_pass > _test -//tensorflow/core/kernels/mkl:mkl_fused_ops_test > == 2021-05-26 15:30:49,144 run.py:595 INFO parse_log_for_error msg: Command > used: bazel > --output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf > --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws -- > config=nogcp --config=nohdfs --compilation_mode=opt --config=opt --subcommands > --verbose_failures --jobs=64 --copt="-fPIC" > --action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore > -10.2.0/include:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/h > ome/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules > /software/nsync/1.24.0-GCCcore-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/ > 1.1.8-GCCcore-10.2.0/include:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include' > --action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/dou > ble-conversion/3.1.5-GCCcore-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2. > 0/lib:/home/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/softw > are/nsync/1.24.0-GCCcore-10.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/l > ib:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib' > --action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1 > --distinct_host_configuration=false --config=mkl --test_output=errors > --build_tests_o > nly --local_test_jobs=64 > --test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only' > --build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only' > --test_env=CUDA_VISIBLE > _DEVICES='-1' --test_timeout=3600 --test_size_filters=small -- > //tensorflow/core/... -//tensorflow/core:example_java_proto > -//tensorflow/core/example:example_protos_closure > //tensorflow/cc/... //tensorflow/c/... //tensorflow/python/... > -//tensorflow/core/ > profiler/internal/gpu:device_tracer_test -//tensorflow/c/eager:c_api_test_gpu > -//tensorflow/c/eager:c_api_distributed_test > -//tensorflow/c/eager:c_api_distributed_test_gpu > -//tensorflow/c/eager:c_api_cluster_test_gpu > -//tensorflow/c/eager:c_api_remote_fun > ction_test_gpu -//tensorflow/c/eager:c_api_remote_test_gpu > -//tensorflow/core/kernels:sparse_matmul_op_test > -//tensorflow/core/kernels:sparse_matmul_op_test_gpu > -//tensorflow/core/common_runtime:collective_param_resolver_local_test > -//tensorflow/core/comm > on_runtime:mkl_layout_pass_test > -//tensorflow/core/kernels/mkl:mkl_fused_ops_test > == 2021-05-26 15:30:49,145 run.py:597 INFO parse_log_for_error (some may be > harmless) regExp (?<![(,-]|\w)(?:error|segmentation > fault|failed)(?![(,-]|\.?\w) > found: > WARNING: Download from > https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz > failed: class > com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep > tion GET returned 404 Not Found > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking > tensorflow/core/platform/liberror.so', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling > tensorflow/core/platform/error.cc', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > > external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc > -MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d > '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o' > -DEIGEN_MPL2_O > NLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' > -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin > -iquote external/eigen_archive -iquote > bazel-out/k8-opt/bin/external/eigen_archive -iquote external/com_google_absl > -iqu > ote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync > -iquote > bazel-out/k8-opt/bin/external/nsync -iquote external/double_conversion -iquote > bazel-out/k8-opt/bin/external/double_conversion -iquote > external/com_google_protobuf -iquote ba > zel-out/k8-opt/bin/external/com_google_protobuf -isystem > third_party/eigen3/mkl_include -isystem > bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem > external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive > -Wno-builtin-macro-re > defined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' > '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' > -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes > -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunc > tion-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 > -ftree-vectorize > '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c > tensorflow/core/platform/error.cc -o > bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o) > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking > tensorflow/core/platform/liberror.a', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > ERROR: > /run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11: > Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test' > failed > (Exit 1): crosstool_wrapper_driver_is_ > not_gcc failed: error executing command > /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal error: > bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No > space left on device > collect2: error: ld returned 1 exit status > FAILED: Build did NOT complete successfully > //tensorflow/core/common_runtime:graph_constructor_test FAILED TO > BUILD > FAILED: Build did NOT complete successfully > == 2021-05-26 15:30:49,145 run.py:554 WARNING Found 11 errors in command > output > (output: WARNING: Download from > https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz > failed: class > com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException > GET returned 404 Not Found > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking > tensorflow/core/platform/liberror.so', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling > tensorflow/core/platform/error.cc', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > > external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc > -MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d > '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o' > -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' > -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin > -iquote external/eigen_archive -iquote > bazel-out/k8-opt/bin/external/eigen_archive -iquote external/com_google_absl > -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync > -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/double_conversion > -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote > external/com_google_protobuf -iquote > bazel-out/k8-opt/bin/external/com_google_protobuf -isystem > third_party/eigen3/mkl_include -isystem > bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem > external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive > -Wno-builtin-macro-redefined '-D__DATE__="redacted"' > '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE > '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer > -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 > -ffunction-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 > -ftree-vectorize '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c > tensorflow/core/platform/error.cc -o > bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o) > SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking > tensorflow/core/platform/liberror.a', configuration: > f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution > platform: @local_execution_config_platform//:platform] > ERROR: > /run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11: > Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test' > failed > (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command > /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal > error: > bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No > space left on device > collect2: error: ld returned 1 exit status > FAILED: Build did NOT complete successfully > //tensorflow/core/common_runtime:graph_constructor_test FAILED TO > BUILD > FAILED: Build did NOT complete successfully) > > > Please note these two errors: > >> WARNING: Download from > https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz > failed: class > com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep >> tion GET returned 404 Not Found > > Is the URL outdated? > >> /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal >> error: >> bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: >> No space left on device > > What device might that be? As shown above, I have quite a bit of disk space. > Is /tmp being used and getting full?
This might be the case. In the past I ran into this problem and solved it with the following: eb TensorFlow-1.15.0-fosscuda-2019b-Python-3.7.4.eb --robot --cuda-compute-capabilities=6.1,7.5 --buildpath=/dev/shm --tmpdir=/scratch/eb-build YMMV Cheers, Loris >> I'd also suggest to join Slack as discussions there are potentially faster. > > I'll take a look - are there instructions for Slack? > > Thanks, > Ole -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de