Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

> On 5/27/21 9:48 AM, Alexander Grund wrote:
>>
>>>> The EB log file reports an error:
>>>>
>>>> //tensorflow/core/common_runtime:graph_constructor_test FAILED TO BUILD
>>>>
>>>> and the log file ends with:
>>>>
>>>> Executed 137 out of 814 tests: 137 tests pass, 1 fails to build and 676 
>>>> were
>>>> skipped.
>>>> FAILED: Build did NOT complete successfully
>> This is a build failure, so something we should fix or at least find the
>> cause.
>> Please check the log, there should be something about why/how it failed to
>> compile. Just search for the name and scroll a bit around. If you attach it, 
>> I
>> can also take a look.
>
> The EB log file is 205 MB, so it's hard to share :-(
>
> I have this environment:
>
> export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build
> ulimit -s 2000240
> export EASYBUILD_TMPDIR=/scratch/$USER
>
> and there is quite a bit of space available:
>
> $ df -h /run/user/$UID/eb_build /scratch
> Filesystem                         Size  Used Avail Use% Mounted on
> tmpfs                               19G   19G   30M 100% /run/user/983
> /dev/mapper/VolGroup00-lv_scratch  850G  675M  849G   1% /scratch
>
> Searching for FAIL in the log file, I noticed this section:
>
> == 2021-05-26 15:20:28,456 tensorflow.py:899 INFO Starting cpu test
> == 2021-05-26 15:20:28,457 run.py:225 INFO running cmd:  bazel
> --output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf
> --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws
> --config=nogcp --config=nohd
> fs --compilation_mode=opt --config=opt --subcommands --verbose_failures
> --jobs=64 --copt="-fPIC"
> --action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore-10.2.0/include:/home/modu
> les/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/home/modules/software/JsonC
> pp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules/software/nsync/1.24.0-GCC
> core-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/inclu
> de:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include'
> --action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/double-conversion/3.1.5-GCCco
> re-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2.0/lib:/home/modules/softwa
> re/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/software/nsync/1.24.0-GCCcore-1
> 0.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/lib:/home/modules/software/
> SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib'
> --action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1
> --distinct_host_configuration=false --config=mkl --test_output=errors
> --build_tests_only --local_test_jobs=64 -
> -test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
> --build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
> --test_env=CUDA_VISIBLE_DEVICES='-1' --test_timeo
> ut=3600 --test_size_filters=small --
> //tensorflow/core/... -//tensorflow/core:example_java_proto
> -//tensorflow/core/example:example_protos_closure
> //tensorflow/cc/... //tensorflow/c/... //tensorflow/python/... 
> -//tensorflow/core/profiler/internal/gpu:devi
> ce_tracer_test -//tensorflow/c/eager:c_api_test_gpu
> -//tensorflow/c/eager:c_api_distributed_test
> -//tensorflow/c/eager:c_api_distributed_test_gpu
> -//tensorflow/c/eager:c_api_cluster_test_gpu
> -//tensorflow/c/eager:c_api_remote_function_test_gpu -//tensorfl
> ow/c/eager:c_api_remote_test_gpu
> -//tensorflow/core/kernels:sparse_matmul_op_test
> -//tensorflow/core/kernels:sparse_matmul_op_test_gpu
> -//tensorflow/core/common_runtime:collective_param_resolver_local_test
> -//tensorflow/core/common_runtime:mkl_layout_pass
> _test -//tensorflow/core/kernels/mkl:mkl_fused_ops_test
> == 2021-05-26 15:30:49,144 run.py:595 INFO parse_log_for_error msg: Command
> used:  bazel
> --output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf
> --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws --
> config=nogcp --config=nohdfs --compilation_mode=opt --config=opt --subcommands
> --verbose_failures --jobs=64 --copt="-fPIC"
> --action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore
> -10.2.0/include:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/h
> ome/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules
> /software/nsync/1.24.0-GCCcore-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/
> 1.1.8-GCCcore-10.2.0/include:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include'
> --action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/dou
> ble-conversion/3.1.5-GCCcore-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2.
> 0/lib:/home/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/softw
> are/nsync/1.24.0-GCCcore-10.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/l
> ib:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib'
> --action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1
> --distinct_host_configuration=false --config=mkl --test_output=errors
> --build_tests_o
> nly --local_test_jobs=64
> --test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
> --build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
> --test_env=CUDA_VISIBLE
> _DEVICES='-1' --test_timeout=3600 --test_size_filters=small -- 
> //tensorflow/core/... -//tensorflow/core:example_java_proto
> -//tensorflow/core/example:example_protos_closure
> //tensorflow/cc/... //tensorflow/c/... //tensorflow/python/... 
> -//tensorflow/core/
> profiler/internal/gpu:device_tracer_test -//tensorflow/c/eager:c_api_test_gpu
> -//tensorflow/c/eager:c_api_distributed_test
> -//tensorflow/c/eager:c_api_distributed_test_gpu
> -//tensorflow/c/eager:c_api_cluster_test_gpu
> -//tensorflow/c/eager:c_api_remote_fun
> ction_test_gpu -//tensorflow/c/eager:c_api_remote_test_gpu
> -//tensorflow/core/kernels:sparse_matmul_op_test
> -//tensorflow/core/kernels:sparse_matmul_op_test_gpu
> -//tensorflow/core/common_runtime:collective_param_resolver_local_test
> -//tensorflow/core/comm
> on_runtime:mkl_layout_pass_test
> -//tensorflow/core/kernels/mkl:mkl_fused_ops_test
> == 2021-05-26 15:30:49,145 run.py:597 INFO parse_log_for_error (some may be
> harmless) regExp (?<![(,-]|\w)(?:error|segmentation 
> fault|failed)(?![(,-]|\.?\w)
> found:
> WARNING: Download from
> https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
> failed: class
> com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep
> tion GET returned 404 Not Found
> SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
> tensorflow/core/platform/liberror.so', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
> SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling
> tensorflow/core/platform/error.cc', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
>
> external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
> -MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d
> '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o'
> -DEIGEN_MPL2_O
> NLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0'
> -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin
> -iquote external/eigen_archive -iquote
> bazel-out/k8-opt/bin/external/eigen_archive -iquote external/com_google_absl
> -iqu
> ote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync 
> -iquote
> bazel-out/k8-opt/bin/external/nsync -iquote external/double_conversion -iquote
> bazel-out/k8-opt/bin/external/double_conversion -iquote
> external/com_google_protobuf -iquote ba
> zel-out/k8-opt/bin/external/com_google_protobuf -isystem
> third_party/eigen3/mkl_include -isystem
> bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem
> external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive
> -Wno-builtin-macro-re
> defined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"'
> '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1'
> -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes
> -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunc
> tion-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 
> -ftree-vectorize
> '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c
> tensorflow/core/platform/error.cc -o 
> bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o)
> SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
> tensorflow/core/platform/liberror.a', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
> ERROR:
> /run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11:
> Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test' 
> failed
> (Exit 1): crosstool_wrapper_driver_is_
> not_gcc failed: error executing command
> /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal error:
> bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No
> space left on device
> collect2: error: ld returned 1 exit status
> FAILED: Build did NOT complete successfully
> //tensorflow/core/common_runtime:graph_constructor_test         FAILED TO 
> BUILD
> FAILED: Build did NOT complete successfully
> == 2021-05-26 15:30:49,145 run.py:554 WARNING Found 11 errors in command 
> output
> (output: WARNING: Download from
> https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
> failed: class
> com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException
> GET returned 404 Not Found
>         SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
> tensorflow/core/platform/liberror.so', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
>         SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling
> tensorflow/core/platform/error.cc', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
>
> external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
> -MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d
> '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o'
> -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0'
> -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin
> -iquote external/eigen_archive -iquote
> bazel-out/k8-opt/bin/external/eigen_archive -iquote external/com_google_absl
> -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync
> -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/double_conversion
> -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote
> external/com_google_protobuf -iquote
> bazel-out/k8-opt/bin/external/com_google_protobuf -isystem
> third_party/eigen3/mkl_include -isystem
> bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem
> external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive
> -Wno-builtin-macro-redefined '-D__DATE__="redacted"'
> '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE
> '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer
> -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2
> -ffunction-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2
> -ftree-vectorize '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c 
> tensorflow/core/platform/error.cc -o 
> bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o)
>         SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
> tensorflow/core/platform/liberror.a', configuration:
> f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6, execution
> platform: @local_execution_config_platform//:platform]
>         ERROR:
> /run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11:
> Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test' 
> failed
> (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
>         /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal
> error:
> bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No
> space left on device
>         collect2: error: ld returned 1 exit status
>         FAILED: Build did NOT complete successfully
>         //tensorflow/core/common_runtime:graph_constructor_test FAILED TO 
> BUILD
>         FAILED: Build did NOT complete successfully)
>
>
> Please note these two errors:
>
>> WARNING: Download from
> https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
> failed: class
> com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep
>> tion GET returned 404 Not Found
>
> Is the URL outdated?
>
>> /home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal 
>> error: 
>> bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: 
>> No space left on device
>
> What device might that be?  As shown above, I have quite a bit of disk space.
> Is /tmp being used and getting full?

This might be the case.  In the past I ran into this problem and solved
it with the following:

  eb TensorFlow-1.15.0-fosscuda-2019b-Python-3.7.4.eb --robot 
--cuda-compute-capabilities=6.1,7.5 --buildpath=/dev/shm 
--tmpdir=/scratch/eb-build

YMMV

Cheers,

Loris

>> I'd also suggest to join Slack as discussions there are potentially faster.
>
> I'll take a look - are there instructions for Slack?
>
> Thanks,
> Ole
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

Reply via email to