"Loris Bennett" <loris.bennett-j/[email protected]>
writes:
> Hi,
>
> With
>
>   TensorFlow-2.15.1-foss-2023a-CUDA-12.1.1.eb
>
> the clean-up step fails with the following error:
>
>   == 2025-05-12 22:17:33,931 easyblock.py:4251 INFO Running method 
> cleanup_step part of step cleanup
>   == 2025-05-12 22:17:33,932 easyblock.py:3978 INFO Cleaning up
> builddir
> /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1
> (in /trinity/home/build/slurm)
>   == 2025-05-12 22:17:38,935 filetools.py:1853 INFO Adjusting
> permissions recursively for
> /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1
>   == 2025-05-12 23:58:00,825 filetools.py:1853 INFO Adjusting
> permissions recursively for
> /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1
>   == 2025-05-13 02:42:45,424 build_log.py:226 ERROR EasyBuild
> encountered an error (at
> easybuild/software/EasyBuild/5.0.0/lib/python3.6/site-packages/easybuild/base/exceptions.py:126
> in __init__): Failed to chmod/chown several paths:
> ['/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/install/f1ec268a484023c283bf4c5d46927af2/.nfs000000010844b19b000003b0',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/install/f1ec268a484023c283bf4c5d46927af2/.nfs000000010844b0fa000003af',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/command_port',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server_info.rawproto',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/request_cookie',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/response_cookie',
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server.pid.txt']
> (last error: [Errno 2] No such file or directory:
> '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server.pid.txt')
> (at
> easybuild/software/EasyBuild/5.0.0/lib/python3.6/site-packages/easybuild/tools/filetools.py:1925
> in adjust_permissions)
>   == 2025-05-13 02:42:47,086 build_log.py:322 INFO ... (took 4 hours 25 mins 
> 13 secs)
>
> Is this potentially connected with having the build directory on an NFS
> share?

Looks like problem was cause by the underlying file system running out
of inodes.  Building TensorFlow seems to create a lot files 😬

-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin

Reply via email to