"Loris Bennett" <loris.bennett-j/[email protected]> writes: > Hi, > > With > > TensorFlow-2.15.1-foss-2023a-CUDA-12.1.1.eb > > the clean-up step fails with the following error: > > == 2025-05-12 22:17:33,931 easyblock.py:4251 INFO Running method > cleanup_step part of step cleanup > == 2025-05-12 22:17:33,932 easyblock.py:3978 INFO Cleaning up > builddir > /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1 > (in /trinity/home/build/slurm) > == 2025-05-12 22:17:38,935 filetools.py:1853 INFO Adjusting > permissions recursively for > /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1 > == 2025-05-12 23:58:00,825 filetools.py:1853 INFO Adjusting > permissions recursively for > /trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1 > == 2025-05-13 02:42:45,424 build_log.py:226 ERROR EasyBuild > encountered an error (at > easybuild/software/EasyBuild/5.0.0/lib/python3.6/site-packages/easybuild/base/exceptions.py:126 > in __init__): Failed to chmod/chown several paths: > ['/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/install/f1ec268a484023c283bf4c5d46927af2/.nfs000000010844b19b000003b0', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/install/f1ec268a484023c283bf4c5d46927af2/.nfs000000010844b0fa000003af', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/command_port', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server_info.rawproto', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/request_cookie', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/response_cookie', > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server.pid.txt'] > (last error: [Errno 2] No such file or directory: > '/trinity/shared/easybuild/build/TensorFlow/2.15.1/foss-2023a-CUDA-12.1.1/TensorFlow/bazel-root/c8200e0e7497c598b69103a2f9e76764/server/server.pid.txt') > (at > easybuild/software/EasyBuild/5.0.0/lib/python3.6/site-packages/easybuild/tools/filetools.py:1925 > in adjust_permissions) > == 2025-05-13 02:42:47,086 build_log.py:322 INFO ... (took 4 hours 25 mins > 13 secs) > > Is this potentially connected with having the build directory on an NFS > share?
Looks like problem was cause by the underlying file system running out of inodes. Building TensorFlow seems to create a lot files 😬 -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin

