I have the beginnings of a lustre filesystem, with a server, mds, hosting the MGS and MDS, and a storage node, oss1. The disks, /mgt and /mdt on mds and /ost on oss1 mount, apparently without error.

I have set up a client, pxe, which mounts /lustre:

root@node080027eb24b8:~# mount -t lustre mds@tcp:/comind /lustre

This appears to be successful - from dmesg:

...
[Wed Feb 21 10:54:59 2024] libcfs: loading out-of-tree module taints kernel.
[Wed Feb 21 10:54:59 2024] libcfs: module verification failed: signature and/or required key missing - tainting kernel [Wed Feb 21 10:54:59 2024] LNet: HW NUMA nodes: 1, HW CPU cores: 1, npartitions: 1
[Wed Feb 21 10:54:59 2024] alg: No test for adler32 (adler32-zlib)
[Wed Feb 21 10:55:00 2024] Key type ._llcrypt registered
[Wed Feb 21 10:55:00 2024] Key type .llcrypt registered
[Wed Feb 21 10:55:00 2024] Lustre: Lustre: Build Version: 2.15.4
[Wed Feb 21 10:55:00 2024] LNet: Added LNI 192.168.50.13@tcp [8/256/0/180]
[Wed Feb 21 10:55:00 2024] LNet: Accept secure, port 988
[Wed Feb 21 10:55:02 2024] Lustre: Mounted comind-client

I have, after several attempts managed to create a file (or at least a directory entry):

root@node080027eb24b8:~# ls /lustre
test

However, anything that tries to open anything in /lustre - eg, 'ls -l' - just hangs indefinitely, which I suspect is because it is waiting for some sort of response on a network socket. An strace shows:

root@node080027eb24b8:~# strace -f /usr/bin/cat /lustre/test
...
fstat(3, {st_mode=S_IFREG|0644, st_size=346132, ...}) = 0
mmap(NULL, 346132, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb3d0994000
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
openat(AT_FDCWD, "/lustre/test", O_RDONLY) = 3
fstat(3,

I see no change in dmesg on pxe and oss1, but this on mds:

...
[Wed Feb 21 10:50:06 2024] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Wed Feb 21 10:50:44 2024] LDISKFS-fs (sda): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Wed Feb 21 10:50:44 2024] Lustre: comind-MDT0000: Imperative Recovery not enabled, recovery window 300-900 [Wed Feb 21 10:51:15 2024] Lustre: comind-OST0000-osc-MDT0000: Connection restored to (at 192.168.50.130@tcp) [Wed Feb 21 10:57:04 2024] Lustre: comind-MDT0000: haven't heard from client 83befb6d-7ee2-4acb-997c-b15520dcb70d (at 192.168.50.13@tcp) in 240 seconds. I think it's dead, and I am evicting it. exp 00000000ddc96899, cur 1708513026 expire 1708512876 last 1708512786


So, something isn't right somewhere in the communication from pxe to mds - but what?


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [lustre-discuss] ope... Jan Andersen via lustre-discuss

Reply via email to