Hello, As a part of my continuing to benchmark Lustre to ascertain was may be best-suited for our needs here, I have re-created at the LSI 8888ELP card level some of my arrays from my earlier benchmark posts. The card is now sending /dev/sdf 998999Mb with 128kB stripesize and /dev/sdg 6992995 Mb with 128kB stripesize to my OSS. The sdf and sdg formatted fine with Lustre and mounted without issue on the OSS. Recycling the MGS MDT's seem to have been a problem. When I tried to mount the MDT on the MGS after mounting the new OST's the mounts performed without error, but the bonnie benchmark test as run before would hang every time.
Sample of errors in MGS file /var/log/messages: Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to service crew5-OST000 1 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Aug 13 12:39:30 mds1 kernel: LustreError: 167-0: This client was evicted by crew5-OST0001; in progress operations using this service will fail. Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection restored to service crew5-OST0001 using nid [EMAIL PROTECTED] Aug 13 12:39:30 mds1 kernel: Lustre: MDS crew5-MDT0000: crew5-OST0001_UUID now active, resetting orphans Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) cre w5-MDT0000: 50b043bb-0e8c-7a5b-b0fe-6bdb67d21e0b reconnecting Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 24 previous similar messages Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:747:target_handle_connect()) crew5-MDT0000: refuse reconnection from [EMAIL PROTECTED]@o2ib to 0xffff81006994d000; still busy with 2 active RPCs Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 24 previous similar messages Aug 13 12:42:42 mds1 kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x600107/t0 o38->[EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 Aug 13 12:42:42 mds1 kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 24 previous similar messages Aug 13 12:43:40 mds1 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -19 Aug 13 12:43:40 mds1 kernel: LustreError: Skipped 7 previous similar messages Aug 13 12:47:50 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to service crew5-OST0001 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait f or recovery to complete. Sample of errors on OSS file /var/log/messages: Aug 13 12:39:30 oss4 kernel: Lustre: crew5-OST0001: received MDS connection from [EMAIL PROTECTED] Aug 13 12:43:57 oss4 kernel: Lustre: crew5-OST0001: haven't heard from client crew5-mdtlov_UUID (at [EMAIL PROTECTED]) in 267 seconds. I think it's dead, and I am evicting it. Aug 13 12:46:27 oss4 kernel: LustreError: 137-5: UUID 'crew5-OST0000_UUID' is not available for connect (no target) Aug 13 12:46:27 oss4 kernel: LustreError: Skipped 51 previous similar messages Aug 13 12:46:27 oss4 kernel: LustreError: 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] x600171/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -19/0 Aug 13 12:46:27 oss4 kernel: LustreError: 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 52 previous similar messages Aug 13 12:47:50 oss4 kernel: Lustre: crew5-OST0001: received MDS connection from [EMAIL PROTECTED] In lctl all pings were successful. Additionally files on Lustre disks on our live system using the same MGS were all fine; no errors in logfile. I thought that maybe changing the disk kB size and reformatting the OST without reformatting the MDT was a problem. So I unmounted OST and MDT and reformatted the MDT on the MGS. All okay. The OST remount without error. The MDT on the MGS will not remount: [EMAIL PROTECTED] ~]# mount -t lustre /dev/sdg1 /srv/lustre/mds/crew5-MDT0000 mount.lustre: mount /dev/sdg1 at /srv/lustre/mds/crew5-MDT0000 failed: Address already in use The target service's index is already in use. (/dev/sdg1) Again, the live systems on the MGS are still fine. A web search for the error suggested I try " tunefs.lustre --reformat --index=0 --writeconf=/dev/sdg1" but I was unable to get a syntax of that command that would run for me (I tried various index= and adding /dev/sdg1 at the end of the line but it failed each time and only reprinted the help without indicating what about what I typed was unparseable). My current thought is that a stripesize of 128kB from the LSI 8888ELP card is not testable on my Lustre 1.6.4 system. This does not seem to be an accurate statement from what I have read of Lustre but seems to be what is occurring on my systems. I will test one more time back at 64kB stripesize. megan _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss