Hi,
Your MDS refuses to start after we tried to enable Quotas: What we did: # umount /lustre/mds # tunefs.lustre --param mdt.quota_type=ug /dev/md10 (as described in http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringQuotas.html) # sync # mount -t lustre /dev/md10 /lustre/mds ---> at this point, the mds crashed <--- Now the MDS refuses to startup: Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 Lustre: Listener bound to ib0:10.201.62.11:987:mlx4_0 Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1 Lustre: Added LNI 10.201.62...@o2ib [8/64/0/180] Lustre: Added LNI 10.201.30...@tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ init dynlocks cache ldiskfs created from ext3-2.6-rhel5 kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on md10, internal journal LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on md10, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. Lustre: MGS MGS started Lustre: mgc10.201.62...@o2ib: Reactivating import Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available for connect (no target) LustreError: 6440:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@ffff81021986a000 x1352839800570911/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available for connect (no target) LustreError: Skipped 1 previous similar message LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@ffff81021986ac00 x1352839303546603/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 1 previous similar message LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available for connect (no target) LustreError: Skipped 17 previous similar messages LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@ffff8101ee758400 x1352840769468288/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181454 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 17 previous similar messages LustreError: 6423:0:(mgs_handler.c:671:mgs_handle()) MGS handle cmd=253 rc=-99 LustreError: 11-0: an error occurred while communicating with 0...@lo. The mgs_target_reg operation failed with -99 LustreError: 6177:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for lustre1-MDT0000: -99 LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available for connect (no target) LustreError: Skipped 17 previous similar messages LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@ffff8101ea921800 x1352839510145001/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181455 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 18 previous similar messages LustreError: 6177:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -99 LustreError: 6177:0:(obd_mount.c:1438:server_put_super()) no obd lustre1-MDT0000 LustreError: 6177:0:(obd_mount.c:147:server_deregister_mount()) lustre1-MDT0000 not registered Lustre: MGS has stopped. LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available for connect (no target) LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@ffff8101ec658000 x1352839459803293/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181457 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 50 previous similar messages LustreError: Skipped 58 previous similar messages Lustre: server umount lustre1-MDT0000 complete LustreError: 6177:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-99) Removing the quota params via # tunefs.lustre --erase-params --param="failover.node=10.201.62...@o2ib,10.201.30...@tcp failover.node=10.201.62...@o2ib,10.201.30...@tcp mdt.group_upcall=/usr/sbin/l_getgroups" /dev/md10 did not help. So what does 'Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, specified as failover' exactly mean? This *is* 10.201.62.11 and tunefs shows: checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre1-MDT0000 Index: 0 Lustre FS: lustre1 Mount type: ldiskfs Flags: 0x45 (MDT MGS update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: failover.node=10.201.62...@o2ib,10.201.30...@tcp failover.node=10.201.62...@o2ib,10.201.30...@tcp mdt.group_upcall=/usr/sbin/l_getgroups Regards, Adrian _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss