Hi,

Your MDS refuses to start after we tried to enable Quotas:


What we did:
 # umount /lustre/mds
 # tunefs.lustre --param mdt.quota_type=ug /dev/md10 (as described in 
http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringQuotas.html)
 # sync
 # mount -t lustre /dev/md10 /lustre/mds
---> at this point, the mds crashed <---

Now the MDS refuses to startup:

Lustre: OBD class driver, http://www.lustre.org/
Lustre:     Lustre Version: 1.8.4
Lustre:     Build Version: 
1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
Lustre: Listener bound to ib0:10.201.62.11:987:mlx4_0
Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
Lustre: Added LNI 10.201.62...@o2ib [8/64/0/180]
Lustre: Added LNI 10.201.30...@tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
init dynlocks cache
ldiskfs created from ext3-2.6-rhel5
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: recovery complete.
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: MGS MGS started
Lustre: mgc10.201.62...@o2ib: Reactivating import
Lustre: Denying initial registration attempt from nid 10.201.62...@o2ib, 
specified as failover
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect 
(no target)
LustreError: 6440:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@ffff81021986a000 x1352839800570911/t0 o38-><?>@<?>:0/0 lens 
368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect 
(no target)
LustreError: Skipped 1 previous similar message
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@ffff81021986ac00 x1352839303546603/t0 o38-><?>@<?>:0/0 lens 
368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 1 
previous similar message
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect 
(no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@ffff8101ee758400 x1352840769468288/t0 o38-><?>@<?>:0/0 lens 
368/0 e 0 to 0 dl 1290181454 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 17 
previous similar messages
LustreError: 6423:0:(mgs_handler.c:671:mgs_handle()) MGS handle cmd=253 rc=-99
LustreError: 11-0: an error occurred while communicating with 0...@lo. The 
mgs_target_reg operation failed with -99
LustreError: 6177:0:(obd_mount.c:1097:server_start_targets()) Required 
registration failed for lustre1-MDT0000: -99
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect 
(no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@ffff8101ea921800 x1352839510145001/t0 o38-><?>@<?>:0/0 lens 
368/0 e 0 to 0 dl 1290181455 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 18 
previous similar messages
LustreError: 6177:0:(obd_mount.c:1655:server_fill_super()) Unable to start 
targets: -99
LustreError: 6177:0:(obd_mount.c:1438:server_put_super()) no obd lustre1-MDT0000
LustreError: 6177:0:(obd_mount.c:147:server_deregister_mount()) lustre1-MDT0000 
not registered
Lustre: MGS has stopped.
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect 
(no target)
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@ffff8101ec658000 x1352839459803293/t0 o38-><?>@<?>:0/0 lens 
368/0 e 0 to 0 dl 1290181457 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 50 
previous similar messages
LustreError: Skipped 58 previous similar messages
Lustre: server umount lustre1-MDT0000 complete
LustreError: 6177:0:(obd_mount.c:2050:lustre_fill_super()) Unable to
mount  (-99)


Removing the quota params via
 # tunefs.lustre --erase-params 
--param="failover.node=10.201.62...@o2ib,10.201.30...@tcp 
failover.node=10.201.62...@o2ib,10.201.30...@tcp 
mdt.group_upcall=/usr/sbin/l_getgroups" /dev/md10

did not help.


So what does 'Lustre: Denying initial registration attempt from nid 
10.201.62...@o2ib, specified as failover' exactly mean?
This *is* 10.201.62.11 and tunefs shows:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre1-MDT0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x45
              (MDT MGS update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=10.201.62...@o2ib,10.201.30...@tcp 
failover.node=10.201.62...@o2ib,10.201.30...@tcp 
mdt.group_upcall=/usr/sbin/l_getgroups




Regards,
 Adrian
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to