So we have a cluster with an MGT and 2 MDTs.  Each has an NID on o2ib and tcp 
and are dual-connected to 2 MDSs.  We created the MGT and MDTs with the 
following commands:
mkfs.lustre --mgs --reformat --failnode=10.4.20...@o2ib,10.4.20...@o2ib 
--failnode=10.4.20...@tcp0,10.4.20...@tcp0 /dev/dm-0
mkfs.lustre --mdt --mgsnode=10.4.20...@o2ib --fsname=lrc --reformat 
--failnode=10.4.20...@o2ib,10.4.20...@o2ib,10.4.20...@tcp0,10.4.20...@tcp0 
/dev/dm-1
mkfs.lustre --mdt --mgsnode=10.4.20...@o2ib --fsname=nano --reformat 
--failnode=10.4.20...@o2ib,10.4.20...@o2ib,10.4.20...@tcp0,10.4.20...@tcp0 
/dev/dm-2

The host cluster starts and mounts the luns just fine.  I mount TCP connected 
clients with both MGSs called out.  The client fails over to the secondary 
MDS/MGT just fine but keeps failing on the MDT.  It just keeps trying the old 
MDS NIDs:
Lustre: Changing connection for lrc-MDT0000-mdc-ffff8101d57ad400 to 
10.4.20...@o2ib/10.0.20...@tcp

Ideas?
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720








_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to