We have a new set of hardware we are configuring as a lustre file system.  We 
are having a problem with MGS failover and could use some help.  It was 
formatted originally using 2.8 but we have since upgraded to 2.9.  We are using 
a JBOB with server pairs for failover and are using ZFS as the backend.  All 
servers are dual-homed on both Ethernet and IB.  Combined MGS/MDS is at 
X.X.X.30 (or .31 for the failover node) and MDT was formatted as:


     mkfs.lustre \
         --fsname=hpfs-fsl \
         --backfstype=zfs \
         --reformat \
         --verbose \
         --mgs --mdt --index=0 \
         --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0 
--servicenode=${LUSTRE_PEER_TCP_IP}@tcp0 \
         --servicenode=${LUSTRE_LOCAL_IB_IP}@o2ib0 
--servicenode=${LUSTRE_PEER_IB_IP}@o2ib0 \
         metadata/meta-fsl


And the OST’s were formatted as:

        mkfs.lustre \
            --mgsnode=xxx.xxx.98.30@tcp0,xxx.xxx.0.30@o2ib0 \
            --fsname=hpfs-fsl \
            --backfstype=zfs \
            --reformat \
            --verbose \
            --ost --index=$num \
            --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0 
--servicenode=${LUSTRE_PEER_TCP_IP}@tcp0 \
            --servicenode=${LUSTRE_LOCAL_IB_IP}@o2ib0 
--servicenode=${LUSTRE_PEER_IB_IP}@o2ib0 \
            $pool/ost-fsl



We realize now there are a couple mistakes in the above.  First, it would have 
been better to put the tcp0/o2ib0 pairs in the same --servicenode line as a 
comma separated list (both MDT and OST).  Our clients are only on 1 of the 
networks so I don’t think this is big problem though.  The 2nd (bigger) problem 
is that we left out the failover MGS node in the mkfs.lustre when the OST’s 
were formatted.  To correct this we used the following:

       tunefs.lustre \
           --verbose \
           --force-nohostid \
           --mgsnode=xxx.xxx.98.31@tcp0,xxx.xxx.0.31@o2ib0 \
           $pool/ost-fsl


I think it worked since before the tunefs.lustre command a “zfs get all | grep 
mgs” showed this:

oss00-0/ost-fsl             lustre:mgsnode        
xxx.xxx.98.30@tcp,xxx.xxx.0.30@o2ib  local

And afterward it shows this:

oss00-0/ost-fsl             lustre:mgsnode        
xxx.xxx.98.30@tcp,xxx.xxx.0.30@o2ib:xxx.xxx.98.31@tcp,xxx.xxx.0.31@o2ib  local


OST failover seems to work great – clients pick up again with no problems and 
the logs on the servers don’t report any issues.  The MDT/MGC failover doesn’t 
go as well.  The clients seems to do just fine but the OSS logs start reporting 
this:

Jan  4 11:24:42 hpfs-fsl-oss00 kernel: Lustre: 
15713:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for slow reply: [sent 1483550635/real 1483550635]  req@ffff8807dc9f6300 
x1555089580422192/t0(0) o250->MGCxxx.xxx.98.30@[email protected]@tcp:26/25 
lens 520/544 e 0 to 1 dl 1483550681 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1


And ‘lctl dl’ on an oss continues to show the primary MGC connection:



[root@hpfs-fsl-oss00 ~]# lctl dl
  0 UP osd-zfs hpfs-fsl-OST0000-osd hpfs-fsl-OST0000-osd_UUID 5
  1 UP mgc MGCxxx.xxx.98.30@tcp 6832efc6-4cc6-cd22-9d48-f7bc31d8930c 5
  2 UP ost OSS OSS_uuid 3
  3 UP obdfilter hpfs-fsl-OST0000 hpfs-fsl-OST0000_UUID 27
  4 UP lwp hpfs-fsl-MDT0000-lwp-OST0000 hpfs-fsl-MDT0000-lwp-OST0000_UUID 5
[root@hpfs-fsl-oss00 ~]#





We have transferred a lot of data to the this LFS in preparation for going 
production so we’d like to try not to reformat the LFS if possible, but that is 
an option if needed.  Are we still missing something from the initial 
mkfs.lustre missteps or is there something else we are missing?  

Thanks
Darby



_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to