I tried using tunefs.lustre to re-set failover parameter for my OST ( although, from dryrun tunefs.lustre output, i saw those parameter ) but it couldn't help. Anyone else has any idea?
Thank you in advance !!!! On Thu, Nov 19, 2009 at 5:33 AM, Dam Thanh Tung <[email protected]> wrote: > On Thu, Nov 19, 2009 at 2:00 AM, > <[email protected]>wrote: > >> Send Lustre-discuss mailing list submissions to >> [email protected] >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> or, via email, send a message with subject or body 'help' to >> [email protected] >> >> You can reach the person managing the list at >> [email protected] >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Lustre-discuss digest..." >> >> >> Today's Topics: >> >> 1. MDS doesn't switch to failover OST node (Dam Thanh Tung) >> 2. Re: MDS doesn't switch to failover OST node (Brian J. Murrell) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 18 Nov 2009 22:54:28 +0700 >> From: Dam Thanh Tung <[email protected]> >> Subject: [Lustre-discuss] MDS doesn't switch to failover OST node >> To: [email protected] >> Message-ID: >> <[email protected]> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi list >> >> I am encountering a problem with OST-MDS connecting. Because of RAID card >> hanging, our OST went down this morning and when i tried to mount the >> faill >> over node of that OST, problem occurred : >> >> MDS only sent request to the OST which was down and didn't connect to our >> backup (failover) OST, so our backup solution was useless, we lost all >> data >> from that OST. It's really a disaster for me because we even lost all of >> our >> data before with the same kind of problem: OST can't connect to MDS !!!! >> >> We use drbd between OSTs to synchronize data. The backup (failover node) >> was >> mounted successfully without any error but didn't have any client to >> recover >> like this: >> >> cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status >> status: RECOVERING >> recovery_start: 0 >> time_remaining: 0 >> connected_clients: 0/1 >> delayed_clients: 0/1 >> completed_clients: 0/1 >> replayed_requests: 0*/??* >> queued_requests: 0 >> next_transno: 30064771073 >> >> In MDS's message log, we only saw the connection to our dead OST: >> >> Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from >> lustre-OST0006-osc to NID 192.168.1...@tcp 56s ago has timed out (limit >> 56s). >> ...... >> >> The output of* **lctl dl *command from MDS >> >> lctl dl >> 0 UP mgs MGS MGS 25 >> 1 UP mgc mgc192.168.1...@tcp 0681a267-849f-350c-5b2c-6869c794550f 5 >> 2 UP mdt MDS MDS_uuid 3 >> 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 >> 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15 >> 5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 >> 6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 >> 7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5 >> 8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 >> 9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5 >> >> I did activated OST6 ( lctl --device 7 activate ) but it couldn't help >> >> >> >> Could anyone tell me how to route MDS to connect to our backup OST ( with >> ip >> address 192.168.1.67 , for example ) ? , to bring our OST up ? >> >> Any help would be really appreciated ! >> >> Hope that i can receive your answers or suggestions as soon as possible >> >> Best Regards >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html >> >> ------------------------------ >> >> Message: 2 >> Date: Wed, 18 Nov 2009 11:10:51 -0500 >> From: "Brian J. Murrell" <[email protected]> >> Subject: Re: [Lustre-discuss] MDS doesn't switch to failover OST node >> To: [email protected] >> Message-ID: <[email protected]> >> Content-Type: text/plain; charset="utf-8" >> >> On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote: >> > Hi list >> >> Hi, >> >> > MDS only sent request to the OST which was down and didn't connect to >> > our backup (failover) OST, so our backup solution was useless, we lost >> > all data from that OST. >> > > Hi Brian > > Thank you for you fast reply > >> >> I don't think you have actually lost any data. It's there. Your >> clients (which the MDS is) just don't know to use the failover OSS that >> you have set up (but not told Lustre about). >> >> > It's really a disaster for me because we even lost all of our data >> > before with the same kind of problem: OST can't connect to MDS !!!! >> >> Failures to connect between nodes does not result in data loss. The >> data is still there. You just need to have your clients access it. >> >> > > I know that data is still there but i refer to "lost" when i no longer can > access it anymore. > > In our client, we mounted with parameter like this: > > mount -t lustre -o flock 192.168.1...@tcp:192.168.1...@tcp:/lustre > /mnt/lustre/ > > We didn't umount our client, just deactivate the dead OST and after mouting > the backup one, we activated it, but because MDS coudn't connect and receive > any information from the backup ( failover ) OST, clients are the same. > > > >> > Could anyone tell me how to route MDS to connect to our backup OST >> > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST >> > up ? >> >> It sounds like you need to review the failover section of the manual. >> >> In summary, you need to tell the clients about failover nodes >> (--failnode) when you create the filesystem. You can add this feature >> after-the-fact with tunefs.lustre. >> > > In our OST, before it goes down because of RAID card hanging, we made it > by: > > mkfs.lustre --ost --mgsnode=192.168.1...@tcp > --mgsnode=192.168.1...@tcp--failover=192.168.1.66@tcp--index=6 --verbose > --writeconf /dev/drbd6 > > Could you please give some suggestions ? Do i need to provide some > information ? > > Many thanks > >> >> b. >> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: not available >> Type: application/pgp-signature >> Size: 197 bytes >> Desc: This is a digitally signed message part >> Url : >> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin >> >> ------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> End of Lustre-discuss Digest, Vol 46, Issue 33 >> ********************************************** >> > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
