On Thu, Nov 19, 2009 at 2:00 AM, <[email protected]>wrote:
> Send Lustre-discuss mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/mailman/listinfo/lustre-discuss > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Lustre-discuss digest..." > > > Today's Topics: > > 1. MDS doesn't switch to failover OST node (Dam Thanh Tung) > 2. Re: MDS doesn't switch to failover OST node (Brian J. Murrell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 18 Nov 2009 22:54:28 +0700 > From: Dam Thanh Tung <[email protected]> > Subject: [Lustre-discuss] MDS doesn't switch to failover OST node > To: [email protected] > Message-ID: > <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > Hi list > > I am encountering a problem with OST-MDS connecting. Because of RAID card > hanging, our OST went down this morning and when i tried to mount the faill > over node of that OST, problem occurred : > > MDS only sent request to the OST which was down and didn't connect to our > backup (failover) OST, so our backup solution was useless, we lost all data > from that OST. It's really a disaster for me because we even lost all of > our > data before with the same kind of problem: OST can't connect to MDS !!!! > > We use drbd between OSTs to synchronize data. The backup (failover node) > was > mounted successfully without any error but didn't have any client to > recover > like this: > > cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status > status: RECOVERING > recovery_start: 0 > time_remaining: 0 > connected_clients: 0/1 > delayed_clients: 0/1 > completed_clients: 0/1 > replayed_requests: 0*/??* > queued_requests: 0 > next_transno: 30064771073 > > In MDS's message log, we only saw the connection to our dead OST: > > Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from > lustre-OST0006-osc to NID 192.168.1...@tcp 56s ago has timed out (limit > 56s). > ...... > > The output of* **lctl dl *command from MDS > > lctl dl > 0 UP mgs MGS MGS 25 > 1 UP mgc mgc192.168.1...@tcp 0681a267-849f-350c-5b2c-6869c794550f 5 > 2 UP mdt MDS MDS_uuid 3 > 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 > 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15 > 5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 > 6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 > 7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5 > 8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 > 9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5 > > I did activated OST6 ( lctl --device 7 activate ) but it couldn't help > > > > Could anyone tell me how to route MDS to connect to our backup OST ( with > ip > address 192.168.1.67 , for example ) ? , to bring our OST up ? > > Any help would be really appreciated ! > > Hope that i can receive your answers or suggestions as soon as possible > > Best Regards > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Wed, 18 Nov 2009 11:10:51 -0500 > From: "Brian J. Murrell" <[email protected]> > Subject: Re: [Lustre-discuss] MDS doesn't switch to failover OST node > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote: > > Hi list > > Hi, > > > MDS only sent request to the OST which was down and didn't connect to > > our backup (failover) OST, so our backup solution was useless, we lost > > all data from that OST. > Hi Brian Thank you for you fast reply > > I don't think you have actually lost any data. It's there. Your > clients (which the MDS is) just don't know to use the failover OSS that > you have set up (but not told Lustre about). > > > It's really a disaster for me because we even lost all of our data > > before with the same kind of problem: OST can't connect to MDS !!!! > > Failures to connect between nodes does not result in data loss. The > data is still there. You just need to have your clients access it. > > I know that data is still there but i refer to "lost" when i no longer can access it anymore. In our client, we mounted with parameter like this: mount -t lustre -o flock 192.168.1...@tcp:192.168.1...@tcp:/lustre /mnt/lustre/ We didn't umount our client, just deactivate the dead OST and after mouting the backup one, we activated it, but because MDS coudn't connect and receive any information from the backup ( failover ) OST, clients are the same. > > Could anyone tell me how to route MDS to connect to our backup OST > > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST > > up ? > > It sounds like you need to review the failover section of the manual. > > In summary, you need to tell the clients about failover nodes > (--failnode) when you create the filesystem. You can add this feature > after-the-fact with tunefs.lustre. > In our OST, before it goes down because of RAID card hanging, we made it by: mkfs.lustre --ost --mgsnode=192.168.1...@tcp --mgsnode=192.168.1...@tcp--failover=192.168.1.66@tcp--index=6 --verbose --writeconf /dev/drbd6 Could you please give some suggestions ? Do i need to provide some information ? Many thanks > > b. > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 197 bytes > Desc: This is a digitally signed message part > Url : > http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin > > ------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > End of Lustre-discuss Digest, Vol 46, Issue 33 > ********************************************** >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
