On Fri, Jun 26, 2009 at 12:51 PM, Kevin Van Maren<kevin.vanma...@sun.com> wrote: > OSS is the server. It normally provides one or more OSTs. > > OST failover is done by configuring multiple OSS nodes to be able to serve > the same OST. Only ONE OSS node may provide the OST at a time. > I understand that OST can't be shared by two or more active OSSs at a time. But we can/should configure OSSs for failover mode. In my interpretation OST failure was a disk/storage failure. So the failover you are referring to was an OSS failover in my understanding (i.e., switch to another failover OSS node, if particular OSS fails) .
> Failover is accomplished by the clients attempting to connect to each OSS > node configured to serve the OST, until one of them responds with it active. > > > An OST can be moved back-and-forth between OSS nodes by umount/mount > commands (assuming both servers can access the same disk!) > > If an OST "fails", meaning that the underlying HW has failed (or the > connection to the storage has failed -- one reason to use multipath IO), > then Lustre will return IO errors to the application (although there is an > RFE to not do that). Normally what happens is the OSS _node_ fails, and the > other node mounts the OST (typically done by using Linux-HA/Heartbeat). > Yeah, this is what I am curious abt - OST/disk/storage-device failure. It might be nice to have something on wiki regarding server and target as separate entities or same machine. I have gone through the FAQ entry, but it would be great if we could elaborate it further. > > MDS/MDT failover/configuration is similar. > > Kevin > > > > Carlos Santana wrote: >> >> Sorry, but may be I am confused between OSS and OST. >> >> On Fri, Jun 26, 2009 at 11:24 AM, Brian J. Murrell<brian.murr...@sun.com> >> wrote: >> >>> >>> On Fri, 2009-06-26 at 10:56 -0500, Carlos Santana wrote: >>> >>>> >>>> I was wondering what will happen during OST failure >>>> - if client is making some read/write operation >>>> >>> >>> Assuming the OST is configured for failover, the client will retry >>> anything that didn't get committed to disk before the OST failure. It >>> will try with all available failover targets for the OST. >>> >> >> Can OST(disk) be configured for failover like an OSS(server node)? >> >> >>>> >>>> - if client requests read/write after OST fails >>>> >>> >>> Same as above. >>> >>> >>>> >>>> When I made OSS unavailable the client waited/got delayed response >>>> till OSS connected back. >>>> >>> >>> Right. That's failover. >>> >>> >>>> >>>> I am not sure about OST failure though. Any >>>> clues? >>>> >>> >>> An OST fails if an OSS fails given that an OST is the disk in an OSS >>> (which is the node). >>> >> >> I thought an OST(disk) can fail without OSS(server) being failed. >> And that's my question, what will happen in such scenario - while >> client is in read/write operation and client requesting read/write >> after the OST(disk) failure? >> >> >>> >>> b. >>> >>> > > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss