Just to clarify one more point: failover is designed to handle temporary issues with the server. It is NOT to handle problems with either the storage or the network: Lustre assumes neither will have problems.
More below. On Jun 26, 2009, at 12:09 PM, "Brian J. Murrell" <brian.murr...@sun.com> wrote: > On Fri, 2009-06-26 at 11:51 -0600, Kevin Van Maren wrote: >> If an OST "fails", meaning that the underlying HW has failed (or the >> connection to the storage has failed -- one reason to use multipath >> IO), >> then Lustre will return IO errors to the application (although >> there is >> an RFE to not do that). > > This is not entirely true. It is only true when an OST is > configured as > "failout". When an OST is configured as failover however (which is > the > typical case), the application just blocks until the OST can be put > back > into service again on any of the defined failover nodes for that OST > and > the client can reconnect. At that time, pending operations are > resumed > and the application continues. If the client connection to the server is lost, then yes. But I was referring to the storage returning an IO error to the server; when that happens, the server returns IO errors to the client, which are then passed to the application. The request to not forward those errors is in bugzilla -- basically give heartbeat a chance to do a failover if the path to storage is lost on the server. > >> Normally what happens is the OSS _node_ fails, >> and the other node mounts the OST (typically done by using >> Linux-HA/Heartbeat). > > Right. And no applications see any errors while this happens. > > And it is worth noting that defining an OST for failover does not > require that more than one OSS be defined for it. You can provide > "failover service" (i.e. no EIOs to clients) using a single OSS. If > it > dies, then clients just block until it can be repaired. Right, that lets you reboot the server semi-transparently (still have the delay/hang on the filesystem). But does not handle the server getting IO errors from te storage. > Kevin _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss