Hello all,

I've been trying to test the fallout failover mode, but instead of getting
the "connection lost, in progress operations using this service will fail"
message and a failure, I receive the "in progress operations will wait for
recovery to complete" and the operation hangs forever. I'm not deploying in
an environment where HA is possible, so I'd prefer in progress operations
to fail instead hanging indefinitely.

Tunefs.lustre tells me that the zfs OSDs have a failover.mode=failout
setting as the manual suggests. Under investigation, there is an unresolved
ticket in the tracker that also states that the fallout is no longer
supported (
https://jira.whamcloud.com/browse/LUDOC-200?jql=text%20~%20%22failout%22)

Does anyone know if it's still officially supported or not? I dove pretty
deep into the source and found it's all still in there, but it doesn't seem
to be marking my failout servers as non-replayable in the pltrpc codepaths.

Thanks for your time!
Best,
Christian

-- 
 <https://opendrives.com/wp-content/uploads/2020/04/OD-Anywhere.pdf>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to