Hi all, Our MDT suffered a kernel panic (which I will post separately), the OSSs stayed alive but the MDT was out for some time while nodes still tried to interact with lustre.
So I have several questions: a. what happens to processes/reading writing during such an event (if they already have handles on the OSS for instance that makes a difference)? I noticed several of our compute-nodes ended up filling their swap/RAM so I assume some level of caching is happening until the MDT returns.... b. what is the best/proper procedure now to ensure filesystem integrity? Should I take the filesystem offline and run an lfsck first on the MDT then on the OSS? Most documents I can find with google on the subject are spread over the various old wikis so it is not clear to me how relevant they are.... Thanks, Eli Specs: Server OS: CentOS 6.4 + lustre 2.5.3 from RPMs (1 MGS/MDS + 3 OSS) Clients: Debian testing/unstable, kernel 4.2.8 + lustre 2.8.0 built from source. Network: Infiniband FDR (o2ib)
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org