If one assumes that the rebuild will incorporate the same identity as the original host (same hostname, IP address, etc.), then it should just be a matter of restoring the OS, re-installing the Lustre packages, configuring LNet (e.g. /etc/modprobe.d/lustre.conf) and remounting. If you've got an HA setup (e.g. Pacemaker + Corosync), then you'll need to restore that as well. Or rather, keep a backup copy of the config so that you can restore it :). There is no need to perform any "rebuild" of Lustre itself; just repair/restore the OS.
Other than LNet, all the Lustre configuration information is held on the storage targets (MGT, MDT), so you can rebuild the root disks without affecting the Lustre config on the MGT and MDT. So, in summary: rebuild the root disks (maybe use a provisioning system like kickstart for repeatability), restore the network config, restore LNet config, maybe restore the HA software, restore the identity management (e.g. LDAP, passwd, group) then mount the storage as before. Malcolm Cowe High Performance Data Division Intel Corporation | www.intel.com -----Original Message----- From: lustre-discuss [mailto:[email protected]] On Behalf Of Jon Tegner Sent: Friday, March 11, 2016 4:48 PM To: [email protected] Subject: [lustre-discuss] Rebuild server Hi, yesterday I had an incident where the system disk of one of my servers (MDT/MGS) went down, but the raid could be rebuilt and the system went up again. However, in the event of a complete failure of the system disk (assuming all relevant "lustre disks" are still intact) is there a clear procedure to follow in order to rebuild the file system once the OS has been reinstalled on new disk? Thanks, /jon _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
