So, did you do the "writeconf"? And the OST mounted afterwards?
As I understand, the MGS was under the impression that this re-mounting
OST was actually a new one using an old index.
So, what made your repaired OST look new/different ?
I would probably have mounted it locally, as an ext4 file system, if
only to check that there is data still present (ok, "df" would do that,
too).
"tunefs.lustre --dryrun" will show other quantum numbers that _should
not_ change when taking down and remounting an OST.
And since "writeconf" has to be done on all targets, you have to take
down your MDS anyhow - so nothing is lost by simply trying an MDS restart?
Regards
Thomas
On 11/5/23 17:11, Backer via lustre-discuss wrote:
Hi,
I am new to this email list. Looking to get some help on why an OST is
not getting mounted.
The cluster was running healthy and the OST experienced an issue and
Linux re-mounted the OST read only. After fixing the issue and rebooting
the node multiple times, it wouldn't mount.
When the mount is done, the mount command errors out stating that that
the index is already in use. The index for the device is 33. There is
no place where this index is mounted.
The debug message from the MGS during the mount is attached at the end
of this email. It is asking to use writeconf. After using writeconfig,
the device was mounted. Looking for a couple of things here.
- I am hoping that the writeconf method is the right thing to do here.
- Why did OST become in this state after the write failure and was
mounted RO. The write error was due to iSCSI target going offline and
coming back after a few seconds later.
20000000:01000000:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg())
updating fs1-OST0021, index=33
20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target())
Process entered
20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index())
Process entered
20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb())
Process entered
20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock())
Process entered
20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock())
Process leaving (rc=0 : 0 : 0)
20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb())
Process leaving (rc=0 : 0 : 0)
20000000:02020000:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index())
140-5: Server fs1-OST0021 requested index 33, but that index is already in
use. Use --writeconf to force
20000000:00000001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index())
Process leaving via out_up (rc=18446744073709551518 : -98 : 0xffffffffffffff9e)
20000000:00000001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target())
Process leaving (rc=18446744073709551518 : -98 : ffffffffffffff9e)
20000000:00020000:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg())
Failed to write fs1-OST0021 log (-98)
20000000:00000001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg())
Process leaving via out (rc=18446744073709551518 : -98 : 0xffffffffffffff9e)
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org