So, did you do the "writeconf"? And the OST mounted afterwards?

As I understand, the MGS was under the impression that this re-mounting OST was actually a new one using an old index.
So, what made your repaired OST look new/different ?
I would probably have mounted it locally, as an ext4 file system, if only to check that there is data still present (ok, "df" would do that, too). "tunefs.lustre --dryrun" will show other quantum numbers that _should not_ change when taking down and remounting an OST.

And since "writeconf" has to be done on all targets, you have to take down your MDS anyhow - so nothing is lost by simply trying an MDS restart?

Regards
Thomas

On 11/5/23 17:11, Backer via lustre-discuss wrote:
Hi,

I am new to this email list. Looking to get some help on why an OST is not getting mounted.


The cluster was running healthy and the OST experienced an issue and Linux re-mounted the OST read only. After fixing the issue and rebooting the node multiple times, it wouldn't mount.

When the mount is done, the mount command errors out stating that that the index is already in use. The index for the device is 33.  There is no place where this index is mounted.

The debug message from the MGS during the mount is attached at the end of this email. It is asking to use writeconf. After using writeconfig, the device was mounted. Looking for a couple of things here.

- I am hoping that the writeconf method is the right thing to do here.
- Why did OST become in this state after the write failure and was mounted RO.  The write error was due to iSCSI target going offline and coming back after a few seconds later.

20000000:01000000:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg())
 updating fs1-OST0021, index=33

20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target())
 Process entered

20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index())
 Process entered

20000000:00000001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb())
 Process entered

20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock())
 Process entered

20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock())
 Process leaving (rc=0 : 0 : 0)

20000000:00000001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb())
 Process leaving (rc=0 : 0 : 0)

20000000:02020000:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index())
 140-5: Server fs1-OST0021 requested index 33, but that index is already in 
use. Use --writeconf to force

20000000:00000001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index())
 Process leaving via out_up (rc=18446744073709551518 : -98 : 0xffffffffffffff9e)

20000000:00000001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target())
 Process leaving (rc=18446744073709551518 : -98 : ffffffffffffff9e)

20000000:00020000:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg())
 Failed to write fs1-OST0021 log (-98)

20000000:00000001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg())
 Process leaving via out (rc=18446744073709551518 : -98 : 0xffffffffffffff9e)




_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to