Hi,

There was a couple of Changelog bugs fixed since 2.1 (some of them still recently). There is some flaws in the way llog are designed which could lead you to this state. This reminds me of similar issues we met, but I could not find the corresponding tickets (but there is definitely JIRAs for them)

What is your changelog creation rate?
What's your MDS size?

You probably had too many records created or too many records outstanding in your llog. Do you know what was your highest number of records in the changelog at one time?

Aurélien

Le 22/06/2015 09:54, Carmelo Ponti (CSCS) a écrit :
Dear all

Last weekend we got a strage problem with the changelog in one of our lustre.

Saturday lustre stop to work with the following errors on the MDS:

Jun 20 10:03:06 monchmds01 kernel: LustreError: 97356:0:(llog_cat.c:81:llog_cat_new_log()) no free catalog slots for log... Jun 20 10:03:06 monchmds01 kernel: LustreError: 97356:0:(llog_obd.c:461:llog_obd_origin_add()) write one catalog record
failed: -28
Jun 20 10:03:06 monchmds01 kernel: LustreError: 97331:0:(llog_cat.c:81:llog_cat_new_log()) no free catalog slots for log... Jun 20 10:03:06 monchmds01 kernel: LustreError: 97331:0:(mdd_object.c:1330:mdd_changelog_data_store()) changelog failed:
rc=-28 op17
t[0x20cc50b18:0x1e83:0x0]
Jun 20 10:03:06 monchmds01 kernel: LustreError: 97331:0:(llog_obd.c:461:llog_obd_origin_add()) write one catalog record
failed: -28
Jun 20 10:03:06 monchmds01 kernel: LustreError: 97331:0:(llog_obd.c:461:llog_obd_origin_add()) Skipped 1 previous similar
message
Jun 20 10:03:06 monchmds01 kernel: LustreError: 97331:0:(mdd_object.c:1330:mdd_changelog_data_store()) changelog failed:
rc=-28 op17
t[0x20cc1a400:0x1f9a0:0x0]
Jun 20 10:03:07 monchmds01 kernel: LustreError: 114688:0:(mdd_dir.c:665:mdd_changelog_ns_store()) changelog failed: rc=-28, op6
monchc206_3250911.0 c[0x20cc345f8:0x1f909:0x0] p[0x200156a5e:0x36:0x0]
Jun 20 10:03:07 monchmds01 kernel: LustreError: 120659:0:(mdd_dir.c:665:mdd_changelog_ns_store()) changelog failed: rc=-28, op6
monchc205_3250911.0 c[0x20cc8cff0:0x1b9c:0x0] p[0x200156a5e:0x36:0x0]
Jun 20 10:03:07 monchmds01 kernel: LustreError: 114688:0:(mdd_dir.c:665:mdd_changelog_ns_store()) Skipped 3 previous similar
messages
Jun 20 10:03:07 monchmds01 kernel: LustreError: 16776:0:(mdd_dir.c:747:mdd_changelog_ext_ns_store()) changelog failed:
rc=-28, op8
out24.dcd c[0x20cc8c820:0x8bf1:0x0] p[0x20cc4dc38:0x1cf:0x0]
Jun 20 10:03:07 monchmds01 kernel: Lustre: 16776:0:(cmm_object.c:697:cml_rename_warn()) cml_rename failed for mdo_rename, should revoke: [mo_po [0x20cc4dc38:0x1cf:0x0]] [mo_pn [0x20cc4dc38:0x1cf:0x0]] [lf [0x20cc8c820:0x8bf1:0x0]] [sname out.dcd] [mo_t
NULL]
[tname out24.dcd] [err -14]
...

And the /var/log/messages of RBH servers was full of the following messages:

Jun 18 22:07:40 monchrbh01 kernel: LustreError: 11-0: lnec-MDT0000-mdc-ffff88014558e400: Communicating with 148.187.72.14@o2ib <mailto:148.187.72.14@o2ib>, operation llog_origin_handle_
open failed with -116.
Jun 18 22:07:40 monchrbh01 kernel: LustreError: 89229:0:(llog_cat.c:192:llog_cat_id2handle()) lnec-MDT0000-mdc-ffff88014558e400: error opening log id 0x0:131
5963671:2e9d340b: rc = -116
Jun 18 22:07:40 monchrbh01 kernel: LustreError: 89229:0:(llog_cat.c:565:llog_cat_process_cb()) lnec-MDT0000-mdc-ffff88014558e400: cannot find handle for llog
 0x0:1315963671: -116
Jun 18 22:08:25 monchrbh01 kernel: Lustre: 89281:0:(llog_cat.c:615:llog_cat_process_or_fork()) catlog 0x21800004:1 crosses index zero Jun 18 22:08:25 monchrbh01 kernel: Lustre: 89281:0:(llog_cat.c:615:llog_cat_process_or_fork()) Skipped 557 previous similar messages Jun 18 22:18:26 monchrbh01 kernel: Lustre: 89939:0:(llog_cat.c:615:llog_cat_process_or_fork()) catlog 0x21800004:1 crosses index zero Jun 18 22:18:26 monchrbh01 kernel: Lustre: 89939:0:(llog_cat.c:615:llog_cat_process_or_fork()) Skipped 557 previous similar messages Jun 18 22:28:26 monchrbh01 kernel: Lustre: 90577:0:(llog_cat.c:615:llog_cat_process_or_fork()) catlog 0x21800004:1 crosses index zero

I don't understand why the changelog catalog was full (no free catalog slots for log...), so I would like to know if has anyone had similar problems before?

I'm asking also myself if this is a lustre general problem or if it is related to RBH.

For the moment we de-registered changelog and stop RBH.

Thank you in advance
Carmelo Ponti

Additional information:

MDS lustre version: 2.1.6
MDS OS version: CentOS release 6.4 (Final)

RBH lustre client version: 2.5.4
RBH OS version: CentOS release 6.6 (Final)
RBH version: 2.5.4

--
----------------------------------------------------------------------
Carmelo Ponti           System Engineer
CSCS                    Swiss Center for Scientific Computing
Via Trevano 131         Email: [email protected]
CH-6900 Lugano          http://www.cscs.ch
                         Phone: +41 91 610 82 15/Fax: +41 91 610 82 82
----------------------------------------------------------------------


------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o


_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to