Re: [ceph-users] cephfs kernel client instability
Upgrading to 4.15.0-43-generic fixed the problem. Best, Martin On Fri, Jan 25, 2019 at 9:43 PM Ilya Dryomov wrote: > > On Fri, Jan 25, 2019 at 9:40 AM Martin Palma wrote: > > > > > Do you see them repeating every 30 seconds? > > > > yes: > > > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615] libceph: mon4 > > 10.8.55.203:6789 session lost, hunting for new mon > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737620] libceph: mon3 > > 10.8.55.202:6789 session lost, hunting for new mon > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737728] libceph: mon2 > > 10.8.55.201:6789 session lost, hunting for new mon > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.739711] libceph: mon1 > > 10.7.55.202:6789 session established > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.739899] libceph: mon1 > > 10.7.55.202:6789 session established > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.740015] libceph: mon3 > > 10.8.55.202:6789 session established > > Jan 25 09:34:43 sdccgw01 kernel: [6306819.881560] libceph: mon2 > > 10.8.55.201:6789 session lost, hunting for new mon > > Jan 25 09:34:43 sdccgw01 kernel: [6306819.883730] libceph: mon4 > > 10.8.55.203:6789 session established > > Jan 25 09:34:47 sdccgw01 kernel: [6306823.977566] libceph: mon0 > > 10.7.55.201:6789 session lost, hunting for new mon > > Jan 25 09:34:47 sdccgw01 kernel: [6306823.980033] libceph: mon1 > > 10.7.55.202:6789 session established > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457449] libceph: mon1 > > 10.7.55.202:6789 session lost, hunting for new mon > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457450] libceph: mon3 > > 10.8.55.202:6789 session lost, hunting for new mon > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457612] libceph: mon1 > > 10.7.55.202:6789 session lost, hunting for new mon > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459168] libceph: mon3 > > 10.8.55.202:6789 session established > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459537] libceph: mon4 > > 10.8.55.203:6789 session established > > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459792] libceph: mon4 > > 10.8.55.203:6789 session established > > > > > Which kernel are you running? > > > > Current running kernel is 4.11.0-13-generic (Ubuntu 16.04.5 LTS), and > > the latest that is provided is 4.15.0-43-generic > > Looks like https://tracker.ceph.com/issues/23537 indeed. A kernel > upgrade will fix it. > > Thanks, > > Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Fri, Jan 25, 2019 at 9:40 AM Martin Palma wrote: > > > Do you see them repeating every 30 seconds? > > yes: > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615] libceph: mon4 > 10.8.55.203:6789 session lost, hunting for new mon > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737620] libceph: mon3 > 10.8.55.202:6789 session lost, hunting for new mon > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737728] libceph: mon2 > 10.8.55.201:6789 session lost, hunting for new mon > Jan 25 09:34:37 sdccgw01 kernel: [6306813.739711] libceph: mon1 > 10.7.55.202:6789 session established > Jan 25 09:34:37 sdccgw01 kernel: [6306813.739899] libceph: mon1 > 10.7.55.202:6789 session established > Jan 25 09:34:37 sdccgw01 kernel: [6306813.740015] libceph: mon3 > 10.8.55.202:6789 session established > Jan 25 09:34:43 sdccgw01 kernel: [6306819.881560] libceph: mon2 > 10.8.55.201:6789 session lost, hunting for new mon > Jan 25 09:34:43 sdccgw01 kernel: [6306819.883730] libceph: mon4 > 10.8.55.203:6789 session established > Jan 25 09:34:47 sdccgw01 kernel: [6306823.977566] libceph: mon0 > 10.7.55.201:6789 session lost, hunting for new mon > Jan 25 09:34:47 sdccgw01 kernel: [6306823.980033] libceph: mon1 > 10.7.55.202:6789 session established > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457449] libceph: mon1 > 10.7.55.202:6789 session lost, hunting for new mon > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457450] libceph: mon3 > 10.8.55.202:6789 session lost, hunting for new mon > Jan 25 09:35:07 sdccgw01 kernel: [6306844.457612] libceph: mon1 > 10.7.55.202:6789 session lost, hunting for new mon > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459168] libceph: mon3 > 10.8.55.202:6789 session established > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459537] libceph: mon4 > 10.8.55.203:6789 session established > Jan 25 09:35:07 sdccgw01 kernel: [6306844.459792] libceph: mon4 > 10.8.55.203:6789 session established > > > Which kernel are you running? > > Current running kernel is 4.11.0-13-generic (Ubuntu 16.04.5 LTS), and > the latest that is provided is 4.15.0-43-generic Looks like https://tracker.ceph.com/issues/23537 indeed. A kernel upgrade will fix it. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
> Do you see them repeating every 30 seconds? yes: Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615] libceph: mon4 10.8.55.203:6789 session lost, hunting for new mon Jan 25 09:34:37 sdccgw01 kernel: [6306813.737620] libceph: mon3 10.8.55.202:6789 session lost, hunting for new mon Jan 25 09:34:37 sdccgw01 kernel: [6306813.737728] libceph: mon2 10.8.55.201:6789 session lost, hunting for new mon Jan 25 09:34:37 sdccgw01 kernel: [6306813.739711] libceph: mon1 10.7.55.202:6789 session established Jan 25 09:34:37 sdccgw01 kernel: [6306813.739899] libceph: mon1 10.7.55.202:6789 session established Jan 25 09:34:37 sdccgw01 kernel: [6306813.740015] libceph: mon3 10.8.55.202:6789 session established Jan 25 09:34:43 sdccgw01 kernel: [6306819.881560] libceph: mon2 10.8.55.201:6789 session lost, hunting for new mon Jan 25 09:34:43 sdccgw01 kernel: [6306819.883730] libceph: mon4 10.8.55.203:6789 session established Jan 25 09:34:47 sdccgw01 kernel: [6306823.977566] libceph: mon0 10.7.55.201:6789 session lost, hunting for new mon Jan 25 09:34:47 sdccgw01 kernel: [6306823.980033] libceph: mon1 10.7.55.202:6789 session established Jan 25 09:35:07 sdccgw01 kernel: [6306844.457449] libceph: mon1 10.7.55.202:6789 session lost, hunting for new mon Jan 25 09:35:07 sdccgw01 kernel: [6306844.457450] libceph: mon3 10.8.55.202:6789 session lost, hunting for new mon Jan 25 09:35:07 sdccgw01 kernel: [6306844.457612] libceph: mon1 10.7.55.202:6789 session lost, hunting for new mon Jan 25 09:35:07 sdccgw01 kernel: [6306844.459168] libceph: mon3 10.8.55.202:6789 session established Jan 25 09:35:07 sdccgw01 kernel: [6306844.459537] libceph: mon4 10.8.55.203:6789 session established Jan 25 09:35:07 sdccgw01 kernel: [6306844.459792] libceph: mon4 10.8.55.203:6789 session established > Which kernel are you running? Current running kernel is 4.11.0-13-generic (Ubuntu 16.04.5 LTS), and the latest that is provided is 4.15.0-43-generic Best, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Fri, Jan 25, 2019 at 8:37 AM Martin Palma wrote: > > Hi Ilya, > > thank you for the clarification. After setting the > "osd_map_messages_max" to 10 the io errors and the MDS error > "MDS_CLIENT_LATE_RELEASE" are gone. > > The messages of "mon session lost, hunting for new new mon" didn't go > away... can it be that this is related to > https://tracker.ceph.com/issues/23537 Do you see them repeating every 30 seconds? Which kernel are you running? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
Hi Ilya, thank you for the clarification. After setting the "osd_map_messages_max" to 10 the io errors and the MDS error "MDS_CLIENT_LATE_RELEASE" are gone. The messages of "mon session lost, hunting for new new mon" didn't go away... can it be that this is related to https://tracker.ceph.com/issues/23537 Best, Martin On Thu, Jan 24, 2019 at 10:16 PM Ilya Dryomov wrote: > > On Thu, Jan 24, 2019 at 6:21 PM Andras Pataki > wrote: > > > > Hi Ilya, > > > > Thanks for the clarification - very helpful. > > I've lowered osd_map_messages_max to 10, and this resolves the issue > > about the kernel being unhappy about large messages when the OSDMap > > changes. One comment here though: you mentioned that Luminous uses 40 > > as the default, which is indeed the case. The documentation for > > Luminous (and master), however, says that the default is 100. > > Looks like that page hasn't been kept up to date. I'll fix that > section. > > > > > One other follow-up question on the kernel client about something I've > > been seeing while testing. Does the kernel client clean up when the MDS > > asks due to cache pressure? On a machine I ran something that touches a > > lot of files, so the kernel client accumulated over 4 million caps. > > Many hours after all the activity finished (i.e. many hours after > > anything accesses ceph on that node) the kernel client still holds > > millions of caps, and the MDS periodically complains about clients not > > responding to cache pressure. How is this supposed to be handled? > > Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches > > does a very thorough cleanup, but something in the middle would be better. > > The kernel client sitting on way too many caps for way too long is > a long standing issue. Adding Zheng who has recently been doing some > work to facilitate cap releases and put a limit on the overall cap > count. > > Thanks, > > Ilya > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Thu, Jan 24, 2019 at 6:21 PM Andras Pataki wrote: > > Hi Ilya, > > Thanks for the clarification - very helpful. > I've lowered osd_map_messages_max to 10, and this resolves the issue > about the kernel being unhappy about large messages when the OSDMap > changes. One comment here though: you mentioned that Luminous uses 40 > as the default, which is indeed the case. The documentation for > Luminous (and master), however, says that the default is 100. Looks like that page hasn't been kept up to date. I'll fix that section. > > One other follow-up question on the kernel client about something I've > been seeing while testing. Does the kernel client clean up when the MDS > asks due to cache pressure? On a machine I ran something that touches a > lot of files, so the kernel client accumulated over 4 million caps. > Many hours after all the activity finished (i.e. many hours after > anything accesses ceph on that node) the kernel client still holds > millions of caps, and the MDS periodically complains about clients not > responding to cache pressure. How is this supposed to be handled? > Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches > does a very thorough cleanup, but something in the middle would be better. The kernel client sitting on way too many caps for way too long is a long standing issue. Adding Zheng who has recently been doing some work to facilitate cap releases and put a limit on the overall cap count. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Thu, Jan 24, 2019 at 8:16 PM Martin Palma wrote: > > We are experiencing the same issues on clients with CephFS mounted > using the kernel client and 4.x kernels. > > The problem shows up when we add new OSDs, on reboots after > installing patches and when changing the weight. > > Here the logs of a misbehaving client; > > [6242967.890611] libceph: mon4 10.8.55.203:6789 session established > [6242968.010242] libceph: osd534 10.7.55.23:6814 io error > [6242968.259616] libceph: mon1 10.7.55.202:6789 io error > [6242968.259658] libceph: mon1 10.7.55.202:6789 session lost, hunting > for new mon > [6242968.359031] libceph: mon4 10.8.55.203:6789 session established > [6242968.622692] libceph: osd534 10.7.55.23:6814 io error > [6242968.692274] libceph: mon4 10.8.55.203:6789 io error > [6242968.692337] libceph: mon4 10.8.55.203:6789 session lost, hunting > for new mon > [6242968.694216] libceph: mon0 10.7.55.201:6789 session established > [6242969.099862] libceph: mon0 10.7.55.201:6789 io error > [6242969.099888] libceph: mon0 10.7.55.201:6789 session lost, hunting > for new mon > [6242969.224565] libceph: osd534 10.7.55.23:6814 io error > > Additional to the MON io error we also got some OSD io errors. This isn't surprising -- the kernel client can receive osdmaps from both monitors and OSDs. > > Moreover when the error occurs several clients causes a > "MDS_CLIENT_LATE_RELEASE" error on the MDS server. > > We are currently running on Luminous 12.2.10 and have around 580 OSDs > and 5 monitor nodes. The cluster is running on CentOS 7.6. > > The ‘osd_map_message_max’ setting is set to the default value of 40. > But we are still getting these errors. My advise is the same: set it to 20 or even 10. The problem is this setting is in terms of the number of osdmaps instead of the size of the resulting message. I've filed http://tracker.ceph.com/issues/38040 Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
Hi Ilya, Thanks for the clarification - very helpful. I've lowered osd_map_messages_max to 10, and this resolves the issue about the kernel being unhappy about large messages when the OSDMap changes. One comment here though: you mentioned that Luminous uses 40 as the default, which is indeed the case. The documentation for Luminous (and master), however, says that the default is 100. One other follow-up question on the kernel client about something I've been seeing while testing. Does the kernel client clean up when the MDS asks due to cache pressure? On a machine I ran something that touches a lot of files, so the kernel client accumulated over 4 million caps. Many hours after all the activity finished (i.e. many hours after anything accesses ceph on that node) the kernel client still holds millions of caps, and the MDS periodically complains about clients not responding to cache pressure. How is this supposed to be handled? Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches does a very thorough cleanup, but something in the middle would be better. Andras On 1/16/19 1:45 PM, Ilya Dryomov wrote: On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki wrote: Hi Ilya/Kjetil, I've done some debugging and tcpdump-ing to see what the interaction between the kernel client and the mon looks like. Indeed - CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon messages for our cluster (with osd_mon_messages_max at 100). We have about 3500 osd's, and the kernel advertises itself as older than This is too big, especially for a fairly large cluster such as yours. The default was reduced to 40 in luminous. Given about 3500 OSDs, you might want to set it to 20 or even 10. Luminous, so it gets full map updates. The FRONT message size on the wire I saw was over 24Mb. I'll try setting osd_mon_messages_max to 30 and do some more testing, but from the debugging it definitely seems like the issue. Is the kernel driver really not up to date to be considered at least a Luminous client by the mon (i.e. it has some feature really missing)? I looked at the bits, and the MON seems to want is bit 59 in ceph features shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2. Can the kernel client be used when setting require-min-compat to luminous (either with the 4.19.x kernel or the Redhat/Centos 7.6 kernel)? Some background here would be helpful. Yes, the kernel client is missing support for that feature bit, however 4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to luminous. See http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
We are experiencing the same issues on clients with CephFS mounted using the kernel client and 4.x kernels. The problem shows up when we add new OSDs, on reboots after installing patches and when changing the weight. Here the logs of a misbehaving client; [6242967.890611] libceph: mon4 10.8.55.203:6789 session established [6242968.010242] libceph: osd534 10.7.55.23:6814 io error [6242968.259616] libceph: mon1 10.7.55.202:6789 io error [6242968.259658] libceph: mon1 10.7.55.202:6789 session lost, hunting for new mon [6242968.359031] libceph: mon4 10.8.55.203:6789 session established [6242968.622692] libceph: osd534 10.7.55.23:6814 io error [6242968.692274] libceph: mon4 10.8.55.203:6789 io error [6242968.692337] libceph: mon4 10.8.55.203:6789 session lost, hunting for new mon [6242968.694216] libceph: mon0 10.7.55.201:6789 session established [6242969.099862] libceph: mon0 10.7.55.201:6789 io error [6242969.099888] libceph: mon0 10.7.55.201:6789 session lost, hunting for new mon [6242969.224565] libceph: osd534 10.7.55.23:6814 io error Additional to the MON io error we also got some OSD io errors. Moreover when the error occurs several clients causes a "MDS_CLIENT_LATE_RELEASE" error on the MDS server. We are currently running on Luminous 12.2.10 and have around 580 OSDs and 5 monitor nodes. The cluster is running on CentOS 7.6. The ‘osd_map_message_max’ setting is set to the default value of 40. But we are still getting these errors. Best, Martin On Wed, Jan 16, 2019 at 7:46 PM Ilya Dryomov wrote: > > On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki > wrote: > > > > Hi Ilya/Kjetil, > > > > I've done some debugging and tcpdump-ing to see what the interaction > > between the kernel client and the mon looks like. Indeed - > > CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon > > messages for our cluster (with osd_mon_messages_max at 100). We have > > about 3500 osd's, and the kernel advertises itself as older than > > This is too big, especially for a fairly large cluster such as yours. > The default was reduced to 40 in luminous. Given about 3500 OSDs, you > might want to set it to 20 or even 10. > > > Luminous, so it gets full map updates. The FRONT message size on the > > wire I saw was over 24Mb. I'll try setting osd_mon_messages_max to 30 > > and do some more testing, but from the debugging it definitely seems > > like the issue. > > > > Is the kernel driver really not up to date to be considered at least a > > Luminous client by the mon (i.e. it has some feature really missing)? I > > looked at the bits, and the MON seems to want is bit 59 in ceph features > > shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2. Can the kernel client be > > used when setting require-min-compat to luminous (either with the 4.19.x > > kernel or the Redhat/Centos 7.6 kernel)? Some background here would be > > helpful. > > Yes, the kernel client is missing support for that feature bit, however > 4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to > luminous. See > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html > > Thanks, > > Ilya > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki wrote: > > Hi Ilya/Kjetil, > > I've done some debugging and tcpdump-ing to see what the interaction > between the kernel client and the mon looks like. Indeed - > CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon > messages for our cluster (with osd_mon_messages_max at 100). We have > about 3500 osd's, and the kernel advertises itself as older than This is too big, especially for a fairly large cluster such as yours. The default was reduced to 40 in luminous. Given about 3500 OSDs, you might want to set it to 20 or even 10. > Luminous, so it gets full map updates. The FRONT message size on the > wire I saw was over 24Mb. I'll try setting osd_mon_messages_max to 30 > and do some more testing, but from the debugging it definitely seems > like the issue. > > Is the kernel driver really not up to date to be considered at least a > Luminous client by the mon (i.e. it has some feature really missing)? I > looked at the bits, and the MON seems to want is bit 59 in ceph features > shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2. Can the kernel client be > used when setting require-min-compat to luminous (either with the 4.19.x > kernel or the Redhat/Centos 7.6 kernel)? Some background here would be > helpful. Yes, the kernel client is missing support for that feature bit, however 4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to luminous. See http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
Hi Ilya/Kjetil, I've done some debugging and tcpdump-ing to see what the interaction between the kernel client and the mon looks like. Indeed - CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon messages for our cluster (with osd_mon_messages_max at 100). We have about 3500 osd's, and the kernel advertises itself as older than Luminous, so it gets full map updates. The FRONT message size on the wire I saw was over 24Mb. I'll try setting osd_mon_messages_max to 30 and do some more testing, but from the debugging it definitely seems like the issue. Is the kernel driver really not up to date to be considered at least a Luminous client by the mon (i.e. it has some feature really missing)? I looked at the bits, and the MON seems to want is bit 59 in ceph features shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2. Can the kernel client be used when setting require-min-compat to luminous (either with the 4.19.x kernel or the Redhat/Centos 7.6 kernel)? Some background here would be helpful. Thanks for your help and suggestions! Andras On 1/16/19 5:20 AM, Ilya Dryomov wrote: On Wed, Jan 16, 2019 at 1:27 AM Kjetil Joergensen wrote: Hi, you could try reducing "osd map message max", some code paths that end up as -EIO (kernel: libceph: mon1 *** io error) is exceeding include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN. This "worked for us" - YMMV. Kjetil, how big is your cluster? Do you remember the circumstances of when you started seeing these errors? Andras, please let us know if this resolves the issue. Decreasing "osd map message max" for large clusters can help with the overall memory consumption and is probably a good idea in general, but then these kernel client limits are pretty arbitrary, so we can look at bumping them. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
On Wed, Jan 16, 2019 at 1:27 AM Kjetil Joergensen wrote: > > Hi, > > you could try reducing "osd map message max", some code paths that end up as > -EIO (kernel: libceph: mon1 *** io error) is exceeding > include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN. > > This "worked for us" - YMMV. Kjetil, how big is your cluster? Do you remember the circumstances of when you started seeing these errors? Andras, please let us know if this resolves the issue. Decreasing "osd map message max" for large clusters can help with the overall memory consumption and is probably a good idea in general, but then these kernel client limits are pretty arbitrary, so we can look at bumping them. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
Hi, you could try reducing "osd map message max", some code paths that end up as -EIO (kernel: libceph: mon1 *** io error) is exceeding include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN. This "worked for us" - YMMV. -KJ On Tue, Jan 15, 2019 at 6:14 AM Andras Pataki wrote: > An update on our cephfs kernel client troubles. After doing some heavier > testing with a newer kernel 4.19.13, it seems like it also gets into a bad > state when it can't connect to monitors (all back end processes are on > 12.2.8): > > Jan 15 08:49:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session > established > Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error > Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789 session > lost, hunting for new mon > Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session > established > Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error > Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session > lost, hunting for new mon > Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 session > established > Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error > Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 session > lost, hunting for new mon > Jan 15 08:49:06 mon5 kernel: libceph: mon2 10.128.150.12:6789 session > established > Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error > Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789 session > lost, hunting for new mon > ... repeating forever ... > > # uname -r > 4.19.13 > > and on the mon node (10.128.150.10) at log level 20, I see that it is > building/encoding a lot of maps (10.128.36.18 is the client in question): > 2019-01-15 08:49:01.355017 7fffee40c700 10 mon.cephmon00@0(leader) e40 > _ms_dispatch new session 0x62dc6c00 MonSession(client.36800361 > 10.128.36.18:0/2911716500 is open , features 0x27018fb86aa42ada (jewel)) > features 0x27018fb86aa42ada > 2019-01-15 08:49:01.355021 7fffee40c700 20 mon.cephmon00@0(leader) e40 > caps > 2019-01-15 08:49:01.355026 7fffee40c700 10 mon.cephmon00@0(leader).auth > v58457 preprocess_query auth(proto 0 34 bytes epoch 0) from client.36800361 > 10.128.36.18:0/2911716500 > > 2019-01-15 08:49:01.355817 7fffee40c700 10 mon.cephmon00@0(leader).osd > e1254390 check_osdmap_sub 0x65373340 next 1254102 (onetime) > 2019-01-15 08:49:01.355819 7fffee40c700 5 mon.cephmon00@0(leader).osd > e1254390 send_incremental [1254102..1254390] to client.36800361 > 10.128.36.18:0/2911716500 > 2019-01-15 08:49:01.355821 7fffee40c700 10 mon.cephmon00@0(leader).osd > e1254390 build_incremental [1254102..1254141] with features 27018fb86aa42ada > 2019-01-15 08:49:01.364859 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254141 with features 504412504116439552 > 2019-01-15 08:49:01.372131 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 build_incrementalinc 1254141 1237271 bytes > 2019-01-15 08:49:01.372180 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254140 with features 504412504116439552 > 2019-01-15 08:49:01.372187 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 build_incrementalinc 1254140 260 bytes > 2019-01-15 08:49:01.380981 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254139 with features 504412504116439552 > 2019-01-15 08:49:01.387983 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 build_incrementalinc 1254139 1237351 bytes > 2019-01-15 08:49:01.388043 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254138 with features 504412504116439552 > 2019-01-15 08:49:01.388049 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 build_incrementalinc 1254138 232 bytes > 2019-01-15 08:49:01.396781 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254137 with features 504412504116439552 > ... a lot more of similar messages > 2019-01-15 08:49:04.210936 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 reencode_incremental_map 1254382 with features 504412504116439552 > 2019-01-15 08:49:04.211032 7fffee40c700 20 mon.cephmon00@0(leader).osd > e1254390 build_incrementalinc 1254382 232 bytes > 2019-01-15 08:49:04.211066 7fffee40c700 10 mon.cephmon00@0(leader) e40 > ms_handle_reset 0x6450f800 10.128.36.18:0/2911716500 > 2019-01-15 08:49:04.211070 7fffee40c700 10 mon.cephmon00@0(leader) e40 > reset/close on session client.36800361 10.128.36.18:0/2911716500 > 2019-01-15 08:49:04.211073 7fffee40c700 10 mon.cephmon00@0(leader) e40 > remove_session 0x62dc6c00 client.36800361 10.128.36.18:0/2911716500 > features 0x27018fb86aa42ada > > looks like the client disconnects (either for waiting too long, or for > some protocol error?). Any hints on why so many maps need to be reencoded > (to jewel), or how to improve this behavior would be much appreciated.
Re: [ceph-users] cephfs kernel client instability
An update on our cephfs kernel client troubles. After doing some heavier testing with a newer kernel 4.19.13, it seems like it also gets into a bad state when it can't connect to monitors (all back end processes are on 12.2.8): Jan 15 08:49:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session established Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Jan 15 08:49:06 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon ... repeating forever ... # uname -r 4.19.13 and on the mon node (10.128.150.10) at log level 20, I see that it is building/encoding a lot of maps (10.128.36.18 is the client in question): 2019-01-15 08:49:01.355017 7fffee40c700 10 mon.cephmon00@0(leader) e40 _ms_dispatch new session 0x62dc6c00 MonSession(client.36800361 10.128.36.18:0/2911716500 is open , features 0x27018fb86aa42ada (jewel)) features 0x27018fb86aa42ada 2019-01-15 08:49:01.355021 7fffee40c700 20 mon.cephmon00@0(leader) e40 caps 2019-01-15 08:49:01.355026 7fffee40c700 10 mon.cephmon00@0(leader).auth v58457 preprocess_query auth(proto 0 34 bytes epoch 0) from client.36800361 10.128.36.18:0/2911716500 2019-01-15 08:49:01.355817 7fffee40c700 10 mon.cephmon00@0(leader).osd e1254390 check_osdmap_sub 0x65373340 next 1254102 (onetime) 2019-01-15 08:49:01.355819 7fffee40c700 5 mon.cephmon00@0(leader).osd e1254390 send_incremental [1254102..1254390] to client.36800361 10.128.36.18:0/2911716500 2019-01-15 08:49:01.355821 7fffee40c700 10 mon.cephmon00@0(leader).osd e1254390 build_incremental [1254102..1254141] with features 27018fb86aa42ada 2019-01-15 08:49:01.364859 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254141 with features 504412504116439552 2019-01-15 08:49:01.372131 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 build_incremental inc 1254141 1237271 bytes 2019-01-15 08:49:01.372180 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254140 with features 504412504116439552 2019-01-15 08:49:01.372187 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 build_incremental inc 1254140 260 bytes 2019-01-15 08:49:01.380981 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254139 with features 504412504116439552 2019-01-15 08:49:01.387983 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 build_incremental inc 1254139 1237351 bytes 2019-01-15 08:49:01.388043 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254138 with features 504412504116439552 2019-01-15 08:49:01.388049 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 build_incremental inc 1254138 232 bytes 2019-01-15 08:49:01.396781 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254137 with features 504412504116439552 ... a lot more of similar messages 2019-01-15 08:49:04.210936 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map 1254382 with features 504412504116439552 2019-01-15 08:49:04.211032 7fffee40c700 20 mon.cephmon00@0(leader).osd e1254390 build_incremental inc 1254382 232 bytes 2019-01-15 08:49:04.211066 7fffee40c700 10 mon.cephmon00@0(leader) e40 ms_handle_reset 0x6450f800 10.128.36.18:0/2911716500 2019-01-15 08:49:04.211070 7fffee40c700 10 mon.cephmon00@0(leader) e40 reset/close on session client.36800361 10.128.36.18:0/2911716500 2019-01-15 08:49:04.211073 7fffee40c700 10 mon.cephmon00@0(leader) e40 remove_session 0x62dc6c00 client.36800361 10.128.36.18:0/2911716500 features 0x27018fb86aa42ada looks like the client disconnects (either for waiting too long, or for some protocol error?). Any hints on why so many maps need to be reencoded (to jewel), or how to improve this behavior would be much appreciated. We would really be interested in using the kernel client instead of fuse, but this seems to be a stumbling block. Thanks, Andras On 1/3/19 6:49 AM, Andras Pataki wrote: I wonder if anyone could offer any insight on the issue below, regarding the CentOS 7.6 kernel cephfs client connecting to a Luminous cluster. I have since tried a much newer 4.19.13 kernel, which did not show the same issue (but
Re: [ceph-users] cephfs kernel client instability
I wonder if anyone could offer any insight on the issue below, regarding the CentOS 7.6 kernel cephfs client connecting to a Luminous cluster. I have since tried a much newer 4.19.13 kernel, which did not show the same issue (but unfortunately for various reasons unrelated to ceph, we can't go to such a new kernel). Am I reading it right that somehow the monitor thinks this kernel is old and needs to prepare special maps in some older format for it, and that takes too long and the kernel just gives up, or perhaps has some other communication protocol error? It seems like one of these mon communication sessions only lasts half a second. Then it reconnects to another mon, and gets the same result, etc. Any way around this? Andras On 12/26/18 7:55 PM, Andras Pataki wrote: We've been using ceph-fuse with a pretty good stability record (against the Luminous 12.2.8 back end). Unfortunately ceph-fuse has extremely poor small file performance (understandably), so we've been testing the kernel client. The latest RedHat kernel 3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as the cluster is running in a completely clean state. However, it seems that as soon as there is something happening to the cluster, the kernel client crashes pretty badly. Today's example: I've reweighted some OSDs to balance the disk usage a bit (set nobackfill, reweight the OSDs, check the new hypothetical space usage, then unset nobackfill). As soon as the reweighting procedure started, the kernel client went into an infinite loop trying to unsuccessfully connect to mons: Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:28:59 mon5 kernel: libceph: mon0 10.128.150.10:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session established Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon ... etc ... seemingly never recovering. The cluster is healthy, all other clients are successfully doing I/O: [root@cephmon00 ceph]# ceph -s cluster: id: d7b33135-0940-4e48-8aa6-1d2026597c2f health: HEALTH_WARN noout flag(s) set 1 backfillfull osd(s) 4 pool(s) backfillfull 239119058/12419244975 objects misplaced (1.925%) services: mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02 mgr: cephmon00(active) mds: cephfs-1/1/1 up {0=cephmds00=up:active}, 1 up:standby osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs flags noout