Re: [ceph-users] RGW: ERROR: failed to distribute cache
> Op 6 november 2017 om 20:17 schreef Yehuda Sadeh-Weinraub : > > > On Mon, Nov 6, 2017 at 7:29 AM, Wido den Hollander wrote: > > Hi, > > > > On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the > > same time I see these errors in the RGW logs: > > > > 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute > > cache for > > gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20 > > 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute > > cache for gn1-pf.rgw.data.root:X > > 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute > > cache for > > gn1-pf.rgw.meta:.meta:bucket.instance:X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyX6EEIXxCD5Cws:1 > > 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute > > cache for > > gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32 > > > > I see one message from a year ago: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html > > > > The setup has two RGWs running: > > > > - ceph-rgw1 > > - ceph-rgw2 > > > > While trying to figure this out I see that a "radosgw-admin period pull" > > hangs for ever. > > > > I don't know if that is related, but it's something I've noticed. > > > > Mainly I see that at random times the RGW stalls for about 30 seconds and > > while that happens these messages show up in the RGW's log. > > > > do you happen to know if there's a dynamic resharding happening? The > dynamic resharding should only affect the writes to the specific > bucket, and should not affect cache distribution though. Originally I > thought it could be HUP signal related issue, but that seem to be > fixed in 12.2.1. > No, it doesn't seem to be that way: $ radosgw-admin reshard list That's empty. Looking at the logs I see this happening: 2017-11-07 09:45:12.147335 7f985b34f700 10 cache put: name=gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 info.flags=0x17 2017-11-07 09:45:12.147357 7f985b34f700 10 adding gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 to cache LRU end 2017-11-07 09:45:12.147364 7f985b34f700 10 updating xattr: name=user.rgw.acl bl.length()=155 2017-11-07 09:45:12.147376 7f985b34f700 10 distributing notification oid=notify.6 bl.length()=708 2017-11-07 09:45:22.148361 7f985b34f700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 2017-11-07 09:45:22.150273 7f985b34f700 10 cache put: name=gn1-pf.rgw.meta++.meta:bucket:XXX-mon-bucket:_iaUdq4vufCpgnMlapZCm169:1 info.flags=0x17 2017-11-07 09:45:22.150283 7f985b34f700 10 adding gn1-pf.rgw.meta++.meta:bucket:XXX-mon-bucket:_iaUdq4vufCpgnMlapZCm169:1 to cache LRU end 2017-11-07 09:45:22.150291 7f985b34f700 10 distributing notification oid=notify.1 bl.length()=407 2017-11-07 09:45:31.881703 7f985b34f700 10 cache put: name=gn1-pf.rgw.data.root++XXX-mon-bucket info.flags=0x17 2017-11-07 09:45:31.881720 7f985b34f700 10 moving gn1-pf.rgw.data.root++XXX-mon-bucket to cache LRU end 2017-11-07 09:45:31.881733 7f985b34f700 10 distributing notification oid=notify.1 bl.length()=372 As you can see, for OID 'gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9' the cache notify failed, but for 'gn1-pf.rgw.data.root++XXX-mon-bucket' it went just fine. Skimming through the logs I see that notifies fail when one of these objects is used: - notify.4 - notify.6 In total there are 8 notify objects in the 'control' pool: - notify.0 - notify.1 - notify.2 - notify.3 - notify.4 - notify.5 - notify.6 - notify.7 I don't know if that's something which might relate to it. I created this issue in the tracker: http://tracker.ceph.com/issues/22060 Wido > Yehuda > > > Is anybody else running into this issue? > > > > Wido > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: ERROR: failed to distribute cache
On Mon, Nov 6, 2017 at 7:29 AM, Wido den Hollander wrote: > Hi, > > On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the > same time I see these errors in the RGW logs: > > 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute cache > for > gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20 > 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute cache > for gn1-pf.rgw.data.root:X > 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute cache > for > gn1-pf.rgw.meta:.meta:bucket.instance:X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyX6EEIXxCD5Cws:1 > 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute cache > for > gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32 > > I see one message from a year ago: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html > > The setup has two RGWs running: > > - ceph-rgw1 > - ceph-rgw2 > > While trying to figure this out I see that a "radosgw-admin period pull" > hangs for ever. > > I don't know if that is related, but it's something I've noticed. > > Mainly I see that at random times the RGW stalls for about 30 seconds and > while that happens these messages show up in the RGW's log. > do you happen to know if there's a dynamic resharding happening? The dynamic resharding should only affect the writes to the specific bucket, and should not affect cache distribution though. Originally I thought it could be HUP signal related issue, but that seem to be fixed in 12.2.1. Yehuda > Is anybody else running into this issue? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: ERROR: failed to distribute cache
I see this once on both my RGW's today: rgw01: 2017-11-06 10:36:35.070068 7f4a4f300700 0 ERROR: failed to distribute cache for default.rgw.meta:.meta:bucket.instance:XXX/YYY:ZZZ.30636654.1::0 2017-11-06 10:36:45.139068 7f4a4f300700 0 ERROR: failed to distribute cache for default.rgw.data.root:.bucket.meta.XXX:YYY:ZZZ.30636654.1 rgw02: 2017-11-06 10:38:29.606736 7f2463658700 0 ERROR: failed to distribute cache for default.rgw.meta:.meta:bucket.instance:XXX/YYY:ZZZ.30636741.1::0 2017-11-06 10:38:39.647266 7f2463658700 0 ERROR: failed to distribute cache for default.rgw.data.root:.bucket.meta.XXX:YYY:ZZZ.30636741.1 Not sure if it's a coincidence, but it is the bucket that should be dynamically reindexed for resharding, which is broken (Issue #22046) Met vriendelijke groeten, -- Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK: 61527076 | http://www.tuxis.nl/ T: 0318 200208 | i...@tuxis.nl Van: Wido den Hollander Aan: Verzonden: 6-11-2017 16:29 Onderwerp: [ceph-users] RGW: ERROR: failed to distribute cache Hi, On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the same time I see these errors in the RGW logs: 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:X 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.meta:.meta:bucket.instance:X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyX6EEIXxCD5Cws:1 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32 I see one message from a year ago: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html The setup has two RGWs running: - ceph-rgw1 - ceph-rgw2 While trying to figure this out I see that a "radosgw-admin period pull" hangs for ever. I don't know if that is related, but it's something I've noticed. Mainly I see that at random times the RGW stalls for about 30 seconds and while that happens these messages show up in the RGW's log. Is anybody else running into this issue? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: Electronic Signature S/MIME ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW: ERROR: failed to distribute cache
Hi, On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the same time I see these errors in the RGW logs: 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:X 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.meta:.meta:bucket.instance:X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyX6EEIXxCD5Cws:1 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.X:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32 I see one message from a year ago: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html The setup has two RGWs running: - ceph-rgw1 - ceph-rgw2 While trying to figure this out I see that a "radosgw-admin period pull" hangs for ever. I don't know if that is related, but it's something I've noticed. Mainly I see that at random times the RGW stalls for about 30 seconds and while that happens these messages show up in the RGW's log. Is anybody else running into this issue? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: ERROR: failed to distribute cache
BTW, I have 10 RGW load balanced through Apache. When restarting one of them I get the following messages in log: 2016-06-14 14:44:15.919801 7fd4728dea40 2 all 8 watchers are set, enabling cache 2016-06-14 14:44:15.919879 7fce370f7700 2 garbage collection: start 2016-06-14 14:44:15.919990 7fce368f6700 2 object expiration: start 2016-06-14 14:44:15.920534 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.15 2016-06-14 14:44:15.921257 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.16 2016-06-14 14:44:15.922145 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.17 2016-06-14 14:44:15.923772 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.18 2016-06-14 14:44:15.924557 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.19 2016-06-14 14:44:15.925400 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.20 2016-06-14 14:44:15.926349 7fd4728dea40 0 starting handler: fastcgi 2016-06-14 14:44:15.927125 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.21 2016-06-14 14:44:15.927897 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.22 2016-06-14 14:44:15.928412 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.23 2016-06-14 14:44:15.929042 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.24 2016-06-14 14:44:15.930752 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.25 2016-06-14 14:44:15.931313 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.26 2016-06-14 14:44:15.932482 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.27 2016-06-14 14:44:15.933237 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.28 2016-06-14 14:44:15.934097 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.29 2016-06-14 14:44:15.934660 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.30 2016-06-14 14:44:15.936322 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.31 2016-06-14 14:44:15.936979 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.0 2016-06-14 14:44:15.937559 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.1 2016-06-14 14:44:15.938222 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.2 2016-06-14 14:44:15.939000 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.3 2016-06-14 14:44:15.939622 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.4 2016-06-14 14:44:15.940135 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.5 2016-06-14 14:44:15.940669 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.6 2016-06-14 14:44:15.941227 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.7 2016-06-14 14:44:15.941854 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.8 2016-06-14 14:44:15.942333 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.9 2016-06-14 14:44:15.943036 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.10 2016-06-14 14:44:15.944708 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.11 2016-06-14 14:44:15.946347 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.12 2016-06-14 14:44:15.947001 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.13 2016-06-14 14:44:15.947610 7fce370f7700 0 RGWGC::process() failed to acquire lock on gc.14 2016-06-14 14:44:15.947615 7fce370f7700 2 garbage collection: stop 2016-06-14 14:44:15.947949 7fd4728dea40 -1 rgw realm watcher: Failed to watch realms.87abf44e-cab3-48c4-b012-0a9247519a5b.control with (2) No such file or directory 2016-06-14 14:44:15.948370 7fd4728dea40 -1 rgw realm watcher: Failed to establish a watch on RGWRealm, disabling dynamic reconfiguration. 2016-06-14 17:34 GMT+03:00 Василий Ангапов : > I also get the following: > > $ radosgw-admin period update --commit > 2016-06-14 14:32:28.982847 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging > 2016-06-14 14:32:38.991846 7fed392baa40 0 ERROR: failed to distribute > cache for > .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch > 2016-06-14 14:32:49.002380 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3 > 2016-06-14 14:32:59.013307 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch > 2016-06-14 14:33:09.023554 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch > 2016-06-14 14:33:19.034593 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc > 2016-06-14 14:33:29.043825 7fed392baa40 0 ERROR: failed to distribute > cache for .rgw.root:zonegroups_names.ed > 2016-06-14 14:33:29.046386 7fed392baa40 0 Realm notify failed with -2 > { > "id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f", > "epoch": 3, >
Re: [ceph-users] RGW: ERROR: failed to distribute cache
I also get the following: $ radosgw-admin period update --commit 2016-06-14 14:32:28.982847 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging 2016-06-14 14:32:38.991846 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.87abf44e-cab3-48c4-b012-0a9247519a5b:staging.latest_epoch 2016-06-14 14:32:49.002380 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.3 2016-06-14 14:32:59.013307 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch 2016-06-14 14:33:09.023554 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:periods.af0b6743-82ba-4517-bd51-36bdfbe48f9f.latest_epoch 2016-06-14 14:33:19.034593 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:zonegroup_info.bef0aa4e-6670-4c39-8520-ee51140424cc 2016-06-14 14:33:29.043825 7fed392baa40 0 ERROR: failed to distribute cache for .rgw.root:zonegroups_names.ed 2016-06-14 14:33:29.046386 7fed392baa40 0 Realm notify failed with -2 { "id": "af0b6743-82ba-4517-bd51-36bdfbe48f9f", "epoch": 3, "predecessor_uuid": "f2645d83-b1b4-4045-bf26-2b762c71937b", "sync_status": [ "", "", 2016-06-14 17:12 GMT+03:00 Василий Ангапов : > Hello, > > I have Ceph 10.2.1 and when creating user in RGW I get the following error: > > $ radosgw-admin user create --uid=test --display-name="test" > 2016-06-14 14:07:32.332288 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1 > 2016-06-14 14:07:42.338251 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.users.uid:test > 2016-06-14 14:07:52.362768 7f00a4487a40 0 ERROR: failed to distribute > cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75 > { > "user_id": "test", > "display_name": "test", > "email": "", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [], > "keys": [ > { > "user": "melesta", > "access_key": "***", > "secret_key": "***" > } > ], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": [], > "bucket_quota": { > "enabled": false, > "max_size_kb": -1, > "max_objects": -1 > }, > "user_quota": { > "enabled": false, > "max_size_kb": -1, > "max_objects": -1 > }, > "temp_url_keys": [] > } > > What does it mean? Is something wrong? > > Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW: ERROR: failed to distribute cache
Hello, I have Ceph 10.2.1 and when creating user in RGW I get the following error: $ radosgw-admin user create --uid=test --display-name="test" 2016-06-14 14:07:32.332288 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.meta:.meta:user:test:_dW3fzQ3UX222SWQvr3qeHYR:1 2016-06-14 14:07:42.338251 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.users.uid:test 2016-06-14 14:07:52.362768 7f00a4487a40 0 ERROR: failed to distribute cache for ed-1.rgw.users.keys:3J7DOREPC0ZLVFTMIW75 { "user_id": "test", "display_name": "test", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "melesta", "access_key": "***", "secret_key": "***" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "temp_url_keys": [] } What does it mean? Is something wrong? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com