Re: [ceph-users] Large OMAP object in RGW GC pool
On 6/11/19 9:48 PM, J. Eric Ivancich wrote: > Hi Wido, > > Interleaving below > > On 6/11/19 3:10 AM, Wido den Hollander wrote: >> >> I thought it was resolved, but it isn't. >> >> I counted all the OMAP values for the GC objects and I got back: >> >> gc.0: 0 >> gc.11: 0 >> gc.14: 0 >> gc.15: 0 >> gc.16: 0 >> gc.18: 0 >> gc.19: 0 >> gc.1: 0 >> gc.20: 0 >> gc.21: 0 >> gc.22: 0 >> gc.23: 0 >> gc.24: 0 >> gc.25: 0 >> gc.27: 0 >> gc.29: 0 >> gc.2: 0 >> gc.30: 0 >> gc.3: 0 >> gc.4: 0 >> gc.5: 0 >> gc.6: 0 >> gc.7: 0 >> gc.8: 0 >> gc.9: 0 >> gc.13: 110996 >> gc.10: 04 >> gc.26: 42 >> gc.28: 111292 >> gc.17: 111314 >> gc.12: 111534 >> gc.31: 111956 > > Casey Bodley mentioned to me that he's seen similar behavior to what > you're describing when RGWs are upgraded but not all OSDs are upgraded > as well. Is it possible that the OSDs hosting gc.13, gc.10, and so forth > are running a different version of ceph? > Yes, the OSDs are still on 13.2.5. As this is a big (2500 OSD) production environment we only created a temporary machine with 13.2.6 (just a few hours before it's release) to run the GC. We did not upgrade the cluster itself as we will have to wait with that before we have validated the release on the testing cluster before. Wido > Eric > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
Hi Wido, Interleaving below On 6/11/19 3:10 AM, Wido den Hollander wrote: > > I thought it was resolved, but it isn't. > > I counted all the OMAP values for the GC objects and I got back: > > gc.0: 0 > gc.11: 0 > gc.14: 0 > gc.15: 0 > gc.16: 0 > gc.18: 0 > gc.19: 0 > gc.1: 0 > gc.20: 0 > gc.21: 0 > gc.22: 0 > gc.23: 0 > gc.24: 0 > gc.25: 0 > gc.27: 0 > gc.29: 0 > gc.2: 0 > gc.30: 0 > gc.3: 0 > gc.4: 0 > gc.5: 0 > gc.6: 0 > gc.7: 0 > gc.8: 0 > gc.9: 0 > gc.13: 110996 > gc.10: 04 > gc.26: 42 > gc.28: 111292 > gc.17: 111314 > gc.12: 111534 > gc.31: 111956 Casey Bodley mentioned to me that he's seen similar behavior to what you're describing when RGWs are upgraded but not all OSDs are upgraded as well. Is it possible that the OSDs hosting gc.13, gc.10, and so forth are running a different version of ceph? Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
On 6/4/19 8:00 PM, J. Eric Ivancich wrote: > On 6/4/19 7:37 AM, Wido den Hollander wrote: >> I've set up a temporary machine next to the 13.2.5 cluster with the >> 13.2.6 packages from Shaman. >> >> On that machine I'm running: >> >> $ radosgw-admin gc process >> >> That seems to work as intended! So the PR seems to have fixed it. >> >> Should be fixed permanently when 13.2.6 is officially released. >> >> Wido > > Thank you, Wido, for sharing the results of your experiment. I'm happy > to learn that it was successful. And v13.2.6 was just released about 2 > hours ago. > I thought it was resolved, but it isn't. I counted all the OMAP values for the GC objects and I got back: gc.0: 0 gc.11: 0 gc.14: 0 gc.15: 0 gc.16: 0 gc.18: 0 gc.19: 0 gc.1: 0 gc.20: 0 gc.21: 0 gc.22: 0 gc.23: 0 gc.24: 0 gc.25: 0 gc.27: 0 gc.29: 0 gc.2: 0 gc.30: 0 gc.3: 0 gc.4: 0 gc.5: 0 gc.6: 0 gc.7: 0 gc.8: 0 gc.9: 0 gc.13: 110996 gc.10: 04 gc.26: 42 gc.28: 111292 gc.17: 111314 gc.12: 111534 gc.31: 111956 So as you can see a few remain. I ran: $ radosgw-admin gc process --debug-rados=10 That finishes within 10 seconds. Then I tried: $ radosgw-admin gc process --debug-rados=10 --include-all That also finishes within 10 seconds. What I noticed in the logs was this: 2019-06-11 09:06:58.711 7f8ffb876240 10 librados: call oid=gc.17 nspace= 2019-06-11 09:06:58.717 7f8ffb876240 10 librados: Objecter returned from call r=-16 The return value is '-16' for gc.17 where for gc.18 or any other object with 0 OMAP values it is: 2019-06-11 09:06:58.717 7f8ffb876240 10 librados: call oid=gc.18 nspace= 2019-06-11 09:06:58.720 7f8ffb876240 10 librados: Objecter returned from call r=0 So I set --debug-rgw=10 RGWGC::process failed to acquire lock on gc.17 I haven't tried stopping all the RGWs yet as that will impact the services, but might that be the root-cause here? Wido > Eric > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
On 6/4/19 7:37 AM, Wido den Hollander wrote: > I've set up a temporary machine next to the 13.2.5 cluster with the > 13.2.6 packages from Shaman. > > On that machine I'm running: > > $ radosgw-admin gc process > > That seems to work as intended! So the PR seems to have fixed it. > > Should be fixed permanently when 13.2.6 is officially released. > > Wido Thank you, Wido, for sharing the results of your experiment. I'm happy to learn that it was successful. And v13.2.6 was just released about 2 hours ago. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
On 5/30/19 2:45 PM, Wido den Hollander wrote: > > > On 5/29/19 11:22 PM, J. Eric Ivancich wrote: >> Hi Wido, >> >> When you run `radosgw-admin gc list`, I assume you are *not* using the >> "--include-all" flag, right? If you're not using that flag, then >> everything listed should be expired and be ready for clean-up. If after >> running `radosgw-admin gc process` the same entries appear in >> `radosgw-admin gc list` then gc apparently stalled. >> > > Not using the --include-all in both cases. > > GC seems to stall and doesn't do anything when looking at it with > --debug-rados=10 > >> There were a few bugs within gc processing that could prevent it from >> making forward progress. They were resolved with a PR (master: >> https://github.com/ceph/ceph/pull/26601 ; mimic backport: >> https://github.com/ceph/ceph/pull/27796). Unfortunately that code was >> backported after the 13.2.5 release, but it is in place for the 13.2.6 >> release of mimic. >> > > Thanks! I'll might grab some packages from Shaman to give GC a try. > I've set up a temporary machine next to the 13.2.5 cluster with the 13.2.6 packages from Shaman. On that machine I'm running: $ radosgw-admin gc process That seems to work as intended! So the PR seems to have fixed it. Should be fixed permanently when 13.2.6 is officially released. Wido > Wido > >> Eric >> >> >> On 5/29/19 3:19 AM, Wido den Hollander wrote: >>> Hi, >>> >>> I've got a Ceph cluster with this status: >>> >>> health: HEALTH_WARN >>> 3 large omap objects >>> >>> After looking into it I see that the issue comes from objects in the >>> '.rgw.gc' pool. >>> >>> Investigating it I found that the gc.* objects have a lot of OMAP keys: >>> >>> for OBJ in $(rados -p .rgw.gc ls); do >>> echo $OBJ >>> rados -p .rgw.gc listomapkeys $OBJ|wc -l >>> done >>> >>> I then found out that on average these objects have about 100k of OMAP >>> keys each, but two stand out and have about 3M OMAP keys. >>> >>> I can list the GC with 'radosgw-admin gc list' and this yields a JSON >>> which is a couple of MB in size. >>> >>> I ran: >>> >>> $ radosgw-admin gc process >>> >>> That runs for hours and then finishes, but the large list of OMAP keys >>> stays. >>> >>> Running Mimic 13.3.5 on this cluster. >>> >>> Has anybody seen this before? >>> >>> Wido >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
On 5/29/19 11:22 PM, J. Eric Ivancich wrote: > Hi Wido, > > When you run `radosgw-admin gc list`, I assume you are *not* using the > "--include-all" flag, right? If you're not using that flag, then > everything listed should be expired and be ready for clean-up. If after > running `radosgw-admin gc process` the same entries appear in > `radosgw-admin gc list` then gc apparently stalled. > Not using the --include-all in both cases. GC seems to stall and doesn't do anything when looking at it with --debug-rados=10 > There were a few bugs within gc processing that could prevent it from > making forward progress. They were resolved with a PR (master: > https://github.com/ceph/ceph/pull/26601 ; mimic backport: > https://github.com/ceph/ceph/pull/27796). Unfortunately that code was > backported after the 13.2.5 release, but it is in place for the 13.2.6 > release of mimic. > Thanks! I'll might grab some packages from Shaman to give GC a try. Wido > Eric > > > On 5/29/19 3:19 AM, Wido den Hollander wrote: >> Hi, >> >> I've got a Ceph cluster with this status: >> >> health: HEALTH_WARN >> 3 large omap objects >> >> After looking into it I see that the issue comes from objects in the >> '.rgw.gc' pool. >> >> Investigating it I found that the gc.* objects have a lot of OMAP keys: >> >> for OBJ in $(rados -p .rgw.gc ls); do >> echo $OBJ >> rados -p .rgw.gc listomapkeys $OBJ|wc -l >> done >> >> I then found out that on average these objects have about 100k of OMAP >> keys each, but two stand out and have about 3M OMAP keys. >> >> I can list the GC with 'radosgw-admin gc list' and this yields a JSON >> which is a couple of MB in size. >> >> I ran: >> >> $ radosgw-admin gc process >> >> That runs for hours and then finishes, but the large list of OMAP keys >> stays. >> >> Running Mimic 13.3.5 on this cluster. >> >> Has anybody seen this before? >> >> Wido >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
Hi Wido, When you run `radosgw-admin gc list`, I assume you are *not* using the "--include-all" flag, right? If you're not using that flag, then everything listed should be expired and be ready for clean-up. If after running `radosgw-admin gc process` the same entries appear in `radosgw-admin gc list` then gc apparently stalled. There were a few bugs within gc processing that could prevent it from making forward progress. They were resolved with a PR (master: https://github.com/ceph/ceph/pull/26601 ; mimic backport: https://github.com/ceph/ceph/pull/27796). Unfortunately that code was backported after the 13.2.5 release, but it is in place for the 13.2.6 release of mimic. Eric On 5/29/19 3:19 AM, Wido den Hollander wrote: > Hi, > > I've got a Ceph cluster with this status: > > health: HEALTH_WARN > 3 large omap objects > > After looking into it I see that the issue comes from objects in the > '.rgw.gc' pool. > > Investigating it I found that the gc.* objects have a lot of OMAP keys: > > for OBJ in $(rados -p .rgw.gc ls); do > echo $OBJ > rados -p .rgw.gc listomapkeys $OBJ|wc -l > done > > I then found out that on average these objects have about 100k of OMAP > keys each, but two stand out and have about 3M OMAP keys. > > I can list the GC with 'radosgw-admin gc list' and this yields a JSON > which is a couple of MB in size. > > I ran: > > $ radosgw-admin gc process > > That runs for hours and then finishes, but the large list of OMAP keys > stays. > > Running Mimic 13.3.5 on this cluster. > > Has anybody seen this before? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com