Re: [ceph-users] Dynamic bucket index resharding bug? - rgw.none showing unreal number of objects
On 11/22/19 11:50 AM, David Monschein wrote: > Hi all. Running an Object Storage cluster with Ceph Nautilus 14.2.4. > > We are running into what appears to be a serious bug that is affecting > our fairly new object storage cluster. While investigating some > performance issues -- seeing abnormally high IOPS, extremely slow bucket > stat listings (over 3 minutes) -- we noticed some dynamic bucket > resharding jobs running. Strangely enough they were resharding buckets > that had very few objects. Even more worrying was the number of new > shards Ceph was planning: 65521 > > [root@os1 ~]# radosgw-admin reshard list > [ > { > "time": "2019-11-22 00:12:40.192886Z", > "tenant": "", > "bucket_name": "redacteed", > "bucket_id": "c0d0b8a5-c63c-4c24-9dab-8deee88dbf0b.7000639.20", > "new_instance_id": > "redacted:c0d0b8a5-c63c-4c24-9dab-8deee88dbf0b.7552496.28", > "old_num_shards": 1, > "new_num_shards": 65521 > } > ] > > Upon further inspection we noticed a seemingly impossible number of > objects (18446744073709551603) in rgw.none for the same bucket: > [root@os1 ~]# radosgw-admin bucket stats --bucket=redacted > { > "bucket": "redacted", > "tenant": "", > "zonegroup": "dbb69c5b-b33f-4af2-950c-173d695a4d2c", > "placement_rule": "default-placement", > "explicit_placement": { > "data_pool": "", > "data_extra_pool": "", > "index_pool": "" > }, > "id": "c0d0b8a5-c63c-4c24-9dab-8deee88dbf0b.7000639.20", > "marker": "c0d0b8a5-c63c-4c24-9dab-8deee88dbf0b.7000639.20", > "index_type": "Normal", > "owner": "d52cb8cc-1f92-47f5-86bf-fb28bc6b592c", > "ver": "0#12623", > "master_ver": "0#0", > "mtime": "2019-11-22 00:18:41.753188Z", > "max_marker": "0#", > "usage": { > "rgw.none": { > "size": 0, > "size_actual": 0, > "size_utilized": 0, > "size_kb": 0, > "size_kb_actual": 0, > "size_kb_utilized": 0, > "num_objects": 18446744073709551603 > }, > "rgw.main": { > "size": 63410030, > "size_actual": 63516672, > "size_utilized": 63410030, > "size_kb": 61924, > "size_kb_actual": 62028, > "size_kb_utilized": 61924, > "num_objects": 27 > }, > "rgw.multimeta": { > "size": 0, > "size_actual": 0, > "size_utilized": 0, > "size_kb": 0, > "size_kb_actual": 0, > "size_kb_utilized": 0, > "num_objects": 0 > } > }, > "bucket_quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > } > } > > It would seem that the unreal number of objects in rgw.none is driving > the resharding process, making ceph reshard the bucket 65521 times. I am > assuming 65521 is the limit. > > I have seen only a couple of references to this issue, none of which had > a resolution or much of a conversation around them: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030791.html > https://tracker.ceph.com/issues/37942 > > For now we are cancelling these resharding jobs since they seem to be > causing performance issues with the cluster, but this is an untenable > solution. Does anyone know what is causing this? Or how to prevent > it/fix it? 2^64 (2 to the 64th power) is 18446744073709551616, which is 13 greater than your value of 18446744073709551603. So this likely represents the value of -13, but displayed in an unsigned format. Obviously is should not calculate a value of -13. I'm guessing it's a bug when bucket index entries that are categorized as rgw.none are found, we're not adding to the stats, but when they're removed they are being subtracted from the stats. Interestingly resharding recalculates these, so you'll likely have a much smaller value when you're done. It seems the operations that result in rgw.none bucket index entries are cancelled operations and removals. We're currently looking at how best to deal with rgw.none stats here: https://github.com/ceph/ceph/pull/29062 Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred
Good morning, Vladimir, Please create a tracker for this (https://tracker.ceph.com/projects/rgw/issues/new <https://tracker.ceph.com/projects/rgw/issues/new>) and include the link to it in an email reply. And if you can include any more potentially relevant details, please do so. I’ll add my initial analysis to it. But the threads do seem to be stuck, at least for a while, in get_obj_data::flush despite a lack of traffic. And sometimes it self-resolves, so it’s not a true “infinite loop”. Thank you, Eric > On Aug 22, 2019, at 9:12 PM, Eric Ivancich wrote: > > Thank you for providing the profiling data, Vladimir. There are 5078 threads > and most of them are waiting. Here is a list of the deepest call of each > thread with duplicates removed. > > + 100.00% epoll_wait > + 100.00% > get_obj_data::flush(rgw::OwningList&&) > + 100.00% poll > + 100.00% poll > + 100.00% poll > + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2 > + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2 > + 100.00% pthread_cond_wait@@GLIBC_2.3.2 > + 100.00% pthread_cond_wait@@GLIBC_2.3.2 > + 100.00% read > + 100.00% > _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ > > The only interesting ones are the second and last: > > * get_obj_data::flush(rgw::OwningList&&) > * > _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ > > They are essentially part of the same call stack that results from processing > a GetObj request, and five threads are in this call stack (the only > difference is wether or not they include the call into boost intrusive list). > Here’s the full call stack of those threads: > > + 100.00% clone > + 100.00% start_thread > + 100.00% worker_thread > + 100.00% process_new_connection > + 100.00% handle_request > + 100.00% RGWCivetWebFrontend::process(mg_connection*) > + 100.00% process_request(RGWRados*, RGWREST*, RGWRequest*, > std::string const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, > OpsLogSocket*, opt > ional_yield, rgw::dmclock::Scheduler*, int*) > + 100.00% rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, > RGWRequest*, req_state*, bool) > + 100.00% RGWGetObj::execute() > + 100.00% RGWRados::Object::Read::iterate(long, long, > RGWGetDataCB*) > + 100.00% RGWRados::iterate_obj(RGWObjectCtx&, > RGWBucketInfo const&, rgw_obj const&, long, long, unsigned long, int > (*)(rgw_raw_obj const&, l > ong, long, long, bool, RGWObjState*, void*), void*) > + 100.00% _get_obj_iterate_cb(rgw_raw_obj const&, long, > long, long, bool, RGWObjState*, void*) > + 100.00% RGWRados::get_obj_iterate_cb(rgw_raw_obj > const&, long, long, long, bool, RGWObjState*, void*) > + 100.00% > get_obj_data::flush(rgw::OwningList&&) > + 100.00% > _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ > > So this isn’t background processing but request processing. I’m not clear why > these requests are consuming so much CPU for so long. > > From your initial message: >> I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, >> radosgw process on those machines starts consuming 100% of 5 CPU cores for >> days at a time, even though the machine is not being used for data transfers >> (nothing in radosgw logs, couple of KB/s of network). >> >> This situation can affect any number of our rados gateways, lasts from few >> hours to few days and stops if radosgw process is restarted or on its own. > > > I’m going to check with others who’re more familiar with this code path. > >> Begin forwarded message: >> >> From: Vladimir Brik > <mailto:vladimir.b...@icecube.wisc.edu>> >> Subject: Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is >> being transferred >> Date: August 21, 2019 at 4:47:01 PM EDT >> To: "J. Eric Ivancich" mailto:ivanc...@redhat.com>>, >> Mark Nelson mailto:mnel.
[ceph-users] Fwd: radosgw pegging down 5 CPU cores when no data is being transferred
Thank you for providing the profiling data, Vladimir. There are 5078 threads and most of them are waiting. Here is a list of the deepest call of each thread with duplicates removed. + 100.00% epoll_wait + 100.00% get_obj_data::flush(rgw::OwningList&&) + 100.00% poll + 100.00% poll + 100.00% poll + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2 + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2 + 100.00% pthread_cond_wait@@GLIBC_2.3.2 + 100.00% pthread_cond_wait@@GLIBC_2.3.2 + 100.00% read + 100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ The only interesting ones are the second and last: * get_obj_data::flush(rgw::OwningList&&) * _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ They are essentially part of the same call stack that results from processing a GetObj request, and five threads are in this call stack (the only difference is wether or not they include the call into boost intrusive list). Here’s the full call stack of those threads: + 100.00% clone + 100.00% start_thread + 100.00% worker_thread + 100.00% process_new_connection + 100.00% handle_request + 100.00% RGWCivetWebFrontend::process(mg_connection*) + 100.00% process_request(RGWRados*, RGWREST*, RGWRequest*, std::string const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, opt ional_yield, rgw::dmclock::Scheduler*, int*) + 100.00% rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool) + 100.00% RGWGetObj::execute() + 100.00% RGWRados::Object::Read::iterate(long, long, RGWGetDataCB*) + 100.00% RGWRados::iterate_obj(RGWObjectCtx&, RGWBucketInfo const&, rgw_obj const&, long, long, unsigned long, int (*)(rgw_raw_obj const&, l ong, long, long, bool, RGWObjState*, void*), void*) + 100.00% _get_obj_iterate_cb(rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*) + 100.00% RGWRados::get_obj_iterate_cb(rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*) + 100.00% get_obj_data::flush(rgw::OwningList&&) + 100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_ So this isn’t background processing but request processing. I’m not clear why these requests are consuming so much CPU for so long. From your initial message: > I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, > radosgw process on those machines starts consuming 100% of 5 CPU cores for > days at a time, even though the machine is not being used for data transfers > (nothing in radosgw logs, couple of KB/s of network). > > This situation can affect any number of our rados gateways, lasts from few > hours to few days and stops if radosgw process is restarted or on its own. I’m going to check with others who’re more familiar with this code path. > Begin forwarded message: > > From: Vladimir Brik > Subject: Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is > being transferred > Date: August 21, 2019 at 4:47:01 PM EDT > To: "J. Eric Ivancich" , Mark Nelson > , ceph-users@lists.ceph.com > > > Are you running multisite? > No > > > Do you have dynamic bucket resharding turned on? > Yes. "radosgw-admin reshard list" prints "[]" > > > Are you using lifecycle? > I am not sure. How can I check? "radosgw-admin lc list" says "[]" > > > And just to be clear -- sometimes all 3 of your rados gateways are > > simultaneously in this state? > Multiple, but I have not seen all 3 being in this state simultaneously. > Currently one gateway has 1 thread using 100% of CPU, and another has 5 > threads each using 100% CPU. > > Here are the fruits of my attempts to capture the call graph using perf and > gdbpmp: > https://icecube.wisc.edu/~vbrik/perf.data > https://icecube.wisc.edu/~vbrik/gdbpmp.data > > These are the commands that I ran and their outputs (note I couldn't get perf > not to generate the warning): > rgw-3 gdbpmp # ./gdbpmp.py -n 100 -p 73688 -o gdbpmp.data > Attaching to proces
Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred
On 8/21/19 10:22 AM, Mark Nelson wrote: > Hi Vladimir, > > > On 8/21/19 8:54 AM, Vladimir Brik wrote: >> Hello >> [much elided] > You might want to try grabbing a a callgraph from perf instead of just > running perf top or using my wallclock profiler to see if you can drill > down and find out where in that method it's spending the most time. I agree with Mark -- a call graph would be very helpful in tracking down what's happening. There are background tasks that run. Are you running multisite? Do you have dynamic bucket resharding turned on? Are you using lifecycle? And garbage collection is another background task. And just to be clear -- sometimes all 3 of your rados gateways are simultaneously in this state? But the call graph would be incredibly helpful. Thank you, Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adventures with large RGW buckets [EXT]
A few interleaved responses below On 8/1/19 10:20 AM, Matthew Vernon wrote: > Hi, > > On 31/07/2019 19:02, Paul Emmerich wrote: > > Some interesting points here, thanks for raising them :) > We've had some problems with large buckets (from around the 70Mobject > mark). > > One you don't mention is that multipart uploads break during resharding > - so if our users are filling up a bucket with many writers uploading > multipart objects, some of these will fail (rather than blocking) when > the bucket is resharded. Is there a tracker for that already? If not, would you mind adding one? > We've also seen bucket deletion via radosgw-admin failing because of > oddities in the bucket itself (e.g. missing shadow objects, omap objects > that still exist when the related object is gone); sorting that was a > bit fiddly (with some help from Canonical, who I think are working on > patches). > There was a recently merged PR that addressed bucket deletion with missing shadow objects: https://tracker.ceph.com/issues/40590 Thank you for reporting your experience w/ rgw, Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adventures with large RGW buckets
Hi Paul, I’ve turned the following idea of yours into a tracker: https://tracker.ceph.com/issues/41051 <https://tracker.ceph.com/issues/41051> > 4. Common prefixes could filtered in the rgw class on the OSD instead > of in radosgw > > Consider a bucket with 100 folders with 1000 objects in each and only one > shard > > /p1/1, /p1/2, ..., /p1/1000, /p2/1, /p2/2, ..., /p2/1000, ... /p100/1000 > > > Now a user wants to list / with aggregating common prefixes on the > delimiter / and > wants up to 1000 results. > So there'll be 100 results returned to the client: the common prefixes > p1 to p100. > > How much data will be transfered between the OSDs and radosgw for this > request? > How many omap entries does the OSD scan? > > radosgw will ask the (single) index object to list the first 1000 objects. > It'll > return 1000 objects in a quite unhelpful way: /p1/1, /p1/2, , /p1/1000 > > radosgw will discard 999 of these and detect one common prefix and continue > the > iteration at /p1/\xFF to skip the remaining entries in /p1/ if there are any. > The OSD will then return everything in /p2/ in that next request and so on. > > So it'll internally list every single object in that bucket. That will > be a problem > for large buckets and having lots of shards doesn't help either. > > > This shouldn't be too hard to fix: add an option "aggregate prefixes" to the > RGW > class method and duplicate the fast-forward logic from radosgw in > cls_rgw. It doesn't > even need to change the response type or anything, it just needs to > limit entries in > common prefixes to one result. > Is this a good idea or am I missing something? > > IO would be reduced by a factor of 100 for that particular > pathological case. I've > unfortunately seen a real-world setup that I think hits a case like that. Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adventures with large RGW buckets
omap entries does the OSD scan? > > radosgw will ask the (single) index object to list the first 1000 objects. > It'll > return 1000 objects in a quite unhelpful way: /p1/1, /p1/2, , /p1/1000 > > radosgw will discard 999 of these and detect one common prefix and continue > the > iteration at /p1/\xFF to skip the remaining entries in /p1/ if there are any. > The OSD will then return everything in /p2/ in that next request and so on. > > So it'll internally list every single object in that bucket. That will > be a problem > for large buckets and having lots of shards doesn't help either. > > > This shouldn't be too hard to fix: add an option "aggregate prefixes" to the > RGW > class method and duplicate the fast-forward logic from radosgw in > cls_rgw. It doesn't > even need to change the response type or anything, it just needs to > limit entries in > common prefixes to one result. > Is this a good idea or am I missing something? On the face it looks good. I’ll raise this with other RGW developers. I do know that there was a related bug that was recently addressed with this pr: https://github.com/ceph/ceph/pull/28192 <https://github.com/ceph/ceph/pull/28192> But your suggestion seems to go farther. > IO would be reduced by a factor of 100 for that particular > pathological case. I've > unfortunately seen a real-world setup that I think hits a case like that. Thank you for sharing your experiences and your ideas. Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW - Multisite setup -> question about Bucket - Sharding, limitations and synchronization
ilding of that sync - process) That’s about right. > And If I understand it correct, how would look the exact strategy in a > multisite - setup to resync e.g. a single bucket at which one zone got > corrupted and must be get back into a synchronous state? Be aware that there are full syncs and incremental syncs. Full syncs just copy every object. Incremental syncs use logs to sync selectively. Perhaps Casey will weigh in and discuss the state transitions. > Hope thats the correct place to ask such questions. > > Best Regards, > Daly -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cannot delete bucket
> On Jun 27, 2019, at 4:53 PM, David Turner wrote: > > I'm still going at 452M incomplete uploads. There are guides online for > manually deleting buckets kinda at the RADOS level that tend to leave data > stranded. That doesn't work for what I'm trying to do so I'll keep going with > this and wait for that PR to come through and hopefully help with bucket > deletion. > > On Thu, Jun 27, 2019 at 2:58 PM Sergei Genchev <mailto:sgenc...@gmail.com>> wrote: > @David Turner > Did your bucket delete ever finish? I am up to 35M incomplete uploads, > and I doubt that I actually had that many upload attempts. I could be > wrong though. > Is there a way to force bucket deletion, even at the cost of not > cleaning up space? Just a quick update…. The PR merged and backports are underway for luminous, mimic, and nautilus: http://tracker.ceph.com/issues/40526 <http://tracker.ceph.com/issues/40526> Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cannot delete bucket
On 6/24/19 1:49 PM, David Turner wrote: > It's aborting incomplete multipart uploads that were left around. First > it will clean up the cruft like that and then it should start actually > deleting the objects visible in stats. That's my understanding of it > anyway. I'm int he middle of cleaning up some buckets right now doing > this same thing. I'm up to `WARNING : aborted 108393000 incomplete > multipart uploads`. This bucket had a client uploading to it constantly > with a very bad network connection. There's a PR to better deal with this situation: https://github.com/ceph/ceph/pull/28724 Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
Hi Wido, Interleaving below On 6/11/19 3:10 AM, Wido den Hollander wrote: > > I thought it was resolved, but it isn't. > > I counted all the OMAP values for the GC objects and I got back: > > gc.0: 0 > gc.11: 0 > gc.14: 0 > gc.15: 0 > gc.16: 0 > gc.18: 0 > gc.19: 0 > gc.1: 0 > gc.20: 0 > gc.21: 0 > gc.22: 0 > gc.23: 0 > gc.24: 0 > gc.25: 0 > gc.27: 0 > gc.29: 0 > gc.2: 0 > gc.30: 0 > gc.3: 0 > gc.4: 0 > gc.5: 0 > gc.6: 0 > gc.7: 0 > gc.8: 0 > gc.9: 0 > gc.13: 110996 > gc.10: 04 > gc.26: 42 > gc.28: 111292 > gc.17: 111314 > gc.12: 111534 > gc.31: 111956 Casey Bodley mentioned to me that he's seen similar behavior to what you're describing when RGWs are upgraded but not all OSDs are upgraded as well. Is it possible that the OSDs hosting gc.13, gc.10, and so forth are running a different version of ceph? Eric -- J. Eric Ivancich he/him/his Red Hat Storage Ann Arbor, Michigan, USA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
On 6/4/19 7:37 AM, Wido den Hollander wrote: > I've set up a temporary machine next to the 13.2.5 cluster with the > 13.2.6 packages from Shaman. > > On that machine I'm running: > > $ radosgw-admin gc process > > That seems to work as intended! So the PR seems to have fixed it. > > Should be fixed permanently when 13.2.6 is officially released. > > Wido Thank you, Wido, for sharing the results of your experiment. I'm happy to learn that it was successful. And v13.2.6 was just released about 2 hours ago. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Large OMAP object in RGW GC pool
Hi Wido, When you run `radosgw-admin gc list`, I assume you are *not* using the "--include-all" flag, right? If you're not using that flag, then everything listed should be expired and be ready for clean-up. If after running `radosgw-admin gc process` the same entries appear in `radosgw-admin gc list` then gc apparently stalled. There were a few bugs within gc processing that could prevent it from making forward progress. They were resolved with a PR (master: https://github.com/ceph/ceph/pull/26601 ; mimic backport: https://github.com/ceph/ceph/pull/27796). Unfortunately that code was backported after the 13.2.5 release, but it is in place for the 13.2.6 release of mimic. Eric On 5/29/19 3:19 AM, Wido den Hollander wrote: > Hi, > > I've got a Ceph cluster with this status: > > health: HEALTH_WARN > 3 large omap objects > > After looking into it I see that the issue comes from objects in the > '.rgw.gc' pool. > > Investigating it I found that the gc.* objects have a lot of OMAP keys: > > for OBJ in $(rados -p .rgw.gc ls); do > echo $OBJ > rados -p .rgw.gc listomapkeys $OBJ|wc -l > done > > I then found out that on average these objects have about 100k of OMAP > keys each, but two stand out and have about 3M OMAP keys. > > I can list the GC with 'radosgw-admin gc list' and this yields a JSON > which is a couple of MB in size. > > I ran: > > $ radosgw-admin gc process > > That runs for hours and then finishes, but the large list of OMAP keys > stays. > > Running Mimic 13.3.5 on this cluster. > > Has anybody seen this before? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.
Hi Manuel, My response is interleaved below. On 5/8/19 3:17 PM, EDH - Manuel Rios Fernandez wrote: > Eric, > > Yes we do : > > time s3cmd ls s3://[BUCKET]/ --no-ssl and we get near 2min 30 secs for list > the bucket. We're adding an --allow-unordered option to `radosgw-admin bucket list`. That would likely speed up your listing. If you want to follow the trackers, they are: https://tracker.ceph.com/issues/39637 [feature added to master] https://tracker.ceph.com/issues/39730 [nautilus backport] https://tracker.ceph.com/issues/39731 [mimic backport] https://tracker.ceph.com/issues/39732 [luminous backport] > If we instantly hit again the query it normally timeouts. That's interesting. I don't have an explanation for that behavior. I would suggest creating a tracker for the issue, ideally with the minimal steps to reproduce the issue. My concern is that your bucket has so many objects, and if that's related to the issue, it would not be easy to reproduce. > Could you explain a little more " > > With respect to your earlier message in which you included the output of > `ceph df`, I believe the reason that default.rgw.buckets.index shows as > 0 bytes used is that the index uses the metadata branch of the object to > store its data. > " Each object in ceph has three components. The data itself plus two types of metadata (omap and xattr). The `ceph df` command doesn't count the metadata. The bucket indexes that track the objects in each bucket use only the metadata. So you won't see that reported in `ceph df`. > I read in IRC today that in Nautilus release now is well calculated and no > show more 0B. Is it correct? I don't know. I wasn't aware of any changes in nautilus that report metadata in `ceph df`. > Thanks for your response. You're welcome, Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.
Hi Manuel, I’ve interleaved responses below. > On May 8, 2019, at 3:17 PM, EDH - Manuel Rios Fernandez > wrote: > > Eric, > > Yes we do : > > time s3cmd ls s3://[BUCKET]/ --no-ssl and we get near 2min 30 secs for list > the bucket. > > If we instantly hit again the query it normally timeouts. > > > Could you explain a little more " > > With respect to your earlier message in which you included the output of > `ceph df`, I believe the reason that default.rgw.buckets.index shows as > 0 bytes used is that the index uses the metadata branch of the object to > store its data. > “ Each object stored in ceph is composed of 3 distinct parts — the data, the xattr metadata (older), and the omap metadata (newer). For the system objects that manage RGW on top of ceph we often use the omap metadata. We use this for bucket indexes and for various types of logs, for example. `ceph df` reports only the data’s size and not the two types of metadata sizes. So that would explain why you see 0B for the bucket index objects. > I read in IRC today that in Nautilus release now is well calculated and no > show more 0B. Is it correct? I am having difficulty understanding that sentence. Would you be so kind as to rewrite it? I don’t want to create confusion by guessing. Eric > Thanks for your response. > > > -Mensaje original- > De: J. Eric Ivancich > Enviado el: miércoles, 8 de mayo de 2019 21:00 > Para: EDH - Manuel Rios Fernandez ; 'Casey Bodley' > ; ceph-users@lists.ceph.com > Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker > diferent. > > Hi Manuel, > > My response is interleaved. > > On 5/7/19 7:32 PM, EDH - Manuel Rios Fernandez wrote: >> Hi Eric, >> >> This looks like something the software developer must do, not something than >> Storage provider must allow no? > > True -- so you're using `radosgw-admin bucket list --bucket=XYZ` to list the > bucket? Currently we do not allow for a "--allow-unordered" flag, but there's > no reason we could not. I'm working on the PR now, although it might take > some time before it gets to v13. > >> Strange behavior is that sometimes bucket is list fast in less than 30 secs >> and other time it timeout after 600 secs, the bucket contains 875 folders >> with a total object number of 6Millions. >> >> I don’t know how a simple list of 875 folder can timeout after 600 >> secs > > Burkhard Linke's comment is on target. The "folders" are a trick using > delimiters. A bucket is really entirely flat without a hierarchy. > >> We bought several NVMe Optane for do 4 partitions in each PCIe card and get >> up 1.000.000 IOPS for Index. Quite expensive because we calc that our index >> is just 4GB (100-200M objects),waiting those cards. Any more idea? > > With respect to your earlier message in which you included the output of > `ceph df`, I believe the reason that default.rgw.buckets.index shows as > 0 bytes used is that the index uses the metadata branch of the object to > store its data. > >> Regards > > Eric > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.
Hi Manuel, My response is interleaved. On 5/7/19 7:32 PM, EDH - Manuel Rios Fernandez wrote: > Hi Eric, > > This looks like something the software developer must do, not something than > Storage provider must allow no? True -- so you're using `radosgw-admin bucket list --bucket=XYZ` to list the bucket? Currently we do not allow for a "--allow-unordered" flag, but there's no reason we could not. I'm working on the PR now, although it might take some time before it gets to v13. > Strange behavior is that sometimes bucket is list fast in less than 30 secs > and other time it timeout after 600 secs, the bucket contains 875 folders > with a total object number of 6Millions. > > I don’t know how a simple list of 875 folder can timeout after 600 secs Burkhard Linke's comment is on target. The "folders" are a trick using delimiters. A bucket is really entirely flat without a hierarchy. > We bought several NVMe Optane for do 4 partitions in each PCIe card and get > up 1.000.000 IOPS for Index. Quite expensive because we calc that our index > is just 4GB (100-200M objects),waiting those cards. Any more idea? With respect to your earlier message in which you included the output of `ceph df`, I believe the reason that default.rgw.buckets.index shows as 0 bytes used is that the index uses the metadata branch of the object to store its data. > Regards Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.
On 5/7/19 11:24 AM, EDH - Manuel Rios Fernandez wrote: > Hi Casey > > ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable) > > Reshard is something than don’t allow us customer to list index? > > Regards Listing of buckets with a large number of buckets is notoriously slow, because the entries are not stored in lexical order but the default behavior is to list the objects in lexical order. If your use case allows for an unordered listing it would likely perform better. You can see some documentation here under the S3 API / GET BUCKET: http://docs.ceph.com/docs/mimic/radosgw/s3/bucketops/ Are you using S3? Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to config mclock_client queue?
So I do not think mclock_client queue works the way you’re hoping it does. For categorization purposes it joins the operation class and the client identifier with the intent that that will execute operations among clients more evenly (i.e., it won’t favor one client over another). However, it was not designed for per-client distinct configurations, which is what it seems that you’re after. I started an effort to update librados (and the path all the way back to the OSDs) to allow per-client QoS configuration. However I got pulled off of that for other priorities. I believe Mark Kogan is working on that as he has time. That might be closer to what you’re after. See: https://github.com/ceph/ceph/pull/20235 . Eric > On Mar 26, 2019, at 8:14 AM, Wang Chuanwen wrote: > > I am now trying to run tests to see how mclock_client queue works on mimic. > But when I tried to config tag (r,w,l) of each client, I found there are no > options to distinguish different clients. > All I got are following options for mclock_opclass, which are used to > distinguish different types of operations. > > [root@ceph-node1 ~]# ceph daemon osd.0 config show | grep mclock > "osd_op_queue": "mclock_opclass", > "osd_op_queue_mclock_client_op_lim": "100.00", > "osd_op_queue_mclock_client_op_res": "100.00", > "osd_op_queue_mclock_client_op_wgt": "500.00", > "osd_op_queue_mclock_osd_subop_lim": "0.00", > "osd_op_queue_mclock_osd_subop_res": "1000.00", > "osd_op_queue_mclock_osd_subop_wgt": "500.00", > "osd_op_queue_mclock_recov_lim": "0.001000", > "osd_op_queue_mclock_recov_res": "0.00", > "osd_op_queue_mclock_recov_wgt": "1.00", > "osd_op_queue_mclock_scrub_lim": "100.00", > "osd_op_queue_mclock_scrub_res": "100.00", > "osd_op_queue_mclock_scrub_wgt": "500.00", > "osd_op_queue_mclock_snap_lim": "0.001000", > "osd_op_queue_mclock_snap_res": "0.00", > "osd_op_queue_mclock_snap_wgt": "1.00" > > I am wondering if ceph mimic provide any configuration interfaces for > mclock_client queue? > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Omap issues - metadata creating too many
If you can wait a few weeks until the next release of luminous there will be tooling to do this safely. Abhishek Lekshmanan of SUSE contributed the PR. It adds some sub-commands to radosgw-admin: radosgw-admin reshard stale-instances list radosgw-admin reshard stale-instances rm If you do it manually you should proceed with extreme caution as you could do some damage that you might not be able to recover from. Eric On 1/3/19 11:31 AM, Bryan Stillwell wrote: > Josef, > > > > I've noticed that when dynamic resharding is on it'll reshard some of > our bucket indices daily (sometimes more). This causes a lot of wasted > space in the .rgw.buckets.index pool which might be what you are seeing. > > > > You can get a listing of all the bucket instances in your cluster with > this command: > > > > radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort > > > > Give that a try and see if you see the same problem. It seems that once > you remove the old bucket instances the omap dbs don't reduce in size > until you compact them. > > > > Bryan > > > > *From: *Josef Zelenka > *Date: *Thursday, January 3, 2019 at 3:49 AM > *To: *"J. Eric Ivancich" > *Cc: *"ceph-users@lists.ceph.com" , Bryan > Stillwell > *Subject: *Re: [ceph-users] Omap issues - metadata creating too many > > > > Hi, i had the default - so it was on(according to ceph kb). turned it > > off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had > > the same issue (reported about it yesterday) - tried his tips about > > compacting, but it doesn't do anything, however i have to add to his > > last point, this happens even with bluestore. Is there anything we can > > do to clean up the omap manually? > > > > Josef > > > > On 18/12/2018 23:19, J. Eric Ivancich wrote: > > On 12/17/18 9:18 AM, Josef Zelenka wrote: > > Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on > > ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three > > nodes have an additional SSD i added to have more space to > rebalance the > > metadata). CUrrently, the cluster is used mainly as a radosgw > storage, > > with 28tb data in total, replication 2x for both the metadata > and data > > pools(a cephfs isntance is running alongside there, but i don't > think > > it's the perpetrator - this happenned likely before we had it). All > > pools aside from the data pool of the cephfs and data pool of the > > radosgw are located on the SSD's. Now, the interesting thing - > at random > > times, the metadata OSD's fill up their entire capacity with > OMAP data > > and go to r/o mode and we have no other option currently than > deleting > > them and re-creating. The fillup comes at a random time, it > doesn't seem > > to be triggered by anything and it isn't caused by some data > influx. It > > seems like some kind of a bug to me to be honest, but i'm not > certain - > > anyone else seen this behavior with their radosgw? Thanks a lot > > Hi Josef, > > > > Do you have rgw_dynamic_resharding turned on? Try turning it off and see > > if the behavior continues. > > > > One theory is that dynamic resharding is triggered and possibly not > > completing. This could add a lot of data to omap for the incomplete > > bucket index shards. After a delay it tries resharding again, possibly > > failing again, and adding more data to the omap. This continues. > > > > If this is the ultimate issue we have some commits on the upstream > > luminous branch that are designed to address this set of issues. > > > > But we should first see if this is the cause. > > > > Eric > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Removing orphaned radosgw bucket indexes from pool
On 11/29/18 6:58 PM, Bryan Stillwell wrote: > Wido, > > I've been looking into this large omap objects problem on a couple of our > clusters today and came across your script during my research. > > The script has been running for a few hours now and I'm already over 100,000 > 'orphaned' objects! > > It appears that ever since upgrading to Luminous (12.2.5 initially, followed > by 12.2.8) this cluster has been resharding the large bucket indexes at least > once a day and not cleaning up the previous bucket indexes: > > for instance in $(radosgw-admin metadata list bucket.instance | jq -r '.[]' | > grep go-test-dashboard); do > mtime=$(radosgw-admin metadata get bucket.instance:${instance} | grep mtime) > num_shards=$(radosgw-admin metadata get bucket.instance:${instance} | grep > num_shards) > echo "${instance}: ${mtime} ${num_shards}" > done | column -t | sort -k3 > go-test-dashboard:default.188839135.327804: "mtime": "2018-06-01 > 22:35:28.693095Z", "num_shards": 0, > go-test-dashboard:default.617828918.2898:"mtime": "2018-06-02 > 22:35:40.438738Z", "num_shards": 46, > go-test-dashboard:default.617828918.4: "mtime": "2018-06-02 > 22:38:21.537259Z", "num_shards": 46, > go-test-dashboard:default.617663016.10499: "mtime": "2018-06-03 > 23:00:04.185285Z", "num_shards": 46, > [...snip...] > go-test-dashboard:default.891941432.342061: "mtime": "2018-11-28 > 01:41:46.777968Z", "num_shards": 7, > go-test-dashboard:default.928133068.2899:"mtime": "2018-11-28 > 20:01:49.390237Z", "num_shards": 46, > go-test-dashboard:default.928133068.5115:"mtime": "2018-11-29 > 01:54:17.788355Z", "num_shards": 7, > go-test-dashboard:default.928133068.8054:"mtime": "2018-11-29 > 20:21:53.733824Z", "num_shards": 7, > go-test-dashboard:default.891941432.359004: "mtime": "2018-11-29 > 20:22:09.201965Z", "num_shards": 46, > > The num_shards is typically around 46, but looking at all 288 instances of > that bucket index, it has varied between 3 and 62 shards. > > Have you figured anything more out about this since you posted this > originally two weeks ago? > > Thanks, > Bryan > > From: ceph-users on behalf of Wido den > Hollander > Date: Thursday, November 15, 2018 at 5:43 AM > To: Ceph Users > Subject: [ceph-users] Removing orphaned radosgw bucket indexes from pool > > Hi, > > Recently we've seen multiple messages on the mailinglists about people > seeing HEALTH_WARN due to large OMAP objects on their cluster. This is > due to the fact that starting with 12.2.6 OSDs warn about this. > > I've got multiple people asking me the same questions and I've done some > digging around. > > Somebody on the ML wrote this script: > > for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do > actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'` > for instance in `radosgw-admin metadata list bucket.instance | jq -r > '.[]' | grep ${bucket}: | cut -d ':' -f 2` > do > if [ "$actual_id" != "$instance" ] > then > radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance} > radosgw-admin metadata rm bucket.instance:${bucket}:${instance} > fi > done > done > > That partially works, but 'orphaned' objects in the index pool do not work. > > So I wrote my own script [0]: > > #!/bin/bash > INDEX_POOL=$1 > > if [ -z "$INDEX_POOL" ]; then > echo "Usage: $0 " > exit 1 > fi > > INDEXES=$(mktemp) > METADATA=$(mktemp) > > trap "rm -f ${INDEXES} ${METADATA}" EXIT > > radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA} > rados -p ${INDEX_POOL} ls > $INDEXES > > for OBJECT in $(cat ${INDEXES}); do > MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5) > grep ${MARKER} ${METADATA} > /dev/null > if [ "$?" -ne 0 ]; then > echo $OBJECT > fi > done > > It does not remove anything, but for example, it returns these objects: > > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752 > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162 > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186 > > The output of: > > $ radosgw-admin metadata list|jq -r '.[]' > > Does not contain: > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186 > > So for me these objects do not seem to be tied to any bucket and seem to > be leftovers which were not cleaned up. > > For example, I see these objects tied to a bucket: > > - b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167 > > But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186 > > Before I remove these objects I want to verify with other users if they > see the same and if my thinking is correct. > > Wido > > [0]: https://gist.github.com/wido/6650e66b09770ef02df8963
Re: [ceph-users] Omap issues - metadata creating too many
On 12/17/18 9:18 AM, Josef Zelenka wrote: > Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on > ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three > nodes have an additional SSD i added to have more space to rebalance the > metadata). CUrrently, the cluster is used mainly as a radosgw storage, > with 28tb data in total, replication 2x for both the metadata and data > pools(a cephfs isntance is running alongside there, but i don't think > it's the perpetrator - this happenned likely before we had it). All > pools aside from the data pool of the cephfs and data pool of the > radosgw are located on the SSD's. Now, the interesting thing - at random > times, the metadata OSD's fill up their entire capacity with OMAP data > and go to r/o mode and we have no other option currently than deleting > them and re-creating. The fillup comes at a random time, it doesn't seem > to be triggered by anything and it isn't caused by some data influx. It > seems like some kind of a bug to me to be honest, but i'm not certain - > anyone else seen this behavior with their radosgw? Thanks a lot Hi Josef, Do you have rgw_dynamic_resharding turned on? Try turning it off and see if the behavior continues. One theory is that dynamic resharding is triggered and possibly not completing. This could add a lot of data to omap for the incomplete bucket index shards. After a delay it tries resharding again, possibly failing again, and adding more data to the omap. This continues. If this is the ultimate issue we have some commits on the upstream luminous branch that are designed to address this set of issues. But we should first see if this is the cause. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] inexplicably slow bucket listing at top level
I did make an inquiry and someone here does have some experience w/ the mc command -- minio client. We're curious how "ls -r" is implemented under mc. Does it need to get a full listing and then do some path parsing to produce nice output? If so, it may be playing a role in the delay as well. Eric On 9/26/18 5:27 PM, Graham Allan wrote: > I have one user bucket, where inexplicably (to me), the bucket takes an > eternity to list, though only on the top level. There are two > subfolders, each of which lists individually at a completely normal > speed... > > eg (using minio client): > >> [~] % time ./mc ls fried/friedlab/ >> [2018-09-26 16:15:48 CDT] 0B impute/ >> [2018-09-26 16:15:48 CDT] 0B wgs/ >> >> real 1m59.390s >> >> [~] % time ./mc ls -r fried/friedlab/ >> ... >> real 3m18.013s >> >> [~] % time ./mc ls -r fried/friedlab/impute >> ... >> real 0m13.512s >> >> [~] % time ./mc ls -r fried/friedlab/wgs >> ... >> real 0m6.437s > > The bucket has about 55k objects total, with 32 index shards on a > replicated ssd pool. It shouldn't be taking this long but I can't > imagine what could be causing this. I haven't found any others behaving > this way. I'd think it has to be some problem with the bucket index, but > what...? > > I did naively try some "radosgw-admin bucket check [--fix]" commands > with no change. > > Graham ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] inexplicably slow bucket listing at top level
The numbers you're reporting strike me as surprising as well. Which version are you running? In case you're not aware, listing of buckets is not a very efficient operation given that the listing is required to return with objects in lexical order. They are distributed across the shards via a hash, which is not in lexical order. So every shard has to have a chunk read and brought to the rgw and the top elements are sorted and returned. For example, in order to return the first 1000 object names, it asks each of the 32 shards for their first 1000 object names, and then does a selection process to get the first 1000 among the 32000. It returns that, and the process is then repeated. I'm unfamiliar with your mc script/command, so I don't know if that might be contributing to the issue. We have added the ability to list buckets in unsorted order and made that accessible via s3 and swift and that's been backported all the way to upstream luminous. Eric On 9/26/18 5:27 PM, Graham Allan wrote: > I have one user bucket, where inexplicably (to me), the bucket takes an > eternity to list, though only on the top level. There are two > subfolders, each of which lists individually at a completely normal > speed... > > eg (using minio client): > > > [~] % time ./mc ls fried/friedlab/ > > [2018-09-26 16:15:48 CDT] 0B impute/ > > [2018-09-26 16:15:48 CDT] 0B wgs/ > > > > real 1m59.390s > > > > [~] % time ./mc ls -r fried/friedlab/ > > ... > > real 3m18.013s > > > > [~] % time ./mc ls -r fried/friedlab/impute > > ... > > real 0m13.512s > > > > [~] % time ./mc ls -r fried/friedlab/wgs > > ... > > real 0m6.437s > > The bucket has about 55k objects total, with 32 index shards on a > replicated ssd pool. It shouldn't be taking this long but I can't > imagine what could be causing this. I haven't found any others behaving > this way. I'd think it has to be some problem with the bucket index, but > what...? > > I did naively try some "radosgw-admin bucket check [--fix]" commands > with no change. > > Graham ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com