Re: [ceph-users] rocksdb mon stores growing until restart
On Wed, Sep 19, 2018 at 7:01 PM Bryan Stillwell wrote: > > > On 08/30/2018 11:00 AM, Joao Eduardo Luis wrote: > > > On 08/30/2018 09:28 AM, Dan van der Ster wrote: > > > Hi, > > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > > > eventually triggering the 'mon is using a lot of disk space' warning? > > > Since upgrading to luminous, we've seen this happen at least twice. > > > Each time, we restart all the mons and then stores slowly trim down to > > > <500MB. We have 'mon compact on start = true', but it's not the > > > compaction that's shrinking the rockdb's -- the space used seems to > > > decrease over a few minutes only after *all* mons have been restarted. > > > This reminds me of a hammer-era issue where references to trimmed maps > > > were leaking -- I can't find that bug at the moment, though. > > > > Next time this happens, mind listing the store contents and check if you > > are holding way too many osdmaps? You shouldn't be holding more osdmaps > > than the default IF the cluster is healthy and all the pgs are clean. > > > > I've chased a bug pertaining this last year, even got a patch, but then > > was unable to reproduce it. Didn't pursue merging the patch any longer > > (I think I may still have an open PR for it though), simply because it > > was no longer clear if it was needed. > > I just had this happen to me while using ceph-gentle-split on a 12.2.5 > cluster with 1,370 OSDs. Unfortunately, I restarted the mon nodes which > fixed the problem before finding this thread. I'm only halfway done > with the split, so I'll see if the problem resurfaces again. > I think I've understood the what's causing this -- it's related to the issue we've seen where osdmaps are not being trimmed on osds. It seems that once the oldest_map and newest_map are within 500, they are no longer trimmed ever until the mon's are restarted. I updated this tracker: http://tracker.ceph.com/issues/37875 -- dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rocksdb mon stores growing until restart
On 8/30/18 10:28 AM, Dan van der Ster wrote: > Hi, > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > eventually triggering the 'mon is using a lot of disk space' warning? > > Since upgrading to luminous, we've seen this happen at least twice. > Each time, we restart all the mons and then stores slowly trim down to > <500MB. We have 'mon compact on start = true', but it's not the > compaction that's shrinking the rockdb's -- the space used seems to > decrease over a few minutes only after *all* mons have been restarted. > > This reminds me of a hammer-era issue where references to trimmed maps > were leaking -- I can't find that bug at the moment, though. > I just saw our message in the other thread and I thought I'd reply here. I have seen this recently as well with Luminous 12.2.8 after a large migration. Cluster grew from ~2000 OSDs to ~2500. Rebalance took about 4 days. After this all the MONs were 15~16GB in size and were issuing a warning. I stopped the MONs and compacted their MON stores using ceph-monstore-tool and started them again, that worked. I'm usually cautions with doing a online compaction as this sometimes hits the MON performance. Not sure yet why this is happening as the MONs should compact during normal operations. Wido > Cheers, Dan > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rocksdb mon stores growing until restart
> On 08/30/2018 11:00 AM, Joao Eduardo Luis wrote: > > On 08/30/2018 09:28 AM, Dan van der Ster wrote: > > Hi, > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > > eventually triggering the 'mon is using a lot of disk space' warning? > > Since upgrading to luminous, we've seen this happen at least twice. > > Each time, we restart all the mons and then stores slowly trim down to > > <500MB. We have 'mon compact on start = true', but it's not the > > compaction that's shrinking the rockdb's -- the space used seems to > > decrease over a few minutes only after *all* mons have been restarted. > > This reminds me of a hammer-era issue where references to trimmed maps > > were leaking -- I can't find that bug at the moment, though. > > Next time this happens, mind listing the store contents and check if you > are holding way too many osdmaps? You shouldn't be holding more osdmaps > than the default IF the cluster is healthy and all the pgs are clean. > > I've chased a bug pertaining this last year, even got a patch, but then > was unable to reproduce it. Didn't pursue merging the patch any longer > (I think I may still have an open PR for it though), simply because it > was no longer clear if it was needed. I just had this happen to me while using ceph-gentle-split on a 12.2.5 cluster with 1,370 OSDs. Unfortunately, I restarted the mon nodes which fixed the problem before finding this thread. I'm only halfway done with the split, so I'll see if the problem resurfaces again. Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rocksdb mon stores growing until restart
The Hammer ticket was https://tracker.ceph.com/issues/13990. The problem here was when OSDs asked each other for which map they needed to keep and a leak would set it to NULL then that OSD would never delete an OSD map again until it was restarted. On Thu, Aug 30, 2018 at 3:09 AM Joao Eduardo Luis wrote: > On 08/30/2018 09:28 AM, Dan van der Ster wrote: > > Hi, > > > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > > eventually triggering the 'mon is using a lot of disk space' warning? > > > > Since upgrading to luminous, we've seen this happen at least twice. > > Each time, we restart all the mons and then stores slowly trim down to > > <500MB. We have 'mon compact on start = true', but it's not the > > compaction that's shrinking the rockdb's -- the space used seems to > > decrease over a few minutes only after *all* mons have been restarted. > > > > This reminds me of a hammer-era issue where references to trimmed maps > > were leaking -- I can't find that bug at the moment, though. > > Next time this happens, mind listing the store contents and check if you > are holding way too many osdmaps? You shouldn't be holding more osdmaps > than the default IF the cluster is healthy and all the pgs are clean. > > I've chased a bug pertaining this last year, even got a patch, but then > was unable to reproduce it. Didn't pursue merging the patch any longer > (I think I may still have an open PR for it though), simply because it > was no longer clear if it was needed. > > -Joao > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rocksdb mon stores growing until restart
On 08/30/2018 09:28 AM, Dan van der Ster wrote: > Hi, > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > eventually triggering the 'mon is using a lot of disk space' warning? > > Since upgrading to luminous, we've seen this happen at least twice. > Each time, we restart all the mons and then stores slowly trim down to > <500MB. We have 'mon compact on start = true', but it's not the > compaction that's shrinking the rockdb's -- the space used seems to > decrease over a few minutes only after *all* mons have been restarted. > > This reminds me of a hammer-era issue where references to trimmed maps > were leaking -- I can't find that bug at the moment, though. Next time this happens, mind listing the store contents and check if you are holding way too many osdmaps? You shouldn't be holding more osdmaps than the default IF the cluster is healthy and all the pgs are clean. I've chased a bug pertaining this last year, even got a patch, but then was unable to reproduce it. Didn't pursue merging the patch any longer (I think I may still have an open PR for it though), simply because it was no longer clear if it was needed. -Joao ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com