[ceph-users] Re: data usage growing despite data being written

Wyll Ingersoll Wed, 07 Sep 2022 07:40:04 -0700

I'm sure we probably have but I'm not sure what else to do.  We are desperate 
to get data off of these 99%+ OSDs and the cluster by itself isn't doing it.

The crushmap appears ok.  we have replicated pools and a large EC pool, all are 
using host-based failure domains.  The new osds on the newly added hosts are 
slowly filling, just not as much as we expected.

We have far too many osds at 99%+ and they continue to fill up.  How do we 
remove the excess OSDMap data, is it even possible?

If we shouldn't be migrating PGs and we cannot remove data, what are our 
options to get it to balance again and stop filling up with OSDMaps and other 
internal ceph data?

thanks!

________________________________
From: Gregory Farnum <gfar...@redhat.com>
Sent: Wednesday, September 7, 2022 10:01 AM
To: Wyll Ingersoll <wyllys.ingers...@keepertech.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] data usage growing despite data being written

On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll
<wyllys.ingers...@keepertech.com> wrote:
>
>
> Our cluster has not had any data written to it externally in several weeks, 
> but yet the overall data usage has been growing.
> Is this due to heavy recovery activity?  If so, what can be done (if 
> anything) to reduce the data generated during recovery.
>
> We've been trying to move PGs away from high-usage OSDS (many over 99%), but 
> it's like playing whack-a-mole, the cluster keeps sending new data to already 
> overly full osds making further recovery nearly impossible.

I may be missing something, but I think you've really slowed things
down by continually migrating PGs around while the cluster is already
unhealthy. It forces a lot of new OSDMap generation and general churn
(which itself slows down data movement.)

I'd also examine your crush map carefully, since it sounded like you'd
added some new hosts and they weren't getting the data you expected
them to. Perhaps there's some kind of imbalance (eg, they aren't in
racks, and selecting those is part of your crush rule?).
-Greg

>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: data usage growing despite data being written

Reply via email to