What got lost is that I need to change the pool’s m/k parameters, which is only possible by creating a new pool and moving all data from the old pool. Changing the crush rule doesn’t allow you to do that.
> On 16. Jun 2023, at 23:32, Nino Kotur <ninoko...@gmail.com> wrote: > > If you create new crush rule for ssd/nvme/hdd and attach it to existing pool > you should be able to do the migration seamlessly while everything is > online... However impact to user will depend on storage devices load and > network utilization as it will create chaos on cluster network. > > Or did i get something wrong? > > > > > Kind regards, > Nino > > > On Wed, Jun 14, 2023 at 5:44 PM Christian Theune <c...@flyingcircus.io> wrote: > Hi, > > further note to self and for posterity … ;) > > This turned out to be a no-go as well, because you can’t silently switch the > pools to a different storage class: the objects will be found, but the index > still refers to the old storage class and lifecycle migrations won’t work. > > I’ve brainstormed for further options and it appears that the last resort is > to use placement targets and copy the buckets explicitly - twice, because on > Nautilus I don’t have renames available, yet. :( > > This will require temporary downtimes prohibiting users to access their > bucket. Fortunately we only have a few very large buckets (200T+) that will > take a while to copy. We can pre-sync them of course, so the downtime will > only be during the second copy. > > Christian > > > On 13. Jun 2023, at 14:52, Christian Theune <c...@flyingcircus.io> wrote: > > > > Following up to myself and for posterity: > > > > I’m going to try to perform a switch here using (temporary) storage classes > > and renaming of the pools to ensure that I can quickly change the STANDARD > > class to a better EC pool and have new objects located there. After that > > we’ll add (temporary) lifecycle rules to all buckets to ensure their > > objects will be migrated to the STANDARD class. > > > > Once that is finished we should be able to delete the old pool and the > > temporary storage class. > > > > First tests appear successfull, but I’m a bit struggling to get the bucket > > rules working (apparently 0 days isn’t a real rule … and the debug interval > > setting causes high frequent LC runs but doesn’t seem move objects just > > yet. I’ll play around with that setting a bit more, though, I think I might > > have tripped something that only wants to process objects every so often > > and on an interval of 10 a day is still 2.4 hours … > > > > Cheers, > > Christian > > > >> On 9. Jun 2023, at 11:16, Christian Theune <c...@flyingcircus.io> wrote: > >> > >> Hi, > >> > >> we are running a cluster that has been alive for a long time and we tread > >> carefully regarding updates. We are still a bit lagging and our cluster > >> (that started around Firefly) is currently at Nautilus. We’re updating and > >> we know we’re still behind, but we do keep running into challenges along > >> the way that typically are still unfixed on main and - as I started with - > >> have to tread carefully. > >> > >> Nevertheless, mistakes happen, and we found ourselves in this situation: > >> we converted our RGW data pool from replicated (n=3) to erasure coded > >> (k=10, m=3, with 17 hosts) but when doing the EC profile selection we > >> missed that our hosts are not evenly balanced (this is a growing cluster > >> and some machines have around 20TiB capacity for the RGW data pool, wheres > >> newer machines have around 160TiB and we rather should have gone with k=4, > >> m=3. In any case, having 13 chunks causes too many hosts to participate > >> in each object. Going for k+m=7 will allow distribution to be more > >> effective as we have 7 hosts that have the 160TiB sizing. > >> > >> Our original migration used the “cache tiering” approach, but that only > >> works once when moving from replicated to EC and can not be used for > >> further migrations. > >> > >> The amount of data is at 215TiB somewhat significant, so using an approach > >> that scales when copying data[1] to avoid ending up with months of > >> migration. > >> > >> I’ve run out of ideas doing this on a low-level (i.e. trying to fix it on > >> a rados/pool level) and I guess we can only fix this on an application > >> level using multi-zone replication. > >> > >> I have the setup nailed in general, but I’m running into issues with > >> buckets in our staging and production environment that have > >> `explicit_placement` pools attached, AFAICT is this an outdated mechanisms > >> but there are no migration tools around. I’ve seen some people talk about > >> patched versions of the `radosgw-admin metadata put` variant that (still) > >> prohibits removing explicit placements. > >> > >> AFAICT those explicit placements will be synced to the secondary zone and > >> the effect that I’m seeing underpins that theory: the sync runs for a > >> while and only a few hundred objects show up in the new zone, as the > >> buckets/objects are already found in the old pool that the new zone uses > >> due to the explicit placement rule. > >> > >> I’m currently running out of ideas, but open for any other options. > >> > >> Looking at > >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ULKK5RU2VXLFXNUJMZBMUG7CQ5UCWJCB/#R6CPZ2TEWRFL2JJWP7TT5GX7DPSV5S7Z > >> I’m wondering whether the relevant patch is available somewhere, or > >> whether I’ll have to try building that patch again on my own. > >> > >> Going through the docs and the code I’m actually wondering whether > >> `explicit_placement` is actually a really crufty residual piece that won’t > >> get used in newer clusters but older clusters don’t really have an option > >> to get away from? > >> > >> In my specific case, the placement rules are identical to the explicit > >> placements that are stored on (apparently older) buckets and the only > >> thing I need to do is to remove them. I can accept a bit of downtime to > >> avoid any race conditions if needed, so maybe having a small tool to just > >> remove those entries while all RGWs are down would be fine. A call to > >> `radosgw-admin bucket stat` takes about 18s for all buckets in production > >> and I guess that would be a good comparison for what timing to expect when > >> running an update on the metadata. > >> > >> I’ll also be in touch with colleagues from Heinlein and 42on but I’m open > >> to other suggestions. > >> > >> Hugs, > >> Christian > >> > >> [1] We currently have 215TiB data in 230M objects. Using the “official” > >> “cache-flush-evict-all” approach was unfeasible here as it only yielded > >> around 50MiB/s. Using cache limits and targetting the cache sizes to 0 > >> caused proper parallelization and was able to flush/evict at almost > >> constant 1GiB/s in the cluster. > >> > >> > >> -- > >> Christian Theune · c...@flyingcircus.io · +49 345 219401 0 > >> Flying Circus Internet Operations GmbH · https://flyingcircus.io > >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > >> Zagrodnick > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > Liebe Grüße, > > Christian Theune > > > > -- > > Christian Theune · c...@flyingcircus.io · +49 345 219401 0 > > Flying Circus Internet Operations GmbH · https://flyingcircus.io > > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > > Zagrodnick > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > Liebe Grüße, > Christian Theune > > -- > Christian Theune · c...@flyingcircus.io · +49 345 219401 0 > Flying Circus Internet Operations GmbH · https://flyingcircus.io > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io Liebe Grüße, Christian Theune -- Christian Theune · c...@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io