Hi,

I've previously discussed some issues I've had with the RGW lifecycle 
processing. I've discovered that the root cause of my problem is that:

  *   I'm running a multisite configuration
     *   Life cycle processing is done on the master site each night. 
`radosgw-admin lc list` correctly returns all buckets with lc config.
  *   I simulate the master site being destroyed from my VM host.
  *   I promote the secondary site to master following the instructions here:  
https://docs.ceph.com/docs/master/radosgw/multisite/
     *   The new master site isn't doing any lifecycle processing. 
`radosgw-admin lc list` returns empty.
  *   I recreate a cluster and pair it with the new master site to get back to 
having multisite redundancy.
     *   Neither site is doing any lifecycle processing. `radosgw-admin lc 
list` returns empty.
So in the process of failover/recovery I have gone from having two paired 
clusters performing lifecycle processing, to two paired clusters NOT performing 
lifecycle processing.

Is this behaviour expected? I've found `radosgw-admin lc reshard fix` will 
"remind" the cluster that I run it on that it needs to do lifecycle processing. 
Although I found no mention of having to use this in the docs, for that command 
the docs state it's only relevant on earlier Ceph versions. I'm running 
Nautilus 14.2.9.

In addition, if I have two healthy clusters paired in a multisite system, and 
swap the master cluster by promoting the non-master, the demoted cluster seems 
to still continue doing lifecycle processing, while the promote does not. If I 
run `radosgw-admin lc reshard fix` on the promoted cluster, then both clusters 
seem to claim they are doing the processing. Is this a happy state to be in?

Does anyone have any experience with this?

Thanks,
Alex
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to