Hi all, Apologies for all the messages to the list over the past few days.
After an upgrade from 12.2.7 to 12.2.12 (inherited cluster) for an RGW multisite active/active setup I am almost constantly seeing 1-10 "recovering shards" when running "radosgw-admin sync status", ie: ---------- # radosgw-admin sync status realm 8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f (prod) zonegroup 60a2cb75-6978-46a3-b830-061c8be9dc75 (prod) zone 7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e (us-east-2) metadata sync no sync (zone is master) data sync source: ffce148e-3b24-462d-98bf-8c212de31de5 (us-east-1) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 1 shards behind shards: [102] 8 shards are recovering recovering shards: [31,37,56,60,83,92,95,127] ---------- Every once in a while it will go to full sync. This is seen on both the master and secondary side. There were a bunch of stale reshard instances (on both ends) that I was able to remove with: ----- radosgw-admin reshard stale-instances rm ----- What exactly is a "recovering shard". What can be performed to troubleshoot/fix this condition? I have verified that rgw_num_rados_handles is 1. Additionally, what exactly do: # radosgw-admin metadata sync init # radosgw-admin data sync init do and do they tend to take a long time to perform. thx Frank
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io