Hi,
according to [1] the error 125 means there was a race condition:
failed to sync bucket instance: (125) Operation canceled
A racing condition exists between writes to the same RADOS object.
Can you rewrite just the affected object? Not sure about the other
error, maybe try rewriting that object as well? But I'm not sure how
that would lead to a 25 TB difference. Or could this condition impact
the entire sync? Hopefully someone with more multisite knowledge can
comment. Is ceph healthy? No inactive PGs or anything?
[1]
https://www.ibm.com/docs/en/storage-ceph/6?topic=gateway-error-code-definitions-ceph-object
Zitat von ankit raikwar <ankit199999raik...@gmail.com>:
Hello Users,
We have the environment as below. Both environments are
the zones of one RGW multisite zonegroup, whereas the DC zone is the
primary and the DR zone is the secondary at this point.
DC
Ceph Version: 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
quincy (stable)
Number of rgw daemons : 25
DR
Ceph Version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
quincy (stable)
Number of rgw daemons : 25
Environment description:
Both the mentioned zones are in production and the
RGW multisite bandwidth is over MPLS of around 3 Gbps.
Issue description :
We have enabled the multisite between DC-DR almost
around a month ago. The total data at the DC zone is around 159 TiB
and the sync has been going as expected . But when the sync had gone
around 120 TiB we saw the speed drastically fell low, the ideal was
around 2 Gbps, and it fell way below 10 Mbps though the link is not
saturated. After checking "# radosgw-admin sync status " the output
says "metadata is caught up with master" and "data is caught up with
source" but with almost 25 TB data behind as compared to DC. It also
looks like the sync status of the bucket " radosgw-admin bucket sync
status --bucket=<bucket-name>" still bucket is behind shards.
Attaching the log and the output below.
The possibility of issuing a resync of the data
from the beginning is quite low and not feasible in our case. The "#
radosgw-admin sync error list" output is also attached with some
information redacted and we see errors.
radosgw-sync status
radosgw-admin sync status
realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
radosgw-admin bucket sync status --bucket=<bucket-name>
realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
bucket :tc******rc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])
source zone d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
source bucket
:tc*******arc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])
full sync: 14/9221 shards
full sync: 49448693 objects completed
incremental sync: 9207/9221 shards
bucket is behind on 25 shards
behind shards:
[9,111,590,826,1774,2968,3132,3382,3386,3409,3685,3820,4174,4544,4708,4811,5733,6285,6558,7288,7417,7443,7876,8151,8878]
Error: radosgw-admin sync error list
"id": "1_1690799008.725414_3926410.1",
"section": "data",
"name":
"bucket0:d09d3d16-8601-448b-bf3d-609b8a29647d.89871.1:1949",
"timestamp": "2023-07-31T10:23:28.725414Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 125,
"message": "failed to sync bucket instance:
(125) Operation canceled"
"id": "1_1690804503.144829_3759212.1",
"section": "data",
"name":
"bucket1:d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1:1232/S01/1/120/2b7ea802-efad-41d3-9d90-9**************523.txt",
"timestamp": "2023-07-31T11:54:53.233451Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 5,
"message": "failed to sync object(5) Input/output error"
Thanks
Ankit
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io