On Mon, Feb 27, 2017 at 11:40 AM, Marius Vaitiekunas < mariusvaitieku...@gmail.com> wrote:
> > > On Mon, Feb 27, 2017 at 9:59 AM, Marius Vaitiekunas < > mariusvaitieku...@gmail.com> wrote: > >> >> >> On Fri, Feb 24, 2017 at 6:35 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com >> > wrote: >> >>> On Fri, Feb 24, 2017 at 3:59 AM, Marius Vaitiekunas >>> <mariusvaitieku...@gmail.com> wrote: >>> > >>> > >>> > On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub < >>> yeh...@redhat.com> >>> > wrote: >>> >> >>> >> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas >>> >> <mariusvaitieku...@gmail.com> wrote: >>> >> > Hi Cephers, >>> >> > >>> >> > We are testing rgw multisite solution between to DC. We have one >>> >> > zonegroup >>> >> > and to zones. At the moment all writes/deletes are done only to >>> primary >>> >> > zone. >>> >> > >>> >> > Sometimes not all the objects are replicated.. We've written >>> prometheus >>> >> > exporter to check replication status. It gives us each bucket object >>> >> > count >>> >> > from user perspective, because we have millions of objects and >>> hundreds >>> >> > of >>> >> > buckets. We just want to be sure, that everything is replicated >>> without >>> >> > using ceph internals like rgw admin api for now. >>> >> > >>> >> > Is it possible to initiate full resync of only one rgw bucket from >>> >> > master >>> >> > zone? What are the options about resync when things go wrong and >>> >> > replication >>> >> > misses some objects? >>> >> > >>> >> > We run latest jewel 10.2.5. >>> >> >>> >> >>> >> There's the 'radosgw-admin bucket sync init' command that you can run >>> >> on the specific bucket on the target zone. This will reinitialize the >>> >> sync state, so that when it starts syncing it will go through the >>> >> whole full sync process. Note that it shouldn't actually copy data >>> >> that already exists on the target. Also, in order to actually start >>> >> the sync, you'll need to have some change that would trigger the sync >>> >> on that bucket, e.g., create a new object there. >>> >> >>> >> Yehuda >>> >> >>> > >>> > Hi, >>> > >>> > I've tried to resync a bucket, but it didn't manage to resync a missing >>> > object. If I try to copy missing object by hand into secondary zone, i >>> get >>> > asked to overwrite existing object.. It looks like the object is >>> replicated, >>> > but is not in a bucket index. I've tried to check bucket index with >>> --fix >>> > and --check-objects flags, but nothing changes. What else should i try? >>> > >>> >>> That's weird. Do you see anything when you run 'radosgw-admin bi list >>> --bucket=<bucket>'? >>> >>> Yehuda >>> >> >> 'radosgw-admin bi list --bucket=<bucket>' gives me an error: >> 2017-02-27 08:55:30.861659 7f20c15779c0 0 error in read_id for id : (2) >> No such file or directory >> 2017-02-27 08:55:30.861991 7f20c15779c0 0 error in read_id for id : (2) >> No such file or directory >> ERROR: bi_list(): (5) Input/output error >> >> 'radosgw-admin bucket list --bucket=<bucket>' successfully list all the >> files except missing ones. >> >> >> >> > > I've done some more investigation. These missing objects could be found in > "rgw.buckets.data" pool, but bucket index is not aware about them. > How does 'radosgw-admin bucket check -b <bucket> --fix --check-objects' > works? > I guess that it's not scanning "rgw.buckets.data" pool for "leaked" > objects? These unreplicated objects looks for me the same like leaked ones > :) > > > By the way in rgw logs I can find all the missing files with http 304 return code. For example: "GET /go84/WRWRDGROWKFKROTWKHXXIBHERRLHBK HTTP/1.1" 304 0 - - All the gateways in both sites are behind haproxies. Any ideas? -- Marius Vaitiekūnas
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com