I am actively working through the code and debugging everything. I figure the issue is with how RGW is listing the parts of a multipart upload when it completes or aborts the upload (read: it's not getting *all* the parts, just those that are either most recent or tagged with the upload id). As soon as I can figure out a patch, or, more importantly, how to manually address the problem, I will respond with instructions.
The reported bug contains detailed instructions on reproducing the problem, so it's trivial to reproduce and test on a small and/or new cluster. Brian On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <tyler.bis...@beyondhosting.net > wrote: > We're having the same issues. I have a 1200TB pool at 90% utilization > however disk utilization is only 40% > > > > [image: http://static.beyondhosting.net/img/bh-small.png] > > > *Tyler Bishop *Chief Technical Officer > 513-299-7108 x10 > > tyler.bis...@beyondhosting.net > > If you are not the intended recipient of this transmission you are > notified that disclosing, copying, distributing or taking any action in > reliance on the contents of this information is strictly prohibited. > > > > ------------------------------ > *From: *"Brian Felton" <bjfel...@gmail.com> > *To: *"ceph-users" <ceph-us...@ceph.com> > *Sent: *Wednesday, July 27, 2016 9:24:30 AM > *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads > > Greetings, > > Background: If an object storage client re-uploads parts to a multipart > object, RadosGW does not clean up all of the parts properly when the > multipart upload is aborted or completed. You can read all of the gory > details (including reproduction steps) in this bug report: > http://tracker.ceph.com/issues/16767. > > My setup: Hammer 0.94.6 cluster only used for S3-compatible object > storage. RGW stripe size is 4MiB. > > My problem: I have buckets that are reporting TB more utilization (and, in > one case, 200k more objects) than they should report. I am trying to > remove the detritus from the multipart uploads, but removing the leftover > parts directly from the .rgw.buckets pool is having no effect on bucket > utilization (i.e. neither the object count nor the space used are > declining). > > To give an example, I have a client that uploaded a very large multipart > object (8000 15MiB parts). Due to a bug in the client, it uploaded each of > the 8000 parts 6 times. After the sixth attempt, it gave up and aborted > the upload, at which point RGW removed the 8000 parts from the sixth > attempt. When I list the bucket's contents with radosgw-admin > (radosgw-admin bucket list --bucket=<bucket> --max-entries=<size of > bucket>), I see all of the object's 8000 parts five separate times, each > under a namespace of 'multipart'. > > Since the multipart upload was aborted, I can't remove the object by name > via the S3 interface. Since my RGW stripe size is 4MiB, I know that each > part of the object will be stored across 4 entries in the .rgw.buckets pool > -- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive > 'shadow' files. I've created a script to remove these parts (rados -p > .rgw.buckets rm <bucket_id>__multipart_<object+prefix>.<part> and rados -p > .rgw.buckets rm <bucket_id>__shadow_<object+prefix>.<part>.[1-3]). The > removes are completing successfully (in that additional attempts to remove > the object result in a failure), but I'm not seeing any decrease in the > bucket's space used, nor am I seeing a decrease in the bucket's object > count. In fact, if I do another 'bucket list', all of the removed parts > are still included. > > I've looked at the output of 'gc list --include-all', and the removed > parts are never showing up for garbage collection. Garbage collection is > otherwise functioning normally and will successfully remove data for any > object properly removed via the S3 interface. > > I've also gone so far as to write a script to list the contents of bucket > shards in the .rgw.buckets.index pool, check for the existence of the entry > in .rgw.buckets, and remove entries that cannot be found, but that is also > failing to decrement the size/object count counters. > > What am I missing here? Where, aside from .rgw.buckets and > .rgw.buckets.index is RGW looking to determine object count and space used > for a bucket? > > Many thanks to any and all who can assist. > > Brian Felton > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com