I am actively working through the code and debugging everything.  I figure
the issue is with how RGW is listing the parts of a multipart upload when
it completes or aborts the upload (read: it's not getting *all* the parts,
just those that are either most recent or tagged with the upload id).  As
soon as I can figure out a patch, or, more importantly, how to manually
address the problem, I will respond with instructions.

The reported bug contains detailed instructions on reproducing the problem,
so it's trivial to reproduce and test on a small and/or new cluster.

Brian

On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <tyler.bis...@beyondhosting.net
> wrote:

> We're having the same issues.   I have a 1200TB pool at 90% utilization
> however disk utilization is only 40%
>
>
>
>  [image: http://static.beyondhosting.net/img/bh-small.png]
>
>
> *Tyler Bishop *Chief Technical Officer
> 513-299-7108 x10
>
> tyler.bis...@beyondhosting.net
>
> If you are not the intended recipient of this transmission you are
> notified that disclosing, copying, distributing or taking any action in
> reliance on the contents of this information is strictly prohibited.
>
>
>
> ------------------------------
> *From: *"Brian Felton" <bjfel...@gmail.com>
> *To: *"ceph-users" <ceph-us...@ceph.com>
> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>
> Greetings,
>
> Background: If an object storage client re-uploads parts to a multipart
> object, RadosGW does not clean up all of the parts properly when the
> multipart upload is aborted or completed.  You can read all of the gory
> details (including reproduction steps) in this bug report:
> http://tracker.ceph.com/issues/16767.
>
> My setup: Hammer 0.94.6 cluster only used for S3-compatible object
> storage.  RGW stripe size is 4MiB.
>
> My problem: I have buckets that are reporting TB more utilization (and, in
> one case, 200k more objects) than they should report.  I am trying to
> remove the detritus from the multipart uploads, but removing the leftover
> parts directly from the .rgw.buckets pool is having no effect on bucket
> utilization (i.e. neither the object count nor the space used are
> declining).
>
> To give an example, I have a client that uploaded a very large multipart
> object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of
> the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted
> the upload, at which point RGW removed the 8000 parts from the sixth
> attempt.  When I list the bucket's contents with radosgw-admin
> (radosgw-admin bucket list --bucket=<bucket> --max-entries=<size of
> bucket>), I see all of the object's 8000 parts five separate times, each
> under a namespace of 'multipart'.
>
> Since the multipart upload was aborted, I can't remove the object by name
> via the S3 interface.  Since my RGW stripe size is 4MiB, I know that each
> part of the object will be stored across 4 entries in the .rgw.buckets pool
> -- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive
> 'shadow' files.  I've created a script to remove these parts (rados -p
> .rgw.buckets rm <bucket_id>__multipart_<object+prefix>.<part> and rados -p
> .rgw.buckets rm <bucket_id>__shadow_<object+prefix>.<part>.[1-3]).  The
> removes are completing successfully (in that additional attempts to remove
> the object result in a failure), but I'm not seeing any decrease in the
> bucket's space used, nor am I seeing a decrease in the bucket's object
> count.  In fact, if I do another 'bucket list', all of the removed parts
> are still included.
>
> I've looked at the output of 'gc list --include-all', and the removed
> parts are never showing up for garbage collection.  Garbage collection is
> otherwise functioning normally and will successfully remove data for any
> object properly removed via the S3 interface.
>
> I've also gone so far as to write a script to list the contents of bucket
> shards in the .rgw.buckets.index pool, check for the existence of the entry
> in .rgw.buckets, and remove entries that cannot be found, but that is also
> failing to decrement the size/object count counters.
>
> What am I missing here?  Where, aside from .rgw.buckets and
> .rgw.buckets.index is RGW looking to determine object count and space used
> for a bucket?
>
> Many thanks to any and all who can assist.
>
> Brian Felton
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to