Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Brian Felton
gt;
>>>> Having dug and dug and dug through the code, I've come to the following
>>>> realizations:
>>>>
>>>>1. When a multipart upload is completed, the function
>>>>list_multipart_parts in rgw_op.cc is called.  This seems to be the 
>>>> start of
>>>>the problems, as it will only return those parts in the 'multipart'
>>>>namespace that include the upload id in the name, irrespective of how 
>>>> many
>>>>copies of parts exist on the system with non-upload id prefixes
>>>>2. In the course of writing to the OSDs, a list (remove_objs) is
>>>>processed in cls_rgw.cc:unaccount_entry(), causing bucket stats to be
>>>>decremented
>>>>3. These decremented stats are written to the bucket's index
>>>>entry/entries in .rgw.buckets.index via the CEPH_OSD_OP_OMAPSETHEADER 
>>>> case
>>>>in ReplicatedPG::do_osd_ops
>>>>
>>>> So this explains why manually removing the multipart entries from
>>>> .rgw.buckets and cleaning the shadow entries in .rgw.buckets.index does not
>>>> cause the bucket's stats to be updated.  What I don't know how to do is
>>>> force an update of the bucket's stats from the CLI.  I can retrieve the
>>>> omap header from each of the bucket's shards in .rgw.buckets.index, but I
>>>> don't have the first clue how to read the data or rebuild it into something
>>>> valid.  I've searched the docs and mailing list archives, but I didn't find
>>>> any solution to this problem.  For what it's worth, I've tried 'bucket
>>>> check' with all combinations of '--check-objects' and '--fix' after
>>>> cleaning up .rgw.buckets and .rgw.buckets.index.
>>>>
>>>> From a long-term perspective, it seems there are two possible fixes
>>>> here:
>>>>
>>>>1. Update the logic in list_multipart_parts to return all the parts
>>>>for a multipart object, so that *all* parts in the 'multipart' namespace
>>>>can be properly removed
>>>>2. Update the logic in RGWPutObj::execute() to not restart a write
>>>>if the put_data_and_throttle() call returns -EEXIST but instead put the
>>>>data in the original file(s)
>>>>
>>>> While I think 2 would involve the least amount of yak shaving with the
>>>> multipart logic since the MP logic already assumes a happy path where all
>>>> objects have a prefix of the multipart upload id, I'm all but certain this
>>>> is going to horribly break many other parts of the system that I don't
>>>> fully understand.
>>>>
>>>
>>> #2 is dangerous. That was the original behavior, and it is racy and
>>> *will* lead to data corruption.  OTOH, I don't think #1 is an easy option.
>>> We only keep a single entry per part, so we don't really have a good way to
>>> see all the uploaded pieces. We could extend the meta object to keep record
>>> of all the uploaded parts, and at the end, when assembling everything
>>> remove the parts that aren't part of the final assembly.
>>>
>>>> The good news is that the assembly of the multipart object is being
>>>> done correctly; what I can't figure out is how it knows about the
>>>> non-upload id prefixes when creating the metadata on the multipart object
>>>> in .rgw.buckets.  My best guess is that it's copying the metadata from the
>>>> 'meta' object in .rgw.buckets.extra (which is correctly updated with the
>>>> new part prefixes after each successful upload), but I haven't absolutely
>>>> confirmed that.
>>>>
>>>
>>> Yeah, something along these lines.
>>>
>>>
>>>> If one of the developer folk that are more familiar with this could
>>>> weigh in, I would be greatly appreciative.
>>>>
>>>
>>> btw, did you try to run the radosgw-admin orphan find tool?
>>>
>>> Yehuda
>>>
>>>> Brian
>>>>
>>>> On Tue, Aug 2, 2016 at 8:59 AM, Brian Felton <bjfel...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am actively working through the code and debugging everything.  I
>>>>> figure the issue is with how RGW is listing the parts of a multipart 
>>>>> upload
>>>>> when it completes or aborts the upload (read: it's not getting *all* the
>>>>> parts, just those that are either most recent or

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Yehuda Sadeh-Weinraub
e searched the docs and mailing list archives, but I didn't find
>>> any solution to this problem.  For what it's worth, I've tried 'bucket
>>> check' with all combinations of '--check-objects' and '--fix' after
>>> cleaning up .rgw.buckets and .rgw.buckets.index.
>>>
>>> From a long-term perspective, it seems there are two possible fixes here:
>>>
>>>1. Update the logic in list_multipart_parts to return all the parts
>>>for a multipart object, so that *all* parts in the 'multipart' namespace
>>>can be properly removed
>>>2. Update the logic in RGWPutObj::execute() to not restart a write
>>>if the put_data_and_throttle() call returns -EEXIST but instead put the
>>>data in the original file(s)
>>>
>>> While I think 2 would involve the least amount of yak shaving with the
>>> multipart logic since the MP logic already assumes a happy path where all
>>> objects have a prefix of the multipart upload id, I'm all but certain this
>>> is going to horribly break many other parts of the system that I don't
>>> fully understand.
>>>
>>
>> #2 is dangerous. That was the original behavior, and it is racy and
>> *will* lead to data corruption.  OTOH, I don't think #1 is an easy option.
>> We only keep a single entry per part, so we don't really have a good way to
>> see all the uploaded pieces. We could extend the meta object to keep record
>> of all the uploaded parts, and at the end, when assembling everything
>> remove the parts that aren't part of the final assembly.
>>
>>> The good news is that the assembly of the multipart object is being done
>>> correctly; what I can't figure out is how it knows about the non-upload id
>>> prefixes when creating the metadata on the multipart object in
>>> .rgw.buckets.  My best guess is that it's copying the metadata from the
>>> 'meta' object in .rgw.buckets.extra (which is correctly updated with the
>>> new part prefixes after each successful upload), but I haven't absolutely
>>> confirmed that.
>>>
>>
>> Yeah, something along these lines.
>>
>>
>>> If one of the developer folk that are more familiar with this could
>>> weigh in, I would be greatly appreciative.
>>>
>>
>> btw, did you try to run the radosgw-admin orphan find tool?
>>
>> Yehuda
>>
>>> Brian
>>>
>>> On Tue, Aug 2, 2016 at 8:59 AM, Brian Felton <bjfel...@gmail.com> wrote:
>>>
>>>> I am actively working through the code and debugging everything.  I
>>>> figure the issue is with how RGW is listing the parts of a multipart upload
>>>> when it completes or aborts the upload (read: it's not getting *all* the
>>>> parts, just those that are either most recent or tagged with the upload
>>>> id).  As soon as I can figure out a patch, or, more importantly, how to
>>>> manually address the problem, I will respond with instructions.
>>>>
>>>> The reported bug contains detailed instructions on reproducing the
>>>> problem, so it's trivial to reproduce and test on a small and/or new
>>>> cluster.
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <
>>>> tyler.bis...@beyondhosting.net> wrote:
>>>>
>>>>> We're having the same issues.   I have a 1200TB pool at 90%
>>>>> utilization however disk utilization is only 40%
>>>>>
>>>>>
>>>>>
>>>>>  [image: http://static.beyondhosting.net/img/bh-small.png]
>>>>>
>>>>>
>>>>> *Tyler Bishop *Chief Technical Officer
>>>>> 513-299-7108 x10
>>>>>
>>>>> tyler.bis...@beyondhosting.net
>>>>>
>>>>> If you are not the intended recipient of this transmission you are
>>>>> notified that disclosing, copying, distributing or taking any action in
>>>>> reliance on the contents of this information is strictly prohibited.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *From: *"Brian Felton" <bjfel...@gmail.com>
>>>>> *To: *"ceph-users" <ceph-us...@ceph.com>
>>>>> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
>>>>> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>>>>>
>>>>> Greetings,
>>>>>
>>

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Brian Felton
 I don't think #1 is an easy option. We only
> keep a single entry per part, so we don't really have a good way to see all
> the uploaded pieces. We could extend the meta object to keep record of all
> the uploaded parts, and at the end, when assembling everything remove the
> parts that aren't part of the final assembly.
>
>> The good news is that the assembly of the multipart object is being done
>> correctly; what I can't figure out is how it knows about the non-upload id
>> prefixes when creating the metadata on the multipart object in
>> .rgw.buckets.  My best guess is that it's copying the metadata from the
>> 'meta' object in .rgw.buckets.extra (which is correctly updated with the
>> new part prefixes after each successful upload), but I haven't absolutely
>> confirmed that.
>>
>
> Yeah, something along these lines.
>
>
>> If one of the developer folk that are more familiar with this could weigh
>> in, I would be greatly appreciative.
>>
>
> btw, did you try to run the radosgw-admin orphan find tool?
>
> Yehuda
>
>> Brian
>>
>> On Tue, Aug 2, 2016 at 8:59 AM, Brian Felton <bjfel...@gmail.com> wrote:
>>
>>> I am actively working through the code and debugging everything.  I
>>> figure the issue is with how RGW is listing the parts of a multipart upload
>>> when it completes or aborts the upload (read: it's not getting *all* the
>>> parts, just those that are either most recent or tagged with the upload
>>> id).  As soon as I can figure out a patch, or, more importantly, how to
>>> manually address the problem, I will respond with instructions.
>>>
>>> The reported bug contains detailed instructions on reproducing the
>>> problem, so it's trivial to reproduce and test on a small and/or new
>>> cluster.
>>>
>>> Brian
>>>
>>>
>>> On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <
>>> tyler.bis...@beyondhosting.net> wrote:
>>>
>>>> We're having the same issues.   I have a 1200TB pool at 90% utilization
>>>> however disk utilization is only 40%
>>>>
>>>>
>>>>
>>>>  [image: http://static.beyondhosting.net/img/bh-small.png]
>>>>
>>>>
>>>> *Tyler Bishop *Chief Technical Officer
>>>> 513-299-7108 x10
>>>>
>>>> tyler.bis...@beyondhosting.net
>>>>
>>>> If you are not the intended recipient of this transmission you are
>>>> notified that disclosing, copying, distributing or taking any action in
>>>> reliance on the contents of this information is strictly prohibited.
>>>>
>>>>
>>>>
>>>> --
>>>> *From: *"Brian Felton" <bjfel...@gmail.com>
>>>> *To: *"ceph-users" <ceph-us...@ceph.com>
>>>> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
>>>> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>>>>
>>>> Greetings,
>>>>
>>>> Background: If an object storage client re-uploads parts to a multipart
>>>> object, RadosGW does not clean up all of the parts properly when the
>>>> multipart upload is aborted or completed.  You can read all of the gory
>>>> details (including reproduction steps) in this bug report:
>>>> http://tracker.ceph.com/issues/16767.
>>>>
>>>> My setup: Hammer 0.94.6 cluster only used for S3-compatible object
>>>> storage.  RGW stripe size is 4MiB.
>>>>
>>>> My problem: I have buckets that are reporting TB more utilization (and,
>>>> in one case, 200k more objects) than they should report.  I am trying to
>>>> remove the detritus from the multipart uploads, but removing the leftover
>>>> parts directly from the .rgw.buckets pool is having no effect on bucket
>>>> utilization (i.e. neither the object count nor the space used are
>>>> declining).
>>>>
>>>> To give an example, I have a client that uploaded a very large
>>>> multipart object (8000 15MiB parts).  Due to a bug in the client, it
>>>> uploaded each of the 8000 parts 6 times.  After the sixth attempt, it gave
>>>> up and aborted the upload, at which point RGW removed the 8000 parts from
>>>> the sixth attempt.  When I list the bucket's contents with radosgw-admin
>>>> (radosgw-admin bucket list --bucket= --max-entries=>>> bucket>), I see all of the object's 8000 parts five separate times, each
&

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Yehuda Sadeh-Weinraub
>>> 513-299-7108 x10
>>>
>>> tyler.bis...@beyondhosting.net
>>>
>>> If you are not the intended recipient of this transmission you are
>>> notified that disclosing, copying, distributing or taking any action in
>>> reliance on the contents of this information is strictly prohibited.
>>>
>>>
>>>
>>> --
>>> *From: *"Brian Felton" <bjfel...@gmail.com>
>>> *To: *"ceph-users" <ceph-us...@ceph.com>
>>> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
>>> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>>>
>>> Greetings,
>>>
>>> Background: If an object storage client re-uploads parts to a multipart
>>> object, RadosGW does not clean up all of the parts properly when the
>>> multipart upload is aborted or completed.  You can read all of the gory
>>> details (including reproduction steps) in this bug report:
>>> http://tracker.ceph.com/issues/16767.
>>>
>>> My setup: Hammer 0.94.6 cluster only used for S3-compatible object
>>> storage.  RGW stripe size is 4MiB.
>>>
>>> My problem: I have buckets that are reporting TB more utilization (and,
>>> in one case, 200k more objects) than they should report.  I am trying to
>>> remove the detritus from the multipart uploads, but removing the leftover
>>> parts directly from the .rgw.buckets pool is having no effect on bucket
>>> utilization (i.e. neither the object count nor the space used are
>>> declining).
>>>
>>> To give an example, I have a client that uploaded a very large multipart
>>> object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of
>>> the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted
>>> the upload, at which point RGW removed the 8000 parts from the sixth
>>> attempt.  When I list the bucket's contents with radosgw-admin
>>> (radosgw-admin bucket list --bucket= --max-entries=>> bucket>), I see all of the object's 8000 parts five separate times, each
>>> under a namespace of 'multipart'.
>>>
>>> Since the multipart upload was aborted, I can't remove the object by
>>> name via the S3 interface.  Since my RGW stripe size is 4MiB, I know that
>>> each part of the object will be stored across 4 entries in the .rgw.buckets
>>> pool -- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three
>>> successive 'shadow' files.  I've created a script to remove these parts
>>> (rados -p .rgw.buckets rm __multipart_<object+prefix>. and
>>> rados -p .rgw.buckets rm
>>> __shadow_<object+prefix>..[1-3]).  The removes are
>>> completing successfully (in that additional attempts to remove the object
>>> result in a failure), but I'm not seeing any decrease in the bucket's space
>>> used, nor am I seeing a decrease in the bucket's object count.  In fact, if
>>> I do another 'bucket list', all of the removed parts are still included.
>>>
>>> I've looked at the output of 'gc list --include-all', and the removed
>>> parts are never showing up for garbage collection.  Garbage collection is
>>> otherwise functioning normally and will successfully remove data for any
>>> object properly removed via the S3 interface.
>>>
>>> I've also gone so far as to write a script to list the contents of
>>> bucket shards in the .rgw.buckets.index pool, check for the existence of
>>> the entry in .rgw.buckets, and remove entries that cannot be found, but
>>> that is also failing to decrement the size/object count counters.
>>>
>>> What am I missing here?  Where, aside from .rgw.buckets and
>>> .rgw.buckets.index is RGW looking to determine object count and space used
>>> for a bucket?
>>>
>>> Many thanks to any and all who can assist.
>>>
>>> Brian Felton
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Brian Felton
This may just be me having a conversation with myself, but maybe this will
be helpful to someone else.

Having dug and dug and dug through the code, I've come to the following
realizations:

   1. When a multipart upload is completed, the function
   list_multipart_parts in rgw_op.cc is called.  This seems to be the start of
   the problems, as it will only return those parts in the 'multipart'
   namespace that include the upload id in the name, irrespective of how many
   copies of parts exist on the system with non-upload id prefixes
   2. In the course of writing to the OSDs, a list (remove_objs) is
   processed in cls_rgw.cc:unaccount_entry(), causing bucket stats to be
   decremented
   3. These decremented stats are written to the bucket's index
   entry/entries in .rgw.buckets.index via the CEPH_OSD_OP_OMAPSETHEADER case
   in ReplicatedPG::do_osd_ops

So this explains why manually removing the multipart entries from
.rgw.buckets and cleaning the shadow entries in .rgw.buckets.index does not
cause the bucket's stats to be updated.  What I don't know how to do is
force an update of the bucket's stats from the CLI.  I can retrieve the
omap header from each of the bucket's shards in .rgw.buckets.index, but I
don't have the first clue how to read the data or rebuild it into something
valid.  I've searched the docs and mailing list archives, but I didn't find
any solution to this problem.  For what it's worth, I've tried 'bucket
check' with all combinations of '--check-objects' and '--fix' after
cleaning up .rgw.buckets and .rgw.buckets.index.

>From a long-term perspective, it seems there are two possible fixes here:

   1. Update the logic in list_multipart_parts to return all the parts for
   a multipart object, so that *all* parts in the 'multipart' namespace can be
   properly removed
   2. Update the logic in RGWPutObj::execute() to not restart a write if
   the put_data_and_throttle() call returns -EEXIST but instead put the data
   in the original file(s)

While I think 2 would involve the least amount of yak shaving with the
multipart logic since the MP logic already assumes a happy path where all
objects have a prefix of the multipart upload id, I'm all but certain this
is going to horribly break many other parts of the system that I don't
fully understand.

The good news is that the assembly of the multipart object is being done
correctly; what I can't figure out is how it knows about the non-upload id
prefixes when creating the metadata on the multipart object in
.rgw.buckets.  My best guess is that it's copying the metadata from the
'meta' object in .rgw.buckets.extra (which is correctly updated with the
new part prefixes after each successful upload), but I haven't absolutely
confirmed that.

If one of the developer folk that are more familiar with this could weigh
in, I would be greatly appreciative.

Brian

On Tue, Aug 2, 2016 at 8:59 AM, Brian Felton <bjfel...@gmail.com> wrote:

> I am actively working through the code and debugging everything.  I figure
> the issue is with how RGW is listing the parts of a multipart upload when
> it completes or aborts the upload (read: it's not getting *all* the parts,
> just those that are either most recent or tagged with the upload id).  As
> soon as I can figure out a patch, or, more importantly, how to manually
> address the problem, I will respond with instructions.
>
> The reported bug contains detailed instructions on reproducing the
> problem, so it's trivial to reproduce and test on a small and/or new
> cluster.
>
> Brian
>
>
> On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <
> tyler.bis...@beyondhosting.net> wrote:
>
>> We're having the same issues.   I have a 1200TB pool at 90% utilization
>> however disk utilization is only 40%
>>
>>
>>
>>  [image: http://static.beyondhosting.net/img/bh-small.png]
>>
>>
>> *Tyler Bishop *Chief Technical Officer
>> 513-299-7108 x10
>>
>> tyler.bis...@beyondhosting.net
>>
>> If you are not the intended recipient of this transmission you are
>> notified that disclosing, copying, distributing or taking any action in
>> reliance on the contents of this information is strictly prohibited.
>>
>>
>>
>> --------------
>> *From: *"Brian Felton" <bjfel...@gmail.com>
>> *To: *"ceph-users" <ceph-us...@ceph.com>
>> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
>> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>>
>> Greetings,
>>
>> Background: If an object storage client re-uploads parts to a multipart
>> object, RadosGW does not clean up all of the parts properly when the
>> multipart upload is aborted or completed.  You can read all of the gory
>> details (including reproduction steps) in this bug report:
>

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-02 Thread Tyler Bishop
We're having the same issues. I have a 1200TB pool at 90% utilization however 
disk utilization is only 40% 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Brian Felton" <bjfel...@gmail.com> 
To: "ceph-users" <ceph-us...@ceph.com> 
Sent: Wednesday, July 27, 2016 9:24:30 AM 
Subject: [ceph-users] Cleaning Up Failed Multipart Uploads 

Greetings, 

Background: If an object storage client re-uploads parts to a multipart object, 
RadosGW does not clean up all of the parts properly when the multipart upload 
is aborted or completed. You can read all of the gory details (including 
reproduction steps) in this bug report: http://tracker.ceph.com/issues/16767 . 

My setup: Hammer 0.94.6 cluster only used for S3-compatible object storage. RGW 
stripe size is 4MiB. 

My problem: I have buckets that are reporting TB more utilization (and, in one 
case, 200k more objects) than they should report. I am trying to remove the 
detritus from the multipart uploads, but removing the leftover parts directly 
from the .rgw.buckets pool is having no effect on bucket utilization (i.e. 
neither the object count nor the space used are declining). 

To give an example, I have a client that uploaded a very large multipart object 
(8000 15MiB parts). Due to a bug in the client, it uploaded each of the 8000 
parts 6 times. After the sixth attempt, it gave up and aborted the upload, at 
which point RGW removed the 8000 parts from the sixth attempt. When I list the 
bucket's contents with radosgw-admin (radosgw-admin bucket list 
--bucket= --max-entries=), I see all of the object's 
8000 parts five separate times, each under a namespace of 'multipart'. 

Since the multipart upload was aborted, I can't remove the object by name via 
the S3 interface. Since my RGW stripe size is 4MiB, I know that each part of 
the object will be stored across 4 entries in the .rgw.buckets pool -- 4 MiB in 
a 'multipart' file, and 4, 4, and 3 MiB in three successive 'shadow' files. 
I've created a script to remove these parts (rados -p .rgw.buckets rm 
__multipart_<object+prefix>. and rados -p .rgw.buckets rm 
__shadow_<object+prefix>..[1-3]). The removes are completing 
successfully (in that additional attempts to remove the object result in a 
failure), but I'm not seeing any decrease in the bucket's space used, nor am I 
seeing a decrease in the bucket's object count. In fact, if I do another 
'bucket list', all of the removed parts are still included. 

I've looked at the output of 'gc list --include-all', and the removed parts are 
never showing up for garbage collection. Garbage collection is otherwise 
functioning normally and will successfully remove data for any object properly 
removed via the S3 interface. 

I've also gone so far as to write a script to list the contents of bucket 
shards in the .rgw.buckets.index pool, check for the existence of the entry in 
.rgw.buckets, and remove entries that cannot be found, but that is also failing 
to decrement the size/object count counters. 

What am I missing here? Where, aside from .rgw.buckets and .rgw.buckets.index 
is RGW looking to determine object count and space used for a bucket? 

Many thanks to any and all who can assist. 

Brian Felton 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-02 Thread Brian Felton
I am actively working through the code and debugging everything.  I figure
the issue is with how RGW is listing the parts of a multipart upload when
it completes or aborts the upload (read: it's not getting *all* the parts,
just those that are either most recent or tagged with the upload id).  As
soon as I can figure out a patch, or, more importantly, how to manually
address the problem, I will respond with instructions.

The reported bug contains detailed instructions on reproducing the problem,
so it's trivial to reproduce and test on a small and/or new cluster.

Brian

On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <tyler.bis...@beyondhosting.net
> wrote:

> We're having the same issues.   I have a 1200TB pool at 90% utilization
> however disk utilization is only 40%
>
>
>
>  [image: http://static.beyondhosting.net/img/bh-small.png]
>
>
> *Tyler Bishop *Chief Technical Officer
> 513-299-7108 x10
>
> tyler.bis...@beyondhosting.net
>
> If you are not the intended recipient of this transmission you are
> notified that disclosing, copying, distributing or taking any action in
> reliance on the contents of this information is strictly prohibited.
>
>
>
> --
> *From: *"Brian Felton" <bjfel...@gmail.com>
> *To: *"ceph-users" <ceph-us...@ceph.com>
> *Sent: *Wednesday, July 27, 2016 9:24:30 AM
> *Subject: *[ceph-users] Cleaning Up Failed Multipart Uploads
>
> Greetings,
>
> Background: If an object storage client re-uploads parts to a multipart
> object, RadosGW does not clean up all of the parts properly when the
> multipart upload is aborted or completed.  You can read all of the gory
> details (including reproduction steps) in this bug report:
> http://tracker.ceph.com/issues/16767.
>
> My setup: Hammer 0.94.6 cluster only used for S3-compatible object
> storage.  RGW stripe size is 4MiB.
>
> My problem: I have buckets that are reporting TB more utilization (and, in
> one case, 200k more objects) than they should report.  I am trying to
> remove the detritus from the multipart uploads, but removing the leftover
> parts directly from the .rgw.buckets pool is having no effect on bucket
> utilization (i.e. neither the object count nor the space used are
> declining).
>
> To give an example, I have a client that uploaded a very large multipart
> object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of
> the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted
> the upload, at which point RGW removed the 8000 parts from the sixth
> attempt.  When I list the bucket's contents with radosgw-admin
> (radosgw-admin bucket list --bucket= --max-entries= bucket>), I see all of the object's 8000 parts five separate times, each
> under a namespace of 'multipart'.
>
> Since the multipart upload was aborted, I can't remove the object by name
> via the S3 interface.  Since my RGW stripe size is 4MiB, I know that each
> part of the object will be stored across 4 entries in the .rgw.buckets pool
> -- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive
> 'shadow' files.  I've created a script to remove these parts (rados -p
> .rgw.buckets rm __multipart_<object+prefix>. and rados -p
> .rgw.buckets rm __shadow_<object+prefix>..[1-3]).  The
> removes are completing successfully (in that additional attempts to remove
> the object result in a failure), but I'm not seeing any decrease in the
> bucket's space used, nor am I seeing a decrease in the bucket's object
> count.  In fact, if I do another 'bucket list', all of the removed parts
> are still included.
>
> I've looked at the output of 'gc list --include-all', and the removed
> parts are never showing up for garbage collection.  Garbage collection is
> otherwise functioning normally and will successfully remove data for any
> object properly removed via the S3 interface.
>
> I've also gone so far as to write a script to list the contents of bucket
> shards in the .rgw.buckets.index pool, check for the existence of the entry
> in .rgw.buckets, and remove entries that cannot be found, but that is also
> failing to decrement the size/object count counters.
>
> What am I missing here?  Where, aside from .rgw.buckets and
> .rgw.buckets.index is RGW looking to determine object count and space used
> for a bucket?
>
> Many thanks to any and all who can assist.
>
> Brian Felton
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cleaning Up Failed Multipart Uploads

2016-07-27 Thread Brian Felton
Greetings,

Background: If an object storage client re-uploads parts to a multipart
object, RadosGW does not clean up all of the parts properly when the
multipart upload is aborted or completed.  You can read all of the gory
details (including reproduction steps) in this bug report:
http://tracker.ceph.com/issues/16767.

My setup: Hammer 0.94.6 cluster only used for S3-compatible object
storage.  RGW stripe size is 4MiB.

My problem: I have buckets that are reporting TB more utilization (and, in
one case, 200k more objects) than they should report.  I am trying to
remove the detritus from the multipart uploads, but removing the leftover
parts directly from the .rgw.buckets pool is having no effect on bucket
utilization (i.e. neither the object count nor the space used are
declining).

To give an example, I have a client that uploaded a very large multipart
object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of
the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted
the upload, at which point RGW removed the 8000 parts from the sixth
attempt.  When I list the bucket's contents with radosgw-admin
(radosgw-admin bucket list --bucket= --max-entries=), I see all of the object's 8000 parts five separate times, each
under a namespace of 'multipart'.

Since the multipart upload was aborted, I can't remove the object by name
via the S3 interface.  Since my RGW stripe size is 4MiB, I know that each
part of the object will be stored across 4 entries in the .rgw.buckets pool
-- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive
'shadow' files.  I've created a script to remove these parts (rados -p
.rgw.buckets rm __multipart_. and rados -p
.rgw.buckets rm __shadow_..[1-3]).  The
removes are completing successfully (in that additional attempts to remove
the object result in a failure), but I'm not seeing any decrease in the
bucket's space used, nor am I seeing a decrease in the bucket's object
count.  In fact, if I do another 'bucket list', all of the removed parts
are still included.

I've looked at the output of 'gc list --include-all', and the removed parts
are never showing up for garbage collection.  Garbage collection is
otherwise functioning normally and will successfully remove data for any
object properly removed via the S3 interface.

I've also gone so far as to write a script to list the contents of bucket
shards in the .rgw.buckets.index pool, check for the existence of the entry
in .rgw.buckets, and remove entries that cannot be found, but that is also
failing to decrement the size/object count counters.

What am I missing here?  Where, aside from .rgw.buckets and
.rgw.buckets.index is RGW looking to determine object count and space used
for a bucket?

Many thanks to any and all who can assist.

Brian Felton
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com