On Mon, Aug 1, 2016 at 11:03 PM, Vladislav Bolkhovitin <v...@vlnb.net> wrote:
> Alex Gorbachev wrote on 08/01/2016 04:05 PM:
>> Hi Ilya,
>>
>> On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
>>> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev <a...@iss-integration.com> 
>>> wrote:
>>>> RBD illustration showing RBD ignoring discard until a certain
>>>> threshold - why is that?  This behavior is unfortunately incompatible
>>>> with ESXi discard (UNMAP) behavior.
>>>>
>>>> Is there a way to lower the discard sensitivity on RBD devices?
>>>>
>> <snip>
>>>>
>>>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28
>>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>>>> print SUM/1024 " KB" }'
>>>> 819200 KB
>>>>
>>>> root@e1:/var/log# blkdiscard -o 0 -l 40960000 /dev/rbd28
>>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>>>> print SUM/1024 " KB" }'
>>>> 782336 KB
>>>
>>> Think about it in terms of underlying RADOS objects (4M by default).
>>> There are three cases:
>>>
>>>     discard range       | command
>>>     -----------------------------------------
>>>     whole object        | delete
>>>     object's tail       | truncate
>>>     object's head       | zero
>>>
>>> Obviously, only delete and truncate free up space.  In all of your
>>> examples, except the last one, you are attempting to discard the head
>>> of the (first) object.
>>>
>>> You can free up as little as a sector, as long as it's the tail:
>>>
>>> Offset    Length  Type
>>> 0         4194304 data
>>>
>>> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28
>>>
>>> Offset    Length  Type
>>> 0         4193792 data
>>
>> Looks like ESXi is sending in each discard/unmap with the fixed
>> granularity of 8192 sectors, which is passed verbatim by SCST.  There
>> is a slight reduction in size via rbd diff method, but now I
>> understand that actual truncate only takes effect when the discard
>> happens to clip the tail of an image.
>>
>> So far looking at
>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513
>>
>> ...the only variable we can control is the count of 8192-sector chunks
>> and not their size.  Which means that most of the ESXi discard
>> commands will be disregarded by Ceph.
>>
>> Vlad, is 8192 sectors coming from ESXi, as in the debug:
>>
>> Aug  1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector
>> 1342099456, nr_sects 8192)
>
> Yes, correct. However, to make sure that VMware is not (erroneously) enforced 
> to do this, you need to perform one more check.
>
> 1. Run cat /sys/block/rbd28/queue/discard*. Ceph should report here correct 
> granularity and alignment (4M, I guess?)

This seems to reflect the granularity (4194304), which matches the
8192 pages (8192 x 512 = 4194304).  However, there is no alignment
value.

Can discard_alignment be specified with RBD?

>
> 2. Connect to the this iSCSI device from a Linux box and run sg_inq -p 0xB0 
> /dev/<device>
>
> SCST should correctly report those values for unmap parameters (in blocks).
>
> If in both cases you see correct the same values, then this is VMware issue, 
> because it is ignoring what it is told to do (generate appropriately sized 
> and aligned UNMAP requests). If either Ceph, or SCST doesn't show correct 
> numbers, then the broken party should be fixed.
>
> Vlad
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to