Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-14 Thread Jason Dillaman
On Mon, May 14, 2018 at 12:15 AM, Marc Schöchlin  wrote:
> Hello Jason,
>
> many thanks for your informative response!
>
> Am 11.05.2018 um 17:02 schrieb Jason Dillaman:
>> I cannot speak for Xen, but in general IO to a block device will hit
>> the pagecache unless the IO operation is flagged as direct (e.g.
>> O_DIRECT) to bypass the pagecache and directly send it to the block
>> device.
> Sure, but it seems that xenserver just forwards io from virtual machines
> (vm: blkfront, dom-0: blkback) to the ndb device in dom-0.
>>> Sorry, my question was a bit unprecice: I was searching for usage statistics
>>> of the rbd cache.
>>> Is there also a possibility to gather rbd_cache usage statistics as a source
>>> of verification for optimizing the cache settings?
>> You can run "perf dump" instead of "config show" to dump out the
>> current performance counters. There are some stats from the in-memory
>> cache included in there.
> Great, i was not aware of that.
> There are really a lot of statistics which might be useful for analyzing
> whats going on or if the optimizations improve the performance of our
> systems.
>>> Can you provide some hints how to about adequate cache settings for a write
>>> intensive environment (70% write, 30% read)?
>>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>>> of 10 seconds?
>> Depends on your workload and your testing results. I suspect a
>> database on top of RBD is going to do its own read caching and will be
>> issuing lots of flush calls to the block device, potentially negating
>> the need for a large cache.
>
> Sure, reducing flushes with the acceptance of a degraded level of
> reliability seems to be one import key for improved performance.
>
>>>
>>> Our typical workload is originated over 70 percent in database write
>>> operations in the virtual machines.
>>> Therefore collecting write operations with rbd cache and writing them in
>>> chunks to ceph might be a good thing.
>>> A higher limit for "rbd cache max dirty" might be a adequate here.
>>> At the other side our read workload typically reads huge files in sequential
>>> manner.
>>>
>>> Therefore it might be useful to do start with a configuration like that:
>>>
>>> rbd cache size = 64MB
>>> rbd cache max dirty = 48MB
>>> rbd cache target dirty = 32MB
>>> rbd cache max dirty age = 10
>>>
>>> What is the strategy of librbd to write data to the storage from rbd_cache
>>> if "rbd cache max dirty = 48MB" is reached?
>>> Is there a reduction of io operations (merging of ios) compared to the
>>> granularity of writes of my virtual machines?
>> If the cache is full, incoming IO will be stalled as the dirty bits
>> are written back to the backing RBD image to make room available for
>> the new IO request.
> Sure, i will have a look at the statistics and the throughput.
> Is there any consolidation of write requests in rbd cache?
>
> Example:
> If a vm writes small io-requests to the ndb device with belong to the
> same rados object - does librbd consollidate these requests to  a single
> ceph io?
> What strategies does librd use for that?

The librbd cache will consolidate sequential dirty extents within the
same object, but it does not consolidate all dirty extents within the
same object to the same write request.

> Regards
> Marc
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-14 Thread Marc Schöchlin
Hello Jason,

many thanks for your informative response!

Am 11.05.2018 um 17:02 schrieb Jason Dillaman:
> I cannot speak for Xen, but in general IO to a block device will hit
> the pagecache unless the IO operation is flagged as direct (e.g.
> O_DIRECT) to bypass the pagecache and directly send it to the block
> device.
Sure, but it seems that xenserver just forwards io from virtual machines
(vm: blkfront, dom-0: blkback) to the ndb device in dom-0.
>> Sorry, my question was a bit unprecice: I was searching for usage statistics
>> of the rbd cache.
>> Is there also a possibility to gather rbd_cache usage statistics as a source
>> of verification for optimizing the cache settings?
> You can run "perf dump" instead of "config show" to dump out the
> current performance counters. There are some stats from the in-memory
> cache included in there.
Great, i was not aware of that.
There are really a lot of statistics which might be useful for analyzing
whats going on or if the optimizations improve the performance of our
systems.
>> Can you provide some hints how to about adequate cache settings for a write
>> intensive environment (70% write, 30% read)?
>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>> of 10 seconds?
> Depends on your workload and your testing results. I suspect a
> database on top of RBD is going to do its own read caching and will be
> issuing lots of flush calls to the block device, potentially negating
> the need for a large cache.

Sure, reducing flushes with the acceptance of a degraded level of
reliability seems to be one import key for improved performance.

>>
>> Our typical workload is originated over 70 percent in database write
>> operations in the virtual machines.
>> Therefore collecting write operations with rbd cache and writing them in
>> chunks to ceph might be a good thing.
>> A higher limit for "rbd cache max dirty" might be a adequate here.
>> At the other side our read workload typically reads huge files in sequential
>> manner.
>>
>> Therefore it might be useful to do start with a configuration like that:
>>
>> rbd cache size = 64MB
>> rbd cache max dirty = 48MB
>> rbd cache target dirty = 32MB
>> rbd cache max dirty age = 10
>>
>> What is the strategy of librbd to write data to the storage from rbd_cache
>> if "rbd cache max dirty = 48MB" is reached?
>> Is there a reduction of io operations (merging of ios) compared to the
>> granularity of writes of my virtual machines?
> If the cache is full, incoming IO will be stalled as the dirty bits
> are written back to the backing RBD image to make room available for
> the new IO request.
Sure, i will have a look at the statistics and the throughput.
Is there any consolidation of write requests in rbd cache?

Example:
If a vm writes small io-requests to the ndb device with belong to the
same rados object - does librbd consollidate these requests to  a single
ceph io?
What strategies does librd use for that?

Regards
Marc

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-11 Thread Jason Dillaman
On Fri, May 11, 2018 at 3:59 AM, Marc Schöchlin  wrote:
> Hello Jason,
>
> thanks for your response.
>
>
> Am 10.05.2018 um 21:18 schrieb Jason Dillaman:
>
> If i configure caches like described at
> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
> caches per rbd-nbd/krbd device or is there a only a single cache area.
>
> The librbd cache is per device, but if you aren't performing direct
> IOs to the device, you would also have the unified Linux pagecache on
> top of all the devices.
>
> XENServer directly utilizes nbd devices which are connected in my
> understanding by blkback (dom-0) and blkfront (dom-U) to the virtual
> machines.
> In my understanding pagecache is only part of the game if i use data on
> mounted filesystems (VFS usage).
> Therefore it would be a good thing to use rbd cache for rbd-nbd (/dev/nbdX).

I cannot speak for Xen, but in general IO to a block device will hit
the pagecache unless the IO operation is flagged as direct (e.g.
O_DIRECT) to bypass the pagecache and directly send it to the block
device.

> How can i identify the rbd cache with the tools provided by the operating
> system?
>
> Identify how? You can enable the admin sockets and use "ceph
> --admin-deamon config show" to display the in-use settings.
>
>
> Ah ok, i discovered that i can gather configuration settings by executing:
> (xen_test is the identity of the xen rbd_nbd user)
>
> ceph --id xen_test --admin-daemon /var/run/ceph/ceph-client.xen_test.asok
> config show | less -p rbd_cache
>
> Sorry, my question was a bit unprecice: I was searching for usage statistics
> of the rbd cache.
> Is there also a possibility to gather rbd_cache usage statistics as a source
> of verification for optimizing the cache settings?

You can run "perf dump" instead of "config show" to dump out the
current performance counters. There are some stats from the in-memory
cache included in there.

> Due to the fact that a rbd cache is created for every device, i assume that
> the rbd cache simply part of the rbd-nbd process memory.

Correct.

>
> Can you provide some hints how to about adequate cache settings for a write
> intensive environment (70% write, 30% read)?
> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
> of 10 seconds?

Depends on your workload and your testing results. I suspect a
database on top of RBD is going to do its own read caching and will be
issuing lots of flush calls to the block device, potentially negating
the need for a large cache.

> The librbd cache is really only useful for sequential read-ahead and
> for small writes (assuming writeback is enabled). Assuming you aren't
> using direct IO, I'd suspect your best performance would be to disable
> the librbd cache and rely on the Linux pagecache to work its magic.
>
> As described, xenserver directly utilizes the nbd devices.
>
> Our typical workload is originated over 70 percent in database write
> operations in the virtual machines.
> Therefore collecting write operations with rbd cache and writing them in
> chunks to ceph might be a good thing.
> A higher limit for "rbd cache max dirty" might be a adequate here.
> At the other side our read workload typically reads huge files in sequential
> manner.
>
> Therefore it might be useful to do start with a configuration like that:
>
> rbd cache size = 64MB
> rbd cache max dirty = 48MB
> rbd cache target dirty = 32MB
> rbd cache max dirty age = 10
>
> What is the strategy of librbd to write data to the storage from rbd_cache
> if "rbd cache max dirty = 48MB" is reached?
> Is there a reduction of io operations (merging of ios) compared to the
> granularity of writes of my virtual machines?

If the cache is full, incoming IO will be stalled as the dirty bits
are written back to the backing RBD image to make room available for
the new IO request.

> Additionally, i would do no non-default settings for readahead on nbd level
> to have the possibility to configure this at operating system level of the
> vms.
>
> Our operating systems in the virtual machines use currently a readahead of
> 256 (256*512 = 128KB).
> From my point of view it would be a good thing for sequential reads in big
> files to increase readahead to a higher value.
> We haven't changed the default rbd object size of 4MB - nevertheless it
> might be a good thing to increase the readahead to 1024 (=512KB) to decrease
> read requests by factor of 4 for sequential reads.
>
> What do you think about this?

Depends on your workload.

> Regards
> Marc
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-11 Thread Marc Schöchlin
Hello Jason,

thanks for your response.


Am 10.05.2018 um 21:18 schrieb Jason Dillaman:

>> If i configure caches like described at
>> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
>> caches per rbd-nbd/krbd device or is there a only a single cache area.
> The librbd cache is per device, but if you aren't performing direct
> IOs to the device, you would also have the unified Linux pagecache on
> top of all the devices.
XENServer directly utilizes nbd devices which are connected in my
understanding by blkback (dom-0) and blkfront (dom-U) to the virtual
machines.
In my understanding pagecache is only part of the game if i use data on
mounted filesystems (VFS usage).
Therefore it would be a good thing to use rbd cache for rbd-nbd (/dev/nbdX).
>> How can i identify the rbd cache with the tools provided by the operating
>> system?
> Identify how? You can enable the admin sockets and use "ceph
> --admin-deamon config show" to display the in-use settings.

Ah ok, i discovered that i can gather configuration settings by executing:
(xen_test is the identity of the xen rbd_nbd user)

ceph --id xen_test --admin-daemon
/var/run/ceph/ceph-client.xen_test.asok config show | less -p rbd_cache

Sorry, my question was a bit unprecice: I was searching for usage
statistics of the rbd cache.
Is there also a possibility to gather rbd_cache usage statistics as a
source of verification for optimizing the cache settings?

Due to the fact that a rbd cache is created for every device, i assume
that the rbd cache simply part of the rbd-nbd process memory.


>> Can you provide some hints how to about adequate cache settings for a write
>> intensive environment (70% write, 30% read)?
>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>> of 10 seconds?
> The librbd cache is really only useful for sequential read-ahead and
> for small writes (assuming writeback is enabled). Assuming you aren't
> using direct IO, I'd suspect your best performance would be to disable
> the librbd cache and rely on the Linux pagecache to work its magic.
As described, xenserver directly utilizes the nbd devices.

Our typical workload is originated over 70 percent in database write
operations in the virtual machines.
Therefore collecting write operations with rbd cache and writing them in
chunks to ceph might be a good thing.
A higher limit for "rbd cache max dirty" might be a adequate here.
At the other side our read workload typically reads huge files in
sequential manner.

Therefore it might be useful to do start with a configuration like that:

rbd cache size = 64MB
rbd cache max dirty = 48MB
rbd cache target dirty = 32MB
rbd cache max dirty age = 10

What is the strategy of librbd to write data to the storage from
rbd_cache if "rbd cache max dirty = 48MB" is reached?
Is there a reduction of io operations (merging of ios) compared to the
granularity of writes of my virtual machines?

Additionally, i would do no non-default settings for readahead on nbd
level to have the possibility to configure this at operating system
level of the vms.

Our operating systems in the virtual machines use currently a readahead
of 256 (256*512 = 128KB).
From my point of view it would be a good thing for sequential reads in
big files to increase readahead to a higher value.
We haven't changed the default rbd object size of 4MB - nevertheless it
might be a good thing to increase the readahead to 1024 (=512KB) to
decrease read requests by factor of 4for sequential reads.

What do you think about this?

Regards
Marc

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-10 Thread Jason Dillaman
On Thu, May 10, 2018 at 12:03 PM, Marc Schöchlin  wrote:
> Hello list,
>
> i map ~30 rbds  per xenserver host by using rbd-nbd to run virtual machines
> on these devices.
>
> I have the following questions:
>
> Is it possible to use rbd cache for rbd-nbd? I assume that this is true, but
> the documentation does not make a clear statement about this.
> (http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/(

It's on by default since it's a librbd client and that's the default setting.

> If i configure caches like described at
> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
> caches per rbd-nbd/krbd device or is there a only a single cache area.

The librbd cache is per device, but if you aren't performing direct
IOs to the device, you would also have the unified Linux pagecache on
top of all the devices.

> How can i identify the rbd cache with the tools provided by the operating
> system?

Identify how? You can enable the admin sockets and use "ceph
--admin-deamon config show" to display the in-use settings.

> Can you provide some hints how to about adequate cache settings for a write
> intensive environment (70% write, 30% read)?
> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
> of 10 seconds?

The librbd cache is really only useful for sequential read-ahead and
for small writes (assuming writeback is enabled). Assuming you aren't
using direct IO, I'd suspect your best performance would be to disable
the librbd cache and rely on the Linux pagecache to work its magic.

>
> Regards
> Marc
>
> Our system:
>
> Luminous/12.2.5
> Ubuntu 16.04
> 5 OSD Nodes (24*8 TB HDD OSDs, 48*1TB SSD OSDS, Bluestore, 6Gb Cache per
> OSD)
> Size per OSD, 192GB RAM, 56 HT CPUs)
> 3 Mons (64 GB RAM, 200GB SSD, 4 visible CPUs)
> 2 * 10 GBIT, SFP+, bonded xmit_hash_policy layer3+4
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com