I also should have mentioned that you’ll naturally have to remount your OSD 
filestores once you’ve made the change to ceph.conf. You can either restart 
each OSD after making the config file change or simply use the mount command 
yourself with the remount option to add the allocsize option live to each OSD’s 
filestore mount point.


________________________________

[cid:image71f234.JPG@2c6ee238.46ab8bf6]<https://storagecraft.com>       Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: Wednesday, November 30, 2016 8:50 AM
To: Thomas Bennett <tho...@ska.ac.za>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs = 
rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so XFS 
allocates 1M blocks for new files. That only affected new files, so manual 
defragmentation was still necessary to clean up older data, but once that was 
done everything got better and stayed better.

You can use the xfs_db command to check fragmentation on an XFS volume and 
xfs_fsr to perform a defragmentation. The defragmentation can run on a mounted 
filesystem too, so you don’t even have to rely on Ceph to avoid downtime. I 
probably wouldn’t run it everywhere at once though for performance reasons. A 
single OSD at a time would be ideal, but that’s a matter of preference.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Bennett
Sent: Wednesday, November 30, 2016 5:58 AM
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

Hi Kate and Steve,

Thanks for the replies. Always good to hear back from a community :)

I'm using Linux on x86_64 architecture and the block size is limited to the 
page size which is 4k. So it looks like I'm hitting hard limits in any changes. 
to increase the block size.

I found this out by running the following command:

$ mkfs.xfs -f -b size=8192 /dev/sda1

$ mount -v /dev/sda1 /tmp/disk/
mount: Function not implemented #huh???

Checking out the man page:

$ man mkfs.xfs
 -b block_size_options
      ... XFS  on  Linux  currently  only  supports pagesize or smaller blocks.

I'm hesitant to implement btrfs as its still experimental and ext4 seems to 
have the same current limitation.

Our current approach is to exclude the hard drive that we're getting the poor 
read rates from our procurement process, but it would still be nice to find out 
how much control we have over how ceph-osd  daemons read from the drives. I may 
attempts a strace on an osd daemon as we read to see what the actual read 
request size is being asked to the kernel.

Cheers,
Tom


On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor 
<steve.tay...@storagecraft.com<mailto:steve.tay...@storagecraft.com>> wrote:
We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M 
blocks) due to massive fragmentation in our filestores a while back. We were 
having to defrag all the time and cluster performance was noticeably degraded. 
We also create and delete lots of RBD snapshots on a daily basis, so that 
likely contributed to the fragmentation as well. It’s been MUCH better since we 
switched XFS to use 1M allocations. Virtually no fragmentation and performance 
is consistently good.

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Kate Ward
Sent: Tuesday, November 29, 2016 2:02 PM
To: Thomas Bennett <tho...@ska.ac.za<mailto:tho...@ska.ac.za>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

I have no experience with XFS, but wouldn't expect poor behaviour with it. I 
use ZFS myself and know that it would combine writes, but btrfs might be an 
option.

Do you know what block size was used to create the XFS filesystem? It looks 
like 4k is the default (reasonable) with a max of 64k. Perhaps a larger block 
size will give better performance for your particular use case. (I use a 1M 
block size with ZFS.)
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch04s02.html


On Tue, Nov 29, 2016 at 10:23 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi Kate,

Thanks for your reply. We currently use xfs as created by ceph-deploy.

What would you recommend we try?

Kind regards,
Tom


On Tue, Nov 29, 2016 at 11:14 AM, Kate Ward 
<kate.w...@forestent.com<mailto:kate.w...@forestent.com>> wrote:
What filesystem do you use on the OSD? Have you considered a different 
filesystem that is better at combining requests before they get to the drive?

k8

On Tue, Nov 29, 2016 at 9:52 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi,

We have a use case where we are reading 128MB objects off spinning disks.

We've benchmarked a number of different hard drive and have noticed that for a 
particular hard drive, we're experiencing slow reads by comparison.

This occurs when we have multiple readers (even just 2) reading objects off the 
OSD.

We've recreated the effect using iozone and have noticed that once the record 
size drops to 4k, the hard drive miss behaves.

Is there a setting on Ceph that we can change to fix the minimum read size when 
the ceph-osd daemon reads the object of the hard drives, to see if we can 
overcome the overall slow read rate.

Cheers,
Tom
________________________________
[cid:image001.jpg@01D24AF6.A9BB2470]<https://storagecraft.com>

Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________
________________________________
[cid:image001.jpg@01D24AF6.A9BB2470]<https://storagecraft.com>

Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341<tel:+27%2021%20506%207341>
Mobile: +27 79 5237105<tel:+27%2079%20523%207105>



--
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341
Mobile: +27 79 5237105
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to