I can't comment directly on the relation XFS fragmentation has to Bluestore, 
but I had a similar issue probably 2-3 years ago where XFS fragmentation was 
causing a significant degradation in cluster performance. The use case was RBDs 
with lots of snapshots created and deleted at regular intervals. XFS got pretty 
severely fragmented and the cluster slowed down quickly.

The solution I found was to set the XFS allocsize to match the RBD object size 
via osd_mount_options_xfs. Of course I also had to defragment XFS to clear up 
the existing fragmentation, but that was fairly painless. XFS fragmentation 
hasn't been an issue since. That solution isn't as applicable in an object 
store use case where the object size is more variable, but increasing the XFS 
allocsize could still help.

As far as Bluestore goes, I haven't deployed it in production yet, but I would 
expect that manipulating bluestore_min_alloc_size in a similar fashion would 
yield similar benefits. Of course you are then wasting some disk space for 
every object that ends up being smaller than that allocation size in both 
cases. That's the trade-off.


________________________________

[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |


________________________________
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.
________________________________


On Thu, 2018-04-12 at 04:13 +0200, Marc Roos wrote:


Is that not obvious? The 8TB is handling twice as much as the 4TB. Afaik
there is not a linear relationship with the iops of a disk and its size.


But interesting about this xfs defragmentation, how does this
relate/compare to bluestore?





-----Original Message-----
From: ? ?? [mailto:yaozong...@outlook.com]
Sent: donderdag 12 april 2018 4:36
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: *****SPAM***** [ceph-users] osds with different disk sizes may
killing performance
Importance: High

Hi,

For anybody who may be interested, here I share a process of locating
the reason for ceph cluster performance slow down in our environment.

Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw
user data is about 500TB. Each day, 3TB' data is uploaded and 3TB oldest
data is lifecycled (we are using s3 object store, and bucket lifecycle
is enabled). As time goes by, the cluster becomes some slower, we doubt
the xfs fragmentation is the fiend.

After some testing, we do find xfs fragmentation slow down filestore's
performance, for example, at 15% fragmentation, the performance is 85%
of the original, and at 25%, the performance is 74.73% of the original.

But the main reason for our cluster's deterioration of performance is
not the xfs fragmentation.

Initially, our ceph cluster contains only osds with 4TB's disk, as time
goes by, we scale out our cluster by adding some new osds with 8TB's
disk. And as the new disk's capacity is double times of the old disks,
so each new osd's weight is double of the old osd. And new osd has
double pgs than old osd, and new osd used double disk space than the old
osd. Everything looks good and fine.

But even though the new osd has double capacity than the old osd, the
new osd's performance is not double than the old osd. After digging into
our internal system stats, we find the new added's disk io util is about
two times than the old. And from time to time, the new disks' io util
rises up to 100%. The new added osds are the performance killer. They
slow down the whole cluster's performance.

As the reason is found, the solution is very simple. After lower new
added osds's weight, the annoying slow request warnings have died away.

So the conclusion is: in cluster with different osd's disk size, osd's
weight is not only determined by its capacity, we should also have a
look at its performance.

Best wishes,
Yao Zongyou
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to