Hi all,

I'm a newbie to Ceph.  I'm an MSP and a small-scale cloud hoster.  I'm 
intending to use Ceph as production storage for a small-scale private hosting 
cloud.  We run ESXi as our HVs so we want to present Ceph as iSCSI.

We've got Ceph Nautilus running on a 3-node cluster.  Each node contains a pair 
of Bronze XEONs, 128G RAM, 6 x 10G NICs,  8 x 10TB spinners, 2 x 2TB SATA SSDs 
and a 4TB NVME.  

The HDDs and the 2TB SSDs give me an rbd pool of 24 OSDs (1024 PGs) with the 
SSDs partitioned and used to hold the DB and WAL.  Each SSD hold the DB/WAL for 
4 HDDs.

The NVMes give me a cache pool of 3 OSDs (128 PGs) which I want to use as a hot 
tier.  As far as I can tell I have followed the guidance given here: 
https://docs.ceph.com/docs/nautilus/rados/operations/cache-tiering/

The cluster is working, the iSCSI is working, and generally everything is 
looking pretty good. My only problem at this stage is that the tiering is not 
handling writes in the way I expect, and I really can't get my brain around why.

For my test I start with the hot tier empty.  I drained it by setting 
dirty_ratio=dirty_high_ratio=full_ratio = 0.    I then set dirty_ratio = 0.5,  
dirty_high_ratio = 0.6 and  full_ratio = 0.7   and start writing data to it at 
high speed (using a simple file copy) from a VM on ESXi.

My expectation is that all inbound writes will land initially on the hot tier, 
resulting in low latency writes as seen by the ESXi hosts.  I expect that once 
the hot tier fills to 50% that it will start to flush these writes down to the 
HDD storage.

What I actually see is that as soon as I start throwing data at the cluster, 
the Ceph dashboard shows writes going to both the NVMEs and to the HDDs, and 
the write latency seen by ESX hits several hundred milliseconds.  It seems that 
the hot tier is absorbing only a fraction of the writes.

Here are my pool settings. 

[root@ceph00 ~]# ceph osd pool ls detail
pool 4 'rbd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins 
pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 6971 lfor 
6971/6971/6971 flags hashpspool,selfmanaged_snaps tiers 11 read_tier 11 
write_tier 11 stripe_width 0 application rbd
        removed_snaps [1~3]
pool 11 'cache' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins 
pg_num 128 pgp_num 128 autoscale_mode warn last_change 7065 lfor 6971/6971/6971 
flags hashpspool,incomplete_clones,selfmanaged_snaps tier_of 4 cache_mode 
writeback target_bytes 3078632557772 hit_set bloom{false_positive_probability: 
0.001, target_size: 0, seed: 0} 3600s x12 decay_rate 0 search_last_n 0 
min_read_recency_for_promote 2 stripe_width 0 application rbd
        removed_snaps [1~3]
pool 12 'pure_hdd' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 7059 flags 
hashpspool,selfmanaged_snaps stripe_width 0 application rbd
        removed_snaps [1~3]


Is anyone able to point me towards a solution?

Thanks,
Steve
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to