FYI
Having 50GB bock.db made no difference on the performance.
- Rado
*From:*David Turner [mailto:drakonst...@gmail.com]
*Sent:* Tuesday, November 14, 2017 6:13 PM
*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
*Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
I'd probably say 50GB to leave some extra space over-provisioned.
50GB should definitely prevent any DB operations from spilling over to the HDD.
On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov
<rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
Thank you,
It is 4TB OSDs and they might become full someday, I’ll try 60GB db
partition – this is the max OSD capacity.
- Rado
*From:*David Turner [mailto:drakonst...@gmail.com
<mailto:drakonst...@gmail.com>]
*Sent:* Tuesday, November 14, 2017 5:38 PM
*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu
<mailto:rad...@bu.edu>>
*Cc:*Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>;
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
You have to configure the size of the db partition in the config
file for the cluster. If you're db partition is 1GB, then I can all
but guarantee that you're using your HDD for your blocks.db very
quickly into your testing. There have been multiple threads
recently about what size the db partition should be and it seems to
be based on how many objects your OSD is likely to have on it. The
recommendation has been to err on the side of bigger. If you're
running 10TB OSDs and anticipate filling them up, then you probably
want closer to an 80GB+ db partition. That's why I asked how full
your cluster was and how large your HDDs are.
Here's a link to one of the recent ML threads on this
topic.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020
822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
<rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
Block-db partition is the default 1GB (is there a way to modify
this? journals are 5GB in filestore case) and usage is low:
[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
100602G 99146G 1455G 1.45
POOLS:
NAME ID USED %USED MAX AVAIL
OBJECTS
kumo-vms 1 19757M 0.02
31147G 5067
kumo-volumes 2 214G 0.18
31147G 55248
kumo-images 3 203G 0.17
31147G 66486
kumo-vms3 11 45824M 0.04
31147G 11643
kumo-volumes3 13 10837M 0
31147G 2724
kumo-images3 15 82450M 0.09
31147G 10320
- Rado
*From:*David Turner [mailto:drakonst...@gmail.com
<mailto:drakonst...@gmail.com>]
*Sent:* Tuesday, November 14, 2017 4:40 PM
*To:* Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>
*Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu
<mailto:rad...@bu.edu>>; ceph-users@lists.ceph.com
<mailto:ceph-users@lists.ceph.com>
*Subject:* Re: [ceph-users] Bluestore performance 50% of
filestore
How big was your blocks.db partition for each OSD and what size
are your HDDs? Also how full is your cluster? It's possible
that your blocks.db partition wasn't large enough to hold the
entire db and it had to spill over onto the HDD which would
definitely impact performance.
On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnel...@redhat.com
<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much
concurrency was
there?
Historically bluestore does pretty well for us with small
random writes
so your write results surprise me a bit. I suspect it's the
low queue
depth. Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.
Mark
On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed
was converting OSDs to bluestore. It is 7200 rpm drives and
triple replication. I also get same results (bluestore 2
times slower) testing continuous writes on a 40GB partition
on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so
additional tests are possible if that helps.
>
> - Rado
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
<mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of
Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com
<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of
filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode? Do you have
client side readahead?
>
> Both are doing better for writes than you'd expect from
the native performance of the disks assuming they are
typical 7200RPM drives and you are using 3X replication
(~150IOPS * 27 / 3 = ~1350 IOPS). Given the small file
size, I'd expect that you might be getting better journal
coalescing in filestore.
>
> Sadly I imagine you can't do a comparison test at this
point, but I'd be curious how it would look if you used
libaio with a high iodepth and a much bigger partition to do
random writes over.
>
> Mark
>
> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>> Hi
>>
>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>>
>> In filestore configuration there are 3 SSDs used for
journals of 9
>> OSDs on each hosts (1 SSD has 3 journal paritions for 3
OSDs).
>>
>> I've converted filestore to bluestore by wiping 1 host a
time and
>> waiting for recovery. SSDs now contain block-db - again
one SSD
>> serving
>> 3 OSDs.
>>
>>
>>
>> Cluster is used as storage for Openstack.
>>
>> Running fio on a VM in that Openstack reveals bluestore
performance
>> almost twice slower than filestore.
>>
>> fio --name fio_test_file --direct=1 --rw=randwrite
--bs=4k --size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>> fio --name fio_test_file --direct=1 --rw=randread --bs=4k
--size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>>
>>
>>
>>
>> Filestore
>>
>> write: io=3511.9MB, bw=19978KB/s, iops=4994,
runt=180001msec
>>
>> write: io=3525.6MB, bw=20057KB/s, iops=5014,
runt=180001msec
>>
>> write: io=3554.1MB, bw=20222KB/s, iops=5055,
runt=180016msec
>>
>>
>>
>> read : io=1995.7MB, bw=11353KB/s, iops=2838,
runt=180001msec
>>
>> read : io=1824.5MB, bw=10379KB/s, iops=2594,
runt=180001msec
>>
>> read : io=1966.5MB, bw=11187KB/s, iops=2796,
runt=180001msec
>>
>>
>>
>> Bluestore
>>
>> write: io=1621.2MB, bw=9222.3KB/s, iops=2305,
runt=180002msec
>>
>> write: io=1576.3MB, bw=8965.6KB/s, iops=2241,
runt=180029msec
>>
>> write: io=1531.9MB, bw=8714.3KB/s, iops=2178,
runt=180001msec
>>
>>
>>
>> read : io=1279.4MB, bw=7276.5KB/s, iops=1819,
runt=180006msec
>>
>> read : io=773824KB, bw=4298.9KB/s, iops=1074,
runt=180010msec
>>
>> read : io=1018.5MB, bw=5793.7KB/s, iops=1448,
runt=180001msec
>>
>>
>>
>>
>>
>> - Rado
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com