Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-18 Thread Nick Fisk
Just a couple of points.

There is no way you can be writing over 7000 iops to 27x7200rpm disks at a 
replica level of 3. As Mark has suggested, with a 1GB test file, you are only 
touching a tiny area on each physical disk and so you are probably getting a 
combination of short stroking from the disks and Filestore/XFS buffering up 
your writes, coalescing them and actually writing a lot less out to the disks 
than what the benchmark is suggesting. 

I'm not 100% sure on how the allocations work in Bluestore, especially when it 
comes to overwriting with tiny 4kb objects, but I wondering if Bluestore is 
starting to spread the data out further across the disk so you lose some 
benefit of short stroking? There maybe other factors coming into play with the 
deferred writes which was implemented/fixed after the investigation Mark 
mentioned. The simple reproducer at the time was to coalesce a stream of small 
sequential writes, the scenario where a larger number of small random writes 
potentially covering the same small area was not tested.

I would suggest trying to use fio with the librbd engine directly and create a 
RBD of around a TB in size to rule out any disk locality issues first. If that 
brings the figures more in line, then that could potentially steer the 
investigation towards why Bluestore struggles to coalesce as well as the Linux 
FS system.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Milanov, Radoslav Nikiforov
> Sent: 17 November 2017 22:56
> To: Mark Nelson <mnel...@redhat.com>; David Turner
> <drakonst...@gmail.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> 
> Here's some more results, I'm reading 12.2.2 will have performance
> improvements for bluestore and should be released soon?
> 
> Iodepth=not specified
> Filestore
>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
> 
>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
> 
> Bluestore
>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
> 
>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
> 
> Iodepth=10
> Filestore
>   write: io=5045.1MB, bw=28706KB/s, iops=7176, runt=180001msec
>   write: io=4764.7MB, bw=27099KB/s, iops=6774, runt=180021msec
>   write: io=4626.2MB, bw=26318KB/s, iops=6579, runt=180031msec
> 
>   read : io=1745.3MB, bw=9928.6KB/s, iops=2482, runt=180001msec
>   read : io=1933.7MB, bw=11000KB/s, iops=2749, runt=180001msec
>   read : io=1952.7MB, bw=11108KB/s, iops=2777, runt=180001msec
> 
> Bluestore
>   write: io=1578.8MB, bw=8980.9KB/s, iops=2245, runt=180006msec
>   write: io=1583.9MB, bw=9010.2KB/s, iops=2252, runt=180002msec
>   write: io=1591.5MB, bw=9050.9KB/s, iops=2262, runt=180009msec
> 
>   read : io=412104KB, bw=2289.5KB/s, iops=572, runt=180002msec
>   read : io=718108KB, bw=3989.5KB/s, iops=997, runt=180003msec
>   read : io=968388KB, bw=5379.7KB/s, iops=1344, runt=180009msec
> 
> Iodpeth=20
> Filestore
>   write: io=4671.2MB, bw=26574KB/s, iops=6643, runt=180001msec
>   write: io=4583.4MB, bw=26066KB/s, iops=6516, runt=180054msec
>   write: io=4641.6MB, bw=26347KB/s, iops=6586, runt=180395msec
> 
>   read : io=2094.3MB, bw=11914KB/s, iops=2978, runt=180001msec
>   read : io=1997.6MB, bw=11364KB/s, iops=2840, runt=180001msec
>   read : io=2028.4MB, bw=11539KB/s, iops=2884, runt=180001msec
> 
> Bluestore
>   write: io=1595.8MB, bw=9078.2KB/s, iops=2269, runt=180001msec
>   write: io=1596.2MB, bw=9080.6KB/s, iops=2270, runt=180001msec
>   write: io=1588.3MB, bw=9035.4KB/s, iops=2258, runt=180002msec
> 
>   read : io=1126.9MB, bw=6410.5KB/s, iops=1602, runt=180004msec
>   read : io=1282.4MB, bw=7295.3KB/s, iops=1823, runt=180003msec
>   read : io=1380.9MB, bw=7854.1KB/s, iops=1963, runt=180007msec
> 
> 
> - Rado
> 
> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: Thursday, November 16, 2017 2:04 PM
> To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner
> <drakonst...@gmail.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> 
> It depends on what yo

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-17 Thread Milanov, Radoslav Nikiforov
Here's some more results, I'm reading 12.2.2 will have performance improvements 
for bluestore and should be released soon? 

Iodepth=not specified
Filestore
  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec

  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec

Bluestore
  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec

  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec

Iodepth=10
Filestore
  write: io=5045.1MB, bw=28706KB/s, iops=7176, runt=180001msec
  write: io=4764.7MB, bw=27099KB/s, iops=6774, runt=180021msec
  write: io=4626.2MB, bw=26318KB/s, iops=6579, runt=180031msec

  read : io=1745.3MB, bw=9928.6KB/s, iops=2482, runt=180001msec
  read : io=1933.7MB, bw=11000KB/s, iops=2749, runt=180001msec
  read : io=1952.7MB, bw=11108KB/s, iops=2777, runt=180001msec

Bluestore
  write: io=1578.8MB, bw=8980.9KB/s, iops=2245, runt=180006msec
  write: io=1583.9MB, bw=9010.2KB/s, iops=2252, runt=180002msec
  write: io=1591.5MB, bw=9050.9KB/s, iops=2262, runt=180009msec

  read : io=412104KB, bw=2289.5KB/s, iops=572, runt=180002msec
  read : io=718108KB, bw=3989.5KB/s, iops=997, runt=180003msec
  read : io=968388KB, bw=5379.7KB/s, iops=1344, runt=180009msec

Iodpeth=20
Filestore
  write: io=4671.2MB, bw=26574KB/s, iops=6643, runt=180001msec
  write: io=4583.4MB, bw=26066KB/s, iops=6516, runt=180054msec
  write: io=4641.6MB, bw=26347KB/s, iops=6586, runt=180395msec

  read : io=2094.3MB, bw=11914KB/s, iops=2978, runt=180001msec
  read : io=1997.6MB, bw=11364KB/s, iops=2840, runt=180001msec
  read : io=2028.4MB, bw=11539KB/s, iops=2884, runt=180001msec

Bluestore
  write: io=1595.8MB, bw=9078.2KB/s, iops=2269, runt=180001msec
  write: io=1596.2MB, bw=9080.6KB/s, iops=2270, runt=180001msec
  write: io=1588.3MB, bw=9035.4KB/s, iops=2258, runt=180002msec

  read : io=1126.9MB, bw=6410.5KB/s, iops=1602, runt=180004msec
  read : io=1282.4MB, bw=7295.3KB/s, iops=1823, runt=180003msec
  read : io=1380.9MB, bw=7854.1KB/s, iops=1963, runt=180007msec


- Rado

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Thursday, November 16, 2017 2:04 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
<drakonst...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

It depends on what you expect your typical workload to be like.  Ceph (and 
distributed storage in general) likes high io depths so writes can hit all of 
the drives at the same time.  There are tricks (like journals, writahead logs, 
centralized caches, etc) that can help mitigate this, but I suspect you'll see 
much better performance with more concurrent writes.

Regarding file size, the smaller the file, the more likely those tricks 
mentioned above are to help you.  Based on your results, it appears filestore 
may be doing a better job of it than bluestore.  The question you have to ask 
is whether or not this kind of test represents what you are likely to see for 
real on your cluster.

Doing writes over a much larger file, say 3-4x over the total amount of RAM in 
all of the nodes, helps you get a better idea of what the behavior is like when 
those tricks are less effective.  I think that's probably a more likely 
scenario in most production environments, but it's up to you which workload you 
think better represents what you are going to see in practice.  A while back 
Nick Fisk showed some results wehre bluestore was slower than filestore at 
small sync writes and it could be that we simply have more work to do in this 
area.  On the other hand, we pretty consistently see bluestore doing better 
than filestore with 4k random writes and higher IO depths, which is why I'd be 
curious to see how it goes if you try that.

Mark

On 11/16/2017 10:11 AM, Milanov, Radoslav Nikiforov wrote:
> No,
> What test parameters (iodepth/file size/numjobs) would make sense  for 3 
> node/27OSD@4TB ?
> - Rado
>
> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: Thursday, November 16, 2017 10:56 AM
> To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
> <drakonst...@gmail.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Did you happen to have a chance to try with a higher io depth?
>
> Mark
>
> On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Mark Nelson
It depends on what you expect your typical workload to be like.  Ceph 
(and distributed storage in general) likes high io depths so writes can 
hit all of the drives at the same time.  There are tricks (like 
journals, writahead logs, centralized caches, etc) that can help 
mitigate this, but I suspect you'll see much better performance with 
more concurrent writes.


Regarding file size, the smaller the file, the more likely those tricks 
mentioned above are to help you.  Based on your results, it appears 
filestore may be doing a better job of it than bluestore.  The question 
you have to ask is whether or not this kind of test represents what you 
are likely to see for real on your cluster.


Doing writes over a much larger file, say 3-4x over the total amount of 
RAM in all of the nodes, helps you get a better idea of what the 
behavior is like when those tricks are less effective.  I think that's 
probably a more likely scenario in most production environments, but 
it's up to you which workload you think better represents what you are 
going to see in practice.  A while back Nick Fisk showed some results 
wehre bluestore was slower than filestore at small sync writes and it 
could be that we simply have more work to do in this area.  On the other 
hand, we pretty consistently see bluestore doing better than filestore 
with 4k random writes and higher IO depths, which is why I'd be curious 
to see how it goes if you try that.


Mark

On 11/16/2017 10:11 AM, Milanov, Radoslav Nikiforov wrote:

No,
What test parameters (iodepth/file size/numjobs) would make sense  for 3 
node/27OSD@4TB ?
- Rado

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
<drakonst...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Did you happen to have a chance to try with a higher io depth?

Mark

On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:

FYI

Having 50GB bock.db made no difference on the performance.



- Rado



*From:*David Turner [mailto:drakonst...@gmail.com]
*Sent:* Tuesday, November 14, 2017 6:13 PM
*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
*Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore



I'd probably say 50GB to leave some extra space over-provisioned.
50GB should definitely prevent any DB operations from spilling over to the HDD.



On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov
<rad...@bu.edu <mailto:rad...@bu.edu>> wrote:

Thank you,

It is 4TB OSDs and they might become full someday, I’ll try 60GB db
partition – this is the max OSD capacity.



- Rado



*From:*David Turner [mailto:drakonst...@gmail.com
<mailto:drakonst...@gmail.com>]
*Sent:* Tuesday, November 14, 2017 5:38 PM


*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu
<mailto:rad...@bu.edu>>

*Cc:*Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>;
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>


*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore



You have to configure the size of the db partition in the config
file for the cluster.  If you're db partition is 1GB, then I can all
but guarantee that you're using your HDD for your blocks.db very
quickly into your testing.  There have been multiple threads
recently about what size the db partition should be and it seems to
be based on how many objects your OSD is likely to have on it.  The
recommendation has been to err on the side of bigger.  If you're
running 10TB OSDs and anticipate filling them up, then you probably
want closer to an 80GB+ db partition.  That's why I asked how full
your cluster was and how large your HDDs are.



Here's a link to one of the recent ML threads on this
topic.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020
822.html

On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
<rad...@bu.edu <mailto:rad...@bu.edu>> wrote:

Block-db partition is the default 1GB (is there a way to modify
this? journals are 5GB in filestore case) and usage is low:



[root@kumo-ceph02 ~]# ceph df

GLOBAL:

SIZEAVAIL  RAW USED %RAW USED

100602G 99146G1455G  1.45

POOLS:

NAME  ID USED   %USED MAX AVAIL
OBJECTS

kumo-vms  1  19757M  0.02
31147G5067

kumo-volumes  2214G  0.18
31147G   55248

kumo-images   3203G  0.17
31147G   66486

kumo-vms3 11 45824M  0.

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Milanov, Radoslav Nikiforov
No,
What test parameters (iodepth/file size/numjobs) would make sense  for 3 
node/27OSD@4TB ?
- Rado

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
<drakonst...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Did you happen to have a chance to try with a higher io depth?

Mark

On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:
> FYI
>
> Having 50GB bock.db made no difference on the performance.
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 6:13 PM
> *To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
> *Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> I'd probably say 50GB to leave some extra space over-provisioned.  
> 50GB should definitely prevent any DB operations from spilling over to the 
> HDD.
>
>
>
> On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov 
> <rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
>
> Thank you,
>
> It is 4TB OSDs and they might become full someday, I’ll try 60GB db
> partition – this is the max OSD capacity.
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com
> <mailto:drakonst...@gmail.com>]
> *Sent:* Tuesday, November 14, 2017 5:38 PM
>
>
> *To:* Milanov, Radoslav Nikiforov <rad...@bu.edu 
> <mailto:rad...@bu.edu>>
>
>     *Cc:*Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>;
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> You have to configure the size of the db partition in the config
> file for the cluster.  If you're db partition is 1GB, then I can all
> but guarantee that you're using your HDD for your blocks.db very
> quickly into your testing.  There have been multiple threads
> recently about what size the db partition should be and it seems to
> be based on how many objects your OSD is likely to have on it.  The
> recommendation has been to err on the side of bigger.  If you're
> running 10TB OSDs and anticipate filling them up, then you probably
> want closer to an 80GB+ db partition.  That's why I asked how full
> your cluster was and how large your HDDs are.
>
>
>
> Here's a link to one of the recent ML threads on this
> topic.  
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020
> 822.html
>
> On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
> <rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
>
> Block-db partition is the default 1GB (is there a way to modify
> this? journals are 5GB in filestore case) and usage is low:
>
>
>
> [root@kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
> SIZEAVAIL  RAW USED %RAW USED
>
> 100602G 99146G1455G  1.45
>
> POOLS:
>
> NAME  ID USED   %USED MAX AVAIL
> OBJECTS
>
> kumo-vms  1  19757M  0.02
> 31147G5067
>
> kumo-volumes  2214G  0.18
> 31147G   55248
>
> kumo-images   3203G  0.17
> 31147G   66486
>
> kumo-vms3 11 45824M  0.04
> 31147G   11643
>
> kumo-volumes3 13 10837M 0
> 31147G2724
>
> kumo-images3  15 82450M  0.09
> 31147G   10320
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com
> <mailto:drakonst...@gmail.com>]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>
> *Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu
> <mailto:rad...@bu.edu>>; ceph-users@lists.ceph.com
> <mailto:ceph-users@lists.ceph.com>
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of 
> filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size
> are your HDDs?  Also how full is your cluster?  It's possible
> that your blocks.db partition wasn't l

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Mark Nelson

Did you happen to have a chance to try with a higher io depth?

Mark

On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:

FYI

Having 50GB bock.db made no difference on the performance.



- Rado



*From:*David Turner [mailto:drakonst...@gmail.com]
*Sent:* Tuesday, November 14, 2017 6:13 PM
*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
*Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore



I'd probably say 50GB to leave some extra space over-provisioned.  50GB
should definitely prevent any DB operations from spilling over to the HDD.



On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov <rad...@bu.edu
<mailto:rad...@bu.edu>> wrote:

Thank you,

It is 4TB OSDs and they might become full someday, I’ll try 60GB db
partition – this is the max OSD capacity.



- Rado



*From:*David Turner [mailto:drakonst...@gmail.com
<mailto:drakonst...@gmail.com>]
*Sent:* Tuesday, November 14, 2017 5:38 PM


*To:* Milanov, Radoslav Nikiforov <rad...@bu.edu <mailto:rad...@bu.edu>>

*Cc:*Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>;
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>


    *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore



You have to configure the size of the db partition in the config
file for the cluster.  If you're db partition is 1GB, then I can all
but guarantee that you're using your HDD for your blocks.db very
quickly into your testing.  There have been multiple threads
recently about what size the db partition should be and it seems to
be based on how many objects your OSD is likely to have on it.  The
recommendation has been to err on the side of bigger.  If you're
running 10TB OSDs and anticipate filling them up, then you probably
want closer to an 80GB+ db partition.  That's why I asked how full
your cluster was and how large your HDDs are.



Here's a link to one of the recent ML threads on this
topic.  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html

On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
<rad...@bu.edu <mailto:rad...@bu.edu>> wrote:

Block-db partition is the default 1GB (is there a way to modify
this? journals are 5GB in filestore case) and usage is low:



[root@kumo-ceph02 ~]# ceph df

GLOBAL:

SIZEAVAIL  RAW USED %RAW USED

100602G 99146G1455G  1.45

POOLS:

NAME  ID USED   %USED MAX AVAIL
OBJECTS

kumo-vms  1  19757M  0.02
31147G5067

kumo-volumes  2214G  0.18
31147G   55248

kumo-images   3203G  0.17
31147G   66486

kumo-vms3 11 45824M  0.04
31147G   11643

kumo-volumes3 13 10837M 0
31147G2724

kumo-images3  15 82450M  0.09
31147G   10320



- Rado



*From:*David Turner [mailto:drakonst...@gmail.com
<mailto:drakonst...@gmail.com>]
*Sent:* Tuesday, November 14, 2017 4:40 PM
*To:* Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>
*Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu
<mailto:rad...@bu.edu>>; ceph-users@lists.ceph.com
    <mailto:ceph-users@lists.ceph.com>


*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore



How big was your blocks.db partition for each OSD and what size
are your HDDs?  Also how full is your cluster?  It's possible
that your blocks.db partition wasn't large enough to hold the
entire db and it had to spill over onto the HDD which would
definitely impact performance.



On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnel...@redhat.com
<mailto:mnel...@redhat.com>> wrote:

How big were the writes in the windows test and how much
concurrency was
there?

Historically bluestore does pretty well for us with small
random writes
so your write results surprise me a bit.  I suspect it's the
low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed
was converting OSDs to bluestore. It is 7200 rpm drives and
triple replication. I also get same results (bluestore 2

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Milanov, Radoslav Nikiforov
FYI
Having 50GB bock.db made no difference on the performance.

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 6:13 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>
Cc: Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore


I'd probably say 50GB to leave some extra space over-provisioned.  50GB should 
definitely prevent any DB operations from spilling over to the HDD.

On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Thank you,
It is 4TB OSDs and they might become full someday, I’ll try 60GB db partition – 
this is the max OSD capacity.

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 5:38 PM

To: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>
Cc: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

You have to configure the size of the db partition in the config file for the 
cluster.  If you're db partition is 1GB, then I can all but guarantee that 
you're using your HDD for your blocks.db very quickly into your testing.  There 
have been multiple threads recently about what size the db partition should be 
and it seems to be based on how many objects your OSD is likely to have on it.  
The recommendation has been to err on the side of bigger.  If you're running 
10TB OSDs and anticipate filling them up, then you probably want closer to an 
80GB+ db partition.  That's why I asked how full your cluster was and how large 
your HDDs are.

Here's a link to one of the recent ML threads on this topic.  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have cli

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-15 Thread Maged Mokhtar
On 2017-11-14 21:54, Milanov, Radoslav Nikiforov wrote:

> Hi 
> 
> We have 3 node, 27 OSDs cluster running Luminous 12.2.1 
> 
> In filestore configuration there are 3 SSDs used for journals of 9 OSDs on 
> each hosts (1 SSD has 3 journal paritions for 3 OSDs). 
> 
> I've converted filestore to bluestore by wiping 1 host a time and waiting for 
> recovery. SSDs now contain block-db - again one SSD serving 3 OSDs. 
> 
> Cluster is used as storage for Openstack. 
> 
> Running fio on a VM in that Openstack reveals bluestore performance almost 
> twice slower than filestore. 
> 
> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G 
> --numjobs=2 --time_based --runtime=180 --group_reporting 
> 
> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G 
> --numjobs=2 --time_based --runtime=180 --group_reporting 
> 
> Filestore 
> 
> write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec 
> 
> write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec 
> 
> write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec 
> 
> read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec 
> 
> read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec 
> 
> read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec 
> 
> Bluestore 
> 
> write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec 
> 
> write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec 
> 
> write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec 
> 
> read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec 
> 
> read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec 
> 
> read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec 
> 
> - Rado 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

It will be useful to see how this filestore edge would perform when you
increase your queue depth (threads/jobs). For example to 32 or 64. This
would represent a more practical load. 

I can see an extreme case if you have a cluster with a large number of
OSDs and only 1 client thread that filestore may be faster: in this case
when the client io hits an OSD it will not be as busy syncing its
journal to hdd (which is the case under normal load), but again this is
not a practical setup.  

/Maged___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread David Turner
I'd probably say 50GB to leave some extra space over-provisioned.  50GB
should definitely prevent any DB operations from spilling over to the HDD.

On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov <rad...@bu.edu>
wrote:

> Thank you,
>
> It is 4TB OSDs and they might become full someday, I’ll try 60GB db
> partition – this is the max OSD capacity.
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 5:38 PM
>
>
> *To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
>
> *Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> You have to configure the size of the db partition in the config file for
> the cluster.  If you're db partition is 1GB, then I can all but guarantee
> that you're using your HDD for your blocks.db very quickly into your
> testing.  There have been multiple threads recently about what size the db
> partition should be and it seems to be based on how many objects your OSD
> is likely to have on it.  The recommendation has been to err on the side of
> bigger.  If you're running 10TB OSDs and anticipate filling them up, then
> you probably want closer to an 80GB+ db partition.  That's why I asked how
> full your cluster was and how large your HDDs are.
>
>
>
> Here's a link to one of the recent ML threads on this topic.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
>
> On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov <rad...@bu.edu>
> wrote:
>
> Block-db partition is the default 1GB (is there a way to modify this?
> journals are 5GB in filestore case) and usage is low:
>
>
>
> [root@kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
> SIZEAVAIL  RAW USED %RAW USED
>
> 100602G 99146G1455G  1.45
>
> POOLS:
>
> NAME  ID USED   %USED MAX AVAIL OBJECTS
>
> kumo-vms  1  19757M  0.0231147G5067
>
> kumo-volumes  2214G  0.1831147G   55248
>
> kumo-images   3203G  0.1731147G   66486
>
> kumo-vms3 11 45824M  0.0431147G   11643
>
> kumo-volumes3 13 10837M 031147G2724
>
> kumo-images3  15 82450M  0.0931147G   10320
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnel...@redhat.com>
> *Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu>;
> ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size are your
> HDDs?  Also how full is your cluster?  It's possible that your blocks.db
> partition wasn't large enough to hold the entire db and it had to spill
> over onto the HDD which would definitely impact performance.
>
>
>
> On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnel...@redhat.com> wrote:
>
> How big were the writes in the windows test and how much concurrency was
> there?
>
> Historically bluestore does pretty well for us with small random writes
> so your write results surprise me a bit.  I suspect it's the low queue
> depth.  Sometimes bluestore does worse with reads, especially if
> readahead isn't enabled on the client.
>
> Mark
>
> On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> > Hi Mark,
> > Yes RBD is in write back, and the only thing that changed was converting
> OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get
> same results (bluestore 2 times slower) testing continuous writes on a 40GB
> partition on a Windows VM, completely different tool.
> >
> > Right now I'm going back to filestore for the OSDs so additional tests
> are possible if that helps.
> >
> > - Rado
> >
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Mark Nelson
> > Sent: Tuesday, November 14, 2017 4:04 PM
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> >
> > Hi Radoslav,
> >
> > Is RBD cache enabled and in writeback mode?  Do you have client side
> readahead?
> >
> > Both are doing better for writes than you'd expect from the native
> performance of the disks assuming they are typical 7200RPM drives and you
> are using 3

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Thank you,
It is 4TB OSDs and they might become full someday, I’ll try 60GB db partition – 
this is the max OSD capacity.

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 5:38 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>
Cc: Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

You have to configure the size of the db partition in the config file for the 
cluster.  If you're db partition is 1GB, then I can all but guarantee that 
you're using your HDD for your blocks.db very quickly into your testing.  There 
have been multiple threads recently about what size the db partition should be 
and it seems to be based on how many objects your OSD is likely to have on it.  
The recommendation has been to err on the side of bigger.  If you're running 
10TB OSDs and anticipate filling them up, then you probably want closer to an 
80GB+ db partition.  That's why I asked how full your cluster was and how large 
your HDDs are.

Here's a link to one of the recent ML threads on this topic.  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?
>
> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.
>
> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.
>
> Mark
>
> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>> Hi
>>
>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>>

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread David Turner
You have to configure the size of the db partition in the config file for
the cluster.  If you're db partition is 1GB, then I can all but guarantee
that you're using your HDD for your blocks.db very quickly into your
testing.  There have been multiple threads recently about what size the db
partition should be and it seems to be based on how many objects your OSD
is likely to have on it.  The recommendation has been to err on the side of
bigger.  If you're running 10TB OSDs and anticipate filling them up, then
you probably want closer to an 80GB+ db partition.  That's why I asked how
full your cluster was and how large your HDDs are.

Here's a link to one of the recent ML threads on this topic.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov <rad...@bu.edu>
wrote:

> Block-db partition is the default 1GB (is there a way to modify this?
> journals are 5GB in filestore case) and usage is low:
>
>
>
> [root@kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
> SIZEAVAIL  RAW USED %RAW USED
>
> 100602G 99146G1455G  1.45
>
> POOLS:
>
> NAME  ID USED   %USED MAX AVAIL OBJECTS
>
> kumo-vms  1  19757M  0.0231147G5067
>
> kumo-volumes  2214G  0.1831147G   55248
>
> kumo-images   3203G  0.1731147G   66486
>
> kumo-vms3 11 45824M  0.0431147G   11643
>
> kumo-volumes3 13 10837M 031147G2724
>
> kumo-images3  15 82450M  0.0931147G   10320
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnel...@redhat.com>
> *Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu>;
> ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size are your
> HDDs?  Also how full is your cluster?  It's possible that your blocks.db
> partition wasn't large enough to hold the entire db and it had to spill
> over onto the HDD which would definitely impact performance.
>
>
>
> On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnel...@redhat.com> wrote:
>
> How big were the writes in the windows test and how much concurrency was
> there?
>
> Historically bluestore does pretty well for us with small random writes
> so your write results surprise me a bit.  I suspect it's the low queue
> depth.  Sometimes bluestore does worse with reads, especially if
> readahead isn't enabled on the client.
>
> Mark
>
> On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> > Hi Mark,
> > Yes RBD is in write back, and the only thing that changed was converting
> OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get
> same results (bluestore 2 times slower) testing continuous writes on a 40GB
> partition on a Windows VM, completely different tool.
> >
> > Right now I'm going back to filestore for the OSDs so additional tests
> are possible if that helps.
> >
> > - Rado
> >
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Mark Nelson
> > Sent: Tuesday, November 14, 2017 4:04 PM
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> >
> > Hi Radoslav,
> >
> > Is RBD cache enabled and in writeback mode?  Do you have client side
> readahead?
> >
> > Both are doing better for writes than you'd expect from the native
> performance of the disks assuming they are typical 7200RPM drives and you
> are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small
> file size, I'd expect that you might be getting better journal coalescing
> in filestore.
> >
> > Sadly I imagine you can't do a comparison test at this point, but I'd be
> curious how it would look if you used libaio with a high iodepth and a much
> bigger partition to do random writes over.
> >
> > Mark
> >
> > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> >> Hi
> >>
> >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
> >>
> >> In filestore configuration there are 3 SSDs used for journals of 9
> >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
> >>
> >> I've converted filestore to bluestore by wiping 1 host a time and
> >&

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?
>
> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.
>
> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.
>
> Mark
>
> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>> Hi
>>
>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>>
>> In filestore configuration there are 3 SSDs used for journals of 9
>> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
>>
>> I've converted filestore to bluestore by wiping 1 host a time and
>> waiting for recovery. SSDs now contain block-db - again one SSD
>> serving
>> 3 OSDs.
>>
>>
>>
>> Cluster is used as storage for Openstack.
>>
>> Running fio on a VM in that Openstack reveals bluestore performance
>> almost twice slower than filestore.
>>
>> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>>
>>
>>
>>
>> Filestore
>>
>>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
>>
>>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
>>
>>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
>>
>>
>>
>>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
>>
>>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
>>
>>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
>>
>>
>>
>> 

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
16 MB block, single thread, sequential writes, this is



[cid:image001.emz@01D35D67.61AF9D30]



- Rado



-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Tuesday, November 14, 2017 4:36 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore



How big were the writes in the windows test and how much concurrency was there?



Historically bluestore does pretty well for us with small random writes so your 
write results surprise me a bit.  I suspect it's the low queue depth.  
Sometimes bluestore does worse with reads, especially if readahead isn't 
enabled on the client.



Mark



On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:

> Hi Mark,

> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.

>

> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.

>

> - Rado

>

> -Original Message-

> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf

> Of Mark Nelson

> Sent: Tuesday, November 14, 2017 4:04 PM

> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

> Subject: Re: [ceph-users] Bluestore performance 50% of filestore

>

> Hi Radoslav,

>

> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?

>

> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.

>

> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.

>

> Mark

>

> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:

>> Hi

>>

>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1

>>

>> In filestore configuration there are 3 SSDs used for journals of 9

>> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).

>>

>> I've converted filestore to bluestore by wiping 1 host a time and

>> waiting for recovery. SSDs now contain block-db - again one SSD

>> serving

>> 3 OSDs.

>>

>>

>>

>> Cluster is used as storage for Openstack.

>>

>> Running fio on a VM in that Openstack reveals bluestore performance

>> almost twice slower than filestore.

>>

>> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G

>> --numjobs=2 --time_based --runtime=180 --group_reporting

>>

>> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G

>> --numjobs=2 --time_based --runtime=180 --group_reporting

>>

>>

>>

>>

>>

>> Filestore

>>

>>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

>>

>>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

>>

>>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec

>>

>>

>>

>>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

>>

>>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

>>

>>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec

>>

>>

>>

>> Bluestore

>>

>>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

>>

>>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

>>

>>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec

>>

>>

>>

>>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

>>

>>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

>>

>>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec

>>

>>

>>

>>

>>

>> - Rado

>>

>>

>>

>>

>>

>> ___

>> ceph-users mailing list

>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

> ___

> ceph-users mailing list

> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>


image001.emz
Description: image001.emz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread David Turner
How big was your blocks.db partition for each OSD and what size are your
HDDs?  Also how full is your cluster?  It's possible that your blocks.db
partition wasn't large enough to hold the entire db and it had to spill
over onto the HDD which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnel...@redhat.com> wrote:

> How big were the writes in the windows test and how much concurrency was
> there?
>
> Historically bluestore does pretty well for us with small random writes
> so your write results surprise me a bit.  I suspect it's the low queue
> depth.  Sometimes bluestore does worse with reads, especially if
> readahead isn't enabled on the client.
>
> Mark
>
> On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> > Hi Mark,
> > Yes RBD is in write back, and the only thing that changed was converting
> OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get
> same results (bluestore 2 times slower) testing continuous writes on a 40GB
> partition on a Windows VM, completely different tool.
> >
> > Right now I'm going back to filestore for the OSDs so additional tests
> are possible if that helps.
> >
> > - Rado
> >
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Mark Nelson
> > Sent: Tuesday, November 14, 2017 4:04 PM
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> >
> > Hi Radoslav,
> >
> > Is RBD cache enabled and in writeback mode?  Do you have client side
> readahead?
> >
> > Both are doing better for writes than you'd expect from the native
> performance of the disks assuming they are typical 7200RPM drives and you
> are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small
> file size, I'd expect that you might be getting better journal coalescing
> in filestore.
> >
> > Sadly I imagine you can't do a comparison test at this point, but I'd be
> curious how it would look if you used libaio with a high iodepth and a much
> bigger partition to do random writes over.
> >
> > Mark
> >
> > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> >> Hi
> >>
> >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
> >>
> >> In filestore configuration there are 3 SSDs used for journals of 9
> >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
> >>
> >> I've converted filestore to bluestore by wiping 1 host a time and
> >> waiting for recovery. SSDs now contain block-db - again one SSD
> >> serving
> >> 3 OSDs.
> >>
> >>
> >>
> >> Cluster is used as storage for Openstack.
> >>
> >> Running fio on a VM in that Openstack reveals bluestore performance
> >> almost twice slower than filestore.
> >>
> >> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >>
> >>
> >>
> >>
> >> Filestore
> >>
> >>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
> >>
> >>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
> >>
> >>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
> >>
> >>
> >>
> >>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
> >>
> >>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
> >>
> >>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
> >>
> >>
> >>
> >> Bluestore
> >>
> >>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
> >>
> >>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
> >>
> >>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
> >>
> >>
> >>
> >>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
> >>
> >>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
> >>
> >>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
> >>
> >>
> >>
> >>
> >>
> >> - Rado
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Mark Nelson
How big were the writes in the windows test and how much concurrency was 
there?


Historically bluestore does pretty well for us with small random writes 
so your write results surprise me a bit.  I suspect it's the low queue 
depth.  Sometimes bluestore does worse with reads, especially if 
readahead isn't enabled on the client.


Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:

Hi Mark,
Yes RBD is in write back, and the only thing that changed was converting OSDs 
to bluestore. It is 7200 rpm drives and triple replication. I also get same 
results (bluestore 2 times slower) testing continuous writes on a 40GB 
partition on a Windows VM, completely different tool.

Right now I'm going back to filestore for the OSDs so additional tests are 
possible if that helps.

- Rado

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Tuesday, November 14, 2017 4:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Hi Radoslav,

Is RBD cache enabled and in writeback mode?  Do you have client side readahead?

Both are doing better for writes than you'd expect from the native performance 
of the disks assuming they are typical 7200RPM drives and you are using 3X 
replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file size, I'd 
expect that you might be getting better journal coalescing in filestore.

Sadly I imagine you can't do a comparison test at this point, but I'd be 
curious how it would look if you used libaio with a high iodepth and a much 
bigger partition to do random writes over.

Mark

On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:

Hi

We have 3 node, 27 OSDs cluster running Luminous 12.2.1

In filestore configuration there are 3 SSDs used for journals of 9
OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).

I've converted filestore to bluestore by wiping 1 host a time and
waiting for recovery. SSDs now contain block-db - again one SSD
serving
3 OSDs.



Cluster is used as storage for Openstack.

Running fio on a VM in that Openstack reveals bluestore performance
almost twice slower than filestore.

fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
--numjobs=2 --time_based --runtime=180 --group_reporting

fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
--numjobs=2 --time_based --runtime=180 --group_reporting





Filestore

  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec



  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec



Bluestore

  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec



  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec





- Rado





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Hi Mark,
Yes RBD is in write back, and the only thing that changed was converting OSDs 
to bluestore. It is 7200 rpm drives and triple replication. I also get same 
results (bluestore 2 times slower) testing continuous writes on a 40GB 
partition on a Windows VM, completely different tool. 

Right now I'm going back to filestore for the OSDs so additional tests are 
possible if that helps.

- Rado

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Tuesday, November 14, 2017 4:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Hi Radoslav,

Is RBD cache enabled and in writeback mode?  Do you have client side readahead?

Both are doing better for writes than you'd expect from the native performance 
of the disks assuming they are typical 7200RPM drives and you are using 3X 
replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file size, I'd 
expect that you might be getting better journal coalescing in filestore.

Sadly I imagine you can't do a comparison test at this point, but I'd be 
curious how it would look if you used libaio with a high iodepth and a much 
bigger partition to do random writes over.

Mark

On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> Hi
>
> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>
> In filestore configuration there are 3 SSDs used for journals of 9 
> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
>
> I've converted filestore to bluestore by wiping 1 host a time and 
> waiting for recovery. SSDs now contain block-db - again one SSD 
> serving
> 3 OSDs.
>
>
>
> Cluster is used as storage for Openstack.
>
> Running fio on a VM in that Openstack reveals bluestore performance 
> almost twice slower than filestore.
>
> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
> --numjobs=2 --time_based --runtime=180 --group_reporting
>
> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
> --numjobs=2 --time_based --runtime=180 --group_reporting
>
>
>
>
>
> Filestore
>
>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
>
>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
>
>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
>
>
>
>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
>
>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
>
>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
>
>
>
> Bluestore
>
>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
>
>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
>
>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
>
>
>
>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
>
>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
>
>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
>
>
>
>
>
> - Rado
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Mark Nelson

Hi Radoslav,

Is RBD cache enabled and in writeback mode?  Do you have client side 
readahead?


Both are doing better for writes than you'd expect from the native 
performance of the disks assuming they are typical 7200RPM drives and 
you are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given 
the small file size, I'd expect that you might be getting better journal 
coalescing in filestore.


Sadly I imagine you can't do a comparison test at this point, but I'd be 
curious how it would look if you used libaio with a high iodepth and a 
much bigger partition to do random writes over.


Mark

On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:

Hi

We have 3 node, 27 OSDs cluster running Luminous 12.2.1

In filestore configuration there are 3 SSDs used for journals of 9 OSDs
on each hosts (1 SSD has 3 journal paritions for 3 OSDs).

I’ve converted filestore to bluestore by wiping 1 host a time and
waiting for recovery. SSDs now contain block-db – again one SSD serving
3 OSDs.



Cluster is used as storage for Openstack.

Running fio on a VM in that Openstack reveals bluestore performance
almost twice slower than filestore.

fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
--numjobs=2 --time_based --runtime=180 --group_reporting

fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
--numjobs=2 --time_based --runtime=180 --group_reporting





Filestore

  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec



  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec



Bluestore

  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec



  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec





- Rado





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Hi
We have 3 node, 27 OSDs cluster running Luminous 12.2.1
In filestore configuration there are 3 SSDs used for journals of 9 OSDs on each 
hosts (1 SSD has 3 journal paritions for 3 OSDs).
I've converted filestore to bluestore by wiping 1 host a time and waiting for 
recovery. SSDs now contain block-db - again one SSD serving 3 OSDs.

Cluster is used as storage for Openstack.
Running fio on a VM in that Openstack reveals bluestore performance almost 
twice slower than filestore.
fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G 
--numjobs=2 --time_based --runtime=180 --group_reporting
fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G --numjobs=2 
--time_based --runtime=180 --group_reporting



Filestore

  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec



  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec



Bluestore

  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec



  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec


- Rado

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com