[ceph-users] Re: Fwd: BlueFS spillover yet again

2020-02-05 Thread Janne Johansson
Den ons 5 feb. 2020 kl 17:27 skrev Vladimir Prokofev :

> Thank you for the insight.
> > If you're using the default options for rocksdb, then the size of L3 will
> be 25GB
> Where this number comes from? Any documentation I can read?
> I want to have a better understanding on how DB size is calculated.
>
>
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

Those pageson the right-side menu have all (and more!) you need to know
about rocksdb, which ceph uses.


> > If you're using the default options for rocksdb, then the size of L3 will
> > be 25GB. Since your block-db is only 20GB and L3 can only be filled if
> the
> > entire level's size is available, bluefs will begin spillover. Like Igor
> > said, having 30GB+ is recommended if you want to host up to 3 levels of
> > rocksdb in the SSD.
>
>
-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: BlueFS spillover yet again

2020-02-05 Thread Wido den Hollander



On 2/5/20 2:21 PM, Vladimir Prokofev wrote:
> Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except BlueFS
> spillover warning.
> We create OSDs with ceph-deploy, command goes like this:
> ceph-deploy osd create --bluestore --data /dev/sdf --block-db /dev/sdb5
> --block-wal /dev/sdb6 ceph-osd3
> where block-db and block-wal are SSD partitions.
> Default ceph-deploy settings created partitions ~1GB which is, of course,
> too small. So we redeployed OSDs using manually partitioned SSD for
> block-db/block-wal with sizes of 20G/5G respectively.
> But now we still get BlueFS spillover warning for redeployed OSDs:
>  osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB used of
> 19 GiB) to slow device
>  osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB used of
> 19 GiB) to slow device
>  osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB used of
> 19 GiB) to slow device
> osd size is 1.8 TiB.
> 
> These OSDs are used primarily for RBD as a backup drives, so a lot of
> snapshots held there. They also have RGW pool assigned to them, but it has
> no data.
> I know of sizing recommendations[1] for block-db/block-wal, but I assumed
> since it's primarily RBD 1%(~20G) should be enough.
> Also, compaction stats doesn't make sense to me[2]. It states that sum of
> DB is only 5.08GB, that should be placed on block-db without a problem?
> Am I understanding all this wrong? Should block-db size be greater in my
> case?

Have you tried a compact?

I saw this today as well. I tried a compact and that reduced the DB
<28GB causing it fit on the DB again.

The DB is 64GB, but at ~29GB it spilled over to the slow device.

Wido

> 
> [1]
> https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
> [2] osd.10 logs as an example
> https://pastebin.com/hC6w6jSn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: BlueFS spillover yet again

2020-02-05 Thread Vladimir Prokofev
Thank you for the insight.
> If you're using the default options for rocksdb, then the size of L3 will
be 25GB
Where this number comes from? Any documentation I can read?
I want to have a better understanding on how DB size is calculated.

ср, 5 февр. 2020 г. в 18:53, Moreno, Orlando :

> Hi Vladimir,
>
> If you're using the default options for rocksdb, then the size of L3 will
> be 25GB. Since your block-db is only 20GB and L3 can only be filled if the
> entire level's size is available, bluefs will begin spillover. Like Igor
> said, having 30GB+ is recommended if you want to host up to 3 levels of
> rocksdb in the SSD.
>
> Thanks,
> Orlando
>
> -Original Message-
> From: Igor Fedotov 
> Sent: Wednesday, February 5, 2020 7:04 AM
> To: Vladimir Prokofev ; ceph-users@ceph.io
> Subject: [ceph-users] Re: Fwd: BlueFS spillover yet again
>
> Hi Vladimir,
>
> there were a plenty of discussions/recommendations around db volume size
> selection here.
>
> In short it's advised to have DB volume of 30 - 64GB for most of use cases.
>
> Thanks,
>
> Igor
>
> On 2/5/2020 4:21 PM, Vladimir Prokofev wrote:
> > Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except
> > BlueFS spillover warning.
> > We create OSDs with ceph-deploy, command goes like this:
> > ceph-deploy osd create --bluestore --data /dev/sdf --block-db
> > /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal
> > are SSD partitions.
> > Default ceph-deploy settings created partitions ~1GB which is, of
> > course, too small. So we redeployed OSDs using manually partitioned
> > SSD for block-db/block-wal with sizes of 20G/5G respectively.
> > But now we still get BlueFS spillover warning for redeployed OSDs:
> >   osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB
> > used of
> > 19 GiB) to slow device
> >   osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB
> > used of
> > 19 GiB) to slow device
> >   osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB
> > used of
> > 19 GiB) to slow device
> > osd size is 1.8 TiB.
> >
> > These OSDs are used primarily for RBD as a backup drives, so a lot of
> > snapshots held there. They also have RGW pool assigned to them, but it
> > has no data.
> > I know of sizing recommendations[1] for block-db/block-wal, but I
> > assumed since it's primarily RBD 1%(~20G) should be enough.
> > Also, compaction stats doesn't make sense to me[2]. It states that sum
> > of DB is only 5.08GB, that should be placed on block-db without a
> problem?
> > Am I understanding all this wrong? Should block-db size be greater in
> > my case?
> >
> > [1]
> > https://docs.ceph.com/docs/master/rados/configuration/bluestore-config
> > -ref/#sizing
> > [2] osd.10 logs as an example
> > https://pastebin.com/hC6w6jSn
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: BlueFS spillover yet again

2020-02-05 Thread Moreno, Orlando
Hi Vladimir,

If you're using the default options for rocksdb, then the size of L3 will be 
25GB. Since your block-db is only 20GB and L3 can only be filled if the entire 
level's size is available, bluefs will begin spillover. Like Igor said, having 
30GB+ is recommended if you want to host up to 3 levels of rocksdb in the SSD.

Thanks,
Orlando

-Original Message-
From: Igor Fedotov  
Sent: Wednesday, February 5, 2020 7:04 AM
To: Vladimir Prokofev ; ceph-users@ceph.io
Subject: [ceph-users] Re: Fwd: BlueFS spillover yet again

Hi Vladimir,

there were a plenty of discussions/recommendations around db volume size 
selection here.

In short it's advised to have DB volume of 30 - 64GB for most of use cases.

Thanks,

Igor

On 2/5/2020 4:21 PM, Vladimir Prokofev wrote:
> Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except 
> BlueFS spillover warning.
> We create OSDs with ceph-deploy, command goes like this:
> ceph-deploy osd create --bluestore --data /dev/sdf --block-db 
> /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal 
> are SSD partitions.
> Default ceph-deploy settings created partitions ~1GB which is, of 
> course, too small. So we redeployed OSDs using manually partitioned 
> SSD for block-db/block-wal with sizes of 20G/5G respectively.
> But now we still get BlueFS spillover warning for redeployed OSDs:
>   osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB 
> used of
> 19 GiB) to slow device
>   osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB 
> used of
> 19 GiB) to slow device
>   osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB 
> used of
> 19 GiB) to slow device
> osd size is 1.8 TiB.
>
> These OSDs are used primarily for RBD as a backup drives, so a lot of 
> snapshots held there. They also have RGW pool assigned to them, but it 
> has no data.
> I know of sizing recommendations[1] for block-db/block-wal, but I 
> assumed since it's primarily RBD 1%(~20G) should be enough.
> Also, compaction stats doesn't make sense to me[2]. It states that sum 
> of DB is only 5.08GB, that should be placed on block-db without a problem?
> Am I understanding all this wrong? Should block-db size be greater in 
> my case?
>
> [1]
> https://docs.ceph.com/docs/master/rados/configuration/bluestore-config
> -ref/#sizing
> [2] osd.10 logs as an example
> https://pastebin.com/hC6w6jSn
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: BlueFS spillover yet again

2020-02-05 Thread Igor Fedotov

Hi Vladimir,

there were a plenty of discussions/recommendations around db volume size 
selection here.


In short it's advised to have DB volume of 30 - 64GB for most of use cases.

Thanks,

Igor

On 2/5/2020 4:21 PM, Vladimir Prokofev wrote:

Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except BlueFS
spillover warning.
We create OSDs with ceph-deploy, command goes like this:
ceph-deploy osd create --bluestore --data /dev/sdf --block-db /dev/sdb5
--block-wal /dev/sdb6 ceph-osd3
where block-db and block-wal are SSD partitions.
Default ceph-deploy settings created partitions ~1GB which is, of course,
too small. So we redeployed OSDs using manually partitioned SSD for
block-db/block-wal with sizes of 20G/5G respectively.
But now we still get BlueFS spillover warning for redeployed OSDs:
  osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB used of
19 GiB) to slow device
  osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB used of
19 GiB) to slow device
  osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB used of
19 GiB) to slow device
osd size is 1.8 TiB.

These OSDs are used primarily for RBD as a backup drives, so a lot of
snapshots held there. They also have RGW pool assigned to them, but it has
no data.
I know of sizing recommendations[1] for block-db/block-wal, but I assumed
since it's primarily RBD 1%(~20G) should be enough.
Also, compaction stats doesn't make sense to me[2]. It states that sum of
DB is only 5.08GB, that should be placed on block-db without a problem?
Am I understanding all this wrong? Should block-db size be greater in my
case?

[1]
https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
[2] osd.10 logs as an example
https://pastebin.com/hC6w6jSn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io