[ceph-users] Re: Fwd: BlueFS spillover yet again
Den ons 5 feb. 2020 kl 17:27 skrev Vladimir Prokofev : > Thank you for the insight. > > If you're using the default options for rocksdb, then the size of L3 will > be 25GB > Where this number comes from? Any documentation I can read? > I want to have a better understanding on how DB size is calculated. > > https://github.com/facebook/rocksdb/wiki/Leveled-Compaction Those pageson the right-side menu have all (and more!) you need to know about rocksdb, which ceph uses. > > If you're using the default options for rocksdb, then the size of L3 will > > be 25GB. Since your block-db is only 20GB and L3 can only be filled if > the > > entire level's size is available, bluefs will begin spillover. Like Igor > > said, having 30GB+ is recommended if you want to host up to 3 levels of > > rocksdb in the SSD. > > -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Fwd: BlueFS spillover yet again
On 2/5/20 2:21 PM, Vladimir Prokofev wrote: > Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except BlueFS > spillover warning. > We create OSDs with ceph-deploy, command goes like this: > ceph-deploy osd create --bluestore --data /dev/sdf --block-db /dev/sdb5 > --block-wal /dev/sdb6 ceph-osd3 > where block-db and block-wal are SSD partitions. > Default ceph-deploy settings created partitions ~1GB which is, of course, > too small. So we redeployed OSDs using manually partitioned SSD for > block-db/block-wal with sizes of 20G/5G respectively. > But now we still get BlueFS spillover warning for redeployed OSDs: > osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB used of > 19 GiB) to slow device > osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB used of > 19 GiB) to slow device > osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB used of > 19 GiB) to slow device > osd size is 1.8 TiB. > > These OSDs are used primarily for RBD as a backup drives, so a lot of > snapshots held there. They also have RGW pool assigned to them, but it has > no data. > I know of sizing recommendations[1] for block-db/block-wal, but I assumed > since it's primarily RBD 1%(~20G) should be enough. > Also, compaction stats doesn't make sense to me[2]. It states that sum of > DB is only 5.08GB, that should be placed on block-db without a problem? > Am I understanding all this wrong? Should block-db size be greater in my > case? Have you tried a compact? I saw this today as well. I tried a compact and that reduced the DB <28GB causing it fit on the DB again. The DB is 64GB, but at ~29GB it spilled over to the slow device. Wido > > [1] > https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing > [2] osd.10 logs as an example > https://pastebin.com/hC6w6jSn > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Fwd: BlueFS spillover yet again
Thank you for the insight. > If you're using the default options for rocksdb, then the size of L3 will be 25GB Where this number comes from? Any documentation I can read? I want to have a better understanding on how DB size is calculated. ср, 5 февр. 2020 г. в 18:53, Moreno, Orlando : > Hi Vladimir, > > If you're using the default options for rocksdb, then the size of L3 will > be 25GB. Since your block-db is only 20GB and L3 can only be filled if the > entire level's size is available, bluefs will begin spillover. Like Igor > said, having 30GB+ is recommended if you want to host up to 3 levels of > rocksdb in the SSD. > > Thanks, > Orlando > > -Original Message- > From: Igor Fedotov > Sent: Wednesday, February 5, 2020 7:04 AM > To: Vladimir Prokofev ; ceph-users@ceph.io > Subject: [ceph-users] Re: Fwd: BlueFS spillover yet again > > Hi Vladimir, > > there were a plenty of discussions/recommendations around db volume size > selection here. > > In short it's advised to have DB volume of 30 - 64GB for most of use cases. > > Thanks, > > Igor > > On 2/5/2020 4:21 PM, Vladimir Prokofev wrote: > > Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except > > BlueFS spillover warning. > > We create OSDs with ceph-deploy, command goes like this: > > ceph-deploy osd create --bluestore --data /dev/sdf --block-db > > /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal > > are SSD partitions. > > Default ceph-deploy settings created partitions ~1GB which is, of > > course, too small. So we redeployed OSDs using manually partitioned > > SSD for block-db/block-wal with sizes of 20G/5G respectively. > > But now we still get BlueFS spillover warning for redeployed OSDs: > > osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB > > used of > > 19 GiB) to slow device > > osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB > > used of > > 19 GiB) to slow device > > osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB > > used of > > 19 GiB) to slow device > > osd size is 1.8 TiB. > > > > These OSDs are used primarily for RBD as a backup drives, so a lot of > > snapshots held there. They also have RGW pool assigned to them, but it > > has no data. > > I know of sizing recommendations[1] for block-db/block-wal, but I > > assumed since it's primarily RBD 1%(~20G) should be enough. > > Also, compaction stats doesn't make sense to me[2]. It states that sum > > of DB is only 5.08GB, that should be placed on block-db without a > problem? > > Am I understanding all this wrong? Should block-db size be greater in > > my case? > > > > [1] > > https://docs.ceph.com/docs/master/rados/configuration/bluestore-config > > -ref/#sizing > > [2] osd.10 logs as an example > > https://pastebin.com/hC6w6jSn > > ___ > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > > email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Fwd: BlueFS spillover yet again
Hi Vladimir, If you're using the default options for rocksdb, then the size of L3 will be 25GB. Since your block-db is only 20GB and L3 can only be filled if the entire level's size is available, bluefs will begin spillover. Like Igor said, having 30GB+ is recommended if you want to host up to 3 levels of rocksdb in the SSD. Thanks, Orlando -Original Message- From: Igor Fedotov Sent: Wednesday, February 5, 2020 7:04 AM To: Vladimir Prokofev ; ceph-users@ceph.io Subject: [ceph-users] Re: Fwd: BlueFS spillover yet again Hi Vladimir, there were a plenty of discussions/recommendations around db volume size selection here. In short it's advised to have DB volume of 30 - 64GB for most of use cases. Thanks, Igor On 2/5/2020 4:21 PM, Vladimir Prokofev wrote: > Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except > BlueFS spillover warning. > We create OSDs with ceph-deploy, command goes like this: > ceph-deploy osd create --bluestore --data /dev/sdf --block-db > /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal > are SSD partitions. > Default ceph-deploy settings created partitions ~1GB which is, of > course, too small. So we redeployed OSDs using manually partitioned > SSD for block-db/block-wal with sizes of 20G/5G respectively. > But now we still get BlueFS spillover warning for redeployed OSDs: > osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB > used of > 19 GiB) to slow device > osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB > used of > 19 GiB) to slow device > osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB > used of > 19 GiB) to slow device > osd size is 1.8 TiB. > > These OSDs are used primarily for RBD as a backup drives, so a lot of > snapshots held there. They also have RGW pool assigned to them, but it > has no data. > I know of sizing recommendations[1] for block-db/block-wal, but I > assumed since it's primarily RBD 1%(~20G) should be enough. > Also, compaction stats doesn't make sense to me[2]. It states that sum > of DB is only 5.08GB, that should be placed on block-db without a problem? > Am I understanding all this wrong? Should block-db size be greater in > my case? > > [1] > https://docs.ceph.com/docs/master/rados/configuration/bluestore-config > -ref/#sizing > [2] osd.10 logs as an example > https://pastebin.com/hC6w6jSn > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Fwd: BlueFS spillover yet again
Hi Vladimir, there were a plenty of discussions/recommendations around db volume size selection here. In short it's advised to have DB volume of 30 - 64GB for most of use cases. Thanks, Igor On 2/5/2020 4:21 PM, Vladimir Prokofev wrote: Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except BlueFS spillover warning. We create OSDs with ceph-deploy, command goes like this: ceph-deploy osd create --bluestore --data /dev/sdf --block-db /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal are SSD partitions. Default ceph-deploy settings created partitions ~1GB which is, of course, too small. So we redeployed OSDs using manually partitioned SSD for block-db/block-wal with sizes of 20G/5G respectively. But now we still get BlueFS spillover warning for redeployed OSDs: osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB used of 19 GiB) to slow device osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB used of 19 GiB) to slow device osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB used of 19 GiB) to slow device osd size is 1.8 TiB. These OSDs are used primarily for RBD as a backup drives, so a lot of snapshots held there. They also have RGW pool assigned to them, but it has no data. I know of sizing recommendations[1] for block-db/block-wal, but I assumed since it's primarily RBD 1%(~20G) should be enough. Also, compaction stats doesn't make sense to me[2]. It states that sum of DB is only 5.08GB, that should be placed on block-db without a problem? Am I understanding all this wrong? Should block-db size be greater in my case? [1] https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing [2] osd.10 logs as an example https://pastebin.com/hC6w6jSn ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io