[ceph-users] Re: Low level bluestore usage

2020-09-23 Thread George Shuklin

On 23/09/2020 04:09, Alexander E. Patrakov wrote:


Sometimes this doesn't help. For data recovery purposes, the most
helpful step if you get the "bluefs enospc" error is to add a separate
db device, like this:

systemctl disable --now ceph-osd@${OSDID}
truncate -s 32G /junk/osd.${OSDID}-recover/block.db
sgdisk -n 0:0:0 /junk/osd.${OSDID}-recover/block.db
ceph-bluestore-tool \
 bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-${OSDID} \
 --dev-target /junk/osd.${OSDID}-recover/block.db \
 --bluestore-block-db-size=31G --bluefs-log-compact-min-size=31G

Of course you can use a real block device instead of just a file.

After that, export all PGs using ceph-objecttstore-tool and re-import
into a fresh OSD, then destroy or purge the full one.

Here is why the options:

--bluestore-block-db-size=31G: ceph-bluestore-tool refuses to do
anything if this option is not set to any value
--bluefs-log-compact-min-size=31G: make absolutely sure that log
compaction doesn't happen, because it would hit "bluefs enospc" again.



Oh, you went this way... I solved my 'pocket ceph' needs by exporting 
disk images (from files) via iscsi and mounting them back to localhost. 
That gives me a perfect 'scsi' devices which work exactly as in 
production. I have a little playbook (iscsi_loopback) to setup it on 
random scrap (including VMs) for development purposes. After iscsi is 
loopback-mounted, all other code works exactly the same as it would in 
production.


I've got this issue few times on small 10GB osds, so I moved to 15Gb and 
it become a less often. I never have had this in real-hardware tests 
with real disk sizes (>>100G per OSD).

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Low level bluestore usage

2020-09-22 Thread tri
You can also expand the OSD. ceph-bluestore-tool has an option for expansion of 
the OSD. I'm not 100% sure if that would solve the rockdb out of space issue. I 
think it will, though.  If not, you can move rockdb to a separate block device.

September 22, 2020 7:31 PM, "George Shuklin"  wrote:

> As far as I know, bluestore doesn't like super small sizes. Normally odd
> should stop doing funny things as full mark, but if device is too small it
> may be too late and bluefs run out of space.
> 
> Two things:
> 1. Don't use too small osd
> 2. Have a spare area on the drive. I usually reserve 1% for emergency
> extension (and to give ssd firmware a bit if space to breath).
> 
> On Wed, Sep 23, 2020, 01:03 Ivan Kurnosov  wrote:
> 
>> Hi,
>> 
>> this morning I woke up to a degraded test ceph cluster (managed by rook,
>> but it does not really change anything for the question I'm about to ask).
>> 
>> After checking logs I have found that bluestore on one of the OSDs run out
>> of space.
>> 
>> Some cluster details:
>> 
>> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
>> (stable)
>> it runs on 3 little OSDs 10Gb each
>> 
>> `ceph osd df` returned RAW USE of about 4.5GB on every node, happily
>> reporting about 5.5GB of AVAIL.
>> 
>> Yet:
>> 
>> ...
>> So, my question would be: how could I have prevented that? From monitoring
>> I have (prometheus) - OSDs are healthy, have plenty of space, yet they are
>> not.
>> 
>> What command (and prometheus metric) would help me understand the actual
>> real bluestore use? Or am I missing something?
>> 
>> Oh, and I "fixed" the cluster by expanding the broken osd.0 with a larger
>> 15GB volume. And 2 other OSDs still run on 10GB volumes.
>> 
>> Thanks in advance for any thoughts.
>> 
>> --
>> With best regards, Ivan Kurnosov
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Low level bluestore usage

2020-09-22 Thread Alexander E. Patrakov
On Wed, Sep 23, 2020 at 3:03 AM Ivan Kurnosov  wrote:
>
> Hi,
>
> this morning I woke up to a degraded test ceph cluster (managed by rook,
> but it does not really change anything for the question I'm about to ask).
>
> After checking logs I have found that bluestore on one of the OSDs run out
> of space.

I think this is a consequence, and the real error is something else
that happened before.

The problem is that, if the cluster is unhealthy, the MON storage
accumulates a lot of osdmaps and pgmaps, and is not cleaned up
automatically, because the MONs think that these old versions might be
needed. And OSDs also get a copy of these osdmaps and pgmaps, if I
understand correctly, that's why small OSDs get quickly filled up if
the cluster stays unhealthy for a few hours.

> So, my question would be: how could I have prevented that? From monitoring
> I have (prometheus) - OSDs are healthy, have plenty of space, yet they are
> not.
>
> What command (and prometheus metric) would help me understand the actual
> real bluestore use? Or am I missing something?

You can fix monitoring by setting the "mon data size warn" to
something like 1 GB or even less.

> Oh, and I "fixed" the cluster by expanding the broken osd.0 with a larger
> 15GB volume. And 2 other OSDs still run on 10GB volumes.

Sometimes this doesn't help. For data recovery purposes, the most
helpful step if you get the "bluefs enospc" error is to add a separate
db device, like this:

systemctl disable --now ceph-osd@${OSDID}
truncate -s 32G /junk/osd.${OSDID}-recover/block.db
sgdisk -n 0:0:0 /junk/osd.${OSDID}-recover/block.db
ceph-bluestore-tool \
bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-${OSDID} \
--dev-target /junk/osd.${OSDID}-recover/block.db \
--bluestore-block-db-size=31G --bluefs-log-compact-min-size=31G

Of course you can use a real block device instead of just a file.

After that, export all PGs using ceph-objecttstore-tool and re-import
into a fresh OSD, then destroy or purge the full one.

Here is why the options:

--bluestore-block-db-size=31G: ceph-bluestore-tool refuses to do
anything if this option is not set to any value
--bluefs-log-compact-min-size=31G: make absolutely sure that log
compaction doesn't happen, because it would hit "bluefs enospc" again.

-- 
Alexander E. Patrakov
CV: http://pc.cd/PLz7
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Low level bluestore usage

2020-09-22 Thread George Shuklin
As far as I know, bluestore doesn't like super small sizes. Normally odd
should stop doing funny things as full mark, but if device is too small it
may be too late and bluefs run out of space.

Two things:
1. Don't use too small osd
2. Have a spare area on the drive. I usually reserve 1% for emergency
extension (and to give ssd firmware a bit if space to breath).


On Wed, Sep 23, 2020, 01:03 Ivan Kurnosov  wrote:

> Hi,
>
> this morning I woke up to a degraded test ceph cluster (managed by rook,
> but it does not really change anything for the question I'm about to ask).
>
> After checking logs I have found that bluestore on one of the OSDs run out
> of space.
>
> Some cluster details:
>
> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
> (stable)
> it runs on 3 little OSDs 10Gb each
>
> `ceph osd df` returned RAW USE of about 4.5GB on every node, happily
> reporting about 5.5GB of AVAIL.
>
> Yet:
>
> ...
> So, my question would be: how could I have prevented that? From monitoring
> I have (prometheus) - OSDs are healthy, have plenty of space, yet they are
> not.
>
> What command (and prometheus metric) would help me understand the actual
> real bluestore use? Or am I missing something?
>
> Oh, and I "fixed" the cluster by expanding the broken osd.0 with a larger
> 15GB volume. And 2 other OSDs still run on 10GB volumes.
>
> Thanks in advance for any thoughts.
>
>
> --
> With best regards, Ivan Kurnosov
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io