[ceph-users] Error on Ceph Dashboard

2021-06-09 Thread Robert W. Eckert
Hi - this just started happening in the past few days using Ceph Pacific 16.2.4 
via cephadmin (Podman containers)
The dashboard is returning

No active ceph-mgr instance is currently running the dashboard. A failover may 
be in progress. Retrying in 5 seconds...

And ceph status returns

  cluster:
id: fe3a7cb0-69ca-11eb-8d45-c86000d08867
health: HEALTH_WARN
Module 'dashboard' has failed dependency: cannot import name 
'AuthManager'
clock skew detected on mon.cube

  services:
mon: 3 daemons, quorum story,cube,rhel1 (age 46h)
mgr: cube.tvlgnp(active, since 47h), standbys: rhel1.zpzsjc, story.gffann
mds: 2/2 daemons up, 1 standby
osd: 13 osds: 13 up (since 46h), 13 in (since 46h)
rgw: 3 daemons active (3 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   11 pools, 497 pgs
objects: 1.50M objects, 2.1 TiB
usage:   6.2 TiB used, 32 TiB / 38 TiB avail
pgs: 497 active+clean

  io:
client:   255 B/s rd, 2.7 KiB/s wr, 0 op/s rd, 0 op/s wr

The only thing that has happened on the cluster was one of the servers was 
rebooted.  No configuration changes were performed

Any suggestions?

Thanks,
rob
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is it safe to mix Octopus and Pacific mons?

2021-06-09 Thread Vladimir Brik

Hello

My attempt to upgrade from Octopus to Pacific ran into 
issues, and I currently have one 16.2.4 mon and two 15.2.12 
mons. Is this safe to run the cluster like this or should I 
shut down the 16.2.4 mon until I figure out what to do next 
with the upgrade?


Thanks,

Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance (RBD) regression after upgrading beyond v15.2.8

2021-06-09 Thread Mark Nelson

On 6/9/21 10:48 AM, Wido den Hollander wrote:



On 09/06/2021 14:33, Ilya Dryomov wrote:

On Wed, Jun 9, 2021 at 1:38 PM Wido den Hollander  wrote:


Hi,

While doing some benchmarks I have two identical Ceph clusters:

3x SuperMicro 1U
AMD Epyc 7302P 16C
256GB DDR
4x Samsung PM983 1,92TB
100Gbit networking

I tested on such a setup with v16.2.4 with fio:

bs=4k
qd=1

IOps: 695

That was very low as I was expecting at least >1000 IOps.

I checked with the second Ceph cluster which was still running v15.2.8,
the result: 1364 IOps.

I then upgraded from 15.2.8 to 15.2.13: 725 IOps

Looking at the differences between v15.2.8 and v15.2.8 of options.cc I
saw these options:

bluefs_buffered_io: false -> true
bluestore_cache_trim_max_skip_pinned: 1000 -> 64

The main difference seems to be 'bluefs_buffered_io', but in both cases
this was already explicitly set to 'true'.

So anything beyond 15.2.8 is right now giving me a much lower I/O
performance with Queue Depth = 1 and Block Size = 4k.

15.2.8: 1364 IOps
15.2.13: 725 IOps
16.2.4: 695 IOps

Has anybody else seen this as well? I'm trying to figure out where this
is going wrong.


Hi Wido,

Going by the subject, I assume these are rbd numbers?  If so, did you
run any RADOS-level benchmarks?


Yes, rbd benchmark using fio.

$ rados -p rbd -t 1 -O 4096 -b 4096 bench 60 write

Average IOPS:   1024
Stddev IOPS:    29.6598
Max IOPS:   1072
Min IOPS:   918
Average Latency(s): 0.00097
Stddev Latency(s):  0.000306557

So that seems kind of OK. Still roughly 1k IOps and a write latency of 
~1ms.


But that was ~0.75ms when writing through RBD.

I now have a 16.2.4 and 15.2.13 cluster with identical hardware to run 
some benchmarks on.


Wido



Good job narrowing it down so far.  Are you testing with fio on a real 
file system backed by RBD or librbd directly?  It would be worth trying 
librbd directly if possible, and also disable rbd_cache.  Let's try to 
get this as close to the OSD as possible.



The OSD has been a little tempermental lately when using it, but gdbpmp 
(or Adam's wallclock profiler) might be helpful in figuring out what the 
OSD is spending time on in both cases.



Mark






Thanks,

 Ilya


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it safe to mix Octopus and Pacific mons?

2021-06-09 Thread Wido den Hollander



On 6/9/21 8:51 PM, Vladimir Brik wrote:
> Hello
> 
> My attempt to upgrade from Octopus to Pacific ran into issues, and I
> currently have one 16.2.4 mon and two 15.2.12 mons. Is this safe to run
> the cluster like this or should I shut down the 16.2.4 mon until I
> figure out what to do next with the upgrade?
> 

I would try to keep it as short as possible, but I would *not* choose to
shut down a MON.

What is preventing you from updating the other MONs to v16?

Wido

> Thanks,
> 
> Vlad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrade to 16 failed: wrong /sys/fs/cgroup path

2021-06-09 Thread Vladimir Brik

Hello

My upgrade from 15.2.12 to 16.2.4 is stuck because a mon 
daemon failed to upgrade. Systemctl status of the mon showed 
this error:


Error: open /sys/fs/cgroup/cpuacct,cpu/system.slice/...

It turns out there is no /sys/fs/cgroup/cpuacct,cpu 
directory on my system. Instead, I have 
/sys/fs/cgroup/cpu,cpuacct. Symlinking them appears to have 
solved the immediate problem, but if I proceed with the 
upgrade to 16.2.4, after reboot, all ceph daemons will 
probably fail to start.


Is this an issue with ceph, podman (2.1.1), or the fact that 
I am running Centos7?


Is it possible to upgrade from 15.2.12 to 16.2.4 on Centos7? 
I thought that installing a version of podman that is 
compatible with both would suffice, but apparently not...



Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] delete stray OSD daemon after replacing disk

2021-06-09 Thread mabi
Hello,

I replaced an OSD disk on one of my Nautilus OSD node which created a new osd 
number. Now ceph shows that there is one cephadm stray daemon (the old OSD #1 
which I replaced) and which I can't remove as you can see below:

# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.1 on host ceph1e not managed by cephadm

# ceph orch daemon rm osd.1 --force
Error EINVAL: Unable to find daemon(s) ['osd.1']

Is there another command I am missing?

Best regards,
Mabi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance (RBD) regression after upgrading beyond v15.2.8

2021-06-09 Thread Wido den Hollander




On 09/06/2021 14:33, Ilya Dryomov wrote:

On Wed, Jun 9, 2021 at 1:38 PM Wido den Hollander  wrote:


Hi,

While doing some benchmarks I have two identical Ceph clusters:

3x SuperMicro 1U
AMD Epyc 7302P 16C
256GB DDR
4x Samsung PM983 1,92TB
100Gbit networking

I tested on such a setup with v16.2.4 with fio:

bs=4k
qd=1

IOps: 695

That was very low as I was expecting at least >1000 IOps.

I checked with the second Ceph cluster which was still running v15.2.8,
the result: 1364 IOps.

I then upgraded from 15.2.8 to 15.2.13: 725 IOps

Looking at the differences between v15.2.8 and v15.2.8 of options.cc I
saw these options:

bluefs_buffered_io: false -> true
bluestore_cache_trim_max_skip_pinned: 1000 -> 64

The main difference seems to be 'bluefs_buffered_io', but in both cases
this was already explicitly set to 'true'.

So anything beyond 15.2.8 is right now giving me a much lower I/O
performance with Queue Depth = 1 and Block Size = 4k.

15.2.8: 1364 IOps
15.2.13: 725 IOps
16.2.4: 695 IOps

Has anybody else seen this as well? I'm trying to figure out where this
is going wrong.


Hi Wido,

Going by the subject, I assume these are rbd numbers?  If so, did you
run any RADOS-level benchmarks?


Yes, rbd benchmark using fio.

$ rados -p rbd -t 1 -O 4096 -b 4096 bench 60 write

Average IOPS:   1024
Stddev IOPS:29.6598
Max IOPS:   1072
Min IOPS:   918
Average Latency(s): 0.00097
Stddev Latency(s):  0.000306557

So that seems kind of OK. Still roughly 1k IOps and a write latency of ~1ms.

But that was ~0.75ms when writing through RBD.

I now have a 16.2.4 and 15.2.13 cluster with identical hardware to run 
some benchmarks on.


Wido



Thanks,

 Ilya


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance (RBD) regression after upgrading beyond v15.2.8

2021-06-09 Thread Ilya Dryomov
On Wed, Jun 9, 2021 at 1:38 PM Wido den Hollander  wrote:
>
> Hi,
>
> While doing some benchmarks I have two identical Ceph clusters:
>
> 3x SuperMicro 1U
> AMD Epyc 7302P 16C
> 256GB DDR
> 4x Samsung PM983 1,92TB
> 100Gbit networking
>
> I tested on such a setup with v16.2.4 with fio:
>
> bs=4k
> qd=1
>
> IOps: 695
>
> That was very low as I was expecting at least >1000 IOps.
>
> I checked with the second Ceph cluster which was still running v15.2.8,
> the result: 1364 IOps.
>
> I then upgraded from 15.2.8 to 15.2.13: 725 IOps
>
> Looking at the differences between v15.2.8 and v15.2.8 of options.cc I
> saw these options:
>
> bluefs_buffered_io: false -> true
> bluestore_cache_trim_max_skip_pinned: 1000 -> 64
>
> The main difference seems to be 'bluefs_buffered_io', but in both cases
> this was already explicitly set to 'true'.
>
> So anything beyond 15.2.8 is right now giving me a much lower I/O
> performance with Queue Depth = 1 and Block Size = 4k.
>
> 15.2.8: 1364 IOps
> 15.2.13: 725 IOps
> 16.2.4: 695 IOps
>
> Has anybody else seen this as well? I'm trying to figure out where this
> is going wrong.

Hi Wido,

Going by the subject, I assume these are rbd numbers?  If so, did you
run any RADOS-level benchmarks?

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nautilus: rbd ls returns ENOENT for some images

2021-06-09 Thread Ilya Dryomov
On Wed, Jun 9, 2021 at 1:36 PM Peter Lieven  wrote:
>
> Am 09.06.21 um 13:28 schrieb Ilya Dryomov:
> > On Wed, Jun 9, 2021 at 11:24 AM Peter Lieven  wrote:
> >> Hi,
> >>
> >>
> >> we currently run into an issue where a rbd ls for a namespace returns 
> >> ENOENT for some of the images in that namespace.
> >>
> >>
> >> /usr/bin/rbd --conf=XXX --id XXX ls 
> >> 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
> >> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55ca2390 fail: (2) No such file or directory
> >> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55caccd2b920 fail: (2) No such file or directory
> >> 2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55caccd9b4e0 fail: (2) No such file or directory
> >> rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such 
> >> file or directory
> >> rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such 
> >> file or directory
> >> rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such 
> >> file or directory
> >> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55cacce07b00 fail: (2) No such file or directory
> >> rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such 
> >> file or directory
> >> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55caccd7cce0 fail: (2) No such file or directory
> >> rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such 
> >> file or directory
> >> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> >> 0x55cacce336f0 fail: (2) No such file or directory
> >> rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such 
> >> file or directory
> >> [{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":3145728,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
> >> rbd: listing images failed: (2) No such file or directory
> >>
> >>
> >> The way to trigger this state was that the images which show "No such file 
> >> or directory" were deleted with rbd rm, but the operation was interrupted 
> >> (rbd process was killed) due to a timeout.
> >>
> >> What is the best way to recover from this and how to properly clean up?
> >>
> >>
> >> Release is nautilus 14.2.20
> > Hi Peter,
> >
> > Does "rbd ls" without "-l" succeed?
>
>
> Yes, it does:
>
>
> /usr/bin/rbd --conf=XXX --id XXX ls 
> 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' --format=json
>
>  
> ["108600c6-2312-4d61-9f5b-35b351112512.raw","1292ef0c-2333-44f1-be30-39105f7d176e.raw","8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","34810ac2-3112-4fef-938c-b76338b0eeaf.raw","c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw","5d5251d1-f017-4382-845c-65e504683742.raw","c625b898-ec34-4446-9455-d2b70d9e378f.raw","990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw","7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw"]

I think simply re-running interrupted "rbd rm" commands would work and
clean up properly.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD bootstrap time

2021-06-09 Thread Konstantin Shalygin
This is new min_alloc_size for bluestore. 4K mkfs required more time and 
process is single threaded I think
It's normal


k

> On 9 Jun 2021, at 14:21, Jan-Philipp Litza  wrote:
> 
> I mean freshly deployed OSDs. Restarted OSDs don't exhibit that behavior.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Performance (RBD) regression after upgrading beyond v15.2.8

2021-06-09 Thread Wido den Hollander

Hi,

While doing some benchmarks I have two identical Ceph clusters:

3x SuperMicro 1U
AMD Epyc 7302P 16C
256GB DDR
4x Samsung PM983 1,92TB
100Gbit networking

I tested on such a setup with v16.2.4 with fio:

bs=4k
qd=1

IOps: 695

That was very low as I was expecting at least >1000 IOps.

I checked with the second Ceph cluster which was still running v15.2.8, 
the result: 1364 IOps.


I then upgraded from 15.2.8 to 15.2.13: 725 IOps

Looking at the differences between v15.2.8 and v15.2.8 of options.cc I 
saw these options:


bluefs_buffered_io: false -> true
bluestore_cache_trim_max_skip_pinned: 1000 -> 64

The main difference seems to be 'bluefs_buffered_io', but in both cases 
this was already explicitly set to 'true'.


So anything beyond 15.2.8 is right now giving me a much lower I/O 
performance with Queue Depth = 1 and Block Size = 4k.


15.2.8: 1364 IOps
15.2.13: 725 IOps
16.2.4: 695 IOps

Has anybody else seen this as well? I'm trying to figure out where this 
is going wrong.


Wido
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nautilus: rbd ls returns ENOENT for some images

2021-06-09 Thread Peter Lieven
Am 09.06.21 um 13:28 schrieb Ilya Dryomov:
> On Wed, Jun 9, 2021 at 11:24 AM Peter Lieven  wrote:
>> Hi,
>>
>>
>> we currently run into an issue where a rbd ls for a namespace returns ENOENT 
>> for some of the images in that namespace.
>>
>>
>> /usr/bin/rbd --conf=XXX --id XXX ls 
>> 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
>> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55ca2390 fail: (2) No such file or directory
>> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55caccd2b920 fail: (2) No such file or directory
>> 2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55caccd9b4e0 fail: (2) No such file or directory
>> rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such 
>> file or directory
>> rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such 
>> file or directory
>> rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such 
>> file or directory
>> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55cacce07b00 fail: (2) No such file or directory
>> rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such 
>> file or directory
>> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55caccd7cce0 fail: (2) No such file or directory
>> rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such 
>> file or directory
>> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
>> 0x55cacce336f0 fail: (2) No such file or directory
>> rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such 
>> file or directory
>> [{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":3145728,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
>> rbd: listing images failed: (2) No such file or directory
>>
>>
>> The way to trigger this state was that the images which show "No such file 
>> or directory" were deleted with rbd rm, but the operation was interrupted 
>> (rbd process was killed) due to a timeout.
>>
>> What is the best way to recover from this and how to properly clean up?
>>
>>
>> Release is nautilus 14.2.20
> Hi Peter,
>
> Does "rbd ls" without "-l" succeed?


Yes, it does:


/usr/bin/rbd --conf=XXX --id XXX ls 
'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' --format=json

 
["108600c6-2312-4d61-9f5b-35b351112512.raw","1292ef0c-2333-44f1-be30-39105f7d176e.raw","8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","34810ac2-3112-4fef-938c-b76338b0eeaf.raw","c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw","5d5251d1-f017-4382-845c-65e504683742.raw","c625b898-ec34-4446-9455-d2b70d9e378f.raw","990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw","7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw"]


Peter


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nautilus: rbd ls returns ENOENT for some images

2021-06-09 Thread Ilya Dryomov
On Wed, Jun 9, 2021 at 11:24 AM Peter Lieven  wrote:
>
> Hi,
>
>
> we currently run into an issue where a rbd ls for a namespace returns ENOENT 
> for some of the images in that namespace.
>
>
> /usr/bin/rbd --conf=XXX --id XXX ls 
> 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55ca2390 fail: (2) No such file or directory
> 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55caccd2b920 fail: (2) No such file or directory
> 2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55caccd9b4e0 fail: (2) No such file or directory
> rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such file 
> or directory
> rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such file 
> or directory
> rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such file 
> or directory
> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55cacce07b00 fail: (2) No such file or directory
> rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such file 
> or directory
> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55caccd7cce0 fail: (2) No such file or directory
> rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such file 
> or directory
> 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
> 0x55cacce336f0 fail: (2) No such file or directory
> rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such file 
> or directory
> [{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":3145728,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
> rbd: listing images failed: (2) No such file or directory
>
>
> The way to trigger this state was that the images which show "No such file or 
> directory" were deleted with rbd rm, but the operation was interrupted (rbd 
> process was killed) due to a timeout.
>
> What is the best way to recover from this and how to properly clean up?
>
>
> Release is nautilus 14.2.20

Hi Peter,

Does "rbd ls" without "-l" succeed?

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD bootstrap time

2021-06-09 Thread Jan-Philipp Litza
Hi Konstantin,

I mean freshly deployed OSDs. Restarted OSDs don't exhibit that behavior.

Best regards,
Jan-Philipp
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD bootstrap time

2021-06-09 Thread Jan-Philipp Litza
Hi Rich,

> I've noticed this a couple of times on Nautilus after doing some large
> backfill operations. It seems the osd map doesn't clear properly after
> the cluster returns to Health OK and builds up on the mons. I do a
> "du" on the mon folder e.g. du -shx /var/lib/ceph/mon/ and this shows
> several GB of data.

It does, almost 8 GB for <300 OSDs, which increased several-fold over
the last weeks (since we started upgrading Nautilus->Pacific). However,
I didn't think much of it after reading in the docs about the hardware
recommendations that require at least 60 GB per ceph-mon [1].

> I give all my mgrs and mons a restart and after a few minutes I can
> see this osd map data getting purged from the mons. After a while it
> should be back to a few hundred MB (depending on cluster size).
> This may not be the problem in your case, but an easy thing to try.
> Note, if your cluster is being held in Warning or Error by something
> this can also explain the osd maps not clearing. Make sure you get the
> cluster back to health OK first.

Thanks for the suggestion, will try that once we reach HEALTH_OK.

Best regards,
Jan-Philipp

[1]:
https://docs.ceph.com/en/latest/start/hardware-recommendations/#minimum-hardware-recommendations
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2021-06-09 Thread Igor Fedotov

Should we fire another ticket for that?

On 6/9/2021 8:39 AM, Konstantin Shalygin wrote:
Stored==used was resolved for this cluster. Actually problem is what 
you was discover in previous year: zero's. Filestore lack of META 
counter - always zero. When I purged last drained OSD from cluster - 
statistics becomes to normal immediately




Thanks,
k

On 20 May 2021, at 21:22, Dan van der Ster > wrote:


I can confirm that we still occasionally see stored==used even with 
14.2.21, but I didn't have time yet to debug the pattern behind the 
observations. I'll let you know if we find anything useful.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] omap sizes

2021-06-09 Thread Szabo, Istvan (Agoda)
Hi,

Is there a way to check the omap sizes in the index pool? Either key numbers 
and size also?

This one doesn't work: https://ceph.com/geen-categorie/get-omap-keyvalue-size/

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] nautilus: rbd ls returns ENOENT for some images

2021-06-09 Thread Peter Lieven
Hi,


we currently run into an issue where a rbd ls for a namespace returns ENOENT 
for some of the images in that namespace.


/usr/bin/rbd --conf=XXX --id XXX ls 
'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55ca2390 fail: (2) No such file or directory
2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55caccd2b920 fail: (2) No such file or directory
2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55caccd9b4e0 fail: (2) No such file or directory
rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such file 
or directory
rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such file 
or directory
rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such file 
or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55cacce07b00 fail: (2) No such file or directory
rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such file 
or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55caccd7cce0 fail: (2) No such file or directory
rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such file 
or directory
2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
0x55cacce336f0 fail: (2) No such file or directory
rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such file 
or directory
[{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":3145728,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
rbd: listing images failed: (2) No such file or directory


The way to trigger this state was that the images which show "No such file or 
directory" were deleted with rbd rm, but the operation was interrupted (rbd 
process was killed) due to a timeout.

What is the best way to recover from this and how to properly clean up?


Release is nautilus 14.2.20


Thanks,

Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD bootstrap time

2021-06-09 Thread Konstantin Shalygin
Hi,

You mean new fresh deployed OSD's or old just restarted OSD's?


Thanks,
k

Sent from my iPhone

> On 8 Jun 2021, at 23:30, Jan-Philipp Litza  wrote:
> 
> recently I'm noticing that starting OSDs for the first time takes ages
> (like, more than an hour) before they are even picked up by the monitors
> as "up" and start backfilling. I'm not entirely sure if this is a new
> phenomenon or if it always was that way. Either way, I'd like to
> understand why.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io