[ceph-users] Re: Octopus on Ubuntu 20.04.6 LTS with kernel 5

2023-05-10 Thread Szabo, Istvan (Agoda)
I can answer my question, even in the official ubuntu repo they are using by 
default the octopus version so for sure it works with kernel 5. 

https://packages.ubuntu.com/focal/allpackages


-Original Message-
From: Szabo, Istvan (Agoda)  
Sent: Thursday, May 11, 2023 11:20 AM
To: Ceph Users 
Subject: [ceph-users] Octopus on Ubuntu 20.04.6 LTS with kernel 5

Hi,

In octopus documentation we can see kernel 4 as recommended, however we've 
changed our test cluster yesterday from centos 7 / 8 to Ubuntu 20.04.6 LTS with 
kernel 5.4.0-148 and seems working, I just want to make sure before I move to 
prod there isn't any caveats.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Octopus on Ubuntu 20.04.6 LTS with kernel 5

2023-05-10 Thread Szabo, Istvan (Agoda)
Hi,

In octopus documentation we can see kernel 4 as recommended, however we've 
changed our test cluster yesterday from centos 7 / 8 to Ubuntu 20.04.6 LTS with 
kernel 5.4.0-148 and seems working, I just want to make sure before I move to 
prod there isn't any caveats.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Xiubo Li

Hey Frank,

On 5/10/23 21:44, Frank Schilder wrote:

The kernel message that shows up on boot on the file server in text format:

May 10 13:56:59 rit-pfile01 kernel: WARNING: CPU: 3 PID: 34 at 
fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Modules linked in: ceph libceph 
dns_resolver nls_utf8 isofs cirrus drm_shmem_helper intel_rapl_msr iTCO_wdt 
intel_rapl_common iTCO_vendor_support drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops pcspkr joydev virtio_net drm i2c_i801 net_failover 
virtio_balloon failover lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc 
sr_mod cdrom sg xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci 
libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console 
virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
May 10 13:56:59 rit-pfile01 kernel: CPU: 3 PID: 34 Comm: kworker/3:0 Not 
tainted 4.18.0-486.el8.x86_64 #1
May 10 13:56:59 rit-pfile01 kernel: Hardware name: Red Hat KVM/RHEL-AV, BIOS 
1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
May 10 13:56:59 rit-pfile01 kernel: Workqueue: ceph-msgr ceph_con_workfn 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Code: c0 48 c7 c7 c0 69 7f c0 e8 6c 4c 72 c3 0f 
0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 c9 fc ff ff 
<0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 00
May 10 13:56:59 rit-pfile01 kernel: RSP: 0018:a4d000d87b48 EFLAGS: 00010217
May 10 13:56:59 rit-pfile01 kernel: RAX:  RBX: 0005 
RCX: dead0200
May 10 13:56:59 rit-pfile01 kernel: RDX: 92d7d7f6e7d0 RSI: 92d7d7f6e7d0 
RDI: 92d7d7f6e7c8
May 10 13:56:59 rit-pfile01 kernel: RBP: 92d7c5588970 R08: 92d7d7f6e7d0 
R09: 0001
May 10 13:56:59 rit-pfile01 kernel: R10: 92d80078cbb8 R11: 92c0 
R12: 0155
May 10 13:56:59 rit-pfile01 kernel: R13: 92d80078cbb8 R14: 92d80078cbc0 
R15: 0001
May 10 13:56:59 rit-pfile01 kernel: FS:  () 
GS:92d937d8() knlGS:
May 10 13:56:59 rit-pfile01 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 10 13:56:59 rit-pfile01 kernel: CR2: 7f74435b9008 CR3: 0001099fa000 
CR4: 003506e0
May 10 13:56:59 rit-pfile01 kernel: Call Trace:
May 10 13:56:59 rit-pfile01 kernel: ceph_handle_caps+0xdf2/0x1780 [ceph]
May 10 13:56:59 rit-pfile01 kernel: mds_dispatch+0x13a/0x670 [ceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_process_message+0x79/0x140 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: ? calc_signature+0xdf/0x110 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_workfn+0x329/0x680 [libceph]
May 10 13:56:59 rit-pfile01 kernel: process_one_work+0x1a7/0x360
May 10 13:56:59 rit-pfile01 kernel: worker_thread+0x30/0x390
May 10 13:56:59 rit-pfile01 kernel: ? create_worker+0x1a0/0x1a0
May 10 13:56:59 rit-pfile01 kernel: kthread+0x134/0x150
May 10 13:56:59 rit-pfile01 kernel: ? set_kthread_struct+0x50/0x50
May 10 13:56:59 rit-pfile01 kernel: ret_from_fork+0x35/0x40
May 10 13:56:59 rit-pfile01 kernel: ---[ end trace 84e4b3694bbe9fde ]---


BTW, did you enabled the async dirop ? Currently this is disabled by 
default in 4.18.0-486.el8.x86_64.


The async dirop is buggy and we have hit a very similar bug as above, 
please see https://tracker.ceph.com/issues/55857. This is a racy between 
the client requests and dir migrating in MDS and this has been fixed 
long time ago.


If you didn't enable the async dirop then it should be a different issue 
without the async dirop. But I guess this should also be a racy between 
client requests and dir migrating in non-async dirop case.


And from the kernel call trace, I just guess the MDS was doing dir 
spliting and migrating and the MDS daemon's crash possible caused by:


"dirfragtree.dump(f);"

Could you reproduce this by enabling the mds debug logs ?


Thanks

- Xiubo



I can't interpret it, some help is appreciated.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, May 10, 2023 3:36 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: mds dump inode crashes file system

For the "mds dump inode" command I could find the crash in the log; see below. 
Most of the log contents is the past OPS dump from the 3 MDS restarts that happened. It 
contains the 1 last OPS before the crash and I can upload the log if someone can use 
it. The crash stack trace somewhat truncated for readability:

2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
version 892464 from mon.4
2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 

[ceph-users] Re: Protected snapshot deleted when RBD image is deleted

2023-05-10 Thread Reto Gysi
I don't know about Octopus, but in Quincy there's a config option
rbd_move_to_trash_on_remove   Default: false

I've set this to true on my instance.
I can move a image with snapshots to treash, but I can't purge the image
from trash  without first deleting the snapshots.

Am Mi., 10. Mai 2023 um 23:39 Uhr schrieb Work Ceph <
work.ceph.user.mail...@gmail.com>:

> Awesome! Thanks.
> What is the default then for RBD images? Is it the default to delete them
> and not to use the trash? Or, do we need a configuration to make Ceph use
> the trash?
> We are using Ceph Octopus.
>
> On Wed, May 10, 2023 at 6:33 PM Reto Gysi  wrote:
>
>> Hi
>>
>> For me with ceph version 17.2.6 rbd doesn't allow me to (delete (I've
>> configured that delete only moves image to trash)/) purge an image that
>> still has snapshots. I need to first delete all the snapshots.
>>
>> from man page:
>>
>> rbd rm image-spec
>>  Delete an rbd image (including all data blocks). If the
>> image has snapshots, this fails and nothing is deleted.
>>
>> rbd snap purge image-spec
>>  Remove all unprotected snapshots from an image.
>>
>> rbd snap rm [--force] snap-spec
>>  Remove the specified snapshot.
>> rbd snap unprotect snap-spec
>>  Unprotect a snapshot from deletion (undo snap protect).  If
>> cloned children remain, snap unprotect fails.  (Note that clones may
>> exist in different pools than the parent snapshot.)
>>
>> Regards
>>
>> Reto
>>
>>
>> Am Mi., 10. Mai 2023 um 20:58 Uhr schrieb Work Ceph <
>> work.ceph.user.mail...@gmail.com>:
>>
>>> Hello guys,
>>> We have a doubt regarding snapshot management, when a protected snapshot
>>> is
>>> created, should it be deleted when its RBD image is removed from the
>>> system?
>>>
>>> If not, how can we list orphaned snapshots in a pool?
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Protected snapshot deleted when RBD image is deleted

2023-05-10 Thread Work Ceph
Awesome! Thanks.
What is the default then for RBD images? Is it the default to delete them
and not to use the trash? Or, do we need a configuration to make Ceph use
the trash?
We are using Ceph Octopus.

On Wed, May 10, 2023 at 6:33 PM Reto Gysi  wrote:

> Hi
>
> For me with ceph version 17.2.6 rbd doesn't allow me to (delete (I've
> configured that delete only moves image to trash)/) purge an image that
> still has snapshots. I need to first delete all the snapshots.
>
> from man page:
>
> rbd rm image-spec
>  Delete an rbd image (including all data blocks). If the
> image has snapshots, this fails and nothing is deleted.
>
> rbd snap purge image-spec
>  Remove all unprotected snapshots from an image.
>
> rbd snap rm [--force] snap-spec
>  Remove the specified snapshot.
> rbd snap unprotect snap-spec
>  Unprotect a snapshot from deletion (undo snap protect).  If
> cloned children remain, snap unprotect fails.  (Note that clones may
> exist in different pools than the parent snapshot.)
>
> Regards
>
> Reto
>
>
> Am Mi., 10. Mai 2023 um 20:58 Uhr schrieb Work Ceph <
> work.ceph.user.mail...@gmail.com>:
>
>> Hello guys,
>> We have a doubt regarding snapshot management, when a protected snapshot
>> is
>> created, should it be deleted when its RBD image is removed from the
>> system?
>>
>> If not, how can we list orphaned snapshots in a pool?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Protected snapshot deleted when RBD image is deleted

2023-05-10 Thread Reto Gysi
Hi

For me with ceph version 17.2.6 rbd doesn't allow me to (delete (I've
configured that delete only moves image to trash)/) purge an image that
still has snapshots. I need to first delete all the snapshots.

from man page:

rbd rm image-spec
 Delete an rbd image (including all data blocks). If the image
has snapshots, this fails and nothing is deleted.

rbd snap purge image-spec
 Remove all unprotected snapshots from an image.

rbd snap rm [--force] snap-spec
 Remove the specified snapshot.
rbd snap unprotect snap-spec
 Unprotect a snapshot from deletion (undo snap protect).  If
cloned children remain, snap unprotect fails.  (Note that clones may exist
in different pools than the parent snapshot.)

Regards

Reto


Am Mi., 10. Mai 2023 um 20:58 Uhr schrieb Work Ceph <
work.ceph.user.mail...@gmail.com>:

> Hello guys,
> We have a doubt regarding snapshot management, when a protected snapshot is
> created, should it be deleted when its RBD image is removed from the
> system?
>
> If not, how can we list orphaned snapshots in a pool?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-10 Thread Patrick Donnelly
Hi Janek,

All this indicates is that you have some files with binary keys  that
cannot be decoded as utf-8. Unfortunately, the rados python library
assumes that omap keys can be decoded this way. I have a ticket here:

https://tracker.ceph.com/issues/59716

I hope to have a fix soon.

On Thu, May 4, 2023 at 3:15 AM Janek Bevendorff
 wrote:
>
> After running the tool for 11 hours straight, it exited with the
> following exception:
>
> Traceback (most recent call last):
>File "/home/webis/first-damage.py", line 156, in 
>  traverse(f, ioctx)
>File "/home/webis/first-damage.py", line 84, in traverse
>  for (dnk, val) in it:
>File "rados.pyx", line 1389, in rados.OmapIterator.__next__
>File "rados.pyx", line 318, in rados.decode_cstr
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8:
> invalid start byte
>
> Does that mean that the last inode listed in the output file is corrupt?
> Any way I can fix it?
>
> The output file has 14 million lines. We have about 24.5 million objects
> in the metadata pool.
>
> Janek
>
>
> On 03/05/2023 14:20, Patrick Donnelly wrote:
> > On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
> >  wrote:
> >> Hi Patrick,
> >>
> >>> I'll try that tomorrow and let you know, thanks!
> >> I was unable to reproduce the crash today. Even with
> >> mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
> >> correctly (though they took forever to rejoin with logs set to 20).
> >>
> >> To me it looks like the issue has resolved itself overnight. I had run a
> >> recursive scrub on the file system and another snapshot was taken, in
> >> case any of those might have had an effect on this. It could also be the
> >> case that the (supposedly) corrupt journal entry has simply been
> >> committed now and hence doesn't trigger the assertion any more. Is there
> >> any way I can verify this?
> > You can run:
> >
> > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py
> >
> > Just do:
> >
> > python3 first-damage.py --memo run.1 
> >
> > No need to do any of the other steps if you just want a read-only check.
> >
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Protected snapshot deleted when RBD image is deleted

2023-05-10 Thread Work Ceph
Hello guys,
We have a doubt regarding snapshot management, when a protected snapshot is
created, should it be deleted when its RBD image is removed from the system?

If not, how can we list orphaned snapshots in a pool?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: docker restarting lost all managers accidentally

2023-05-10 Thread Adam King
in /var/lib/ceph// on the host with that mgr
reporting the error, there should be a unit.run file that shows what is
being done to start the mgr as well as a few files that get mounted into
the mgr on startup, notably the "config" and "keyring" files. That config
file should include the mon host addresses. E.g.

[root@vm-01 ~]# cat
/var/lib/ceph/5a72983c-ef57-11ed-a389-525400e42d74/mgr.vm-01.ilfvis/config
# minimal ceph.conf for 5a72983c-ef57-11ed-a389-525400e42d74
[global]
fsid = 5a72983c-ef57-11ed-a389-525400e42d74
mon_host = [v2:192.168.122.75:3300/0,v1:192.168.122.75:6789/0] [v2:
192.168.122.246:3300/0,v1:192.168.122.246:6789/0] [v2:
192.168.122.97:3300/0,v1:192.168.122.97:6789/0]

The first thing I'd do is probably make sure that array of addresses is
correct.

Then you could probably check the keyring file as well and see if it
matches up with what you get running "ceph auth get ".
E.g. here

[root@vm-01 ~]# cat
/var/lib/ceph/5a72983c-ef57-11ed-a389-525400e42d74/mgr.vm-01.ilfvis/keyring
[mgr.vm-01.ilfvis]
key = AQDf01tk7mn/IRAAvZ+ZhUgT77uZsFBSzLGPyQ==

the key matches with

[ceph: root@vm-00 /]# ceph auth get mgr.vm-01.ilfvis
[mgr.vm-01.ilfvis]
key = AQDf01tk7mn/IRAAvZ+ZhUgT77uZsFBSzLGPyQ==
caps mds = "allow *"
caps mon = "profile mgr"
caps osd = "allow *"

I wouldn't post them for obvious reasons (these are just on a test cluster
I'll tear back down so it's fine for me) but those are the first couple
things I'd check. You could also try to make adjustments directly to the
unit.run file if you have other things you'd like to try.

On Wed, May 10, 2023 at 11:09 AM Ben  wrote:

> Hi,
> This cluster is deployed by cephadm 17.2.5,containerized.
> It ends up in this(no active mgr):
> [root@8cd2c0657c77 /]# ceph -s
>   cluster:
> id: ad3a132e-e9ee-11ed-8a19-043f72fb8bf9
> health: HEALTH_WARN
> 6 hosts fail cephadm check
> no active mgr
> 1/3 mons down, quorum h18w,h19w
> Degraded data redundancy: 781908/2345724 objects degraded
> (33.333%), 101 pgs degraded, 209 pgs undersized
>
>   services:
> mon: 3 daemons, quorum h18w,h19w (age 19m), out of quorum: h15w
> mgr: no daemons active (since 5h)
> mds: 1/1 daemons up, 1 standby
> osd: 9 osds: 6 up (since 5h), 6 in (since 5h)
> rgw: 2 daemons active (2 hosts, 1 zones)
>
>   data:
> volumes: 1/1 healthy
> pools:   8 pools, 209 pgs
> objects: 781.91k objects, 152 GiB
> usage:   312 GiB used, 54 TiB / 55 TiB avail
> pgs: 781908/2345724 objects degraded (33.333%)
>  108 active+undersized
>  101 active+undersized+degraded
>
> I checked the h20w, there is a manager container running with log:
>
> debug 2023-05-10T12:43:23.315+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T12:48:23.318+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T12:53:23.318+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T12:58:23.319+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T13:03:23.319+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T13:08:23.319+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
> debug 2023-05-10T13:13:23.319+ 7f5e152ec000  0 monclient(hunting):
> authenticate timed out after 300
>
>
> any idea to get a mgr up running again through cephadm?
>
> Thanks,
> Ben
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
Hi Gregory,

using the more complicated rados way, I found the path. I assume you are 
referring to attribs I can read with getfattr. The output of a dump is:

# getfattr -d 
/mnt/cephfs/shares/rit-oil/Projects/CSP/Chalk/CSP1.A.03/99_Personal\ 
folders/Eugenio/Tests/Eclipse/19_imbLab/19_IMBLAB.EGRID
getfattr: Removing leading '/' from absolute path names
# file: mnt/cephfs/shares/rit-oil/Projects/CSP/Chalk/CSP1.A.03/99_Personal 
folders/Eugenio/Tests/Eclipse/19_imbLab/19_IMBLAB.EGRID
user.DOSATTRIB=0sAAAFAAURIIfMCneZfdkB
user.SAMBA_PAI=0sAgSEDwABgYYejLcxAAAC/wABgYYejLcxABAAlFExABABlFExABAALPgoABABLPgoABAADEUvABABDEUvABAAllExABABllExABAAE9AqABABE9AqAA==

#
An empty line is part of the output. These look all right to me. Can you tell 
me what I should look at? I will probably reply tomorrow, my time for today is 
almost up.

Thanks for your help and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Gregory Farnum 
Sent: Wednesday, May 10, 2023 4:37 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: mds dump inode crashes file system

On Wed, May 10, 2023 at 7:33 AM Frank Schilder  wrote:
>
> Hi Gregory,
>
> thanks for your reply. Yes, I forgot, I can also inspect the rados head 
> object. My bad.
>
> The empty xattr might come from a crash of the SAMBA daemon. We export to 
> windows and this uses xattrs extensively to map to windows ACLs. It might be 
> possible that a crash at an inconvenient moment left an object in this state. 
> Do you think this is possible? Would it be possible to repair that?

I'm still a little puzzled that it's possible for the system to get
into this state, so we probably will need to generate some bugfixes.
And it might just be the dump function is being naughty. But I would
start by looking at what xattrs exist and if there's an obvious bad
one, deleting it.
-Greg

>
> I will report back what I find with the low-level access. Need to head home 
> now ...
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Gregory Farnum 
> Sent: Wednesday, May 10, 2023 4:26 PM
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: mds dump inode crashes file system
>
> This is a very strange assert to be hitting. From a code skim my best
> guess is the inode somehow has an xattr with no value, but that's just
> a guess and I've no idea how it would happen.
> Somebody recently pointed you at the (more complicated) way of
> identifying an inode path by looking at its RADOS object and grabbing
> the backtrace, which ought to let you look at the file in-situ.
> -Greg
>
>
> On Wed, May 10, 2023 at 6:37 AM Frank Schilder  wrote:
> >
> > For the "mds dump inode" command I could find the crash in the log; see 
> > below. Most of the log contents is the past OPS dump from the 3 MDS 
> > restarts that happened. It contains the 1 last OPS before the crash and 
> > I can upload the log if someone can use it. The crash stack trace somewhat 
> > truncated for readability:
> >
> > 2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map 
> > to version 892464 from mon.4
> > 2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 
> > seconds ago
> > 2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map 
> > to version 892465 from mon.4
> > 2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 
> > seconds ago
> > 2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 
> > seconds ago
> > 2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
> > inode {number=2199322355147,prefix=dump inode} (starting...)
> > 2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
> >  In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
> > 7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
> >  501: FAILED ceph_assert(_raw)
> >
> >  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) oct

[ceph-users] Ceph Leadership Team Meeting Minutes - May 10, 2023

2023-05-10 Thread Ernesto Puerta
Hi Cephers,

These are the topics that we just covered in today's meeting:

   - *Issues recording our meetings in Jitsi (Mike Perez)*
  - David Orman suggested using a self-hosted Jitsi instance:
  https://jitsi.github.io/handbook/docs/devops-guide/. Tested on a
  single container, with 4 cores for 4-5 attendees a call.
  - Help needed from Dan Mick & Adam Kraitman.
  - Ilya: this might scale for smaller meetings (dailies), but not for
  the larger ones (CDM, User-Devs, etc.).
   - *Reef release status review*
  - Josh: CentOS 9 blocked by Dashboard/Cephadm Python deps missing in
  CentOS 9. Casey: It's blocking teuthology testing.
 - Already discussed in devs mailing list a couple of months ago,
 for Quincy + Centos 9 (the missing packages would be the same).
 - Ken Dreyer preferred to keep the legacy approach (distro
 packages) instead of embedding Python deps.
 - Casey, Matt & Ernesto to resume that discussion with Ken Dreyer.
  - Other issues:
 - Radek:
- Performance issue with RocksDB config.
- Mismatching client-server features during upgrade. msgr
encoder-decoder issues, introduced feature bit for Squid (

https://github.com/ceph/ceph/commit/1049d3e5eff0b7fa4fc9e5853494cb21c10b290a).
Performance concerns. To be further discussed at Mark's
Perf meeting (incl.
Yuval)
 - Paul Cuzner perf testing Reef: higher CPU usage in Reef vs
 Quincy (more time spent in RocksDB get calls)
  - Target GA remains this June. Missing CentOS 9 packages becoming a
  blocker issue for the release.
   - *Perf regression in Pacific vs Nautilus (David Orman)*
  - https://tracker.ceph.com/issues/58530
  - Mark: Missing change in upstream Rocksdb project.
  - David Orman: Is this degradation still happening in newer Rocksdb
  versions (reef)? Mark: No reason to think otherwise.

Kind Regards,
Ernesto
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] docker restarting lost all managers accidentally

2023-05-10 Thread Ben
Hi,
This cluster is deployed by cephadm 17.2.5,containerized.
It ends up in this(no active mgr):
[root@8cd2c0657c77 /]# ceph -s
  cluster:
id: ad3a132e-e9ee-11ed-8a19-043f72fb8bf9
health: HEALTH_WARN
6 hosts fail cephadm check
no active mgr
1/3 mons down, quorum h18w,h19w
Degraded data redundancy: 781908/2345724 objects degraded
(33.333%), 101 pgs degraded, 209 pgs undersized

  services:
mon: 3 daemons, quorum h18w,h19w (age 19m), out of quorum: h15w
mgr: no daemons active (since 5h)
mds: 1/1 daemons up, 1 standby
osd: 9 osds: 6 up (since 5h), 6 in (since 5h)
rgw: 2 daemons active (2 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   8 pools, 209 pgs
objects: 781.91k objects, 152 GiB
usage:   312 GiB used, 54 TiB / 55 TiB avail
pgs: 781908/2345724 objects degraded (33.333%)
 108 active+undersized
 101 active+undersized+degraded

I checked the h20w, there is a manager container running with log:

debug 2023-05-10T12:43:23.315+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:48:23.318+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:53:23.318+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:58:23.319+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:03:23.319+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:08:23.319+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:13:23.319+ 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300


any idea to get a mgr up running again through cephadm?

Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
On Wed, May 10, 2023 at 7:33 AM Frank Schilder  wrote:
>
> Hi Gregory,
>
> thanks for your reply. Yes, I forgot, I can also inspect the rados head 
> object. My bad.
>
> The empty xattr might come from a crash of the SAMBA daemon. We export to 
> windows and this uses xattrs extensively to map to windows ACLs. It might be 
> possible that a crash at an inconvenient moment left an object in this state. 
> Do you think this is possible? Would it be possible to repair that?

I'm still a little puzzled that it's possible for the system to get
into this state, so we probably will need to generate some bugfixes.
And it might just be the dump function is being naughty. But I would
start by looking at what xattrs exist and if there's an obvious bad
one, deleting it.
-Greg

>
> I will report back what I find with the low-level access. Need to head home 
> now ...
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Gregory Farnum 
> Sent: Wednesday, May 10, 2023 4:26 PM
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: mds dump inode crashes file system
>
> This is a very strange assert to be hitting. From a code skim my best
> guess is the inode somehow has an xattr with no value, but that's just
> a guess and I've no idea how it would happen.
> Somebody recently pointed you at the (more complicated) way of
> identifying an inode path by looking at its RADOS object and grabbing
> the backtrace, which ought to let you look at the file in-situ.
> -Greg
>
>
> On Wed, May 10, 2023 at 6:37 AM Frank Schilder  wrote:
> >
> > For the "mds dump inode" command I could find the crash in the log; see 
> > below. Most of the log contents is the past OPS dump from the 3 MDS 
> > restarts that happened. It contains the 1 last OPS before the crash and 
> > I can upload the log if someone can use it. The crash stack trace somewhat 
> > truncated for readability:
> >
> > 2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map 
> > to version 892464 from mon.4
> > 2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 
> > seconds ago
> > 2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map 
> > to version 892465 from mon.4
> > 2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 
> > seconds ago
> > 2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] 
> > : client.205899841 isn't responding to mclientcaps(revoke), ino 
> > 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 
> > seconds ago
> > 2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
> > inode {number=2199322355147,prefix=dump inode} (starting...)
> > 2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
> >  In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
> > 7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
> >  501: FAILED ceph_assert(_raw)
> >
> >  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> > (stable)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> > const*)+0x158) [0x7fe979ae9b92]
> >  2: (()+0x27ddac) [0x7fe979ae9dac]
> >  3: (()+0x5ce831) [0x7fe979e3a831]
> >  4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
> >  5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
> >  6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> > [0x55c08c41e00c]
> >  7: (MDSRank::command_dump_inode(ceph::Formatter*, ..., 
> > std::ostream&)+0xb5) [0x55c08c353e75]
> >  8: (MDSRankDispatcher::handle_asok_command(std::basic_string_view > std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> > [0x55c08c36c5f6]
> >  9: (MDSDaemon::asok_command(std::basic_string_view > ceph::buffer::v15_2_0::list&)>)+0x75b) [0x55c08c340eab]
> >  10: (MDSSocketHook::call_async(std::basic_string_view > std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
> > [0x55c08c34f9ca]
> >  11: 
> > (AdminSocket::execute_command(std::vector > std::char_traits, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
> > [0x7fe979bece59]
> >  12: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
> >  13

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
Hi Gregory,

thanks for your reply. Yes, I forgot, I can also inspect the rados head object. 
My bad.

The empty xattr might come from a crash of the SAMBA daemon. We export to 
windows and this uses xattrs extensively to map to windows ACLs. It might be 
possible that a crash at an inconvenient moment left an object in this state. 
Do you think this is possible? Would it be possible to repair that?

I will report back what I find with the low-level access. Need to head home now 
...

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Gregory Farnum 
Sent: Wednesday, May 10, 2023 4:26 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: mds dump inode crashes file system

This is a very strange assert to be hitting. From a code skim my best
guess is the inode somehow has an xattr with no value, but that's just
a guess and I've no idea how it would happen.
Somebody recently pointed you at the (more complicated) way of
identifying an inode path by looking at its RADOS object and grabbing
the backtrace, which ought to let you look at the file in-situ.
-Greg


On Wed, May 10, 2023 at 6:37 AM Frank Schilder  wrote:
>
> For the "mds dump inode" command I could find the crash in the log; see 
> below. Most of the log contents is the past OPS dump from the 3 MDS restarts 
> that happened. It contains the 1 last OPS before the crash and I can 
> upload the log if someone can use it. The crash stack trace somewhat 
> truncated for readability:
>
> 2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892464 from mon.4
> 2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
> 2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892465 from mon.4
> 2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 seconds ago
> 2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 seconds ago
> 2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
> inode {number=2199322355147,prefix=dump inode} (starting...)
> 2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
> 7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  501: FAILED ceph_assert(_raw)
>
>  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x158) [0x7fe979ae9b92]
>  2: (()+0x27ddac) [0x7fe979ae9dac]
>  3: (()+0x5ce831) [0x7fe979e3a831]
>  4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
>  5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
>  6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> [0x55c08c41e00c]
>  7: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
> [0x55c08c353e75]
>  8: (MDSRankDispatcher::handle_asok_command(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> [0x55c08c36c5f6]
>  9: (MDSDaemon::asok_command(std::basic_string_view ceph::buffer::v15_2_0::list&)>)+0x75b) [0x55c08c340eab]
>  10: (MDSSocketHook::call_async(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
> [0x55c08c34f9ca]
>  11: 
> (AdminSocket::execute_command(std::vector std::char_traits, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
> [0x7fe979bece59]
>  12: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
>  13: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
>  14: (()+0xc2ba3) [0x7fe977afaba3]
>  15: (()+0x81ca) [0x7fe9786bf1ca]
>  16: (clone()+0x43) [0x7fe977111dd3]
>
> 2023-05-10T13:44:00.522+0200 7fe972ca8700 -1 *** Caught signal (Aborted) **
>  in thread 7fe972ca8700 thread_name:admin_socket
>
>  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>  1: (()+0x12ce0) [0x7fe9786c9ce0]
>  2: (gsignal()+0x10f) [0x7fe977126a9f]
>  3: (abort()+0x127) [0x7fe9770f9e05]
>  4: (ceph::__ceph_assert_fail(char const*, char 

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
This is a very strange assert to be hitting. From a code skim my best
guess is the inode somehow has an xattr with no value, but that's just
a guess and I've no idea how it would happen.
Somebody recently pointed you at the (more complicated) way of
identifying an inode path by looking at its RADOS object and grabbing
the backtrace, which ought to let you look at the file in-situ.
-Greg


On Wed, May 10, 2023 at 6:37 AM Frank Schilder  wrote:
>
> For the "mds dump inode" command I could find the crash in the log; see 
> below. Most of the log contents is the past OPS dump from the 3 MDS restarts 
> that happened. It contains the 1 last OPS before the crash and I can 
> upload the log if someone can use it. The crash stack trace somewhat 
> truncated for readability:
>
> 2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892464 from mon.4
> 2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
> 2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892465 from mon.4
> 2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 seconds ago
> 2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 seconds ago
> 2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
> inode {number=2199322355147,prefix=dump inode} (starting...)
> 2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
> 7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  501: FAILED ceph_assert(_raw)
>
>  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x158) [0x7fe979ae9b92]
>  2: (()+0x27ddac) [0x7fe979ae9dac]
>  3: (()+0x5ce831) [0x7fe979e3a831]
>  4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
>  5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
>  6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> [0x55c08c41e00c]
>  7: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
> [0x55c08c353e75]
>  8: (MDSRankDispatcher::handle_asok_command(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> [0x55c08c36c5f6]
>  9: (MDSDaemon::asok_command(std::basic_string_view ceph::buffer::v15_2_0::list&)>)+0x75b) [0x55c08c340eab]
>  10: (MDSSocketHook::call_async(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
> [0x55c08c34f9ca]
>  11: 
> (AdminSocket::execute_command(std::vector std::char_traits, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
> [0x7fe979bece59]
>  12: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
>  13: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
>  14: (()+0xc2ba3) [0x7fe977afaba3]
>  15: (()+0x81ca) [0x7fe9786bf1ca]
>  16: (clone()+0x43) [0x7fe977111dd3]
>
> 2023-05-10T13:44:00.522+0200 7fe972ca8700 -1 *** Caught signal (Aborted) **
>  in thread 7fe972ca8700 thread_name:admin_socket
>
>  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>  1: (()+0x12ce0) [0x7fe9786c9ce0]
>  2: (gsignal()+0x10f) [0x7fe977126a9f]
>  3: (abort()+0x127) [0x7fe9770f9e05]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x1a9) [0x7fe979ae9be3]
>  5: (()+0x27ddac) [0x7fe979ae9dac]
>  6: (()+0x5ce831) [0x7fe979e3a831]
>  7: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
>  8: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
>  9: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> [0x55c08c41e00c]
>  10: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
> [0x55c08c353e75]
>  11: (MDSRankDispatcher::handle_asok_command(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> [0x55c08c36c5f6]
>  12: (MDSDaemon::asok_command(std::basic_string_view std::char_traits >, ..., ceph::buffer::v15_2_0::list&)>)+0x75b) 
> [0x55c08c340eab]
>  13: (MDSSocketHook::call_async(std::

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
Now I tested if an MDS fail-over to a stand-by changes anything. Unfortunately, 
it doesn't. The MDS ceph-23 failed over to ceph-10 and on this new MDS I 
observe the same crash/cache-corruption after fail-over was completed:

# ceph tell "mds.ceph-10" dump inode 0x20011d3e5cb
2023-05-10T16:04:09.861+0200 7f5d0affd700  0 client.210152749 ms_handle_reset 
on v2:192.168.32.74:6800/3586644765
^C

MDS ceph-10 restarts at this point and goes into a restart loop until the 
tell-command gets interrupted. To check if its only this MDS and only this 
specific inode:

# ceph tell "mds.ceph-14" dump inode 0x001
{
"path": "/",
...

# ceph tell "mds.ceph-10" dump inode 0x001
{
"path": "/",

I'm afraid this might be a file system corruption. Please let me know what the 
best next steps should be. I'm right now not confident with starting an fs 
scrub on the MDS that is serving this share (its pinned to a rank). I'm afraid 
that I end up in a crash loop. Is it possible to pick a stand-by to do the 
scrubbing?

Many thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, May 10, 2023 3:44 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: mds dump inode crashes file system

The kernel message that shows up on boot on the file server in text format:

May 10 13:56:59 rit-pfile01 kernel: WARNING: CPU: 3 PID: 34 at 
fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Modules linked in: ceph libceph 
dns_resolver nls_utf8 isofs cirrus drm_shmem_helper intel_rapl_msr iTCO_wdt 
intel_rapl_common iTCO_vendor_support drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops pcspkr joydev virtio_net drm i2c_i801 net_failover 
virtio_balloon failover lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc 
sr_mod cdrom sg xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci 
libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console 
virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
May 10 13:56:59 rit-pfile01 kernel: CPU: 3 PID: 34 Comm: kworker/3:0 Not 
tainted 4.18.0-486.el8.x86_64 #1
May 10 13:56:59 rit-pfile01 kernel: Hardware name: Red Hat KVM/RHEL-AV, BIOS 
1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
May 10 13:56:59 rit-pfile01 kernel: Workqueue: ceph-msgr ceph_con_workfn 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Code: c0 48 c7 c7 c0 69 7f c0 e8 6c 4c 72 
c3 0f 0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 
c9 fc ff ff <0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 00
May 10 13:56:59 rit-pfile01 kernel: RSP: 0018:a4d000d87b48 EFLAGS: 00010217
May 10 13:56:59 rit-pfile01 kernel: RAX:  RBX: 0005 
RCX: dead0200
May 10 13:56:59 rit-pfile01 kernel: RDX: 92d7d7f6e7d0 RSI: 92d7d7f6e7d0 
RDI: 92d7d7f6e7c8
May 10 13:56:59 rit-pfile01 kernel: RBP: 92d7c5588970 R08: 92d7d7f6e7d0 
R09: 0001
May 10 13:56:59 rit-pfile01 kernel: R10: 92d80078cbb8 R11: 92c0 
R12: 0155
May 10 13:56:59 rit-pfile01 kernel: R13: 92d80078cbb8 R14: 92d80078cbc0 
R15: 0001
May 10 13:56:59 rit-pfile01 kernel: FS:  () 
GS:92d937d8() knlGS:
May 10 13:56:59 rit-pfile01 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 10 13:56:59 rit-pfile01 kernel: CR2: 7f74435b9008 CR3: 0001099fa000 
CR4: 003506e0
May 10 13:56:59 rit-pfile01 kernel: Call Trace:
May 10 13:56:59 rit-pfile01 kernel: ceph_handle_caps+0xdf2/0x1780 [ceph]
May 10 13:56:59 rit-pfile01 kernel: mds_dispatch+0x13a/0x670 [ceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_process_message+0x79/0x140 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: ? calc_signature+0xdf/0x110 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_workfn+0x329/0x680 [libceph]
May 10 13:56:59 rit-pfile01 kernel: process_one_work+0x1a7/0x360
May 10 13:56:59 rit-pfile01 kernel: worker_thread+0x30/0x390
May 10 13:56:59 rit-pfile01 kernel: ? create_worker+0x1a0/0x1a0
May 10 13:56:59 rit-pfile01 kernel: kthread+0x134/0x150
May 10 13:56:59 rit-pfile01 kernel: ? set_kthread_struct+0x50/0x50
May 10 13:56:59 rit-pfile01 kernel: ret_from_fork+0x35/0x40
May 10 13:56:59 rit-pfile01 kernel: ---[ end trace 84e4b3694bbe9fde ]---

I can't interpret it, some help is appreciated.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, May 10, 2023 3:36 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: mds dump inode crashes file system

For the "mds dump inode" command I could find the crash in the log; see below. 
Most of the log contents

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
The kernel message that shows up on boot on the file server in text format:

May 10 13:56:59 rit-pfile01 kernel: WARNING: CPU: 3 PID: 34 at 
fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Modules linked in: ceph libceph 
dns_resolver nls_utf8 isofs cirrus drm_shmem_helper intel_rapl_msr iTCO_wdt 
intel_rapl_common iTCO_vendor_support drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops pcspkr joydev virtio_net drm i2c_i801 net_failover 
virtio_balloon failover lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc 
sr_mod cdrom sg xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci 
libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console 
virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
May 10 13:56:59 rit-pfile01 kernel: CPU: 3 PID: 34 Comm: kworker/3:0 Not 
tainted 4.18.0-486.el8.x86_64 #1
May 10 13:56:59 rit-pfile01 kernel: Hardware name: Red Hat KVM/RHEL-AV, BIOS 
1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
May 10 13:56:59 rit-pfile01 kernel: Workqueue: ceph-msgr ceph_con_workfn 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
May 10 13:56:59 rit-pfile01 kernel: Code: c0 48 c7 c7 c0 69 7f c0 e8 6c 4c 72 
c3 0f 0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 
c9 fc ff ff <0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 00
May 10 13:56:59 rit-pfile01 kernel: RSP: 0018:a4d000d87b48 EFLAGS: 00010217
May 10 13:56:59 rit-pfile01 kernel: RAX:  RBX: 0005 
RCX: dead0200
May 10 13:56:59 rit-pfile01 kernel: RDX: 92d7d7f6e7d0 RSI: 92d7d7f6e7d0 
RDI: 92d7d7f6e7c8
May 10 13:56:59 rit-pfile01 kernel: RBP: 92d7c5588970 R08: 92d7d7f6e7d0 
R09: 0001
May 10 13:56:59 rit-pfile01 kernel: R10: 92d80078cbb8 R11: 92c0 
R12: 0155
May 10 13:56:59 rit-pfile01 kernel: R13: 92d80078cbb8 R14: 92d80078cbc0 
R15: 0001
May 10 13:56:59 rit-pfile01 kernel: FS:  () 
GS:92d937d8() knlGS:
May 10 13:56:59 rit-pfile01 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 10 13:56:59 rit-pfile01 kernel: CR2: 7f74435b9008 CR3: 0001099fa000 
CR4: 003506e0
May 10 13:56:59 rit-pfile01 kernel: Call Trace:
May 10 13:56:59 rit-pfile01 kernel: ceph_handle_caps+0xdf2/0x1780 [ceph]
May 10 13:56:59 rit-pfile01 kernel: mds_dispatch+0x13a/0x670 [ceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_process_message+0x79/0x140 
[libceph]
May 10 13:56:59 rit-pfile01 kernel: ? calc_signature+0xdf/0x110 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
May 10 13:56:59 rit-pfile01 kernel: ceph_con_workfn+0x329/0x680 [libceph]
May 10 13:56:59 rit-pfile01 kernel: process_one_work+0x1a7/0x360
May 10 13:56:59 rit-pfile01 kernel: worker_thread+0x30/0x390
May 10 13:56:59 rit-pfile01 kernel: ? create_worker+0x1a0/0x1a0
May 10 13:56:59 rit-pfile01 kernel: kthread+0x134/0x150
May 10 13:56:59 rit-pfile01 kernel: ? set_kthread_struct+0x50/0x50
May 10 13:56:59 rit-pfile01 kernel: ret_from_fork+0x35/0x40
May 10 13:56:59 rit-pfile01 kernel: ---[ end trace 84e4b3694bbe9fde ]---

I can't interpret it, some help is appreciated.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, May 10, 2023 3:36 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: mds dump inode crashes file system

For the "mds dump inode" command I could find the crash in the log; see below. 
Most of the log contents is the past OPS dump from the 3 MDS restarts that 
happened. It contains the 1 last OPS before the crash and I can upload the 
log if someone can use it. The crash stack trace somewhat truncated for 
readability:

2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
version 892464 from mon.4
2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
version 892465 from mon.4
2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 seconds ago
2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 seconds ago
2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
inode {number=2199322355147,prefix=dump inode} (starting...)
2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
/home/jenkins-build/build/workspace/ceph

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
For the "mds dump inode" command I could find the crash in the log; see below. 
Most of the log contents is the past OPS dump from the 3 MDS restarts that 
happened. It contains the 1 last OPS before the crash and I can upload the 
log if someone can use it. The crash stack trace somewhat truncated for 
readability:

2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
version 892464 from mon.4
2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
version 892465 from mon.4
2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 seconds ago
2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 seconds ago
2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
inode {number=2199322355147,prefix=dump inode} (starting...)
2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
 In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
 501: FAILED ceph_assert(_raw)

 ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x158) [0x7fe979ae9b92]
 2: (()+0x27ddac) [0x7fe979ae9dac]
 3: (()+0x5ce831) [0x7fe979e3a831]
 4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
 5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
 6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) [0x55c08c41e00c]
 7: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
[0x55c08c353e75]
 8: (MDSRankDispatcher::handle_asok_command(std::basic_string_view >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
[0x55c08c36c5f6]
 9: (MDSDaemon::asok_command(std::basic_string_view)+0x75b) [0x55c08c340eab]
 10: (MDSSocketHook::call_async(std::basic_string_view >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
[0x55c08c34f9ca]
 11: (AdminSocket::execute_command(std::vector, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
[0x7fe979bece59]
 12: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
 13: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
 14: (()+0xc2ba3) [0x7fe977afaba3]
 15: (()+0x81ca) [0x7fe9786bf1ca]
 16: (clone()+0x43) [0x7fe977111dd3]

2023-05-10T13:44:00.522+0200 7fe972ca8700 -1 *** Caught signal (Aborted) **
 in thread 7fe972ca8700 thread_name:admin_socket

 ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
(stable)
 1: (()+0x12ce0) [0x7fe9786c9ce0]
 2: (gsignal()+0x10f) [0x7fe977126a9f]
 3: (abort()+0x127) [0x7fe9770f9e05]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1a9) [0x7fe979ae9be3]
 5: (()+0x27ddac) [0x7fe979ae9dac]
 6: (()+0x5ce831) [0x7fe979e3a831]
 7: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
 8: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
 9: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) [0x55c08c41e00c]
 10: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
[0x55c08c353e75]
 11: (MDSRankDispatcher::handle_asok_command(std::basic_string_view >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
[0x55c08c36c5f6]
 12: (MDSDaemon::asok_command(std::basic_string_view >, ..., ceph::buffer::v15_2_0::list&)>)+0x75b) 
[0x55c08c340eab]
 13: (MDSSocketHook::call_async(std::basic_string_view >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
[0x55c08c34f9ca]
 14: (AdminSocket::execute_command(std::vector, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
[0x7fe979bece59]
 15: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
 16: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
 17: (()+0xc2ba3) [0x7fe977afaba3]
 18: (()+0x81ca) [0x7fe9786bf1ca]
 19: (clone()+0x43) [0x7fe977111dd3]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, May 10, 2023 2:33 PM
To: ceph-users@ceph.io
Subject: [ceph-use

[ceph-users] mds dump inode crashes file system

2023-05-10 Thread Frank Schilder
Hi all,

I have an annoying problem with a specific ceph fs client. We have a file 
server on which we re-export kernel mounts via samba (all mounts with noshare 
option). On one of these re-exports we have recurring problems. Today I caught 
it with

2023-05-10T13:39:50.963685+0200 mds.ceph-23 (mds.1) 1761 : cluster [WRN] 
client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago

and I wanted to look up what path the inode 0x20011d3e5cb points to. 
Unfortunately, the command

ceph tell "mds.*" dump inode 0x20011d3e5cb

crashes an MDS in a way that it restarts itself, but doesn't seem to come back 
clean (it does not fail over to a stand-by). If I repeat the command above, it 
crashes the MDS again. Execution on other MDS daemons succeeds, for example:

# ceph tell "mds.ceph-24" dump inode 0x20011d3e5cb
2023-05-10T14:14:37.091+0200 7fa47700  0 client.210149523 ms_handle_reset 
on v2:192.168.32.88:6800/3216233914
2023-05-10T14:14:37.124+0200 7fa4857fa700  0 client.210374440 ms_handle_reset 
on v2:192.168.32.88:6800/3216233914
dump inode failed, wrong inode number or the inode is not cached

The caps recall gets the client evicted at some point but it doesn't manage to 
come back clean. On a single ceph fs mount point I see this

# ls /shares/samba/rit-oil
ls: cannot access '/shares/samba/rit-oil': Stale file handle

All other mount points are fine, just this one acts up. A "mount -o remount 
/shares/samba/rit-oil" crashed the entire server and I had to do a cold reboot. 
On reboot I see this message: https://imgur.com/a/bOSLxBb , which only occurs 
on this one file server (we are running a few of those). Does this point to a 
more serious problem, like a file system corruption? Should I try an fs scrub 
on the corresponding path?

Some info about the system:

The file server's kernel version is quite recent, updated two weeks ago:

$ uname -r
4.18.0-486.el8.x86_64
# cat /etc/redhat-release 
CentOS Stream release 8

Our ceph cluster is octopus latest and we use the packages from the octopus el8 
repo on this server.

We have several such shares and they all work fine. It is only on one share 
where we have persistent problems with the mount point hanging or the server 
freezing and crashing.

After working hours I will try a proper fail of the "broken" MDS to see if I 
can execute the dump inode command without it crashing everything.

In the mean time, any hints would be appreciated. I see that we have an 
exceptionally large MDS log for the problematic one. Any hint what to look for 
would be appreciated, it contains a lot from the recovery operations:

# pdsh -w ceph-[08-17,23-24] ls -lh "/var/log/ceph/ceph-mds.ceph-??.log"

ceph-23: -rw-r--r--. 1 ceph ceph 15M May 10 14:28 
/var/log/ceph/ceph-mds.ceph-23.log *** huge ***

ceph-24: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-24.log
ceph-10: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
/var/log/ceph/ceph-mds.ceph-10.log
ceph-13: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
/var/log/ceph/ceph-mds.ceph-13.log
ceph-08: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
/var/log/ceph/ceph-mds.ceph-08.log
ceph-15: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-15.log
ceph-17: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-17.log
ceph-14: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-14.log
ceph-09: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-09.log
ceph-16: -rw-r--r--. 1 ceph ceph 15K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-16.log
ceph-11: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
/var/log/ceph/ceph-mds.ceph-11.log
ceph-12: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
/var/log/ceph/ceph-mds.ceph-12.log

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-05-10 Thread mhnx
I'm talking about bluestore db+wal caching. It's good to know cache
tier is deprecated now, I should check why.

It's not possible because I don't have enough slots on servers. I'm
considering buying nvme in pci form.
Now I'm trying to speed up the rep 2 pool for the file size between
10K-700K millions of small files.
With compression the write speed is %5 reduced but the delete speed is
%30 increased.
Do you have any tuning advice for me?

Best regards,

Frank Schilder , 9 May 2023 Sal, 11:02 tarihinde şunu yazdı:
>
> When you say cache device, do you mean a ceph cache pool as a tier to a rep-2 
> pool? If so, you might want to reconsider, cache pools are deprecated and 
> will be removed from ceph at some point.
>
> If you have funds to buy new drives, you can just as well deploy a beegfs (or 
> something else) on these. It is no problem to run ceph and beegfs on the same 
> hosts. The disks should not be shared, but that's all. This might still be a 
> simpler config than introducing a cache tier just to cover up for rep-2 
> overhead.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: mhnx 
> Sent: Friday, May 5, 2023 9:26 PM
> To: Frank Schilder
> Cc: Janne Johansson; Ceph Users
> Subject: Re: [ceph-users] Re: How can I use not-replicated pool (replication 
> 1 or raid-0)
>
> Hello Frank.
>
> >If your only tool is a hammer ...
> >Sometimes its worth looking around.
>
> You are absolutely right! But I have limitations because my customer
> is a startup and they want to create a hybrid system with current
> hardware for all their needs. That's why I'm spending time to find a
> work around. They are using cephfs on their Software and I moved them
> on this path from NFS. At the beginning they were only looking for a
> rep2 pool for their important data and Ceph was an absolutely great
> idea. Now the system is running smoothly but they also want to move
> the [garbage data] on the same system but as I told you, the data flow
> is different and the current hardware (non plp sata ssd's without
> bluestore cache) can not supply the required speed with replication 2.
> They are happy with replication 1 speed but I'm not because when any
> network, disk, or node goes down, the cluster will be suspended due to
> rep1.
>
> Now I advised at least adding low latency PCI-Nvme's as a cache device
> to force rep2 pool. I will solve the Write latency with PLP low
> latency nvme's but still I need to solve deletion speed too. Actually
> with the random write-delete I was trying to tell the difference on
> delete speed. You are right, /dev/random requires cpu power and it
> will create latency and it should not used for write speed tests.
>
> Currently I'm working on development of an automation script to fix
> any problem for replication 1 pool.
> It is what it is.
>
> Best regards.
>
>
>
>
> Frank Schilder , 3 May 2023 Çar, 11:50 tarihinde şunu yazdı:
>
>
> >
> > Hi mhnx.
> >
> > > I also agree with you, Ceph is not designed for this kind of use case
> > > but I tried to continue what I know.
> > If your only tool is a hammer ...
> > Sometimes its worth looking around.
> >
> > While your tests show that a rep-1 pool is faster than a rep-2 pool, the 
> > values are not exactly impressive. There are 2 things that are relevant 
> > here: ceph is a high latency system, its software stack is quite 
> > heavy-weight. Even for a rep-1 pool its doing a lot to ensure data 
> > integrity. BeeGFS is a lightweight low-latency system skipping a lot of 
> > magic, which makes it very suited for performance critical tasks but less 
> > for long-term archival applications.
> >
> > The second is that the device /dev/urandom is actually very slow (and even 
> > unpredictable on some systems, it might wait for more entropy to be 
> > created). Your times are almost certainly affected by that. If you want to 
> > have comparable and close to native storage performance, create the files 
> > you want to write to storage first in RAM and then copy from RAM to 
> > storage. Using random data is a good idea to bypass potential built-in 
> > accelerations for special data, like all-zeros. However, exclude the random 
> > number generator from the benchmark and generate the data first before 
> > timing its use.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: mhnx 
> > Sent: Tuesday, May 2, 2023 5:25 PM
> > To: Frank Schilder
> > Cc: Janne Johansson; Ceph Users
> > Subject: Re: [ceph-users] Re: How can I use not-replicated pool 
> > (replication 1 or raid-0)
> >
> > Thank you for the explanation Frank.
> >
> > I also agree with you, Ceph is not designed for this kind of use case
> > but I tried to continue what I know.
> > My idea was exactly what you described, I was trying to automate
> > cleaning or recreating on any failure.

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-10 Thread Frank Schilder
Hi Xiubo.

> IMO evicting the corresponding client could also resolve this issue
> instead of restarting the MDS.

Yes, it can get rid of the stuck caps release request, but it will also make 
any process accessing the file system crash. After a client eviction we usually 
have to reboot the server to get everything back clean. An MDS restart would 
achieve this in a transparent way and when replaying the journal execute the 
pending caps recall successfully without making processes crash - if there 
wasn't the wrong peer issue.

As far as I can tell, the operation is stuck in the MDS because its never 
re-scheduled/re-tried or checked if the condition still exists (the client 
still holds the caps requested). An MDS restart re-schedules all pending 
operations and then it succeeds. In every ceph version so far there were 
examples where hand-shaking between a client and an MDS had small flaws. For 
situations like that I would really like to have a light-weight MDS daemon 
command to force a re-schedule/re-play without having to restart the entire MDS 
and reconnect all its clients from scratch.

It would be great to have light-weight tools available to rectify such simple 
conditions in an as non-disruptive as possible way.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Wednesday, May 10, 2023 4:01 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] client isn't responding to mclientcaps(revoke), 
pending pAsLsXsFsc issued pAsLsXsFsc


On 5/9/23 16:23, Frank Schilder wrote:
> Dear Xiubo,
>
> both issues will cause problems, the one reported in the subject 
> (https://tracker.ceph.com/issues/57244) and the potential follow-up on MDS 
> restart 
> (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LYY7TBK63XPR6X6TD7372I2YEPJO2L6F).
>  Either one will cause compute jobs on our HPC cluster to hang and users will 
> need to run the jobs again. Our queues are full, so not very popular to loose 
> your spot.
>
> The process in D-state is a user process. Interestingly it is often possible 
> to kill it despite the D-state (if one can find the process) and the stuck 
> recall gets resolved. If I restart the MDS, the stuck process might continue 
> working, but we run a significant risk of other processed getting stuck due 
> to the libceph/MDS wrong peer issue. We actually have these kind of messages
>
> [Mon Mar  6 12:56:46 2023] libceph: mds1 192.168.32.87:6801 wrong peer at 
> address
> [Mon Mar  6 13:05:18 2023] libceph: wrong peer, want 
> 192.168.32.87:6801/-223958753, got
> 192.168.32.87:6801/-1572619386
>
> all over the HPC cluster and each of them means that some files/dirs are 
> inaccessible on the compute node and jobs either died or are/got stuck there. 
> Every MDS restart bears the risk of such events happening and with many nodes 
> this probability approaches 1 - every time we restart an MDS jobs get stuck.
>
> I have a reproducer for an instance of https://tracker.ceph.com/issues/57244. 
> Unfortunately, this is a big one that I would need to pack into a container. 
> I was not able to reduce it to something small, it seems to depend on a very 
> specific combination of codes with certain internal latencies between threads 
> that trigger a race.
>
> It sounds like you have a patch for https://tracker.ceph.com/issues/57244 
> although its not linked from the tracker item.

IMO evicting the corresponding client could also resolve this issue
instead of restarting the MDS.

Have you tried this ?

Thanks

- Xiubo

>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Xiubo Li 
> Sent: Friday, May 5, 2023 2:40 AM
> To: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] client isn't responding to mclientcaps(revoke), 
> pending pAsLsXsFsc issued pAsLsXsFsc
>
>
> On 5/1/23 17:35, Frank Schilder wrote:
>> Hi all,
>>
>> I think we might be hitting a known problem 
>> (https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, 
>> because we have troubles with older kclients that miss the mds restart and 
>> hold on to cache entries referring to the killed instance, leading to 
>> hanging jobs on our HPC cluster.
> Will this cause any issue in your case ?
>
>> I have seen this issue before and there was a process in D-state that 
>> dead-locked itself. Usually, killing this process succeeded and resolved the 
>> issue. However, this time I can't find such a process.
> BTW, what's the D-state process ? A ceph one ?
>
> Thanks
>
>> The tracker mentions that one can delete the file/folder. I have the inode 
>> number, but really don't want to start a find on a 1.5PB file system. Is 
>> there a better way to find what path is causing the issue (ask the MDS 
>> directly, look at a cache dump, or similar)? Is there an alternative to 
>> deletion or MDS fail?
>>

[ceph-users] Re: 17.2.6 Dashboard/RGW Signature Mismatch

2023-05-10 Thread Ondřej Kukla
Hello,

I have found out that the issue seems to be in this change - 
https://github.com/ceph/ceph/pull/47207

When I’ve commented out the change and replaced it with the previous value the 
dashboard works as expected.

Ondrej
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-10 Thread Eugen Block
Adding a second host worked as well after adding the ceph.pub key to  
the authorized_keys of the "deployer" user.


Zitat von Eugen Block :

I used the default to create a new user, so umask is 022. And the  
/tmp/var/lib/ceph directory belongs to the root user. I haven't  
tried to add another host yet, I understood that in your case it  
already failed during the initial boostrap, but I can try to add one  
more host.


Zitat von Ben :


Curiously what is the umask and directory permission in your case?  add a
host for the cluster for further try?

Eugen Block  于2023年5月9日周二 14:59写道:


Hi,

I just retried without the single-host option and it worked. Also
everything under /tmp/var belongs to root in my case. Unfortunately, I
can't use the curl-based cephadm but the contents are identical, I
compared. Not sure what it could be at the moment.

Zitat von Ben :


Hi, It is uos v20(with kernel 4.19), one linux distribution among others.
no matter since cephadm deploys things in containers by default. cephadm

is

pulled by curl from Quincy branch of github.

I think you could see some sort of errors if you remove parameter
--single-host-defaults.

More investigation shows it looks like a bug with cephadm.
during the deploying procedure


,/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new

is created through sudo ssh session remotely(with owner of root) and
/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/ is changed to

owner

of ssh user deployer. The correct thing to do instead is,  /tmp/var/ be
changed to the owner deployer recursively so that following scp can have
access permission.
I will see if having time to wire up a PR to fix it.

Thanks for help on this.
Ben


Eugen Block  于2023年5月8日周一 21:01写道:


Hi,

could you provide some more details about your host OS? Which cephadm
version is it? I was able to bootstrap a one-node cluster with both
17.2.5 and 17.2.6 with a non-root user with no such error on openSUSE
Leap 15.4:

quincy:~ # rpm -qa | grep cephadm
cephadm-17.2.6.248+gad656d572cb-lp154.2.1.noarch

deployer@quincy:~> sudo cephadm --image quay.io/ceph/ceph:v17.2.5
bootstrap --mon-ip 172.17.2.3 --skip-monitoring-stack --ssh-user
deployer --single-host-defaults
Verifying ssh connectivity ...
Adding key to deployer@localhost authorized_keys...
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.4.4 is present
[...]
Ceph version: ceph version 17.2.5
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
[...]
Adding key to deployer@localhost authorized_keys...
Adding host quincy...
Deploying mon service with default placement...
Deploying mgr service with default placement...
[...]
Bootstrap complete.

Zitat von Ben :

> Hi,
>
> with following command:
>
> sudo cephadm  --docker bootstrap --mon-ip 10.1.32.33
--skip-monitoring-stack
>   --ssh-user deployer
> the user deployer has passwordless sudo configuration.
> I can see the error below:
>
> debug 2023-05-04T12:46:43.268+ 7fc5ddc2e700  0 [cephadm ERROR
> cephadm.ssh] Unable to write
>


szhyf-xx1d002-hx15w:/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e:

> scp:
>


/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new:

> Permission denied
>
> Traceback (most recent call last):
>
>   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 222, in
_write_remote_file
>
> await asyncssh.scp(f.name, (conn, tmp_path))
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in

scp

>
> await source.run(srcpath)
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in

run

>
> self.handle_error(exc)
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> handle_error
>
> raise exc from None
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in

run

>
> await self._send_files(path, b'')
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
> _send_files
>
> self.handle_error(exc)
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> handle_error
>
> raise exc from None
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
> _send_files
>
> await self._send_file(srcpath, dstpath, attrs)
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
> _send_file
>
> await self._make_cd_request(b'C', attrs, size, srcpath)
>
>   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
> _make_cd_re

[ceph-users] Re: Lua scripting in the rados gateway

2023-05-10 Thread Yuval Lifshitz
thanks Thomas!
opened this tracker: https://tracker.ceph.com/issues/59697 should cover the
missing dependencies for luarocks on the centos8 container (feel free to
add anything missing there...).
still trying to figure out the lib64 issue you found.
regarding the "script put" issue - I will add that to the lua documentation
page.

On Tue, May 9, 2023 at 11:09 PM Thomas Bennett  wrote:

> Hi Yuval,
>
> Just a follow up on this.
>
> An issue I’ve just resolved is getting scripts into the cephadm shell. As
> it turns out - I didn’t know this be it seems the host file system is
> mounted into the cephadm shell at /rootfs/.
>
> So I've been editing a /tmp/preRequest.lua on my host and then running:
>
> cephadm shell radosgw-admin script put --infile=/rootfs/tmp/preRequest.lua
> --context=preRequest
>
> This injects the lua script into the pre request context.
>
> Cheers,
> Tom
>
> On Fri, 28 Apr 2023 at 15:19, Thomas Bennett  wrote:
>
>> Hey Yuval,
>>
>> No problem. It was interesting to me to figure out how it all fits
>> together and works.  Thanks for opening an issue on the tracker.
>>
>> Cheers,
>> Tom
>>
>> On Thu, 27 Apr 2023 at 15:03, Yuval Lifshitz  wrote:
>>
>>> Hi Thomas,
>>> Thanks for the detailed info!
>>> RGW lua scripting was never tested in a cephadm deployment :-(
>>> Opened a tracker: https://tracker.ceph.com/issues/59574 to make sure
>>> this would work out of the box.
>>>
>>> Yuval
>>>
>>>
>>> On Tue, Apr 25, 2023 at 10:25 PM Thomas Bennett  wrote:
>>>
 Hi ceph users,

 I've been trying out the lua scripting for the rados gateway (thanks
 Yuval).

 As in my previous email I mentioned that there is an error when trying
 to
 load the luasocket module. However, I thought it was a good time to
 report
 on my progress.

 My 'hello world' example below is called *test.lua* below includes the
 following checks:

1. Can I write to the debug log?
2. Can I use the lua socket package to do something stupid but
intersting, like connect to a webservice?

 Before you continue reading this, you might need to know that I run all
 ceph processes in a *CentOS Stream release 8 *container deployed using
 ceph
 orchestrator running *Ceph v17.2.5*, so please view the information
 below
 in that context.

 For anyone looking for a reference, I suggest going to the ceph lua
 rados
 gateway documentation at radosgw/lua-scripting
 .

 There are two new switches you need to know about in the radosgw-admin:

- *script* -> loading your lua script
- *script-package* -> loading supporting packages for your script -
 e.i.
luasocket in this case.

 For a basic setup, you'll need to have a few dependencies in your
 containers:

- cephadm container: requires luarocks (I've checked the code - it
 runs
a luarocks search command)
- radosgw container: requires luarocks, gcc, make,  m4, wget (wget
 just
in case).

 To achieve the above, I updated the container image for our running
 system.
 I needed to do this because I needed to redeploy the rados gateway
 container to inject the lua script packages into the radosgw runtime
 process. This will start with a fresh container based on the global
 config
 *container_image* setting on your running system.

 For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
 * and included the following exta
 steps (including installing the lua dev from an rpm because there is no
 centos package in yum):
 yum install luarocks gcc make wget m4
 rpm -i

 https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm

 You will notice that I've included a compiler and compiler support into
 the
 image. This is because luarocks on the radosgw to compile luasocket (the
 package I want to install). This will happen at start time when the
 radosgw
 is restarted from ceph orch.

 In the cephadm container I still need to update our cephadm shell so I
 need
 to install luarocks by hand:
 yum install luarocks

 Then set thew updated image to use:
 ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3

 I now create a file called: *test.lua* in the cephadm container. This
 contains the following lines to write to the log and then do a get
 request
 to google. This is not practical in production, but it serves the
 purpose
 of testing the infrastructure:

 RGWDebugLog("Tsolo start lua script")
 local LuaSocket = require("socket")
 client = LuaSocket.connect("google.com", 80)
 client:send("GET / HTTP/1.0\r\nHost: google.com\r\n\r\n")
 while true do
  

[ceph-users] Re: cephadm docker registry

2023-05-10 Thread Eugen Block

Hi,

I recommend to create your own (local) container registry to have full  
control over the images. I also don't use the latest tag but always a  
specific version tag, that's also what the docs suggest [1]. The docs  
for an isolated environment [2] briefly describe how to set up your  
local registry, and in [1] you can also find some information about  
container images, for example how to set the global ceph image (just  
an example):


ceph config set global container_image myregistry.domain:/ceph/ceph:v17.2.6

The monitoring images can be set with these config options:

container_image_prometheus
container_image_grafana
container_image_alertmanager
container_image_node_exporter

Regards,
Eugen

[1] https://docs.ceph.com/en/latest/install/containers/#containers
[2]  
https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment



Zitat von Satish Patel :


Folks,

I am trying to install ceph on 10 node clusters and planning to use
cephadm. My question is if next year i will add new nodes to this cluster
then what docker image version cephadm will use to add new nodes?

Are there any local registry can i create one to copy images locally? How
does cephadm control images?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.13 Pacific released

2023-05-10 Thread Marc
What is with this latency issue? From what I have read here on mailing list, to 
me this looks bad. Until someone from ceph/redhat says it is not. 

https://tracker.ceph.com/issues/58530
https://www.mail-archive.com/ceph-users@ceph.io/msg19012.html


> 
> We're happy to announce the 13th backport release in the Pacific series.
> 
> https://ceph.io/en/news/blog/2023/v16-2-13-pacific-released/
> 
> Notable Changes
> ---
> 
> * CEPHFS: Rename the `mds_max_retries_on_remount_failure` option to
>   `client_max_retries_on_remount_failure` and move it from mds.yaml.in
> to
>   mds-client.yaml.in because this option was only used by MDS client
> from its
>   birth.
> 
> * `ceph mgr dump` command now outputs `last_failure_osd_epoch` and
>   `active_clients` fields at the top level.  Previously, these fields
> were
>   output under `always_on_modules` field.
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-10 Thread Eugen Block
I used the default to create a new user, so umask is 022. And the  
/tmp/var/lib/ceph directory belongs to the root user. I haven't tried  
to add another host yet, I understood that in your case it already  
failed during the initial boostrap, but I can try to add one more host.


Zitat von Ben :


Curiously what is the umask and directory permission in your case?  add a
host for the cluster for further try?

Eugen Block  于2023年5月9日周二 14:59写道:


Hi,

I just retried without the single-host option and it worked. Also
everything under /tmp/var belongs to root in my case. Unfortunately, I
can't use the curl-based cephadm but the contents are identical, I
compared. Not sure what it could be at the moment.

Zitat von Ben :

> Hi, It is uos v20(with kernel 4.19), one linux distribution among others.
> no matter since cephadm deploys things in containers by default. cephadm
is
> pulled by curl from Quincy branch of github.
>
> I think you could see some sort of errors if you remove parameter
> --single-host-defaults.
>
> More investigation shows it looks like a bug with cephadm.
> during the deploying procedure
>
,/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new
> is created through sudo ssh session remotely(with owner of root) and
> /tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/ is changed to
owner
> of ssh user deployer. The correct thing to do instead is,  /tmp/var/ be
> changed to the owner deployer recursively so that following scp can have
> access permission.
> I will see if having time to wire up a PR to fix it.
>
> Thanks for help on this.
> Ben
>
>
> Eugen Block  于2023年5月8日周一 21:01写道:
>
>> Hi,
>>
>> could you provide some more details about your host OS? Which cephadm
>> version is it? I was able to bootstrap a one-node cluster with both
>> 17.2.5 and 17.2.6 with a non-root user with no such error on openSUSE
>> Leap 15.4:
>>
>> quincy:~ # rpm -qa | grep cephadm
>> cephadm-17.2.6.248+gad656d572cb-lp154.2.1.noarch
>>
>> deployer@quincy:~> sudo cephadm --image quay.io/ceph/ceph:v17.2.5
>> bootstrap --mon-ip 172.17.2.3 --skip-monitoring-stack --ssh-user
>> deployer --single-host-defaults
>> Verifying ssh connectivity ...
>> Adding key to deployer@localhost authorized_keys...
>> Verifying podman|docker is present...
>> Verifying lvm2 is present...
>> Verifying time synchronization is in place...
>> Unit chronyd.service is enabled and running
>> Repeating the final host check...
>> podman (/usr/bin/podman) version 4.4.4 is present
>> [...]
>> Ceph version: ceph version 17.2.5
>> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
>> Extracting ceph user uid/gid from container image...
>> Creating initial keys...
>> Creating initial monmap...
>> Creating mon...
>> Waiting for mon to start...
>> Waiting for mon...
>> mon is available
>> [...]
>> Adding key to deployer@localhost authorized_keys...
>> Adding host quincy...
>> Deploying mon service with default placement...
>> Deploying mgr service with default placement...
>> [...]
>> Bootstrap complete.
>>
>> Zitat von Ben :
>>
>> > Hi,
>> >
>> > with following command:
>> >
>> > sudo cephadm  --docker bootstrap --mon-ip 10.1.32.33
>> --skip-monitoring-stack
>> >   --ssh-user deployer
>> > the user deployer has passwordless sudo configuration.
>> > I can see the error below:
>> >
>> > debug 2023-05-04T12:46:43.268+ 7fc5ddc2e700  0 [cephadm ERROR
>> > cephadm.ssh] Unable to write
>> >
>>
szhyf-xx1d002-hx15w:/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e:
>> > scp:
>> >
>>
/tmp/var/lib/ceph/ad3a132e-e9ee-11ed-8a19-043f72fb8bf9/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e.new:
>> > Permission denied
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 222, in
>> _write_remote_file
>> >
>> > await asyncssh.scp(f.name, (conn, tmp_path))
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in
scp
>> >
>> > await source.run(srcpath)
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in
run
>> >
>> > self.handle_error(exc)
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
>> > handle_error
>> >
>> > raise exc from None
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in
run
>> >
>> > await self._send_files(path, b'')
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
>> > _send_files
>> >
>> > self.handle_error(exc)
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
>> > handle_error
>> >
>> > raise exc from None
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
>> > _send_files
>> >
>> > await self._send_file(srcpath, dstpath, attrs)
>> >
>> >   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-10 Thread Igor Fedotov

Hey Zakhar,

You do need to restart OSDs to bring performance back to normal anyway, 
don't you? So yeah, we're not aware of better way so far - all the 
information I  have is from you and Nikola. And you both tell us about 
the need for restart.


Apparently there is no need to restart every OSD but "degraded/slow" 
ones only. We actually need to verify that. So please indicate the 
slowest OSDs (in terms of subop_w_lat) and do restart for them first. 
Hopefully just a fraction of your OSDs would require this.



Thanks,
Igor

On 5/10/2023 6:01 AM, Zakhar Kirpichenko wrote:
Thank you, Igor. I will try to see how to collect the perf values. Not 
sure about restarting all OSDs as it's a production cluster, is there 
a less invasive way?


/Z

On Tue, 9 May 2023 at 23:58, Igor Fedotov  wrote:

Hi Zakhar,

Let's leave questions regarding cache usage/tuning to a different
topic for now. And concentrate on performance drop.

Could you please do the same experiment I asked from Nikola once
your cluster reaches "bad performance" state (Nikola, could you
please use this improved scenario as well?):

- collect perf counters for every OSD

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters
again.

- Then restart OSDs one-by-one starting with the worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be sufficient
to reset just a few OSDs before the cluster is back to normal?

- if partial OSD restart is sufficient - please leave the
remaining OSDs run as-is without reboot.

- after the restart (no matter partial or complete one - the key
thing it's should successful) reset all the perf counters and
leave the cluster run for 30 mins and collect perf counters again.

- wait 24 hours and collect the counters one more time

- share all four counters snapshots.


Thanks,

Igor

On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:

Don't mean to hijack the thread, but I may be observing something
similar with 16.2.12: OSD performance noticeably peaks after OSD
restart and then gradually reduces over 10-14 days, while commit
and apply latencies increase across the board.

Non-default settings are:

        "bluestore_cache_size_hdd": {
            "default": "1073741824",
            "mon": "4294967296",
            "final": "4294967296"
        },
        "bluestore_cache_size_ssd": {
            "default": "3221225472",
            "mon": "4294967296",
            "final": "4294967296"
        },
...
        "osd_memory_cache_min": {
            "default": "134217728",
            "mon": "2147483648",
            "final": "2147483648"
        },
        "osd_memory_target": {
            "default": "4294967296",
            "mon": "17179869184",
            "final": "17179869184"
        },
        "osd_scrub_sleep": {
            "default": 0,
            "mon": 0.10001,
            "final": 0.10001
        },
        "rbd_balance_parent_reads": {
            "default": false,
            "mon": true,
            "final": true
        },

All other settings are default, the usage is rather simple
Openstack / RBD.

I also noticed that OSD cache usage doesn't increase over time
(see my message "Ceph 16.2.12, bluestore cache doesn't seem to be
used much" dated 26 April 2023, which received no comments),
despite OSDs are being used rather heavily and there's plenty of
host and OSD cache / target memory available. It may be worth
checking if available memory is being used in a good way.

/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov 
wrote:

Hey Nikola,

On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> OK, starting collecting those for all OSDs..
> I have hour samples of all OSDs perf dumps loaded in DB, so
I can easily examine,
> sort, whatever..
>
You didn't reset the counters every hour, do you? So having
average
subop_w_latency growing that way means the current values
were much
higher than before.

Curious if subop latencies were growing for every OSD or just
a subset
(may be even just a single one) of them?


Next time you reach the bad state please do the following if
possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf
counters again.

- Then start restarting OSD one-by-one starting with the
worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be
sufficient to
reset just a few OSDs before the cluster is back to normal?

>> currently values for avgtime are around 0.0003 for
subop_w_lat an