[ceph-users] mds crash loop - Server.cc: 7503: FAILED ceph_assert(in->first <= straydn->first)

2022-02-08 Thread Arnaud MARTEL



Hi all, 

We have a cephfs cluster in production for about 2 months and, for the past 2-3 
weeks, we are regularly experiencing MDS crash loops (every 3-4 hours if we 
have some user activity). 
A temporary fix is to remove the MDSs in error (or unknown) state, stop samba & 
nfs-ganesha gateways, then wipe all sessions. Sometimes, we have to repeat this 
procedure 2 or 3 times to have our cephfs back and working... 

When looking in the MDS log files, I noticed that all crashs have the following 
stack trace: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Server.cc:
 7503: FAILED ceph_assert(in->first <= straydn->first) 

ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable) 
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) 
[0x7eff2644bcce] 
2: /usr/lib64/ceph/libceph-common.so.2(+0x276ee8) [0x7eff2644bee8] 
3: (Server::_unlink_local(boost::intrusive_ptr&, CDentry*, 
CDentry*)+0x106a) [0x559c8f83331a] 
4: (Server::handle_client_unlink(boost::intrusive_ptr&)+0x4d9) 
[0x559c8f837fe9] 
5: 
(Server::dispatch_client_request(boost::intrusive_ptr&)+0xefb) 
[0x559c8f84e82b] 
6: (Server::handle_client_request(boost::intrusive_ptr 
const&)+0x3fc) [0x559c8f859aac] 
7: (Server::dispatch(boost::intrusive_ptr const&)+0x12b) 
[0x559c8f86258b] 
8: (MDSRank::handle_message(boost::intrusive_ptr const&)+0xbb4) 
[0x559c8f7bf374] 
9: (MDSRank::_dispatch(boost::intrusive_ptr const&, bool)+0x7bb) 
[0x559c8f7c19eb] 
10: (MDSRank::retry_dispatch(boost::intrusive_ptr const&)+0x16) 
[0x559c8f7c1f86] 
11: (MDSContext::complete(int)+0x56) [0x559c8fac0906] 
12: (MDSRank::_advance_queues()+0x84) [0x559c8f7c0a54] 
13: (MDSRank::_dispatch(boost::intrusive_ptr const&, 
bool)+0x204) [0x559c8f7c1434] 
14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr 
const&)+0x55) [0x559c8f7c1fe5] 
15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr const&)+0x128) 
[0x559c8f7b1f28] 
16: (DispatchQueue::entry()+0x126a) [0x7eff266894da] 
17: (DispatchQueue::DispatchThread::entry()+0x11) [0x7eff26739e21] 
18: /lib64/libpthread.so.0(+0x814a) [0x7eff2543214a] 
19: clone() 

I found an analog case in the ceph tracker website ( [ 
https://tracker.ceph.com/issues/41147 | https://tracker.ceph.com/issues/41147 ] 
) so I suspected an inode corruption and I started a cephfs scrub (ceph tell 
mds.cephfsvol:0 scrub start / recursive,repair). As we have a lot of files 
(about 200 millions entries for 200 TB), I don't know how long time it will 
take nor: 
- If this will correct the situation 
- What to do to avoid the same situation in the future 

Some information about our ceph cluster (pacific 16.2.6 with containers): 

** 
ceph -s 
** 
cluster: 
id: 2943b4fe-2063-11ec-a560-e43d1a1bc30f 
health: HEALTH_WARN 
1 MDSs report oversized cache 

services: 
mon: 5 daemons, quorum cephp03,cephp06,cephp05,cephp01,cephp02 (age 12d) 
mgr: cephp01.smfvfd(active, since 12d), standbys: cephp02.equfuj 
mds: 2/2 daemons up, 4 standby 
osd: 264 osds: 264 up (since 12d), 264 in (since 9w) 
rbd-mirror: 1 daemon active (1 hosts) 

task status: 
scrub status: 
mds.cephfsvol.cephp02.wsokro: idle+waiting paths [/] 
mds.cephfsvol.cephp05.qneike: active paths [/] 

data: 
volumes: 1/1 healthy 
pools: 5 pools, 2176 pgs 
objects: 595.12M objects, 200 TiB 
usage: 308 TiB used, 3.3 PiB / 3.6 PiB avail 
pgs: 2167 active+clean 
7 active+clean+scrubbing+deep 
2 active+clean+scrubbing 

io: 
client: 39 KiB/s rd, 152 KiB/s wr, 27 op/s rd, 27 op/s wr 


** 
# ceph fs get cephfsvol 
** 
Filesystem 'cephfsvol' (1) 
fs_name cephfsvol 
epoch 106554 
flags 12 
created 2021-09-28T14:19:54.399567+ 
modified 2022-02-08T12:57:00.653514+ 
tableserver 0 
root 0 
session_timeout 60 
session_autoclose 300 
max_file_size 5497558138880 
required_client_features {} 
last_failure 0 
last_failure_osd_epoch 41205 
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no 
anchor table,9=file layout v2,10=snaprealm v2} 
max_mds 2 
in 0,1 
up {0=4044909,1=3325354} 
failed 
damaged 
stopped 
data_pools [3,4] 
metadata_pool 2 
inline_data disabled 
balancer 
standby_count_wanted 1 
[mds.cephfsvol.cephp05.qneike{0:4044909} state up:active seq 1789 export 
targets 1 join_fscid=1 addr 
[v2:10.2.100.5:6800/2702983829,v1:10.2.100.5:6801/2702983829] compat 
{c=[1],r=[1],i=[7ff]}] 
[mds.cephfsvol.cephp02.wsokro{1:32bdaa} state up:active seq 18a02 export 
targets 0 join_fscid=1 addr 
[v2:10.2.100.2:1a90/aa660301,v1:10.2.100.2:1a91/aa66

[ceph-users] Re: ceph_assert(start >= coll_range_start && start < coll_range_end)

2022-02-08 Thread Manuel Lausch
Okay, I definitely need here some help.

The crashing OSD moved with the PG. so The PG seems to have the issue

I moved (via upmaps ) all 4 replicas to filestore OSDs. After this the
error seems to be solved. No OSD crashed after this.

A deep-scrub of the PG didn't throw any error. So I moved the first
shard back to a bluestore OSD. This worked flawlessly as well.

A deep scrub after this showed one object missing. The
same which was obviously the cause of the prior crashes.

repair seemed to fixed the object. But a further deep-scrub brings back
the same error.

Even putting the object again with rados put didn't help. now I have
two "missing" objects. (the head and the snapshot from overwriting)


Here the scrub error and reapair from the osd log
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
shard 3 
1::::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_0040:head
 : missing
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
deep-scrub 1 missing, 0 inconsistent objects
2022-02-08 14:04:43.751 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
deep-scrub 1 errors

2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
shard 3 
1::::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_0040:head
 : missing
2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
repair 1 missing, 0 inconsistent objects
2022-02-08 13:52:09.111 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
repair 1 errors, 1 fixed


and here the new scrub error with the two missings
2022-02-08 14:19:10.990 7f600dfec700  0 log_channel(cluster) log [DBG] : 1.7fff 
deep-scrub starts
2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
shard 3 
1::::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_0040:974
 : missing
2022-02-08 14:25:17.749 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
shard 3 
1::::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_0040:head
 : missing
2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
deep-scrub 2 missing, 0 inconsistent objects
2022-02-08 14:25:17.750 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff 
deep-scrub 2 errors


Can someone help me here? I don't have any clue.


Regards
Manuel

On Mon, 7 Feb 2022 16:51:16 +0100
Manuel Lausch  wrote:

> Hi,
> 
> I am migrating from filestore to bluestore (workflow is draining osd,
> and reformat it with bluestore)
> 
> Now I have two OSDs which crashes to the same time with the following
> error. Restarting of the OSD works for some time until they crash
> again.
> 
>-40> 2022-02-07 16:28:20.489 7f550723a700 20 
> bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600)  
> r 0 v.len 512
>-39> 2022-02-07 16:28:20.489 7f550723a700 15 
> bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head 
> #1:ffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_0040:head#
>-38> 2022-02-07 16:28:20.489 7f550723a700 10 
> bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head 
> #1:ffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_0040:head#
>  = 0
>-37> 2022-02-07 16:28:20.489 7f550723a700 10 
> bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head 
> #1:ffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
>-36> 2022-02-07 16:28:20.489 7f550723a700 20 
> bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) 
> get_onode oid 
> #1:ffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
>  key 
> 0x7f8001ffef216264'a22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d!='0xfffe'o'
>-35> 2022-02-07 16:28:20.489 7f550723a700 20 
> bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600)  
> r 0 v.len 843
>-34> 2022-02-07 16:28:20.489 7f550723a700 15 
> bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head 
> #1:ffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
>-33> 2022-02-07 16:28:20.489 7f550723a700 10 
> bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head 
> #1:ffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
>  = 0
>-32> 2022-02-07 16:28:20.489 7f550723a700 10 
> bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head 
> #1:fffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_0040:head#
>-31> 2022-02-07 16:28:20.489 7f550723a700 20 
> bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) 
> get_onode oid 
> #1:fffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_0040:head#
>  key 
> 0x7f8001ff

[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-02-08 Thread Dan van der Ster
On Tue, Feb 8, 2022 at 1:04 PM Frank Schilder  wrote:
> The reason for this seemingly strange behaviour was an old static snapshot 
> taken in an entirely different directory. Apparently, ceph fs snapshots are 
> not local to an FS directory sub-tree but always global on the entire FS 
> despite the fact that you can only access the sub-tree in the snapshot, which 
> easily leads to the wrong conclusion that only data below the directory is in 
> the snapshot. As a consequence, the static snapshot was accumulating the 
> garbage from the rotating snapshots even though these sub-trees were 
> completely disjoint.

So are you saying that if I do this I'll have 1M files in stray?

mkdir /a
cd /a
for i in {1..100}; do touch $i; done  # create 1M files in /a
cd ..
mkdir /b
mkdir /b/.snap/testsnap  # create a snap in the empty dir /b
rm -rf /a/


Cheers, Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2022-02-08 Thread Christian Rohmann

Hey there again,

there now was a question from Neha Ojha in 
https://tracker.ceph.com/issues/53663
about providing OSD debug logs for a manual deep-scrub on (inconsistent) 
PGs.


I did provide the logs of two of those deep-scrubs via ceph-post-file 
already.


But since data inconsistencies are the worse of bugs and adding some 
unpredictability to their occurrence we likely need
more evidence to have a chance to narrow this down. And since you seem 
to observe something similar,  could you gather

and post debug info about them to the ticket as well maybe?


Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW automation encryption - still testing only?

2022-02-08 Thread David Orman
Is RGW encryption for all objects at rest still testing only, and if not,
which version is it considered stable in?:

https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only

David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW automation encryption - still testing only?

2022-02-08 Thread Casey Bodley
hi David,

that method of encryption based on rgw_crypt_default_encryption_key
will never be officially supported. however, support for SSE-S3
encryption [1] is nearly complete in [2] (cc Marcus), and we hope to
include that in the quincy release - and if not, we'll backport it to
quincy in an early point release

can SSE-S3 with PutBucketEncryption satisfy your use case?

[1] 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
[2] https://github.com/ceph/ceph/pull/44494

On Tue, Feb 8, 2022 at 10:44 AM David Orman  wrote:
>
> Is RGW encryption for all objects at rest still testing only, and if not,
> which version is it considered stable in?:
>
> https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
>
> David
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW automation encryption - still testing only?

2022-02-08 Thread Casey Bodley
On Tue, Feb 8, 2022 at 11:11 AM Casey Bodley  wrote:
>
> hi David,
>
> that method of encryption based on rgw_crypt_default_encryption_key
> will never be officially supported.

to expand on why: rgw_crypt_default_encryption_key requires the key
material to be stored insecurely in ceph's config, and cannot support
key rotation

> however, support for SSE-S3
> encryption [1] is nearly complete in [2] (cc Marcus), and we hope to
> include that in the quincy release - and if not, we'll backport it to
> quincy in an early point release
>
> can SSE-S3 with PutBucketEncryption satisfy your use case?
>
> [1] 
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
> [2] https://github.com/ceph/ceph/pull/44494
>
> On Tue, Feb 8, 2022 at 10:44 AM David Orman  wrote:
> >
> > Is RGW encryption for all objects at rest still testing only, and if not,
> > which version is it considered stable in?:
> >
> > https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
> >
> > David
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW automation encryption - still testing only?

2022-02-08 Thread David Orman
Totally understand, I'm not really a fan of service-managed encryption keys
as a general rule vs. client-managed. I just thought I'd probe about
capabilities considered stable before embarking on our own work. SSE-S3
would be a reasonable middle-ground. I appreciate the PR link, that's very
helpful.

On Tue, Feb 8, 2022 at 10:29 AM Casey Bodley  wrote:

> On Tue, Feb 8, 2022 at 11:11 AM Casey Bodley  wrote:
> >
> > hi David,
> >
> > that method of encryption based on rgw_crypt_default_encryption_key
> > will never be officially supported.
>
> to expand on why: rgw_crypt_default_encryption_key requires the key
> material to be stored insecurely in ceph's config, and cannot support
> key rotation
>
> > however, support for SSE-S3
> > encryption [1] is nearly complete in [2] (cc Marcus), and we hope to
> > include that in the quincy release - and if not, we'll backport it to
> > quincy in an early point release
> >
> > can SSE-S3 with PutBucketEncryption satisfy your use case?
> >
> > [1]
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
> > [2] https://github.com/ceph/ceph/pull/44494
> >
> > On Tue, Feb 8, 2022 at 10:44 AM David Orman 
> wrote:
> > >
> > > Is RGW encryption for all objects at rest still testing only, and if
> not,
> > > which version is it considered stable in?:
> > >
> > >
> https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
> > >
> > > David
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-02-08 Thread Gregory Farnum
On Tue, Feb 8, 2022 at 7:30 AM Dan van der Ster  wrote:
>
> On Tue, Feb 8, 2022 at 1:04 PM Frank Schilder  wrote:
> > The reason for this seemingly strange behaviour was an old static snapshot 
> > taken in an entirely different directory. Apparently, ceph fs snapshots are 
> > not local to an FS directory sub-tree but always global on the entire FS 
> > despite the fact that you can only access the sub-tree in the snapshot, 
> > which easily leads to the wrong conclusion that only data below the 
> > directory is in the snapshot. As a consequence, the static snapshot was 
> > accumulating the garbage from the rotating snapshots even though these 
> > sub-trees were completely disjoint.
>
> So are you saying that if I do this I'll have 1M files in stray?

No, happily.

The thing that's happening here post-dates my main previous stretch on
CephFS and I had forgotten it, but there's a note in the developer
docs: https://docs.ceph.com/en/latest/dev/cephfs-snapshots/#hard-links
(I fortuitously stumbled across this from an entirely different
direction/discussion just after seeing this thread and put the pieces
together!)

Basically, hard links are *the worst*. For everything in filesystems.
I spent a lot of time trying to figure out how to handle hard links
being renamed across snapshots[1] and never managed it, and the
eventual "solution" was to give up and do the degenerate thing:
If there's a file with multiple hard links, that file is a member of
*every* snapshot.

Doing anything about this will take a lot of time. There's probably an
opportunity to improve it for users of the subvolumes library, as
those subvolumes do get tagged a bit, so I'll see if we can look into
that. But for generic CephFS, I'm not sure what the solution will look
like at all.

Sorry folks. :/
-Greg

[1]: The issue is that, if you have a hard linked file in two places,
you would expect it to be snapshotted whenever a snapshot covering
either location occurs. But in CephFS the file can only live in one
location, and the other location has to just hold a reference to it
instead. So say you have inode Y at path A, and then hard link it in
at path B. Given how snapshots work, when you open up Y from A, you
would need to check all the snapshots that apply from both A and B's
trees. But 1) opening up other paths is a challenge all on its own,
and 2) without an inode and its backtrace to provide a lookup resolve
point, it's impossible to maintain a lookup that scales and is
possible to keep consistent.
(Oh, I did just have one idea, but I'm not sure if it would fix every
issue or just that scalable backtrace lookup:
https://tracker.ceph.com/issues/54205)

>
> mkdir /a
> cd /a
> for i in {1..100}; do touch $i; done  # create 1M files in /a
> cd ..
> mkdir /b
> mkdir /b/.snap/testsnap  # create a snap in the empty dir /b
> rm -rf /a/
>
>
> Cheers, Dan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW automation encryption - still testing only?

2022-02-08 Thread Casey Bodley
On Tue, Feb 8, 2022 at 11:55 AM Stefan Schueffler
 wrote:
>
> Hi Casey,
>
> great news to hear about "SSE-S3 almost implemented" :-)
>
> One question about that - will the implementation have one key per bucket, or 
> one key per individual object?
>
> Amazon (as per the public available docs) is using one unique key per object 
> - and encrypts the key on top of this with a per bucket or master key that 
> regularly rotates.

my understanding is that there are per-object keys, and
key-encryption-keys that can either be per-bucket, per-user, or global
depending on ceph config

>
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html
>
> Best
> Stefan
>
>
>
>
> Am 08.02.2022 um 17:11 schrieb Casey Bodley :
>
> hi David,
>
> that method of encryption based on rgw_crypt_default_encryption_key
> will never be officially supported. however, support for SSE-S3
> encryption [1] is nearly complete in [2] (cc Marcus), and we hope to
> include that in the quincy release - and if not, we'll backport it to
> quincy in an early point release
>
> can SSE-S3 with PutBucketEncryption satisfy your use case?
>
> [1] 
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
> [2] https://github.com/ceph/ceph/pull/44494
>
> On Tue, Feb 8, 2022 at 10:44 AM David Orman  wrote:
>
>
> Is RGW encryption for all objects at rest still testing only, and if not,
> which version is it considered stable in?:
>
> https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
>
> David
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] R release naming

2022-02-08 Thread Josh Durgin
Hi folks,

As we near the end of the Quincy cycle, it's time to choose a name for
the next release.

This etherpad began a while ago, so there are some votes already,
however we wanted to open it up for anyone who hasn't voted yet. Add
your +1 to the name you prefer here, or add a new option:

https://pad.ceph.com/p/r

Josh

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Monitoring slow ops

2022-02-08 Thread Trey Palmer
Hi all,

We have found that RGW access problems on our clusters almost always
coincide with slow ops in "ceph -s".

Is there any good way to monitor and alert on slow ops from prometheus
metrics?

We are running Nautilus but I'd be interested in any changes that might
help in newer versions, as well.

Thanks,

Trey Palmer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Monitoring slow ops

2022-02-08 Thread Konstantin Shalygin
Hi,

> On 9 Feb 2022, at 09:03, Benoît Knecht  wrote:
> 
> I don't remember in which Ceph release it was introduced, but on Pacific
> there's a metric called `ceph_healthcheck_slow_ops`.

At least in Nautilus this metric exists


k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io