date:20220804

[ceph-users] Re: ceph mds dump tree - root inode is not in cache

2022-08-04 Thread 胡玮文

Hi Frank,

I have not experienced this before. Maybe mds.tceph-03 is in standby state? 
Could you show the output of “ceph fs status”?

You can also try “ceph tell mds.0 …” and let ceph find the correct daemon for 
you.

You may also try dumping “~mds0/stray0”.

Weiwen Hu

> 在 2022年8月4日，23:22，Frank Schilder  写道：
> 
> Hi all,
> 
> I'm stuck with a very annoying problem with a ceph octopus test cluster 
> (latest stable version). I need to investigate the contents of the MDS stray 
> buckets and something like this should work:
> 
> [root@ceph-adm:tceph-03 ~]# ceph daemon mds.tceph-03 dump tree '~mdsdir' 3
> [root@ceph-adm:tceph-03 ~]# ceph tell mds.tceph-03 dump tree '~mdsdir/stray0'
> 2022-08-04T16:57:54.010+0200 7f3475ffb700  0 client.371437 ms_handle_reset on 
> v2:10.41.24.15:6812/2903519715
> 2022-08-04T16:57:54.052+0200 7f3476ffd700  0 client.371443 ms_handle_reset on 
> v2:10.41.24.15:6812/2903519715
> root inode is not in cache
> 
> However, I either get nothing or an error message. Whatever I try, I cannot 
> figure out how to pull the root inode into the MDS cache - if this is even 
> the problem here. I also don't understand why the annoying ms_handle_reset 
> messages are there. I found the second command in a script:
> 
> Code line: 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fhuww98%2F91cbff0782ad4f6673dcffccce731c05%23file-cephfs-reintegrate-conda-stray-py-L11&data=05%7C01%7C%7C8c073b7e1d98481f873908da762d2110%7C84df9e7fe9f640afb435%7C1%7C0%7C637952233466034022%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zS7Z2%2B1nYrhl39T8nvTRr33AgVmhlray4Gi8RH7C7UU%3D&reserved=0
> 
> that came up in this conversation: 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ceph.io%2Fhyperkitty%2Flist%2Fceph-users%40ceph.io%2Fmessage%2F4TDASTSWF4UIURKUN2P7PGZZ3V5SCCEE%2F&data=05%7C01%7C%7C8c073b7e1d98481f873908da762d2110%7C84df9e7fe9f640afb435%7C1%7C0%7C637952233466034022%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Es%2FCebNVrmDP1afP7gZYYxfly%2BNkMYtElOae83WKzFc%3D&reserved=0
> 
> The only place I can find "root inode is not in cache" is 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F53597%23note-14&data=05%7C01%7C%7C8c073b7e1d98481f873908da762d2110%7C84df9e7fe9f640afb435%7C1%7C0%7C637952233466190258%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZPncxAhS4abFsrPbTTrnB6P3kZTC5TnzepeAkONqcVw%3D&reserved=0,
>  where it says that the above commands should return the tree. I have ca. 
> 1mio stray entries and they must be somewhere. mds.tceph-03 is the only 
> active MDS.
> 
> Can someone help me out here?
> 
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm old spec Feature `crush_device_class` is not supported

2022-08-04 Thread David Orman

https://github.com/ceph/ceph/pull/46480 - you can see the backports/dates
there.

Perhaps it isn't in the version you're running?

On Thu, Aug 4, 2022 at 7:51 AM Kenneth Waegeman 
wrote:

> Hi all,
>
> I’m trying to deploy this spec:
>
> spec:
>   data_devices:
> model: Dell Ent NVMe AGN MU U.2 6.4TB
> rotational: 0
>   encrypted: true
>   osds_per_device: 4
>   crush_device_class: nvme
> placement:
>   host_pattern: 'ceph30[1-3]'
> service_id: nvme_22_drive_group
> service_type: osd
>
>
> But it fails:
>
> ceph orch apply -i /etc/ceph/orch_osd.yaml --dry-run
> Error EINVAL: Failed to validate OSD spec "nvme_22_drive_group": Feature
> `crush_device_class` is not supported
>
> It’s in the docs
> https://docs.ceph.com/en/quincy/cephadm/services/osd/#ceph.deployment.drive_group.DriveGroupSpec.crush_device_class,
> and it’s also even in the docs of Pacific. I’m running Quincy 17.2.0
>
> Is this option missing somehow?
>
> Thanks!!
>
> Kenneth
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-08-04 Thread Yuval Lifshitz

Hi Mark,
It is always good to move forward with the ceph versions :-) However, this
was also backported to pacific [1].
So, if you want to wait for the next pacific release, the fix should be
there.

Yuval

[1] https://github.com/ceph/ceph/pull/47175


On Thu, Jul 21, 2022 at 11:10 PM Mark Selby  wrote:

> “Is there a usecase for sending a notification when the upload starts?” –
> Not for me. Only having to watch for ObjectCreated:CompleteMultipartUpload
> and ObjectCreated:Put works for me. Quincy here I come. Thanks!
>
>
>
>
>
> --
>
> Mark Selby
>
> Sr Linux Administrator, The Voleon Group
>
> mse...@voleon.com
>
>
>
>  This email is subject to important conditions and disclosures that are
> listed on this web page: https://voleon.com/disclaimer/.
>
>
>
>
>
> *From: *Yuval Lifshitz 
> *Date: *Thursday, July 21, 2022 at 12:21 AM
> *To: *Mark Selby 
> *Cc: *"d...@redhat.com" , "ceph-users@ceph.io" <
> ceph-users@ceph.io>
> *Subject: *Re: [ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications
> and MultiPart Uploads
>
>
>
> Hi Mark,
>
> Starting from quincy, we send 1 notification,
> "ObjectCreated:CompleteMultipartUpload", when the upload is complete. See:
> https://docs.ceph.com/en/quincy/radosgw/s3-notification-compatibility/
> 
>
> We don't send the "ObjectCreated:Post" notification when the upload starts
> as it will be confusing with other objects "POST" uploads.
>
> Is there a usecase for sending a notification when the upload starts?
>
>
>
> Yuval
>
>
>
>
>
> Yuval
>
>
>
> On Thu, Jul 21, 2022 at 12:53 AM Mark Selby  wrote:
>
> I have not tested with Quincy/17.x yet so I do not know which
> notifications are sent for Multipart uploads in this release set.
>
> I know that for Pacific.16.x I needed to add some code/logic to only act
> on notifications that represented the end state of an Object creation.
>
> My tests show that when a multipart upload is in progress if you perform a
> head on the object before the final part is uploaded that you will get back
> a 200 with the size that the object eventually will be. The multiple
> noitifications that occur with the Multipart upload have size set to the
> chunk that is being uploaded, not the total size.
>
> In order to get the behavior that I wanted, act on an object after it has
> been uploaded I had to code the following:
>
> - get a notification and get it's size
> - head the object in RGW and get it's size
> - if the sizes do not match then do nothing.
> - When the size in the notification matches the size on the head request
> the event type is ObjectCreated:CompleteMultipartUpload and we know the
> upload is complete.
>
> This is a bunch of extra code and round trips that.I wish that I did not
> have to make.
>
> Hopefully Quincy only sends 2 notifications for a multipart upload (1) The
> initial Post and (2) The find Put.
>
>
>
> --
>
>
> Mark Selby
> Sr Linux Administrator, The Voleon Group
> mse...@voleon.com
>
>  This email is subject to important conditions and disclosures that are
> listed on this web page: https://voleon.com/disclaimer/
> 
> .
>
>
> On 7/20/22, 5:57 AM, "Daniel Gryniewicz"  wrote:
>
> Seems like the notification for a multipart upload should look
> different
> to a normal upload?
>
> Daniel
>
> On 7/20/22 08:53, Yehuda Sadeh-Weinraub wrote:
> > Can maybe leverage one of the other calls to check for upload
> completion:
> > list multipart uploads and/or list parts. The latter should work if
> you
> > have the upload id at hand.
> >
> > Yehuda
> >
> > On Wed, Jul 20, 2022, 8:40 AM Casey Bodley 
> wrote:
> >
> >> On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz <
> ylifs...@redhat.com>
> >> wrote:
> >>>
> >>> yes, that would work. you would get a "404" until the object is
> fully
> >>> uploaded.
> >>
> >> just note that you won't always get 404 before multipart complete,
> >> because multipart uploads can overwrite existing objects
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSDs crashing/flapping

2022-08-04 Thread Igor Fedotov


Hi Torkil,

it looks like you're facing pretty well known problem with RocksDB 
performance degradation caused by bulk data removal. This has been 
discussed multiple times in this mailing list.


And here is one of the relevant tracker: 
https://tracker.ceph.com/issues/40741


To eliminate the effect you might want to do manual DB compaction using 
ceph-kvstore-tool for all your OSDs. This will cure current degraded OSD 
state but unfortunately doesn't protect from future similar cases.



Thanks,

Igor

On 8/4/2022 10:17 AM, Torkil Svensgaard wrote:

Hi

We have a lot of OSDs flapping during recovery and eventually they 
don't come up again until kicked with "ceph orch daemon restart osd.x".


ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)


6 hosts connected by 2 x 10 GB. Most data in EC 2+2 rbd pool.

"
# ceph -s
  cluster:
    id: 3b7736c6-00e4-11ec-a3c5-3cecef467984
    health: HEALTH_WARN
    2 host(s) running different kernel versions
    noscrub,nodeep-scrub,nosnaptrim flag(s) set
    Degraded data redundancy: 95909/1023542135 objects 
degraded (0.009%), 10 pgs degraded, 6 pgs undersized

    484 pgs not deep-scrubbed in time
    725 pgs not scrubbed in time
    11 daemons have recently crashed
    4 slow ops, oldest one blocked for 178 sec, daemons 
[osd.13,osd.15,osd.19,osd.46,osd.50] have slow ops.


  services:
    mon:    5 daemons, quorum 
test-ceph-03,test-ceph-04,dcn-ceph-03,dcn-ceph-02,dcn-ceph-01 (age 51s)
    mgr:    dcn-ceph-01.dzercj(active, since 20h), standbys: 
dcn-ceph-03.lrhaxo

    mds:    1/1 daemons up, 2 standby
    osd:    118 osds: 118 up (since 3m), 118 in (since 86m); 137 
remapped pgs

    flags noscrub,nodeep-scrub,nosnaptrim
    rbd-mirror: 2 daemons active (2 hosts)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 2737 pgs
    objects: 257.30M objects, 334 TiB
    usage:   673 TiB used, 680 TiB / 1.3 PiB avail
    pgs: 95909/1023542135 objects degraded (0.009%)
 7498287/1023542135 objects misplaced (0.733%)
 2505 active+clean
 98   active+remapped+backfilling
 85   active+clean+snaptrim_wait
 32   active+remapped+backfill_wait
 6    active+clean+laggy
 5    active+undersized+degraded+remapped+backfilling
 4    active+recovering+degraded
 1    active+recovering+degraded+remapped
 1    active+undersized+remapped+backfilling

  io:
    client:   45 KiB/s rd, 1.3 MiB/s wr, 52 op/s rd, 91 op/s wr
    recovery: 1.2 GiB/s, 467 objects/s

  progress:
    Global Recovery Event (20h)
  [==..] (remaining: 65m)
"

Crash info for one OSD:

"
022-08-03T10:34:50.179+ 7fdedd02f700 -1 *** Caught signal 
(Aborted) **

 in thread 7fdedd02f700 thread_name:tp_osd_tp

 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) 
pacific (stable)

 1: /lib64/libpthread.so.0(+0x12c20) [0x7fdf00a78c20]
 2: pread64()
 3: (KernelDevice::read_random(unsigned long, unsigned long, char*, 
bool)+0x40d) [0x55701b4c0f0d]
 4: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned 
long, char*)+0x60d) [0x55701b05ee7d]
 5: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, 
rocksdb::Slice*, char*) const+0x24) [0x55701b08e6d4]
 6: (rocksdb::LegacyRandomAccessFileWrapper::Read(unsigned long, 
unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, 
rocksdb::IODebugContext*) const+0x26) [0x55701b529396]
 7: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned 
long, rocksdb::Slice*, char*, bool) const+0xdc7) [0x55701b745267]

 8: (rocksdb::BlockFetcher::ReadBlockContents()+0x4b5) [0x55701b69fa45]
 9: (rocksdb::Status 
rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, 
rocksdb::UncompressionDict const&, 
rocksdb::CachableEntry*, 
rocksdb::BlockType, rocksdb::GetContext*, 
rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*) 
const+0x919) [0x55701b691ba9]
 10: (rocksdb::Status 
rocksdb::BlockBasedTable::RetrieveBlock(rocksdb::FilePrefetchBuffer*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, 
rocksdb::UncompressionDict const&, 
rocksdb::CachableEntry*, 
rocksdb::BlockType, rocksdb::GetContext*, 
rocksdb::BlockCacheLookupContext*, bool, bool) const+0x286) 
[0x55701b691f86]
 11: 
(rocksdb::FilterBlockReaderCommon::ReadFilterBlock(rocksdb::BlockBasedTable 
const*, rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, 
bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, 
rocksdb::CachableEntry*)+0xf1) 
[0x55701b760891]
 12: 
(rocksdb::FilterBlockReaderCommon::GetOrReadFilterBlock(bool, 
rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, 
rocksdb::CachableEntry*) const+0xfe) 
[0x55701b760b5e]
 13: (rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice

[ceph-users] Re: OSDs crashing/flapping

2022-08-04 Thread Torkil Svensgaard




On 8/4/22 09:17, Torkil Svensgaard wrote:

Hi

We have a lot of OSDs flapping during recovery and eventually they don't 
come up again until kicked with "ceph orch daemon restart osd.x".


This is the end of the log for one OSD going down for good:

"
2022-08-04T09:57:31.752+ 7f3812cb2700  1 he'OSD::osd_op_tp thread 
0x7f37f5a67700' had timed out after 15.00954s
2022-08-04T09:57:35.520+ 7f37f5a67700  1 heartbeat_map clear_timeout 
'OSD::osd_op_tp thread 0x7f37f5a67700' had timed out after 15.00954s
2022-08-04T09:57:35.520+ 7f37f5a67700  0 
bluestore(/var/lib/ceph/osd/ceph-102) log_latency slow operation 
observed for submit_transact, latency = 63.815132141s
2022-08-04T09:57:35.531+ 7f3806c9a700  0 
bluestore(/var/lib/ceph/osd/ceph-102) log_latency_fn slow operation 
observed for _txc_committed_kv, latency = 63.825824738s, txc = 
0x55f84d31d500
2022-08-04T09:57:35.554+ 7f3802c92700  0 log_channel(cluster) log 
[WRN] : Monitor daemon marked osd.102 down, but it is still running
2022-08-04T09:57:35.554+ 7f3802c92700  0 log_channel(cluster) log 
[DBG] : map e138671 wrongly marked me down at e138668
2022-08-04T09:57:35.554+ 7f3802c92700 -1 osd.102 138671 
_committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 
600.00 seconds, shutting down
2022-08-04T09:57:35.558+ 7f3802c92700  0 osd.102 138671 
_committed_osd_maps shutdown OSD via async signal
2022-08-04T09:57:35.558+ 7f38114af700 -1 received  signal: Interrupt 
from Kernel ( Could be generated by pthread_kill(), raise(), abort(), 
alarm() ) UID: 0
2022-08-04T09:57:35.558+ 7f38114af700 -1 osd.102 138671 *** Got 
signal Interrupt ***
2022-08-04T09:57:35.558+ 7f38114af700 -1 osd.102 138671 *** 
Immediate shutdown (osd_fast_shutdown=true) ***

"

Mvh.

Torkil


ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)


6 hosts connected by 2 x 10 GB. Most data in EC 2+2 rbd pool.

"
# ceph -s
   cluster:
     id: 3b7736c6-00e4-11ec-a3c5-3cecef467984
     health: HEALTH_WARN
     2 host(s) running different kernel versions
     noscrub,nodeep-scrub,nosnaptrim flag(s) set
     Degraded data redundancy: 95909/1023542135 objects degraded 
(0.009%), 10 pgs degraded, 6 pgs undersized

     484 pgs not deep-scrubbed in time
     725 pgs not scrubbed in time
     11 daemons have recently crashed
     4 slow ops, oldest one blocked for 178 sec, daemons 
[osd.13,osd.15,osd.19,osd.46,osd.50] have slow ops.


   services:
     mon:    5 daemons, quorum 
test-ceph-03,test-ceph-04,dcn-ceph-03,dcn-ceph-02,dcn-ceph-01 (age 51s)
     mgr:    dcn-ceph-01.dzercj(active, since 20h), standbys: 
dcn-ceph-03.lrhaxo

     mds:    1/1 daemons up, 2 standby
     osd:    118 osds: 118 up (since 3m), 118 in (since 86m); 137 
remapped pgs

     flags noscrub,nodeep-scrub,nosnaptrim
     rbd-mirror: 2 daemons active (2 hosts)

   data:
     volumes: 1/1 healthy
     pools:   9 pools, 2737 pgs
     objects: 257.30M objects, 334 TiB
     usage:   673 TiB used, 680 TiB / 1.3 PiB avail
     pgs: 95909/1023542135 objects degraded (0.009%)
  7498287/1023542135 objects misplaced (0.733%)
  2505 active+clean
  98   active+remapped+backfilling
  85   active+clean+snaptrim_wait
  32   active+remapped+backfill_wait
  6    active+clean+laggy
  5    active+undersized+degraded+remapped+backfilling
  4    active+recovering+degraded
  1    active+recovering+degraded+remapped
  1    active+undersized+remapped+backfilling

   io:
     client:   45 KiB/s rd, 1.3 MiB/s wr, 52 op/s rd, 91 op/s wr
     recovery: 1.2 GiB/s, 467 objects/s

   progress:
     Global Recovery Event (20h)
   [==..] (remaining: 65m)
"

Crash info for one OSD:

"
022-08-03T10:34:50.179+ 7fdedd02f700 -1 *** Caught signal (Aborted) **
  in thread 7fdedd02f700 thread_name:tp_osd_tp

  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)

  1: /lib64/libpthread.so.0(+0x12c20) [0x7fdf00a78c20]
  2: pread64()
  3: (KernelDevice::read_random(unsigned long, unsigned long, char*, 
bool)+0x40d) [0x55701b4c0f0d]
  4: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned 
long, char*)+0x60d) [0x55701b05ee7d]
  5: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, 
rocksdb::Slice*, char*) const+0x24) [0x55701b08e6d4]
  6: (rocksdb::LegacyRandomAccessFileWrapper::Read(unsigned long, 
unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, 
rocksdb::IODebugContext*) const+0x26) [0x55701b529396]
  7: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned 
long, rocksdb::Slice*, char*, bool) const+0xdc7) [0x55701b745267]

  8: (rocksdb::BlockFetcher::ReadBlockContents()+0x4b5) [0x55701b69fa45]
  9: (rocksdb::Status 
rocksdb::BlockBasedTable::MaybeRead

[ceph-users] Re: Adding new drives to ceph with ssd DB+WAL

2022-08-04 Thread Sven Kieske

On Mo, 2022-08-01 at 09:11 +0200, Robert Sander wrote:
> You cannot use "cephadm ceph-volume lvm create" in a cephadm
> orchestrated cluster because it will not create the correct
> container systemd units.

is there an open bug report about this behaviour?

I would assume cephadm needs to be fixed to make this command
work and to disable this command until it works so users will not
do things that might potentially harm their clusters?

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer

Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp

Tel.: 05772 / 293-900
Fax: 05772 / 293-333

https://www.mittwald.de

Geschäftsführer: Robert Meyer, Florian Jürgens

St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] OSDs crashing/flapping

2022-08-04 Thread Torkil Svensgaard


Hi

We have a lot of OSDs flapping during recovery and eventually they don't 
come up again until kicked with "ceph orch daemon restart osd.x".


ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)


6 hosts connected by 2 x 10 GB. Most data in EC 2+2 rbd pool.

"
# ceph -s
  cluster:
id: 3b7736c6-00e4-11ec-a3c5-3cecef467984
health: HEALTH_WARN
2 host(s) running different kernel versions
noscrub,nodeep-scrub,nosnaptrim flag(s) set
Degraded data redundancy: 95909/1023542135 objects degraded 
(0.009%), 10 pgs degraded, 6 pgs undersized

484 pgs not deep-scrubbed in time
725 pgs not scrubbed in time
11 daemons have recently crashed
4 slow ops, oldest one blocked for 178 sec, daemons 
[osd.13,osd.15,osd.19,osd.46,osd.50] have slow ops.


  services:
mon:5 daemons, quorum 
test-ceph-03,test-ceph-04,dcn-ceph-03,dcn-ceph-02,dcn-ceph-01 (age 51s)
mgr:dcn-ceph-01.dzercj(active, since 20h), standbys: 
dcn-ceph-03.lrhaxo

mds:1/1 daemons up, 2 standby
osd:118 osds: 118 up (since 3m), 118 in (since 86m); 137 
remapped pgs

flags noscrub,nodeep-scrub,nosnaptrim
rbd-mirror: 2 daemons active (2 hosts)

  data:
volumes: 1/1 healthy
pools:   9 pools, 2737 pgs
objects: 257.30M objects, 334 TiB
usage:   673 TiB used, 680 TiB / 1.3 PiB avail
pgs: 95909/1023542135 objects degraded (0.009%)
 7498287/1023542135 objects misplaced (0.733%)
 2505 active+clean
 98   active+remapped+backfilling
 85   active+clean+snaptrim_wait
 32   active+remapped+backfill_wait
 6active+clean+laggy
 5active+undersized+degraded+remapped+backfilling
 4active+recovering+degraded
 1active+recovering+degraded+remapped
 1active+undersized+remapped+backfilling

  io:
client:   45 KiB/s rd, 1.3 MiB/s wr, 52 op/s rd, 91 op/s wr
recovery: 1.2 GiB/s, 467 objects/s

  progress:
Global Recovery Event (20h)
  [==..] (remaining: 65m)
"

Crash info for one OSD:

"
022-08-03T10:34:50.179+ 7fdedd02f700 -1 *** Caught signal (Aborted) **
 in thread 7fdedd02f700 thread_name:tp_osd_tp

 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)

 1: /lib64/libpthread.so.0(+0x12c20) [0x7fdf00a78c20]
 2: pread64()
 3: (KernelDevice::read_random(unsigned long, unsigned long, char*, 
bool)+0x40d) [0x55701b4c0f0d]
 4: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned 
long, char*)+0x60d) [0x55701b05ee7d]
 5: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, 
rocksdb::Slice*, char*) const+0x24) [0x55701b08e6d4]
 6: (rocksdb::LegacyRandomAccessFileWrapper::Read(unsigned long, 
unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, 
rocksdb::IODebugContext*) const+0x26) [0x55701b529396]
 7: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned 
long, rocksdb::Slice*, char*, bool) const+0xdc7) [0x55701b745267]

 8: (rocksdb::BlockFetcher::ReadBlockContents()+0x4b5) [0x55701b69fa45]
 9: (rocksdb::Status 
rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, 
rocksdb::UncompressionDict const&, 
rocksdb::CachableEntry*, 
rocksdb::BlockType, rocksdb::GetContext*, 
rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*) const+0x919) 
[0x55701b691ba9]
 10: (rocksdb::Status 
rocksdb::BlockBasedTable::RetrieveBlock(rocksdb::FilePrefetchBuffer*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, 
rocksdb::UncompressionDict const&, 
rocksdb::CachableEntry*, 
rocksdb::BlockType, rocksdb::GetContext*, 
rocksdb::BlockCacheLookupContext*, bool, bool) const+0x286) [0x55701b691f86]
 11: 
(rocksdb::FilterBlockReaderCommon::ReadFilterBlock(rocksdb::BlockBasedTable 
const*, rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, bool, 
rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, 
rocksdb::CachableEntry*)+0xf1) 
[0x55701b760891]
 12: 
(rocksdb::FilterBlockReaderCommon::GetOrReadFilterBlock(bool, 
rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, 
rocksdb::CachableEntry*) const+0xfe) 
[0x55701b760b5e]
 13: (rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice const&, 
bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*) 
const+0x43) [0x55701b69aae3]
 14: 
(rocksdb::BlockBasedTable::FullFilterKeyMayMatch(rocksdb::ReadOptions 
const&, rocksdb::FilterBlockReader*, rocksdb::Slice const&, bool, 
rocksdb::SliceTransform const*, rocksdb::GetContext*, 
rocksdb::BlockCacheLookupContext*) const+0x9e) [0x55701b68118e]
 15: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, 
rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform 
const*, bool)+0x180) [0x55701b6829a0]
 16: (rocksdb::TableCa

[ceph-users] Re: ceph mds dump tree - root inode is not in cache

[ceph-users] Re: Cephadm old spec Feature `crush_device_class` is not supported

[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

[ceph-users] Re: OSDs crashing/flapping

[ceph-users] Re: OSDs crashing/flapping

[ceph-users] Re: Adding new drives to ceph with ssd DB+WAL

[ceph-users] OSDs crashing/flapping

7 matches

Site Navigation

Mail list logo

Footer information