[ceph-users] Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint

2021-04-21 Thread Szabo, Istvan (Agoda)
Hi Ceph Users,
Here is the latest request I tried but still not working

curl -v -H 'Date: Tue, 20 Apr 2021 16:05:47 +' -H 'Authorization: AWS 
:' -L -H 'content-type: application/x-www-form-urlencoded' 
-k -X POST https://servername -d 
Action=CreateTopic&Name=test-ceph-event-replication&Attributes.entry.8.key=push-endpoint&Attributes.entry.8.value=kafka://:@servername2:9093&Attributes.entry.5.key=use-ssl&Attributes.entry.5.value=true

And the response I get is still Invalid Input
InvalidInputtx007993081-00607efbdd-1c7e96b-hkg1c7e96b-hkg-data
Can someone please help with this?
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch upgrade fails when pulling container image

2021-04-21 Thread Robert Sander
Hi,

# docker pull ceph/ceph:v16.2.1
Error response from daemon: toomanyrequests: You have reached your pull
rate limit. You may increase the limit by authenticating and upgrading:
https://www.docker.com/increase-rate-limit

How do I update a Ceph cluster in this situation?

Regards
-- 
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade fails when pulling container image

2021-04-21 Thread Robert Sander
Hi,

Am 21.04.21 um 10:14 schrieb Robert Sander:

> How do I update a Ceph cluster in this situation?

I learned that I need to create an account on the website hub.docker.com
to be able to download Ceph container images in the future.

With the credentials I need to run "docker login" on each node of the
Ceph cluster.

Shouldn't this be mentioned in the documentation?

Regards
-- 
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade fails when pulling container image

2021-04-21 Thread Julian Fölsch

Hello,

We have circumvented the need to create an account by using Sonatype 
Nexus to proxy Docker Hub.

This also allowed us to keep our Ceph hosts disconnected from the internet.

Kind regards,
Julian Fölsch

Am 21.04.21 um 10:35 schrieb Robert Sander:

Hi,

Am 21.04.21 um 10:14 schrieb Robert Sander:


How do I update a Ceph cluster in this situation?

I learned that I need to create an account on the website hub.docker.com
to be able to download Ceph container images in the future.

With the credentials I need to run "docker login" on each node of the
Ceph cluster.

Shouldn't this be mentioned in the documentation?

Regards

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io


--
Julian Fölsch
   
   Arbeitsgemeinschaft Dresdner Studentennetz (AG DSN)


   Telefon: +49 351 271816 69
   Mobil: +49 152 22915871
   Fax: +49 351 46469685
   Email:julian.foel...@agdsn.de
   Web:https://agdsn.de
   
   Studierendenrat der TU Dresden

   Helmholtzstr. 10
   01069 Dresden
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBA vs caching Raid controller

2021-04-21 Thread Marc
> > This is what I have when I query prometheus, most hdd's are still sata
> 5400rpm, there are also some ssd's. I also did not optimize cpu
> frequency settings. (forget about the instance=c03, that is just because
> the data comes from mgr c03, these drives are on different hosts)
> >
> > ceph_osd_apply_latency_ms
> >
> > ceph_osd_apply_latency_ms{ceph_daemon="osd.12", instance="c03",
> job="ceph"}   42
> > ...
> > ceph_osd_apply_latency_ms{ceph_daemon="osd.19", instance="c03",
> job="ceph"}   1
> 
> I assume this looks somewhat normal, with a bit of variance due to
> access.
> 
> > avg (ceph_osd_apply_latency_ms)
> > 9.336
> 
> I see something similar, around 9ms average latency for HDD based osds,
> best case average around 3ms.
> 
> > So I guess it is possible for you to get lower values on the lsi hba
> 
> Can you let me know which exact model you have?

[~]# sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

 Adapter Selected is a LSI SAS: SAS2308_2(D1)

 Controller Number  : 0
 Controller : SAS2308_2(D1)
 PCI Address: 00:04:00:00
 SAS Address: 500605b-0-05a6-c49e
 NVDATA Version (Default)   : 14.01.00.06
 NVDATA Version (Persistent): 14.01.00.06
 Firmware Product ID: 0x2214 (IT)
 Firmware Version   : 20.00.07.00
 NVDATA Vendor  : LSI
 NVDATA Product ID  : SAS9207-8i
 BIOS Version   : 07.39.02.00
 UEFI BSD Version   : N/A
 FCODE Version  : N/A
 Board Name : SAS9207-8i
 Board Assembly : N/A
 Board Tracer Number: N/A

> 
> > Maybe you can tune read a head on the lsi with something like this.
> > echo 8192 > /sys/block/$line/queue/read_ahead_kb
> > echo 1024 > /sys/block/$line/queue/nr_requests
> 
> I tried both of them, even going up to 16MB read ahead cache, but
> besides a short burst when changing the values, the average stays +/-
> the same on that host.
> 
> I also checked cpu speed (same as the rest), io scheduler (using "none"
> really drives the disks crazy). What I observed is that the avq value in
> atop is lower than on the other servers, which are around 15. This
> server is more in the range 1-3.
> 
> > Also check for pci-e 3 those have higher bus speeds.
> 
> True, even though pci-e 2, x8 should be able to deliver 4 GB/s, if I am
> not mistaken.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Dan van der Ster
Do we have a tracker for this?

We should ideally be able to remove that final collection_list from
the optimized pg removal.
It can take a really long time and lead to osd flapping:

2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:41.595 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _collection_list, latency = 67.7234s, lat = 67s cid
=10.14aes4_head start GHMAX end GHMAX max 30
2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
(bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
[171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[4#10:7528head#]
2021-04-21 15:23:50.061 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
observed for submit_transact, latency = 8.46584s
2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:50.115 7f51b6ca1700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _txc_committed_kv, latency = 8.51916s, txc =
0x5573928a7340
2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
: Monitor daemon marked osd.941 down, but it is still running

-- dan

On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster  wrote:
>
> Thanks Igor and Neha for the quick responses.
>
> I posted an osd log with debug_osd 10 and debug_bluestore 20:
> ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
>
> Hope that helps,
>
> Dan
>
> On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha  wrote:
> >
> > We saw this warning once in testing
> > (https://tracker.ceph.com/issues/49900#note-1), but there, the problem
> > was different, which also led to a crash. That issue has been fixed
> > but if you can provide osd logs with verbose logging, we might be able
> > to investigate further.
> >
> > Neha
> >
> > On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov  wrote:
> > >
> > > Hi Dan,
> > >
> > > Seen that once before and haven't thoroughly investigated yet but I
> > > think the new PG removal stuff just revealed this "issue". In fact it
> > > had been in the code before the patch.
> > >
> > > The warning means that new object(s) (given the object names these are
> > > apparently system objects, don't remember what's this exactly)  has been
> > > written to a PG after it was staged for removal.
> > >
> > > New PG removal properly handles that case - that was just a paranoid
> > > check for an unexpected situation which has actually triggered. Hence
> > > IMO no need to worry at this point but developers might want to validate
> > > why this is happening
> > >
> > >
> > > Thanks,
> > >
> > > Igor
> > >
> > > On 4/14/2021 10:26 PM, Dan van der Ster wrote:
> > > > Hi Igor,
> > > >
> > > > After updating to 14.2.19 and then moving some PGs around we have a
> > > > few warnings related to the new efficient PG removal code, e.g. [1].
> > > > Is that something to worry about?
> > > >
> > > > Best Regards,
> > > >
> > > > Dan
> > > >
> > > > [1]
> > > >
> > > > /var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
> > > > osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
> > > > (33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
> > > > n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
> > > > 40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
> > > > DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
> > > > _delete_some additional unexpected onode list (new onodes has appeared
> > > > since PG removal started[0#10:4d28head#]
> > > >
> > > > /var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
> > > > osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
> > > > (33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
> > > > n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
> > > > 40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
> > > > DELETING pi=[33714,40670)/4 crt=40662'288967 unknown NOTIFY mbc={}]
> > > > _delete_some additional unexpected onode list (new onodes has appeared
> > > > since PG removal started[5#10:fcc8head#]
> > > >
> > > > /var/log/ceph/ceph-osd.851.log:2021-04-14 20:58:14.393 7fd87adeb700  0
> > > > osd.851 pg_epoch: 40906 pg[10.2e8s3( v 40610'288991
> > > > (33782'285900,40610'288991] lb MIN (bitwise) local-lis/les=33786/33787
> > > > n=161220 ec=4937/4937 lis/c 39826/33716 les/c/f 39827/33747/0
> > > > 40617/40617/39225) [717,933,727,792,607,129]p717(0) r=-1 lpr=40617
> > > > DELETING pi=[33716,40617)/3 crt=40610'288991 unknown NOTIFY mbc=

[ceph-users] Metrics for object sizes

2021-04-21 Thread Szabo, Istvan (Agoda)
Hi,

Is there any clusterwise metric regarding object sizes?

I'd like to collect some information about the users what is the object sizes 
in their buckets.




This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Konstantin Shalygin
Dan, you about this issue [1] ?

I was start to backfill new OSD's on 14.2.19: pool have 2048PG with 7.14G 
objects...
avg number of PG is 3486328 objects



[1] https://tracker.ceph.com/issues/47044 


k


> On 21 Apr 2021, at 16:37, Dan van der Ster  wrote:
> 
> Do we have a tracker for this?
> 
> We should ideally be able to remove that final collection_list from
> the optimized pg removal.
> It can take a really long time and lead to osd flapping:
> 
> 2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> 2021-04-21 15:23:41.595 7f51a3e81700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> observed for _collection_list, latency = 67.7234s, lat = 67s cid
> =10.14aes4_head start GHMAX end GHMAX max 30
> 2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
> pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
> (bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
> 41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
> [171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
> pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
> _delete_some additional unexpected onode list (new onodes has appeared
> since PG removal started[4#10:7528head#]
> 2021-04-21 15:23:50.061 7f51a3e81700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
> observed for submit_transact, latency = 8.46584s
> 2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> 2021-04-21 15:23:50.115 7f51b6ca1700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> observed for _txc_committed_kv, latency = 8.51916s, txc =
> 0x5573928a7340
> 2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
> : Monitor daemon marked osd.941 down, but it is still running
> 
> -- dan
> 
> On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster  wrote:
>> 
>> Thanks Igor and Neha for the quick responses.
>> 
>> I posted an osd log with debug_osd 10 and debug_bluestore 20:
>> ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
>> 
>> Hope that helps,
>> 
>> Dan
>> 
>> On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha  wrote:
>>> 
>>> We saw this warning once in testing
>>> (https://tracker.ceph.com/issues/49900#note-1), but there, the problem
>>> was different, which also led to a crash. That issue has been fixed
>>> but if you can provide osd logs with verbose logging, we might be able
>>> to investigate further.
>>> 
>>> Neha
>>> 
>>> On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov  wrote:
 
 Hi Dan,
 
 Seen that once before and haven't thoroughly investigated yet but I
 think the new PG removal stuff just revealed this "issue". In fact it
 had been in the code before the patch.
 
 The warning means that new object(s) (given the object names these are
 apparently system objects, don't remember what's this exactly)  has been
 written to a PG after it was staged for removal.
 
 New PG removal properly handles that case - that was just a paranoid
 check for an unexpected situation which has actually triggered. Hence
 IMO no need to worry at this point but developers might want to validate
 why this is happening
 
 
 Thanks,
 
 Igor
 
 On 4/14/2021 10:26 PM, Dan van der Ster wrote:
> Hi Igor,
> 
> After updating to 14.2.19 and then moving some PGs around we have a
> few warnings related to the new efficient PG removal code, e.g. [1].
> Is that something to worry about?
> 
> Best Regards,
> 
> Dan
> 
> [1]
> 
> /var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
> osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
> (33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
> n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
> 40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
> DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
> _delete_some additional unexpected onode list (new onodes has appeared
> since PG removal started[0#10:4d28head#]
> 
> /var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
> osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
> (33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
> n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
> 40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
> DELETING pi=[33714,40670)/4 crt=40662'288967 unknown NOTIFY mbc={}]
> _delete_some additional unexpected onode list (new onodes has appeared
> since PG removal started[5#10:fcc8head#]
> 
> /var/log/ceph/ceph-osd.851.log:2021-04-14 20:58:14.393 7fd87adeb700  0
> osd.851 pg_epoch: 40906 pg[1

[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Dan van der Ster
Yes, with the fixes in 14.2.19 PG removal is really much much much
better than before.

But on some clusters (in particular with rocksdb on the hdd) there is
still a rare osd flap at the end of the PG removal -- indicated by the
logs I shared earlier.
Our workaround to prevent that new flap is to increase
osd_heartbeat_grace (e.g. to 45).

With 3.5M objects in a PG, I suggest that you try moving one PG with
upmap and watch how it goes (especially at the end).

-- dan

On Wed, Apr 21, 2021 at 4:01 PM Konstantin Shalygin  wrote:
>
> Dan, you about this issue [1] ?
>
> I was start to backfill new OSD's on 14.2.19: pool have 2048PG with 7.14G 
> objects...
> avg number of PG is 3486328 objects
>
>
>
> [1] https://tracker.ceph.com/issues/47044
>
> k
>
>
> On 21 Apr 2021, at 16:37, Dan van der Ster  wrote:
>
> Do we have a tracker for this?
>
> We should ideally be able to remove that final collection_list from
> the optimized pg removal.
> It can take a really long time and lead to osd flapping:
>
> 2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> 2021-04-21 15:23:41.595 7f51a3e81700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> observed for _collection_list, latency = 67.7234s, lat = 67s cid
> =10.14aes4_head start GHMAX end GHMAX max 30
> 2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
> pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
> (bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
> 41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
> [171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
> pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
> _delete_some additional unexpected onode list (new onodes has appeared
> since PG removal started[4#10:7528head#]
> 2021-04-21 15:23:50.061 7f51a3e81700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
> observed for submit_transact, latency = 8.46584s
> 2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> 2021-04-21 15:23:50.115 7f51b6ca1700  0
> bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> observed for _txc_committed_kv, latency = 8.51916s, txc =
> 0x5573928a7340
> 2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
> : Monitor daemon marked osd.941 down, but it is still running
>
> -- dan
>
> On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster  wrote:
>
>
> Thanks Igor and Neha for the quick responses.
>
> I posted an osd log with debug_osd 10 and debug_bluestore 20:
> ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
>
> Hope that helps,
>
> Dan
>
> On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha  wrote:
>
>
> We saw this warning once in testing
> (https://tracker.ceph.com/issues/49900#note-1), but there, the problem
> was different, which also led to a crash. That issue has been fixed
> but if you can provide osd logs with verbose logging, we might be able
> to investigate further.
>
> Neha
>
> On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov  wrote:
>
>
> Hi Dan,
>
> Seen that once before and haven't thoroughly investigated yet but I
> think the new PG removal stuff just revealed this "issue". In fact it
> had been in the code before the patch.
>
> The warning means that new object(s) (given the object names these are
> apparently system objects, don't remember what's this exactly)  has been
> written to a PG after it was staged for removal.
>
> New PG removal properly handles that case - that was just a paranoid
> check for an unexpected situation which has actually triggered. Hence
> IMO no need to worry at this point but developers might want to validate
> why this is happening
>
>
> Thanks,
>
> Igor
>
> On 4/14/2021 10:26 PM, Dan van der Ster wrote:
>
> Hi Igor,
>
> After updating to 14.2.19 and then moving some PGs around we have a
> few warnings related to the new efficient PG removal code, e.g. [1].
> Is that something to worry about?
>
> Best Regards,
>
> Dan
>
> [1]
>
> /var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
> osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
> (33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
> n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
> 40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
> DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
> _delete_some additional unexpected onode list (new onodes has appeared
> since PG removal started[0#10:4d28head#]
>
> /var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
> osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
> (33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
> n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
> 40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
> DELETING pi=[33714,40670)/4 c

[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Igor Fedotov

Hi Dan,

I recall no relevant tracker, feel free to create.

Curious if you had bluefs_buffered_io set to true when faced that?


Thanks,

Igor

On 4/21/2021 4:37 PM, Dan van der Ster wrote:

Do we have a tracker for this?

We should ideally be able to remove that final collection_list from
the optimized pg removal.
It can take a really long time and lead to osd flapping:

2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:41.595 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _collection_list, latency = 67.7234s, lat = 67s cid
=10.14aes4_head start GHMAX end GHMAX max 30
2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
(bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
[171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[4#10:7528head#]
2021-04-21 15:23:50.061 7f51a3e81700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
observed for submit_transact, latency = 8.46584s
2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
2021-04-21 15:23:50.115 7f51b6ca1700  0
bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
observed for _txc_committed_kv, latency = 8.51916s, txc =
0x5573928a7340
2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
: Monitor daemon marked osd.941 down, but it is still running

-- dan

On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster  wrote:

Thanks Igor and Neha for the quick responses.

I posted an osd log with debug_osd 10 and debug_bluestore 20:
ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c

Hope that helps,

Dan

On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha  wrote:

We saw this warning once in testing
(https://tracker.ceph.com/issues/49900#note-1), but there, the problem
was different, which also led to a crash. That issue has been fixed
but if you can provide osd logs with verbose logging, we might be able
to investigate further.

Neha

On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov  wrote:

Hi Dan,

Seen that once before and haven't thoroughly investigated yet but I
think the new PG removal stuff just revealed this "issue". In fact it
had been in the code before the patch.

The warning means that new object(s) (given the object names these are
apparently system objects, don't remember what's this exactly)  has been
written to a PG after it was staged for removal.

New PG removal properly handles that case - that was just a paranoid
check for an unexpected situation which has actually triggered. Hence
IMO no need to worry at this point but developers might want to validate
why this is happening


Thanks,

Igor

On 4/14/2021 10:26 PM, Dan van der Ster wrote:

Hi Igor,

After updating to 14.2.19 and then moving some PGs around we have a
few warnings related to the new efficient PG removal code, e.g. [1].
Is that something to worry about?

Best Regards,

Dan

[1]

/var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
(33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[0#10:4d28head#]

/var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
(33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
DELETING pi=[33714,40670)/4 crt=40662'288967 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[5#10:fcc8head#]

/var/log/ceph/ceph-osd.851.log:2021-04-14 20:58:14.393 7fd87adeb700  0
osd.851 pg_epoch: 40906 pg[10.2e8s3( v 40610'288991
(33782'285900,40610'288991] lb MIN (bitwise) local-lis/les=33786/33787
n=161220 ec=4937/4937 lis/c 39826/33716 les/c/f 39827/33747/0
40617/40617/39225) [717,933,727,792,607,129]p717(0) r=-1 lpr=40617
DELETING pi=[33716,40617)/3 crt=40610'288991 unknown NOTIFY mbc={}]
_delete_some additional unexpected onode list (new onodes has appeared
since PG removal started[3#10:1740head#]

/var/log/ceph/ceph-osd.883.log:2021-04-14 18:55:16.822 7f78c485d700  0
osd.883 pg_epoch: 40857 pg[7.d4( v 40804'9911289
(35835'9908201,40804

[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Konstantin Shalygin
Nope, umpap is currently impossible on this clusters 😬
due client lib (guys works on update now).

ID   CLASS WEIGHT   REWEIGHT SIZERAW USE DATAOMAP   METAAVAIL   
%USE VAR  PGS STATUS TYPE NAME
-166   10.94385-  11 TiB 382 GiB 317 GiB 64 KiB  66 GiB  11 TiB 
3.42 1.00   -host meta115
 768  nvme  0.91199  1.0 932 GiB  36 GiB  30 GiB  8 KiB 6.0 GiB 896 GiB 
3.85 1.13   1 up osd.768
 769  nvme  0.91199  1.0 932 GiB  22 GiB  18 GiB  4 KiB 4.0 GiB 909 GiB 
2.41 0.71   0 up osd.769
 770  nvme  0.91199  1.0 932 GiB  38 GiB  31 GiB  8 KiB 6.3 GiB 894 GiB 
4.04 1.18   2 up osd.770
 771  nvme  0.91199  1.0 932 GiB  22 GiB  18 GiB0 B 3.9 GiB 910 GiB 
2.33 0.68   0 up osd.771
 772  nvme  0.91199  1.0 932 GiB  37 GiB  30 GiB  4 KiB 6.1 GiB 895 GiB 
3.93 1.15   2 up osd.772
 773  nvme  0.91199  1.0 932 GiB  34 GiB  28 GiB  4 KiB 6.0 GiB 898 GiB 
3.65 1.07   1 up osd.773
 774  nvme  0.91199  1.0 932 GiB  32 GiB  26 GiB  8 KiB 5.4 GiB 900 GiB 
3.43 1.00   1 up osd.774
 775  nvme  0.91199  1.0 932 GiB  36 GiB  30 GiB  4 KiB 6.1 GiB 895 GiB 
3.91 1.14   2 up osd.775
 776  nvme  0.91199  1.0 932 GiB  36 GiB  30 GiB  4 KiB 6.4 GiB 895 GiB 
3.90 1.14   1 up osd.776
 777  nvme  0.91199  1.0 932 GiB  36 GiB  30 GiB  8 KiB 6.1 GiB 895 GiB 
3.89 1.14   2 up osd.777
 778  nvme  0.91199  1.0 932 GiB  32 GiB  27 GiB  8 KiB 5.5 GiB 899 GiB 
3.48 1.02   1 up osd.778
 779  nvme  0.91199  1.0 932 GiB  21 GiB  17 GiB  4 KiB 3.7 GiB 911 GiB 
2.23 0.65   0 up osd.779
   TOTAL  11 TiB 382 GiB 317 GiB 65 KiB  66 GiB  11 TiB 3.42
MIN/MAX VAR: 0.65/1.18  STDDEV: 0.66

Second PG landed... don't see any huge spikes on ceph_osd_op_latency metric.


k


> On 21 Apr 2021, at 17:12, Dan van der Ster  wrote:
> 
> Yes, with the fixes in 14.2.19 PG removal is really much much much
> better than before.
> 
> But on some clusters (in particular with rocksdb on the hdd) there is
> still a rare osd flap at the end of the PG removal -- indicated by the
> logs I shared earlier.
> Our workaround to prevent that new flap is to increase
> osd_heartbeat_grace (e.g. to 45).
> 
> With 3.5M objects in a PG, I suggest that you try moving one PG with
> upmap and watch how it goes (especially at the end).

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Konstantin Shalygin
Just for the record - I enable this for all OSD's on this clusters


k

> On 21 Apr 2021, at 17:22, Igor Fedotov  wrote:
> 
> Curious if you had bluefs_buffered_io set to true when faced that?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS replay takes forever and cephfs is down

2021-04-21 Thread Flemming Frandsen
I tried restarting an MDS server using: systemctl restart
ceph-mds@ardmore.service

This caused the standby server to enter replay state and the fs started
hanging for several minutes.

In a slight panic I restarted the other mds server, which was replaced by
the standby server and it almost immediately entered resolve state.

fs dump shows a seq number counting upwards very slowly for the replay'ing
MDS server, I have no idea how far it needs to count:

# ceph fs dump

dumped fsmap epoch 1030314
e1030314
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in sep
arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch   1030314
flags   12
created 2019-09-09 13:08:26.830927
modified2021-04-21 14:04:14.672440
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
min_compat_client   -1 (unspecified)
last_failure0
last_failure_osd_epoch  13610
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in sep
arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
max_mds 2
in  0,1
up  {0=10398946,1=10404857}
failed
damaged
stopped
data_pools  [1]
metadata_pool   2
inline_data disabled
balancer
standby_count_wanted1
[mds.dalmore{0:10398946} state up:replay seq 215 addr [v2:
10.0.37.222:6800/2681188441,v1:10.0.37.222:6801/2681188441]]
[mds.cragganmore{1:10404857} state up:resolve seq 201 addr [v2:
10.0.37.221:6800/871249119,v1:10.0.37.221:6801/871249119]]


Standby daemons:

[mds.ardmore{-1:10408652} state up:standby seq 2 addr [v2:
10.0.37.223:6800/4096598841,v1:10.0.37.223:6801/4096598841]]


Earlier today we added a new OSD host with 12 new OSDs and backfilling is
proceeding as expected:

 cluster:
   id: e2007417-a346-4af7-8aa9-4ce8f0d73661
   health: HEALTH_WARN
   1 filesystem is degraded
   1 MDSs behind on trimming

 services:
   mon: 3 daemons, quorum cragganmore,dalmore,ardmore (age 5w)
   mgr: ardmore(active, since 2w), standbys: dalmore, cragganmore
   mds: cephfs:2/2 {0=dalmore=up:replay,1=cragganmore=up:resolve} 1
up:standby
   osd: 69 osds: 69 up (since 102m), 69 in (since 102m); 443 remapped pgs

   rgw: 9 daemons active (ardmore.rgw0, ardmore.rgw1, ardmore.rgw2,
cragganmore.rgw0, cragganmore.rgw1, cragganmore.rgw2, dalmore.rgw0,
dalmore.rgw1, dalmore.rgw2)

 task status:
   scrub status:
   mds.cragganmore: idle
   mds.dalmore: idle

 data:
   pools:   13 pools, 1440 pgs
   objects: 50.57M objects, 9.0 TiB
   usage:   34 TiB used, 37 TiB / 71 TiB avail
   pgs: 30195420/151707033 objects misplaced (19.904%)
997 active+clean
431 active+remapped+backfill_wait
12  active+remapped+backfilling

 io:
   client:   65 MiB/s rd, 206 KiB/s wr, 17 op/s rd, 8 op/s wr
   recovery: 5.5 MiB/s, 23 objects/s

 progress:
   Rebalancing after osd.62 marked in
 [==]
   Rebalancing after osd.67 marked in
 [===...]
   Rebalancing after osd.68 marked in
 [..]
   Rebalancing after osd.64 marked in
 [=.]
   Rebalancing after osd.60 marked in
 [..]
   Rebalancing after osd.66 marked in
 [=.]
   Rebalancing after osd.63 marked in
 [=.]
   Rebalancing after osd.61 marked in
 [==]
   Rebalancing after osd.59 marked in
 [==]
   Rebalancing after osd.58 marked in
 [..]
   Rebalancing after osd.57 marked in
 [===...]
   Rebalancing after osd.65 marked in
 [==]


It seems we're running a mix of versions:

ceph versions
{
   "mon": {
   "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 3
   },
   "mgr": {
   "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
nautilus (stable)": 3
   },
   "osd": {
   "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 57,
   "ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0)
nautilus (stable)": 12
   },
   "mds": {
   "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
nautilus (stable)": 3
   },
   "rgw": {
   "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 9
   },
   "overall": {
   "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 69,
   "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373

[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Dan van der Ster
Here's a tracker: https://tracker.ceph.com/issues/50466

bluefs_buffered_io is indeed enabled on this cluster, but I suspect it
doesn't help for this precise issue because the collection isn't
repeated fully listed any more.

-- dan

On Wed, Apr 21, 2021 at 4:22 PM Igor Fedotov  wrote:
>
> Hi Dan,
>
> I recall no relevant tracker, feel free to create.
>
> Curious if you had bluefs_buffered_io set to true when faced that?
>
>
> Thanks,
>
> Igor
>
> On 4/21/2021 4:37 PM, Dan van der Ster wrote:
> > Do we have a tracker for this?
> >
> > We should ideally be able to remove that final collection_list from
> > the optimized pg removal.
> > It can take a really long time and lead to osd flapping:
> >
> > 2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> > 2021-04-21 15:23:41.595 7f51a3e81700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> > observed for _collection_list, latency = 67.7234s, lat = 67s cid
> > =10.14aes4_head start GHMAX end GHMAX max 30
> > 2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
> > pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
> > (bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
> > 41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
> > [171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
> > pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
> > _delete_some additional unexpected onode list (new onodes has appeared
> > since PG removal started[4#10:7528head#]
> > 2021-04-21 15:23:50.061 7f51a3e81700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
> > observed for submit_transact, latency = 8.46584s
> > 2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
> > 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> > 2021-04-21 15:23:50.115 7f51b6ca1700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> > observed for _txc_committed_kv, latency = 8.51916s, txc =
> > 0x5573928a7340
> > 2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
> > : Monitor daemon marked osd.941 down, but it is still running
> >
> > -- dan
> >
> > On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster  
> > wrote:
> >> Thanks Igor and Neha for the quick responses.
> >>
> >> I posted an osd log with debug_osd 10 and debug_bluestore 20:
> >> ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
> >>
> >> Hope that helps,
> >>
> >> Dan
> >>
> >> On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha  wrote:
> >>> We saw this warning once in testing
> >>> (https://tracker.ceph.com/issues/49900#note-1), but there, the problem
> >>> was different, which also led to a crash. That issue has been fixed
> >>> but if you can provide osd logs with verbose logging, we might be able
> >>> to investigate further.
> >>>
> >>> Neha
> >>>
> >>> On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov  wrote:
>  Hi Dan,
> 
>  Seen that once before and haven't thoroughly investigated yet but I
>  think the new PG removal stuff just revealed this "issue". In fact it
>  had been in the code before the patch.
> 
>  The warning means that new object(s) (given the object names these are
>  apparently system objects, don't remember what's this exactly)  has been
>  written to a PG after it was staged for removal.
> 
>  New PG removal properly handles that case - that was just a paranoid
>  check for an unexpected situation which has actually triggered. Hence
>  IMO no need to worry at this point but developers might want to validate
>  why this is happening
> 
> 
>  Thanks,
> 
>  Igor
> 
>  On 4/14/2021 10:26 PM, Dan van der Ster wrote:
> > Hi Igor,
> >
> > After updating to 14.2.19 and then moving some PGs around we have a
> > few warnings related to the new efficient PG removal code, e.g. [1].
> > Is that something to worry about?
> >
> > Best Regards,
> >
> > Dan
> >
> > [1]
> >
> > /var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
> > osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
> > (33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
> > n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
> > 40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
> > DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
> > _delete_some additional unexpected onode list (new onodes has appeared
> > since PG removal started[0#10:4d28head#]
> >
> > /var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
> > osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
> > (33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
> > n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
> > 40670/40670/33432) [859,1

[ceph-users] Re: MDS replay takes forever and cephfs is down

2021-04-21 Thread Patrick Donnelly
On Wed, Apr 21, 2021 at 7:39 AM Flemming Frandsen  wrote:
>
> I tried restarting an MDS server using: systemctl restart
> ceph-mds@ardmore.service
>
> This caused the standby server to enter replay state and the fs started
> hanging for several minutes.
>
> In a slight panic I restarted the other mds server, which was replaced by
> the standby server and it almost immediately entered resolve state.

While restarting a service/machine is a reasonable practice for a
laptop, please resist the urge to do this in a distributed system. You
may multiply your problems.

> fs dump shows a seq number counting upwards very slowly for the replay'ing
> MDS server, I have no idea how far it needs to count:

This is a normal heartbeat sequence number. Nothing to be concerned about.

> # ceph fs dump
>
> dumped fsmap epoch 1030314
> e1030314
> enable_multiple, ever_enabled_multiple: 0,0
> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in sep
> arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> legacy client fscid: 1
>
> Filesystem 'cephfs' (1)
> fs_name cephfs
> epoch   1030314
> flags   12
> created 2019-09-09 13:08:26.830927
> modified2021-04-21 14:04:14.672440
> tableserver 0
> root0
> session_timeout 60
> session_autoclose   300
> max_file_size   1099511627776
> min_compat_client   -1 (unspecified)
> last_failure0
> last_failure_osd_epoch  13610
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in sep
> arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> max_mds 2
> in  0,1
> up  {0=10398946,1=10404857}
> failed
> damaged
> stopped
> data_pools  [1]
> metadata_pool   2
> inline_data disabled
> balancer
> standby_count_wanted1
> [mds.dalmore{0:10398946} state up:replay seq 215 addr [v2:
> 10.0.37.222:6800/2681188441,v1:10.0.37.222:6801/2681188441]]
> [mds.cragganmore{1:10404857} state up:resolve seq 201 addr [v2:
> 10.0.37.221:6800/871249119,v1:10.0.37.221:6801/871249119]]
>
>
> Standby daemons:
>
> [mds.ardmore{-1:10408652} state up:standby seq 2 addr [v2:
> 10.0.37.223:6800/4096598841,v1:10.0.37.223:6801/4096598841]]
>
>
> Earlier today we added a new OSD host with 12 new OSDs and backfilling is
> proceeding as expected:
>
>  cluster:
>id: e2007417-a346-4af7-8aa9-4ce8f0d73661
>health: HEALTH_WARN
>1 filesystem is degraded
>1 MDSs behind on trimming
>
>  services:
>mon: 3 daemons, quorum cragganmore,dalmore,ardmore (age 5w)
>mgr: ardmore(active, since 2w), standbys: dalmore, cragganmore
>mds: cephfs:2/2 {0=dalmore=up:replay,1=cragganmore=up:resolve} 1
> up:standby
>osd: 69 osds: 69 up (since 102m), 69 in (since 102m); 443 remapped pgs
>
>rgw: 9 daemons active (ardmore.rgw0, ardmore.rgw1, ardmore.rgw2,
> cragganmore.rgw0, cragganmore.rgw1, cragganmore.rgw2, dalmore.rgw0,
> dalmore.rgw1, dalmore.rgw2)
>
>  task status:
>scrub status:
>mds.cragganmore: idle
>mds.dalmore: idle
>
>  data:
>pools:   13 pools, 1440 pgs
>objects: 50.57M objects, 9.0 TiB
>usage:   34 TiB used, 37 TiB / 71 TiB avail
>pgs: 30195420/151707033 objects misplaced (19.904%)
> 997 active+clean
> 431 active+remapped+backfill_wait
> 12  active+remapped+backfilling
>
>  io:
>client:   65 MiB/s rd, 206 KiB/s wr, 17 op/s rd, 8 op/s wr
>recovery: 5.5 MiB/s, 23 objects/s
>
>  progress:
>Rebalancing after osd.62 marked in
>  [==]
>Rebalancing after osd.67 marked in
>  [===...]
>Rebalancing after osd.68 marked in
>  [..]
>Rebalancing after osd.64 marked in
>  [=.]
>Rebalancing after osd.60 marked in
>  [..]
>Rebalancing after osd.66 marked in
>  [=.]
>Rebalancing after osd.63 marked in
>  [=.]
>Rebalancing after osd.61 marked in
>  [==]
>Rebalancing after osd.59 marked in
>  [==]
>Rebalancing after osd.58 marked in
>  [..]
>Rebalancing after osd.57 marked in
>  [===...]
>Rebalancing after osd.65 marked in
>  [==]
>
>
> It seems we're running a mix of versions:
>
> ceph versions
> {
>"mon": {
>"ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
> nautilus (stable)": 3
>},
>"mgr": {
>"ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
> nautilus (stable)": 3
>},
>"osd": {
>"ceph version 14.2.18 

[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Konstantin Shalygin
This is hdd or hybrids OSD's? How much obj per PG avg?


k

Sent from my iPhone

> On 21 Apr 2021, at 17:44, Dan van der Ster  wrote:
> 
> Here's a tracker: https://tracker.ceph.com/issues/50466
> 
> bluefs_buffered_io is indeed enabled on this cluster, but I suspect it
> doesn't help for this precise issue because the collection isn't
> repeated fully listed any more.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: _delete_some new onodes has appeared since PG removal started

2021-04-21 Thread Dan van der Ster
hdd only. ~160k objects per PG.

The flapping is pretty rare -- we've moved hundreds of PGs today and
only one flap. (this is with osd_heartbeat_grace =45. with the default
20s we had one flap per ~hour)

-- dan

On Wed, Apr 21, 2021 at 5:20 PM Konstantin Shalygin  wrote:
>
> This is hdd or hybrids OSD's? How much obj per PG avg?
>
>
> k
>
> Sent from my iPhone
>
> > On 21 Apr 2021, at 17:44, Dan van der Ster  wrote:
> >
> > Here's a tracker: https://tracker.ceph.com/issues/50466
> >
> > bluefs_buffered_io is indeed enabled on this cluster, but I suspect it
> > doesn't help for this precise issue because the collection isn't
> > repeated fully listed any more.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Swift Stat Timeout

2021-04-21 Thread Dylan Griff
Just to close the loop on this one in case someone reads this in the future. We 
were encountering this bug:

https://tracker.ceph.com/issues/44671

And updating to 14.2.20 solved it.

Cheers,
Dylan

On Thu, 2021-04-15 at 18:48 +, Dylan Griff wrote:
> 
> Just some more info on this, it started happening after they added several 
> thousand objects to their buckets. While the client side times out, the 
> operation seems to proceed in ceph for a very long
> time happily working away getting the stat info for their objects. It doesn't 
> appear to be failing, just taking an extremely long time. This doesn't seem 
> right to me, but can someone confirm that they
> can run an account level stat with swift on a user with several thousand 
> buckets/objects?
> 
> Any info would be helpful!
> 
> Cheers,
> Dylan
> 
> On Tue, 2021-04-13 at 21:50 +, Dylan Griff wrote:
> > Hey folks!
> > 
> > We have a user with ~1900 buckets in our RGW service and running this stat 
> > command results in a timeout for them:
> > 
> > swift -A https://:443/auth/1.0 -U  -K  stat
> > 
> > Running the same command, but specifiying one of their buckets, returns 
> > promptly. Running the command for a different user with minimal buckets 
> > returns promptly as well. Turning up debug logging to
> > 20
> > for rgw resulted in a great deal of logs showing:
> > 
> > 20 reading from default.rgw.meta:root:.bucket.meta.
> > 20 get_system_obj_state: rctx=0x559b32a6b570 
> > obj=default.rgw.meta:root:.bucket.meta. state=0x559b32c37e40 
> > s->prefetch_data=0
> > 10 cache get: name=default.rgw.meta+root+.bucket.meta. : hit 
> > (requested=0x16, cached=0x17)
> > 20 get_system_obj_state: s->obj_tag was set empty
> > 10 cache get: name=default.rgw.meta+root+.bucket.meta. : hit 
> > (requested=0x11, cached=0x17)
> > 
> > Which looks like to me it is iterating getting the state of all their 
> > stuff. My question: is ~1900 an unreasonable amount of buckets such that we 
> > should expect to see this full account 'stat'
> > command
> > timeout? Or should I be expecting it to return promptly still? Thanks!
> > 
> > Cheers,
> > Dylan
> > --
> > Dylan Griff
> > Senior System Administrator
> > CLE D063
> > RCS - Systems - University of Victoria
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint

2021-04-21 Thread Yuval Lifshitz
Hi Istvan,
Can you please share the relevant part for the radosgw log, indicating
which input was invalid?
The only way I managed to reproduce that error is by sending the request to
a non-HTTPS radosgw (which does not seem to be your case). In such a case
it replies with "InvalidInput" because we are trying to send user/password
in cleartext.
I used curl, similarly to what you did against a vstart cluster based off
of master: https://paste.sh/SQ_8IrB5#BxBYbh1kTh15n7OKvjB5wEOM

Yuval

On Wed, Apr 21, 2021 at 11:23 AM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Hi Ceph Users,
> Here is the latest request I tried but still not working
>
> curl -v -H 'Date: Tue, 20 Apr 2021 16:05:47 +' -H 'Authorization: AWS
> :' -L -H 'content-type:
> application/x-www-form-urlencoded' -k -X POST https://servername -d
> Action=CreateTopic&Name=test-ceph-event-replication&Attributes.entry.8.key=push-endpoint&Attributes.entry.8.value=kafka://:@servername2:9093&Attributes.entry.5.key=use-ssl&Attributes.entry.5.value=true
>
> And the response I get is still Invalid Input
>  encoding="UTF-8"?>InvalidInputtx007993081-00607efbdd-1c7e96b-hkg1c7e96b-hkg-data
> Can someone please help with this?
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS replay takes forever and cephfs is down

2021-04-21 Thread Flemming Frandsen
On Wed, 21 Apr 2021 at 16:57, Patrick Donnelly  wrote:

> It's probably that you have a very large journal (behind on trimming).
>

Hmm, yes, that might be, is that related to "MDSs behind on trimming"
warning?

According to the documentation that has to do with trimming the cache.

Is there any way I can check on the journal trimming?



> Did you make any configuration changes to the MDS?


No.



> You simply need to wait for the up:replay daemon to finish.
>

Yes, that's what worries me.

It seems patience was all that was needed, the replay finished at seq
1507, after about two hours of downtime.

I'm worried that restarting an MDS server takes the fs down for so
long, it makes upgrading it a bit hard.


-- 
Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Flemming Frandsen
I've just spent a couple of hours waiting for an MDS server to replay a
journal that it was behind on and it seems to be getting worse.

The system is not terribly busy, but there are 14 ops in flight that are
very old and do not seem to go away on their own.

Is there anything I can do to unwedge the mds server?


root@dalmore:~# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1
MDSs behind on trimming
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
   mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
blocked for 4107 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
   mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
MDS_TRIM 1 MDSs behind on trimming
   mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128,
num_segments: 3443


root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
{
   "ops": [
   {
   "description": "client_request(client.9801215:348401522 readdir
#0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
caller_gid=1000{})",
   "initiated_at": "2021-04-21 15:46:30.822084",
   "age": 4020.3439449050002,
   "duration": 4020.343998116,
   "type_data": {
   "flag_point": "cleaned up request",
   "reqid": "client.9801215:348401522",
   "op_type": "client_request",
   "client_info": {
   "client": "client.9801215",
   "tid": 348401522
   },
   "events": [
   {
   "time": "2021-04-21 15:46:30.822084",
   "event": "initiated"
   },
   {
   "time": "2021-04-21 15:46:30.822084",
   "event": "header_read"
   },
   {
   "time": "2021-04-21 15:46:30.822090",
   "event": "throttled"
   },
   {
   "time": "2021-04-21 15:46:30.822108",
   "event": "all_read"
   },
   {
   "time": "2021-04-21 15:46:30.966336",
   "event": "dispatched"
   },
   {
   "time": "2021-04-21 15:46:30.966419",
   "event": "acquired locks"
   },
   {
   "time": "2021-04-21 16:01:40.084378",
   "event": "killing request"
   },
   {
   "time": "2021-04-21 16:01:40.084438",
   "event": "cleaned up request"
   }
   ]
   }
   },
   {
   "description": "client_request(client.9801215:348454506
setfilelock rule 1, type 4, owner 14336643358275911908, pid 32085, start 0,
length 0, wait 0 #0x1001e694623 20
21-04-21 15:55:50.829175 caller_uid=1000, caller_gid=1000{})",
   "initiated_at": "2021-04-21 15:55:50.832432",
   "age": 3460.333597209,
   "duration": 3460.333751787,
   "type_data": {
   "flag_point": "cleaned up request",
   "reqid": "client.9801215:348454506",
   "op_type": "client_request",
   "client_info": {
   "client": "client.9801215",
   "tid": 348454506
   },
   "events": [
   {
   "time": "2021-04-21 15:55:50.832432",
   "event": "initiated"
   },
   {
   "time": "2021-04-21 15:55:50.832432",
   "event": "header_read"
   },
   {
   "time": "2021-04-21 15:55:50.832439",
   "event": "throttled"
   },
   {
   "time": "2021-04-21 15:55:50.832465",
   "event": "all_read"
   },
   {
   "time": "2021-04-21 15:55:50.832608",
   "event": "dispatched"
   },
   {
   "time": "2021-04-21 16:01:40.084448",
   "event": "killing request"
   },
   {
   "time": "2021-04-21 16:01:40.084451",
   "event": "cleaned up request"
   }
   ]
   }
   },
   {
   "description": "client_request(client.9143865:2383945773 getattr
pAsLsXsFs #0x10004f6a16b 2021-04-21 16:35:30.075110 caller_uid=0,
caller_gid=0{})",
   "initiated_at": "2021-04-21 16:35:30.077995",
   "age": 1081.088033715,
   "duration": 1081.0882681850001,
   "type_data": {
   "flag_point": "failed to authpin, subtree is being
export

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Flemming Frandsen
I've gone through the clients mentioned by the ops in flight and none of
them are connected any more.

The number of segments that the MDS is behind on is rising steadily and the
ops_in_flight remain, this feels a lot like a catastrophe brewing.

The documentation suggests trying to restart the MDS server, but the last
time I did replay took two hours before any cephfs worked again, so I'd
rather not risk that, if I can help it.

Any hints are appreciated.

# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1
MDSs behind on trimming
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
   mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
blocked for 6046 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
   mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
MDS_TRIM 1 MDSs behind on trimming
   mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
num_segments: 4515



On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen  wrote:

> I've just spent a couple of hours waiting for an MDS server to replay a
> journal that it was behind on and it seems to be getting worse.
>
> The system is not terribly busy, but there are 14 ops in flight that are
> very old and do not seem to go away on their own.
>
> Is there anything I can do to unwedge the mds server?
>
>
> root@dalmore:~# ceph health detail
> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
> 1 MDSs behind on trimming
> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
> blocked for 4107 secs
> MDS_SLOW_REQUEST 1 MDSs report slow requests
>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
> MDS_TRIM 1 MDSs behind on trimming
>mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128,
> num_segments: 3443
>
>
> root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
> {
>"ops": [
>{
>"description": "client_request(client.9801215:348401522 readdir
> #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
> caller_gid=1000{})",
>"initiated_at": "2021-04-21 15:46:30.822084",
>"age": 4020.3439449050002,
>"duration": 4020.343998116,
>"type_data": {
>"flag_point": "cleaned up request",
>"reqid": "client.9801215:348401522",
>"op_type": "client_request",
>"client_info": {
>"client": "client.9801215",
>"tid": 348401522
>},
>"events": [
>{
>"time": "2021-04-21 15:46:30.822084",
>"event": "initiated"
>},
>{
>"time": "2021-04-21 15:46:30.822084",
>"event": "header_read"
>},
>{
>"time": "2021-04-21 15:46:30.822090",
>"event": "throttled"
>},
>{
>"time": "2021-04-21 15:46:30.822108",
>"event": "all_read"
>},
>{
>"time": "2021-04-21 15:46:30.966336",
>"event": "dispatched"
>},
>{
>"time": "2021-04-21 15:46:30.966419",
>"event": "acquired locks"
>},
>{
>"time": "2021-04-21 16:01:40.084378",
>"event": "killing request"
>},
>{
>"time": "2021-04-21 16:01:40.084438",
>"event": "cleaned up request"
>}
>]
>}
>},
>{
>"description": "client_request(client.9801215:348454506
> setfilelock rule 1, type 4, owner 14336643358275911908, pid 32085, start 0,
> length 0, wait 0 #0x1001e694623 20
> 21-04-21 15:55:50.829175 caller_uid=1000, caller_gid=1000{})",
>"initiated_at": "2021-04-21 15:55:50.832432",
>"age": 3460.333597209,
>"duration": 3460.333751787,
>"type_data": {
>"flag_point": "cleaned up request",
>"reqid": "client.9801215:348454506",
>"op_type": "client_request",
>"client_info": {
>"client": "client.9801215",
>"tid": 348454506
>},
>"events": [
>{
>"time": "2021-04-21 15:55:50.832432",
>"event": "initiated"
>},
>{
>"time": "2021-04-21 15:55:50.832432",
>"event": "header_read"
>  

[ceph-users] Re: EC Backfill Observations

2021-04-21 Thread Josh Durgin

On 4/21/21 9:29 AM, Josh Baergen wrote:

Hey Josh,

Thanks for the info!


With respect to reservations, it seems like an oversight that
we don't reserve other shards for backfilling. We reserve all
shards for recovery [0].


Very interesting that there is a reservation difference between
backfill and recovery.


On the other hand, overload from recovery is handled better in
pacific and beyond with mclock-based QoS, which provides much
more effective control of recovery traffic [1][2].


Indeed, I was wondering if mclock was ultimately the answer here,
though I wonder how mclock acts in the case where a source OSD gets
overloaded in the way that I described. Will it throttle backfill too
aggressively, for example, compared to if the reservation was in
place, preventing overload in the first place?


I expect you'd see more backfills proceeding, each at a slower pace,
than if you had the reservations on all replicas. The total backfill
throughput would be about the same, but completing a particular backfill
would take longer.


One more question in this space: Has there ever been discussion about
a back-off mechanism when one of the remote reservations is blocked?
Another issue that we've commonly seen is that a backfill that should
be able to make progress can't because of a backfill_wait that holds
some of its reservations but is waiting for others. Example (with
simplified up/acting sets):

 1.1  active+remapped+backfilling   [0,2]  0   [0,1]  0
 1.2  active+remapped+backfill_wait   [3,2]  3   [3,1]  3
 1.3  active+remapped+backfill_wait   [3,5]  3   [3,4]  3

1.3's backfill could make progress independent of 1.1, but is blocked
behind 1.2 because the latter is holding the local reservation on
osd.3 and is waiting for the remote reservation on osd.2.


Yes, the reservation mechanism is rather complex and intertwined with
the recovery state machine. There was some discussion about this
(including the idea of backoffs) before:

https://marc.info/?t=15209545422&r=1&w=2

and summarized in this card:

https://trello.com/c/ppJKaJeT/331-osd-refactor-reserver

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd nearfull is not detected

2021-04-21 Thread Dan van der Ster
Are you currently doing IO on the relevant pool? Maybe nearfull isn't
reported until some pgstats are reported.

Otherwise sorry I haven't seen this.


Dan



On Wed, Apr 21, 2021, 8:05 PM Konstantin Shalygin  wrote:

> Hi,
>
> On the adopted cluster Prometheus was triggered for "osd full > 90%"
> But Ceph itself - not. Actually OSD is drained (see %USE).
>
> root@host# ceph osd df name osd.696
> ID  CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
>  %USE  VAR  PGS STATUS
> 696  nvme 0.91199  1.0 912 GiB 830 GiB 684 GiB   8 KiB 146 GiB 81 GiB
> 91.09 1.00  47 up
>  TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB
> 91.09
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0
> root@host# ceph osd df name osd.696
> ID  CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
>  %USE  VAR  PGS STATUS
> 696  nvme 0.91199  1.0 912 GiB 830 GiB 684 GiB   8 KiB 146 GiB 81 GiB
> 91.08 1.00  47 up
>  TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB
> 91.08
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0
> root@host# ceph osd df name osd.696
> ID  CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
>  %USE  VAR  PGS STATUS
> 696  nvme 0.91199  1.0 912 GiB 830 GiB 684 GiB   8 KiB 146 GiB 81 GiB
> 91.07 1.00  47 up
>  TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB
> 91.07
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0
>
> Pool 18 is another class pool, OSD's of this pool triggered as usual, but
> for pool 17 - don't.
>
> root@host# ceph health detail
> HEALTH_WARN noout flag(s) set; Some pool(s) have the nodeep-scrub flag(s)
> set; Low space hindering backfill (add storage if this doesn't resolve
> itself): 2 pgs backfill_toofull
> OSDMAP_FLAGS noout flag(s) set
> POOL_SCRUB_FLAGS Some pool(s) have the nodeep-scrub flag(s) set
> Pool meta_ru1b has nodeep-scrub flag
> Pool data_ru1b has nodeep-scrub flag
> PG_BACKFILL_FULL Low space hindering backfill (add storage if this doesn't
> resolve itself): 2 pgs backfill_toofull
> pg 18.1008 is active+remapped+backfill_wait+backfill_toofull, acting
> [336,462,580]
> pg 18.27e0 is active+remapped+backfill_wait+backfill_toofull, acting
> [401,627,210]
>
>
> On my experience , Ceph triggers when OSD drain on backfillfull_ratio,
> then on nearfull_ratio until
> usage will drops to 84.99%
> I don't think is to possible to configure silence for this
>
> Current usage:
>
> root@host# ceph df  detail
> RAW STORAGE:
> CLASS SIZEAVAILUSEDRAW USED %RAW USED
> hdd   4.3 PiB 1022 TiB 3.3 PiB  3.3 PiB 76.71
> nvme  161 TiB   61 TiB  82 TiB  100 TiB 62.30
> TOTAL 4.4 PiB  1.1 PiB 3.4 PiB  3.4 PiB 76.20
>
> POOLS:
> POOL  ID PGS   STORED  OBJECTS USED
>  %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED
> COMPR UNDER COMPR
> meta_ru1b 17  2048 3.1 TiB   7.15G  82 TiB
> 92.77   2.1 TiB N/A   N/A 7.15G
>  0 B 0 B
> data_ru1b 18 16384 1.1 PiB   3.07G 3.3 PiB
> 88.29   148 TiB N/A   N/A 3.07G
>  0 B 0 B
>
>
> Current OSD dump header:
>
> epoch 270540
> fsid ccf2c233-4adf-423c-b734-236220096d4e
> created 2019-02-14 15:30:56.642918
> modified 2021-04-21 20:33:54.481616
> flags noout,sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
> crush_version 7255
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client jewel
> min_compat_client jewel
> require_osd_release nautilus
> pool 17 'meta_ru1b' replicated size 3 min_size 2 crush_rule 1 object_hash
> rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 240836
> lfor 0/0/51990 flags hashpspool,nodeep-scrub stripe_width 0 application
> metadata
> pool 18 'data_ru1b' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 16384 pgp_num 16384 autoscale_mode warn last_change 270529
> lfor 0/0/52038 flags hashpspool,nodeep-scrub stripe_width 0 application data
> max_osd 780
>
>
> Current versions:
>
> {
> "mon": {
> "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
> nautilus (stable)": 3
> },
> "mgr": {
> "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
> nautilus (stable)": 3
> },
> "osd": {
> "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
> nautilus (stable)": 780
> },
> "mds": {},
> "overall": {
> "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
> nautilus (stable)": 786
> }
> }
>
>
>
> Dan, maybe there was something like that in your memory? My guess is that
> some counter type overflowed
>
>
>
> Thanks,
> k
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send a

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Dan van der Ster
Did this eventually clear?
We had something like this happen once when we changed an md export pin for
a very top level directory from mds.3 to mds.0. This triggered so much
subtree export work that it took something like 30 minutes to complete. In
our case the md segments kept growing into a few 10k, iirc. As soon as the
exports completed the md log trimmed quickly.

.. Dan



On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen  wrote:

> I've gone through the clients mentioned by the ops in flight and none of
> them are connected any more.
>
> The number of segments that the MDS is behind on is rising steadily and the
> ops_in_flight remain, this feels a lot like a catastrophe brewing.
>
> The documentation suggests trying to restart the MDS server, but the last
> time I did replay took two hours before any cephfs worked again, so I'd
> rather not risk that, if I can help it.
>
> Any hints are appreciated.
>
> # ceph health detail
> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1
> MDSs behind on trimming
> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
> blocked for 6046 secs
> MDS_SLOW_REQUEST 1 MDSs report slow requests
>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
> MDS_TRIM 1 MDSs behind on trimming
>mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
> num_segments: 4515
>
>
>
> On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen  wrote:
>
> > I've just spent a couple of hours waiting for an MDS server to replay a
> > journal that it was behind on and it seems to be getting worse.
> >
> > The system is not terribly busy, but there are 14 ops in flight that are
> > very old and do not seem to go away on their own.
> >
> > Is there anything I can do to unwedge the mds server?
> >
> >
> > root@dalmore:~# ceph health detail
> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
> > 1 MDSs behind on trimming
> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
> >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
> > blocked for 4107 secs
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
> > MDS_TRIM 1 MDSs behind on trimming
> >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128,
> > num_segments: 3443
> >
> >
> > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
> > {
> >"ops": [
> >{
> >"description": "client_request(client.9801215:348401522
> readdir
> > #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
> > caller_gid=1000{})",
> >"initiated_at": "2021-04-21 15:46:30.822084",
> >"age": 4020.3439449050002,
> >"duration": 4020.343998116,
> >"type_data": {
> >"flag_point": "cleaned up request",
> >"reqid": "client.9801215:348401522",
> >"op_type": "client_request",
> >"client_info": {
> >"client": "client.9801215",
> >"tid": 348401522
> >},
> >"events": [
> >{
> >"time": "2021-04-21 15:46:30.822084",
> >"event": "initiated"
> >},
> >{
> >"time": "2021-04-21 15:46:30.822084",
> >"event": "header_read"
> >},
> >{
> >"time": "2021-04-21 15:46:30.822090",
> >"event": "throttled"
> >},
> >{
> >"time": "2021-04-21 15:46:30.822108",
> >"event": "all_read"
> >},
> >{
> >"time": "2021-04-21 15:46:30.966336",
> >"event": "dispatched"
> >},
> >{
> >"time": "2021-04-21 15:46:30.966419",
> >"event": "acquired locks"
> >},
> >{
> >"time": "2021-04-21 16:01:40.084378",
> >"event": "killing request"
> >},
> >{
> >"time": "2021-04-21 16:01:40.084438",
> >"event": "cleaned up request"
> >}
> >]
> >}
> >},
> >{
> >"description": "client_request(client.9801215:348454506
> > setfilelock rule 1, type 4, owner 14336643358275911908, pid 32085, start
> 0,
> > length 0, wait 0 #0x1001e694623 20
> > 21-04-21 15:55:50.829175 caller_uid=1000, caller_gid=1000{})",
> >"initiated_at": "2021-04-21 15:55:50.832432",
> >"age": 3460.333597209,
> >"duration": 34

[ceph-users] New Ceph cluster- having issue with one monitor

2021-04-21 Thread Robert W. Eckert
Hi,
I have pieced together some pcs which I had been using to run  a  windows DFS 
cluster.  the 3 servers all have 3 4Tb Hard Drives and 1 2Tb SSD, but they have 
different CPUs
All of them are running RHEL8, and have 2.5 Gbps NICs in them.

The install was with cephadm, and the ceph processes are all running under 
podman.

Two of the servers - which are on old hardware, one a Core I5 2500, the other a 
J5005 embedded cpu are very stable, and have had no real issues.
The other server an AMD 3600 on an X570 motherboard runs great, but the monitor 
goes down at least once a day.   I have a script I run to copy from one of the 
other servers when it goes down, but I would prefer to just have it run.

I don't know if it is tied together or not, but I also see OSD_SCRUB_ERRORS and 
PG_DAMAGED.

The load on the cluster is light-I have some backups from my pcs running to it, 
a copy of all my photos, 3 private Minecraft servers running in podman 
containers, and an ark survival server also running in a podman container with 
a mount pointing to the ceph file system.  Last night, I stopped the Minecraft 
and ark containers before I went to bed, and this morning, everything was 
clean.  This afternoon, with no load on the filesystem, the monitor went down, 
and I have 8 scrub errors.

I have not been able to find where the ceph monitor logs are going, I thought 
under either /var/log/ceph or /var/log/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867
But the log files there are 0 bytes.

Attached is the output from ceph-report.
What can I try to improve logging for troubleshooting or get this one monitor 
stable?

Thanks,
Rob
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Flemming Frandsen
Not as of yet, it's steadily getting further behind.

We're now up to 6797 segments and there's still the same 14 long-running
operations that are all "cleaned up request".

Something is blocking trimming, normally I'd follow the advice of
restarting the mds:
https://docs.ceph.com/en/latest/cephfs/troubleshooting/#slow-requests-mds

... but I just did that and replaying 1200 segments took two hours, so if
the same happens again, then I'm looking at 12 hours of downtime, which
would be very unpopular.



On Wed, 21 Apr 2021 at 20:26, Dan van der Ster  wrote:

> Did this eventually clear?
> We had something like this happen once when we changed an md export pin
> for a very top level directory from mds.3 to mds.0. This triggered so much
> subtree export work that it took something like 30 minutes to complete. In
> our case the md segments kept growing into a few 10k, iirc. As soon as the
> exports completed the md log trimmed quickly.
>
> .. Dan
>
>
>
> On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen  wrote:
>
>> I've gone through the clients mentioned by the ops in flight and none of
>> them are connected any more.
>>
>> The number of segments that the MDS is behind on is rising steadily and
>> the
>> ops_in_flight remain, this feels a lot like a catastrophe brewing.
>>
>> The documentation suggests trying to restart the MDS server, but the last
>> time I did replay took two hours before any cephfs worked again, so I'd
>> rather not risk that, if I can help it.
>>
>> Any hints are appreciated.
>>
>> # ceph health detail
>> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
>> 1
>> MDSs behind on trimming
>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
>> blocked for 6046 secs
>> MDS_SLOW_REQUEST 1 MDSs report slow requests
>>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>> MDS_TRIM 1 MDSs behind on trimming
>>mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
>> num_segments: 4515
>>
>>
>>
>> On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen 
>> wrote:
>>
>> > I've just spent a couple of hours waiting for an MDS server to replay a
>> > journal that it was behind on and it seems to be getting worse.
>> >
>> > The system is not terribly busy, but there are 14 ops in flight that are
>> > very old and do not seem to go away on their own.
>> >
>> > Is there anything I can do to unwedge the mds server?
>> >
>> >
>> > root@dalmore:~# ceph health detail
>> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
>> requests;
>> > 1 MDSs behind on trimming
>> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>> >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
>> > blocked for 4107 secs
>> > MDS_SLOW_REQUEST 1 MDSs report slow requests
>> >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>> > MDS_TRIM 1 MDSs behind on trimming
>> >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128,
>> > num_segments: 3443
>> >
>> >
>> > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
>> > {
>> >"ops": [
>> >{
>> >"description": "client_request(client.9801215:348401522
>> readdir
>> > #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
>> > caller_gid=1000{})",
>> >"initiated_at": "2021-04-21 15:46:30.822084",
>> >"age": 4020.3439449050002,
>> >"duration": 4020.343998116,
>> >"type_data": {
>> >"flag_point": "cleaned up request",
>> >"reqid": "client.9801215:348401522",
>> >"op_type": "client_request",
>> >"client_info": {
>> >"client": "client.9801215",
>> >"tid": 348401522
>> >},
>> >"events": [
>> >{
>> >"time": "2021-04-21 15:46:30.822084",
>> >"event": "initiated"
>> >},
>> >{
>> >"time": "2021-04-21 15:46:30.822084",
>> >"event": "header_read"
>> >},
>> >{
>> >"time": "2021-04-21 15:46:30.822090",
>> >"event": "throttled"
>> >},
>> >{
>> >"time": "2021-04-21 15:46:30.822108",
>> >"event": "all_read"
>> >},
>> >{
>> >"time": "2021-04-21 15:46:30.966336",
>> >"event": "dispatched"
>> >},
>> >{
>> >"time": "2021-04-21 15:46:30.966419",
>> >"event": "acquired locks"
>> >},
>> >{
>> >"time": "2021-04-21 16:01:40.084378",
>> >  

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Dan van der Ster
You don't pin subtrees ?
I would guess that something in the workload changed and it's triggering a
particularly bad behavior in the md balancer.
Increase debug_mds gradually on both mds's; hopefully that gives a hint as
to what it's doing.

.. dan


On Wed, Apr 21, 2021, 8:48 PM Flemming Frandsen  wrote:

> Not as of yet, it's steadily getting further behind.
>
> We're now up to 6797 segments and there's still the same 14 long-running
> operations that are all "cleaned up request".
>
> Something is blocking trimming, normally I'd follow the advice of
> restarting the mds:
> https://docs.ceph.com/en/latest/cephfs/troubleshooting/#slow-requests-mds
>
> ... but I just did that and replaying 1200 segments took two hours, so if
> the same happens again, then I'm looking at 12 hours of downtime, which
> would be very unpopular.
>
>
>
> On Wed, 21 Apr 2021 at 20:26, Dan van der Ster  wrote:
>
>> Did this eventually clear?
>> We had something like this happen once when we changed an md export pin
>> for a very top level directory from mds.3 to mds.0. This triggered so much
>> subtree export work that it took something like 30 minutes to complete. In
>> our case the md segments kept growing into a few 10k, iirc. As soon as the
>> exports completed the md log trimmed quickly.
>>
>> .. Dan
>>
>>
>>
>> On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen 
>> wrote:
>>
>>> I've gone through the clients mentioned by the ops in flight and none of
>>> them are connected any more.
>>>
>>> The number of segments that the MDS is behind on is rising steadily and
>>> the
>>> ops_in_flight remain, this feels a lot like a catastrophe brewing.
>>>
>>> The documentation suggests trying to restart the MDS server, but the last
>>> time I did replay took two hours before any cephfs worked again, so I'd
>>> rather not risk that, if I can help it.
>>>
>>> Any hints are appreciated.
>>>
>>> # ceph health detail
>>> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
>>> requests; 1
>>> MDSs behind on trimming
>>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
>>> blocked for 6046 secs
>>> MDS_SLOW_REQUEST 1 MDSs report slow requests
>>>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>>> MDS_TRIM 1 MDSs behind on trimming
>>>mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
>>> num_segments: 4515
>>>
>>>
>>>
>>> On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen 
>>> wrote:
>>>
>>> > I've just spent a couple of hours waiting for an MDS server to replay a
>>> > journal that it was behind on and it seems to be getting worse.
>>> >
>>> > The system is not terribly busy, but there are 14 ops in flight that
>>> are
>>> > very old and do not seem to go away on their own.
>>> >
>>> > Is there anything I can do to unwedge the mds server?
>>> >
>>> >
>>> > root@dalmore:~# ceph health detail
>>> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
>>> requests;
>>> > 1 MDSs behind on trimming
>>> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>> >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
>>> oldest
>>> > blocked for 4107 secs
>>> > MDS_SLOW_REQUEST 1 MDSs report slow requests
>>> >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>>> > MDS_TRIM 1 MDSs behind on trimming
>>> >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128,
>>> > num_segments: 3443
>>> >
>>> >
>>> > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
>>> > {
>>> >"ops": [
>>> >{
>>> >"description": "client_request(client.9801215:348401522
>>> readdir
>>> > #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
>>> > caller_gid=1000{})",
>>> >"initiated_at": "2021-04-21 15:46:30.822084",
>>> >"age": 4020.3439449050002,
>>> >"duration": 4020.343998116,
>>> >"type_data": {
>>> >"flag_point": "cleaned up request",
>>> >"reqid": "client.9801215:348401522",
>>> >"op_type": "client_request",
>>> >"client_info": {
>>> >"client": "client.9801215",
>>> >"tid": 348401522
>>> >},
>>> >"events": [
>>> >{
>>> >"time": "2021-04-21 15:46:30.822084",
>>> >"event": "initiated"
>>> >},
>>> >{
>>> >"time": "2021-04-21 15:46:30.822084",
>>> >"event": "header_read"
>>> >},
>>> >{
>>> >"time": "2021-04-21 15:46:30.822090",
>>> >"event": "throttled"
>>> >},
>>> >{
>>> >"time": "2021-04-21 15:46:30.822108",
>>> >"event": "all_read"
>>> >},

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Flemming Frandsen
No, I don't.

I guess I could pin a large part of the tree, if that's something that's
likely to help.

On Wed, 21 Apr 2021 at 21:02, Dan van der Ster  wrote:

> You don't pin subtrees ?
> I would guess that something in the workload changed and it's triggering a
> particularly bad behavior in the md balancer.
> Increase debug_mds gradually on both mds's; hopefully that gives a hint as
> to what it's doing.
>
> .. dan
>
>
> On Wed, Apr 21, 2021, 8:48 PM Flemming Frandsen  wrote:
>
>> Not as of yet, it's steadily getting further behind.
>>
>> We're now up to 6797 segments and there's still the same 14 long-running
>> operations that are all "cleaned up request".
>>
>> Something is blocking trimming, normally I'd follow the advice of
>> restarting the mds:
>> https://docs.ceph.com/en/latest/cephfs/troubleshooting/#slow-requests-mds
>>
>> ... but I just did that and replaying 1200 segments took two hours, so if
>> the same happens again, then I'm looking at 12 hours of downtime, which
>> would be very unpopular.
>>
>>
>>
>> On Wed, 21 Apr 2021 at 20:26, Dan van der Ster 
>> wrote:
>>
>>> Did this eventually clear?
>>> We had something like this happen once when we changed an md export pin
>>> for a very top level directory from mds.3 to mds.0. This triggered so much
>>> subtree export work that it took something like 30 minutes to complete. In
>>> our case the md segments kept growing into a few 10k, iirc. As soon as the
>>> exports completed the md log trimmed quickly.
>>>
>>> .. Dan
>>>
>>>
>>>
>>> On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen 
>>> wrote:
>>>
 I've gone through the clients mentioned by the ops in flight and none of
 them are connected any more.

 The number of segments that the MDS is behind on is rising steadily and
 the
 ops_in_flight remain, this feels a lot like a catastrophe brewing.

 The documentation suggests trying to restart the MDS server, but the
 last
 time I did replay took two hours before any cephfs worked again, so I'd
 rather not risk that, if I can help it.

 Any hints are appreciated.

 # ceph health detail
 HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
 requests; 1
 MDSs behind on trimming
 MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest
 blocked for 6046 secs
 MDS_SLOW_REQUEST 1 MDSs report slow requests
mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
 MDS_TRIM 1 MDSs behind on trimming
mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
 num_segments: 4515



 On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen 
 wrote:

 > I've just spent a couple of hours waiting for an MDS server to replay
 a
 > journal that it was behind on and it seems to be getting worse.
 >
 > The system is not terribly busy, but there are 14 ops in flight that
 are
 > very old and do not seem to go away on their own.
 >
 > Is there anything I can do to unwedge the mds server?
 >
 >
 > root@dalmore:~# ceph health detail
 > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
 requests;
 > 1 MDSs behind on trimming
 > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
 >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
 oldest
 > blocked for 4107 secs
 > MDS_SLOW_REQUEST 1 MDSs report slow requests
 >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
 > MDS_TRIM 1 MDSs behind on trimming
 >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments:
 128,
 > num_segments: 3443
 >
 >
 > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
 > {
 >"ops": [
 >{
 >"description": "client_request(client.9801215:348401522
 readdir
 > #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
 > caller_gid=1000{})",
 >"initiated_at": "2021-04-21 15:46:30.822084",
 >"age": 4020.3439449050002,
 >"duration": 4020.343998116,
 >"type_data": {
 >"flag_point": "cleaned up request",
 >"reqid": "client.9801215:348401522",
 >"op_type": "client_request",
 >"client_info": {
 >"client": "client.9801215",
 >"tid": 348401522
 >},
 >"events": [
 >{
 >"time": "2021-04-21 15:46:30.822084",
 >"event": "initiated"
 >},
 >{
 >"time": "2021-04-21 15:46:30.822084",
 >"event": "header_read"
 >},
 >{
>>>

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Dan van der Ster
No no pinning now won't help anything... I was asking to understand if it's
likely there is balancing happening actively now. If you don't pin, then
it's likely.

Try the debug logs. And check the exports using something like :

ceph daemon mds.b get subtrees | jq '.[] | [.dir.path, .auth_first,
.export_pin]'

Dan



On Wed, Apr 21, 2021, 9:05 PM Flemming Frandsen  wrote:

> No, I don't.
>
> I guess I could pin a large part of the tree, if that's something that's
> likely to help.
>
> On Wed, 21 Apr 2021 at 21:02, Dan van der Ster  wrote:
>
>> You don't pin subtrees ?
>> I would guess that something in the workload changed and it's triggering
>> a particularly bad behavior in the md balancer.
>> Increase debug_mds gradually on both mds's; hopefully that gives a hint
>> as to what it's doing.
>>
>> .. dan
>>
>>
>> On Wed, Apr 21, 2021, 8:48 PM Flemming Frandsen 
>> wrote:
>>
>>> Not as of yet, it's steadily getting further behind.
>>>
>>> We're now up to 6797 segments and there's still the same 14 long-running
>>> operations that are all "cleaned up request".
>>>
>>> Something is blocking trimming, normally I'd follow the advice of
>>> restarting the mds:
>>> https://docs.ceph.com/en/latest/cephfs/troubleshooting/#slow-requests-mds
>>>
>>> ... but I just did that and replaying 1200 segments took two hours, so
>>> if the same happens again, then I'm looking at 12 hours of downtime, which
>>> would be very unpopular.
>>>
>>>
>>>
>>> On Wed, 21 Apr 2021 at 20:26, Dan van der Ster 
>>> wrote:
>>>
 Did this eventually clear?
 We had something like this happen once when we changed an md export pin
 for a very top level directory from mds.3 to mds.0. This triggered so much
 subtree export work that it took something like 30 minutes to complete. In
 our case the md segments kept growing into a few 10k, iirc. As soon as the
 exports completed the md log trimmed quickly.

 .. Dan



 On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen 
 wrote:

> I've gone through the clients mentioned by the ops in flight and none
> of
> them are connected any more.
>
> The number of segments that the MDS is behind on is rising steadily
> and the
> ops_in_flight remain, this feels a lot like a catastrophe brewing.
>
> The documentation suggests trying to restart the MDS server, but the
> last
> time I did replay took two hours before any cephfs worked again, so I'd
> rather not risk that, if I can help it.
>
> Any hints are appreciated.
>
> # ceph health detail
> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
> requests; 1
> MDSs behind on trimming
> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
> oldest
> blocked for 6046 secs
> MDS_SLOW_REQUEST 1 MDSs report slow requests
>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
> MDS_TRIM 1 MDSs behind on trimming
>mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments: 128,
> num_segments: 4515
>
>
>
> On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen 
> wrote:
>
> > I've just spent a couple of hours waiting for an MDS server to
> replay a
> > journal that it was behind on and it seems to be getting worse.
> >
> > The system is not terribly busy, but there are 14 ops in flight that
> are
> > very old and do not seem to go away on their own.
> >
> > Is there anything I can do to unwedge the mds server?
> >
> >
> > root@dalmore:~# ceph health detail
> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
> requests;
> > 1 MDSs behind on trimming
> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
> >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
> oldest
> > blocked for 4107 secs
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
> > MDS_TRIM 1 MDSs behind on trimming
> >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments:
> 128,
> > num_segments: 3443
> >
> >
> > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
> > {
> >"ops": [
> >{
> >"description": "client_request(client.9801215:348401522
> readdir
> > #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000,
> > caller_gid=1000{})",
> >"initiated_at": "2021-04-21 15:46:30.822084",
> >"age": 4020.3439449050002,
> >"duration": 4020.343998116,
> >"type_data": {
> >"flag_point": "cleaned up request",
> >"reqid": "client.9801215:348401522",
> >"op_type": "client_request",
> >"client_info": {
> >  

[ceph-users] Re: MDS_TRIM 1 MDSs behind on trimming and

2021-04-21 Thread Flemming Frandsen
I'll be damned.

I restarted the wedged mds and after a reasonable amount of time the
standby mds finished replaying and became active.

The cluster is now healthy and it seems the apps I have running on top of
cephfs have sorted themselves out too, I guess all the MDS really needed
was a stern bullet in the head.

Thank you for helping.

It's getting late where I am, so I'll save the post-mortem for tomorrow.

On Wed, 21 Apr 2021 at 21:09, Dan van der Ster  wrote:

> No no pinning now won't help anything... I was asking to understand if
> it's likely there is balancing happening actively now. If you don't pin,
> then it's likely.
>
> Try the debug logs. And check the exports using something like :
>
> ceph daemon mds.b get subtrees | jq '.[] | [.dir.path, .auth_first,
> .export_pin]'
>
> Dan
>
>
>
> On Wed, Apr 21, 2021, 9:05 PM Flemming Frandsen  wrote:
>
>> No, I don't.
>>
>> I guess I could pin a large part of the tree, if that's something that's
>> likely to help.
>>
>> On Wed, 21 Apr 2021 at 21:02, Dan van der Ster 
>> wrote:
>>
>>> You don't pin subtrees ?
>>> I would guess that something in the workload changed and it's triggering
>>> a particularly bad behavior in the md balancer.
>>> Increase debug_mds gradually on both mds's; hopefully that gives a hint
>>> as to what it's doing.
>>>
>>> .. dan
>>>
>>>
>>> On Wed, Apr 21, 2021, 8:48 PM Flemming Frandsen 
>>> wrote:
>>>
 Not as of yet, it's steadily getting further behind.

 We're now up to 6797 segments and there's still the same 14
 long-running operations that are all "cleaned up request".

 Something is blocking trimming, normally I'd follow the advice of
 restarting the mds:

 https://docs.ceph.com/en/latest/cephfs/troubleshooting/#slow-requests-mds

 ... but I just did that and replaying 1200 segments took two hours, so
 if the same happens again, then I'm looking at 12 hours of downtime, which
 would be very unpopular.



 On Wed, 21 Apr 2021 at 20:26, Dan van der Ster 
 wrote:

> Did this eventually clear?
> We had something like this happen once when we changed an md export
> pin for a very top level directory from mds.3 to mds.0. This triggered so
> much subtree export work that it took something like 30 minutes to
> complete. In our case the md segments kept growing into a few 10k, iirc. 
> As
> soon as the exports completed the md log trimmed quickly.
>
> .. Dan
>
>
>
> On Wed, Apr 21, 2021, 7:38 PM Flemming Frandsen 
> wrote:
>
>> I've gone through the clients mentioned by the ops in flight and none
>> of
>> them are connected any more.
>>
>> The number of segments that the MDS is behind on is rising steadily
>> and the
>> ops_in_flight remain, this feels a lot like a catastrophe brewing.
>>
>> The documentation suggests trying to restart the MDS server, but the
>> last
>> time I did replay took two hours before any cephfs worked again, so
>> I'd
>> rather not risk that, if I can help it.
>>
>> Any hints are appreciated.
>>
>> # ceph health detail
>> HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
>> requests; 1
>> MDSs behind on trimming
>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
>> oldest
>> blocked for 6046 secs
>> MDS_SLOW_REQUEST 1 MDSs report slow requests
>>mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>> MDS_TRIM 1 MDSs behind on trimming
>>mds.dalmore(mds.0): Behind on trimming (4515/128) max_segments:
>> 128,
>> num_segments: 4515
>>
>>
>>
>> On Wed, 21 Apr 2021 at 19:09, Flemming Frandsen 
>> wrote:
>>
>> > I've just spent a couple of hours waiting for an MDS server to
>> replay a
>> > journal that it was behind on and it seems to be getting worse.
>> >
>> > The system is not terribly busy, but there are 14 ops in flight
>> that are
>> > very old and do not seem to go away on their own.
>> >
>> > Is there anything I can do to unwedge the mds server?
>> >
>> >
>> > root@dalmore:~# ceph health detail
>> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow
>> requests;
>> > 1 MDSs behind on trimming
>> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>> >mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs,
>> oldest
>> > blocked for 4107 secs
>> > MDS_SLOW_REQUEST 1 MDSs report slow requests
>> >mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs
>> > MDS_TRIM 1 MDSs behind on trimming
>> >mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments:
>> 128,
>> > num_segments: 3443
>> >
>> >
>> > root@dalmore:~#  ceph daemon mds.dalmore dump_ops_in_flight
>> > {
>> > 

[ceph-users] RGW objects has same marker and bucket id in different buckets.

2021-04-21 Thread by morphin
Hello.

I have a rgw s3 user and the user have 2 bucket.
I tried to copy objects from old.bucket to new.bucket with rclone. (in
the rgw client server)
After I checked the object with "radosgw-admin --bucket=new.bucket
object stat $i" and I saw old.bucket id and marker id also old bucket
name in the object stats.

Is rgw doing this for deduplication or is it a bug?
If it's not a bug then If I delete the old bucket what will happen to
these objects???
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC Backfill Observations

2021-04-21 Thread Josh Baergen
Hey Josh,

Thanks for the info!

> With respect to reservations, it seems like an oversight that
> we don't reserve other shards for backfilling. We reserve all
> shards for recovery [0].

Very interesting that there is a reservation difference between
backfill and recovery.

> On the other hand, overload from recovery is handled better in
> pacific and beyond with mclock-based QoS, which provides much
> more effective control of recovery traffic [1][2].

Indeed, I was wondering if mclock was ultimately the answer here,
though I wonder how mclock acts in the case where a source OSD gets
overloaded in the way that I described. Will it throttle backfill too
aggressively, for example, compared to if the reservation was in
place, preventing overload in the first place?

One more question in this space: Has there ever been discussion about
a back-off mechanism when one of the remote reservations is blocked?
Another issue that we've commonly seen is that a backfill that should
be able to make progress can't because of a backfill_wait that holds
some of its reservations but is waiting for others. Example (with
simplified up/acting sets):

1.1  active+remapped+backfilling   [0,2]  0   [0,1]  0
1.2  active+remapped+backfill_wait   [3,2]  3   [3,1]  3
1.3  active+remapped+backfill_wait   [3,5]  3   [3,4]  3

1.3's backfill could make progress independent of 1.1, but is blocked
behind 1.2 because the latter is holding the local reservation on
osd.3 and is waiting for the remote reservation on osd.2.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC Backfill Observations

2021-04-21 Thread Josh Baergen
> Yes, the reservation mechanism is rather complex and intertwined with
> the recovery state machine. There was some discussion about this
> (including the idea of backoffs) before:

Thanks!

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW objects has same marker and bucket id in different buckets.

2021-04-21 Thread Matt Benjamin
Hi Morphin,

Yes, this is by design.  When an RGW object has tail chunks and is
copied so as to duplicate an entire tail chunk, RGW causes the
coincident chunk(s) to be shared.  Tail chunks are refcounted to avoid
leaks.

Matt

On Wed, Apr 21, 2021 at 4:21 PM by morphin  wrote:
>
> Hello.
>
> I have a rgw s3 user and the user have 2 bucket.
> I tried to copy objects from old.bucket to new.bucket with rclone. (in
> the rgw client server)
> After I checked the object with "radosgw-admin --bucket=new.bucket
> object stat $i" and I saw old.bucket id and marker id also old bucket
> name in the object stats.
>
> Is rgw doing this for deduplication or is it a bug?
> If it's not a bug then If I delete the old bucket what will happen to
> these objects???
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io