[ceph-users] precise/best way to check ssd usage

2023-07-28 Thread Marc


Currently I am checking usage on ssd drives with 

ceph osd df| egrep 'CLASS|ssd'

I have a use % between 48% and 57%, and assume that with a node failure 1/3 
(only using 3x repl.) of this 57% needs to be able to migrate and added to a 
different node.

Is there a better way of checking this (on old Nautilus)?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Not all Bucket Shards being used

2023-07-28 Thread J. Eric Ivancich
Thank you for the information, Christian. When you reshard the bucket id is 
updated (with most recent versions of ceph, a generation number is 
incremented). The first bucket id matches the bucket marker, but after the 
first reshard they diverge.

The bucket id is in the names of the currently used bucket index shards. You’re 
searching for the marker, which means you’re finding older bucket index shards.

Change your commands to these:

# rados -p raum.rgw.buckets.index ls \
   |grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
   |sort -V

# rados -p raum.rgw.buckets.index ls \
   |grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
   |sort -V \
   |xargs -IOMAP sh -c \
   'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l'

When you refer to the “second zone”, what do you mean? Is this cluster using 
multisite? If and only if your answer is “no”, then it’s safe to remove old 
bucket index shards. Depending on the version of ceph running when reshard was 
run, they were either intentionally left behind (earlier behavior) or removed 
automatically (later behavior).

Eric
(he/him)

> On Jul 25, 2023, at 6:32 AM, Christian Kugler  wrote:
> 
> Hi Eric,
> 
>> 1. I recommend that you *not* issue another bucket reshard until you figure 
>> out what’s going on.
> 
> Thanks, noted!
> 
>> 2. Which version of Ceph are you using?
> 17.2.5
> I wanted to get the Cluster to Health OK before upgrading. I didn't
> see anything that led me to believe that an upgrade could fix the
> reshard issue.
> 
>> 3. Can you issue a `radosgw-admin metadata get bucket:` so we 
>> can verify what the current marker is?
> 
> # radosgw-admin metadata get bucket:sql20
> {
>"key": "bucket:sql20",
>"ver": {
>"tag": "_hGhtgzjcWY9rO9JP7YlWzt8",
>"ver": 3
>},
>"mtime": "2023-07-12T15:56:55.226784Z",
>"data": {
>"bucket": {
>"name": "sql20",
>"marker": "3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9",
>"bucket_id": "3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1",
>"tenant": "",
>"explicit_placement": {
>"data_pool": "",
>"data_extra_pool": "",
>"index_pool": ""
>}
>},
>"owner": "S3user",
>"creation_time": "2023-04-26T09:22:01.681646Z",
>"linked": "true",
>"has_bucket_info": "false"
>}
> }
> 
>> 4. After you resharded previously, did you get command-line output along the 
>> lines of:
>> 2023-07-24T13:33:50.867-0400 7f10359f2a80 1 execute INFO: reshard of bucket 
>> “" completed successfully
> 
> I think so, at least for the second reshard. But I wouldn't bet my
> life on it. I fear I might have missed an error on the first one since
> I have done a radosgw-admin bucket reshard so often and never seen it
> fail.
> 
> Christian
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: LARGE_OMAP_OBJECTS warning and bucket has lot of unknown objects and 1999 shards.

2023-07-28 Thread J. Eric Ivancich
There are a couple of potential explanations. 1) Do you have versioning turned 
on? 1a) And do you write the same file over and over, such as a heartbeat file? 
2) Do you have lots of incomplete multipart uploads?

If you wouldn’t mind, please run: `radosgw-admin bi list —bucket=epbucket 
--max-entries=50`and provide the output in a reply.

Thanks,

Eric
(he/him)

> On Jul 28, 2023, at 9:25 AM, Uday Bhaskar Jalagam  
> wrote:
> 
> Hello Everyone ,
> I am getting [WRN] LARGE_OMAP_OBJECTS: 18 large omap objects  warning
> in one of my clusters .  I see one of the buckets has a huge number of
> shards 1999 and "num_objects": 221185360 when I check bucket stats using
> radosgw-admin bucket stats . However I see only 8 files when I actually do
> list bucket using python boto3. I am not sure what those objects are
> missing somewhere .
> 
> # radosgw-admin bucket stats --bucket=epbucket
> {
>"bucket": "epbucket",
>"num_shards": 1999,
>"tenant": "",
>"zonegroup": "ac361c18-7420-4739-8e92-73109fc8ec80",
>"placement_rule": "default-placement",
>"explicit_placement": {
>"data_pool": "",
>"data_extra_pool": "",
>"index_pool": ""
>},
>"id": "f3e47d62-b9ff-4f38-b863-d7bfc155d698.35744173.268063",
>"marker": "f3e47d62-b9ff-4f38-b863-d7bfc155d698.35744173.1225",
>"index_type": "Normal",
>"owner": "user",
>"ver":
> "0#3,1#1,2#1,3#9,4#1,5#3,6#1,7#3,8#1,9#8,10#7,11#1,12#3,13#1,14#1,15#1,16#1,17#1,18#5,19#1,20#1,21#3,22#3,23#1,24#3,25#1,26#1,27#1,28#3,29#3,30#2,31#3,32#2,33#1,34#3,35#1,36#3,37#1,38#3,39#1,40#3,41#1,42#1,43#1,44#1,45#1,46#3,47#1,48#3,49#1,50#3,51#1,52#1,53#1,54#1,55#1,56#3,57#1,58#1,59#5,60#1,61#1,62#1,63#3,64#1,65#1,66#3,67#1,68#5,69#1,70#1,71#3,72#1,73#3,74#2,75#3,76#1,77#3,78#1,79#1,80#3,81#3,82#3,83#3,84#1,85#1,86#3,87#3,88#8,89#1,90#1,91#1,92#5,93#7,94#1,95#1,96#3,97#1,98#5,99#1,100#1,101#3,102#5,103#3,104#1,105#1,106#1,107#1,108#1,109#239050,110#1,111#1,112#1,113#5,114#1,115#1,116#1,117#3,118#3,119#9,120#1,121#1,122#3,123#5,124#3,125#3,126#1,127#1,128#1,129#3,130#3,131#3,132#3,133#5,134#1,135#1,136#1,137#1,138#5,139#3,140#1,141#1,142#3,143#3,144#1,145#1,146#3,147#3,148#3,149#1,150#3,151#1,152#1,153#3,154#7,155#1,156#5,157#3,158#3,159#3,160#5,161#3,162#3,163#3,164#3,165#1,166#1,167#5,168#3,169#1,170#1,171#1,172#4,173#1,174#12,175#1,176#3,177#3,178#3,179#1,180#3,181#33,182#1,18
> 3#1,184#1,185#1,186#3,187#3,188#11,189#3,190#5,191#3,192#1,193#5,194#3,195#2,196#3,197#1,198#3,199#1,200#7,201#3,202#7,203#1,204#3,205#3,206#1,207#1,208#3,209#1,210#7,211#3,212#2,213#9,214#1,215#3,216#1,217#1,218#7,219#1,220#1,221#5,222#1,223#1,224#3,225#1,226#1,227#3,228#3,229#1,230#5,231#5,232#1,233#3,234#1,235#1,236#5,237#1,238#1,239#3,240#7,241#3,242#1,243#3,244#1,245#1,246#3,247#1,248#3,249#3,250#1,251#1,252#3,253#1,254#3,255#1,256#3,257#1,258#1,259#7,260#9,261#3,262#1,263#1,264#3,265#1,266#1,267#5,268#1,269#1,270#1,271#1,272#1,273#3,274#1,275#1,276#1,277#3,278#1,279#7,280#1,281#238803,282#3,283#1,284#1,285#1,286#1,287#3,288#1,289#1,290#1,291#16,292#3,293#3,294#1,295#1,296#5,297#3,298#3,299#7,300#5,301#5,302#1,303#1,304#1,305#3,306#3,307#5,308#1,309#1,310#3,311#3,312#1,313#1,314#5,315#3,316#7,317#1,318#1,319#3,320#3,321#3,322#1,323#9,324#3,325#1,326#5,327#1,328#3,329#9,330#3,331#1,332#3,333#3,334#3,335#3,336#3,337#9,338#1,339#1,340#1,341#3,342#3,343#3,344#2,345#3,346#3,347#1,34
> 8#3,349#3,350#5,351#1,352#5,353#8,354#5,355#1,356#3,357#1,358#3,359#1,360#1,361#1,362#1,363#5,364#1,365#1,366#1,367#1,368#1,369#1,370#3,371#3,372#3,373#3,374#1,375#3,376#5,377#8,378#1,379#1,380#1,381#1,382#3,383#3,384#1,385#3,386#1,387#1,388#1,389#2,390#1,391#1,392#1,393#1,394#5,395#5,396#1,397#5,398#3,399#1,400#1,401#7,402#1,403#5,404#1,405#1,406#3,407#3,408#3,409#1,410#1,411#1,412#1,413#1,414#1,415#1,416#1,417#1,418#3,419#3,420#1,421#5,422#5,423#5,424#1,425#1,426#1,427#1,428#9,429#1,430#3,431#1,432#3,433#3,434#1,435#1,436#3,437#5,438#3,439#3,440#1,441#1,442#1,443#1,444#1,445#5,446#3,447#9,448#1,449#1,450#1,451#1,452#1,453#3,454#1,455#1,456#3,457#5,458#5,459#3,460#9,461#1,462#1,463#1,464#1,465#1,466#1,467#3,468#1,469#3,470#8,471#1,472#1,473#1,474#7,475#3,476#1,477#11,478#10,479#1,480#1,481#1,482#3,483#1,484#7,485#1,486#1,487#3,488#8,489#1,490#1,491#8,492#3,493#1,494#3,495#5,496#1,497#1,498#1,499#3,500#1,501#5,502#1,503#1,504#3,505#1,506#1,507#1,508#3,509#3,510#3,511#1,512#7,513#3,5
> 

[ceph-users] Re: cephadm logs

2023-07-28 Thread Adam King
Not currently. Those logs aren't generated by any daemons, they come
directly from anything done by the cephadm binary one the host, which tends
to be quite a bit since the cephadm mgr module runs most of its operations
on the host through a copy of the cephadm binary. It doesn't log to journal
because it doesn't have a systemd unit or anything, it's just a python
script being run directly and nothing has been implemented to make it
possible for that to log to journald.

On Fri, Jul 28, 2023 at 9:43 AM Luis Domingues 
wrote:

> Hi,
>
> Quick question about cephadm and its logs. On my cluster I have every logs
> that goes to journald. But on each machine, I still have
> /var/log/ceph/cephadm.log that is alive.
>
> Is there a way to make cephadm log to journald instead of a file? If yes
> did I miss it on the documentation? Of if not is there any reason to log
> into a file while everything else logs to journald?
>
> Thanks
>
> Luis Domingues
> Proton AG
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Reef release candidate - v18.1.3

2023-07-28 Thread Yuri Weinstein
This is the third and possibly last release candidate for Reef.

The Reef release comes with a new RockDB version (7.9.2) [0], which
incorporates several performance improvements and features. Our
internal testing doesn't show any side effects from the new version,
but we are very eager to hear community feedback on it. This is the
first release to have the ability to tune RockDB settings per column
family [1], which allows for more granular tunings to be applied to
different kinds of data stored in RocksDB. A new set of settings has
been used in Reef to optimize performance for most kinds of workloads
with a slight penalty in some cases, outweighed by large improvements
in use cases such as RGW, in terms of compactions and write
amplification. We would highly encourage community members to give
these a try against their performance benchmarks and use cases. The
detailed list of changes in terms of RockDB and BlueStore can be found
in https://pad.ceph.com/p/reef-rc-relnotes.

If any of our community members would like to help us with performance
investigations or regression testing of the Reef release candidate,
please feel free to provide feedback via email or in
https://pad.ceph.com/p/reef_scale_testing. For more active
discussions, please use the #ceph-at-scale slack channel in
ceph-storage.slack.com.

This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!

Thanks,
YuriW

Get the release from

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-18.1.3.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: f594a0802c34733bb06e5993bc4bdb085c9a5f3f
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm logs

2023-07-28 Thread Luis Domingues
Hi,

Quick question about cephadm and its logs. On my cluster I have every logs that 
goes to journald. But on each machine, I still have /var/log/ceph/cephadm.log 
that is alive.

Is there a way to make cephadm log to journald instead of a file? If yes did I 
miss it on the documentation? Of if not is there any reason to log into a file 
while everything else logs to journald?

Thanks

Luis Domingues
Proton AG
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] LARGE_OMAP_OBJECTS warning and bucket has lot of unknown objects and 1999 shards.

2023-07-28 Thread Uday Bhaskar Jalagam
Hello Everyone ,
 I am getting [WRN] LARGE_OMAP_OBJECTS: 18 large omap objects  warning
in one of my clusters .  I see one of the buckets has a huge number of
shards 1999 and "num_objects": 221185360 when I check bucket stats using
radosgw-admin bucket stats . However I see only 8 files when I actually do
list bucket using python boto3. I am not sure what those objects are
missing somewhere .

# radosgw-admin bucket stats --bucket=epbucket
{
"bucket": "epbucket",
"num_shards": 1999,
"tenant": "",
"zonegroup": "ac361c18-7420-4739-8e92-73109fc8ec80",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "f3e47d62-b9ff-4f38-b863-d7bfc155d698.35744173.268063",
"marker": "f3e47d62-b9ff-4f38-b863-d7bfc155d698.35744173.1225",
"index_type": "Normal",
"owner": "user",
"ver":
"0#3,1#1,2#1,3#9,4#1,5#3,6#1,7#3,8#1,9#8,10#7,11#1,12#3,13#1,14#1,15#1,16#1,17#1,18#5,19#1,20#1,21#3,22#3,23#1,24#3,25#1,26#1,27#1,28#3,29#3,30#2,31#3,32#2,33#1,34#3,35#1,36#3,37#1,38#3,39#1,40#3,41#1,42#1,43#1,44#1,45#1,46#3,47#1,48#3,49#1,50#3,51#1,52#1,53#1,54#1,55#1,56#3,57#1,58#1,59#5,60#1,61#1,62#1,63#3,64#1,65#1,66#3,67#1,68#5,69#1,70#1,71#3,72#1,73#3,74#2,75#3,76#1,77#3,78#1,79#1,80#3,81#3,82#3,83#3,84#1,85#1,86#3,87#3,88#8,89#1,90#1,91#1,92#5,93#7,94#1,95#1,96#3,97#1,98#5,99#1,100#1,101#3,102#5,103#3,104#1,105#1,106#1,107#1,108#1,109#239050,110#1,111#1,112#1,113#5,114#1,115#1,116#1,117#3,118#3,119#9,120#1,121#1,122#3,123#5,124#3,125#3,126#1,127#1,128#1,129#3,130#3,131#3,132#3,133#5,134#1,135#1,136#1,137#1,138#5,139#3,140#1,141#1,142#3,143#3,144#1,145#1,146#3,147#3,148#3,149#1,150#3,151#1,152#1,153#3,154#7,155#1,156#5,157#3,158#3,159#3,160#5,161#3,162#3,163#3,164#3,165#1,166#1,167#5,168#3,169#1,170#1,171#1,172#4,173#1,174#12,175#1,176#3,177#3,178#3,179#1,180#3,181#33,182#1,18
 
3#1,184#1,185#1,186#3,187#3,188#11,189#3,190#5,191#3,192#1,193#5,194#3,195#2,196#3,197#1,198#3,199#1,200#7,201#3,202#7,203#1,204#3,205#3,206#1,207#1,208#3,209#1,210#7,211#3,212#2,213#9,214#1,215#3,216#1,217#1,218#7,219#1,220#1,221#5,222#1,223#1,224#3,225#1,226#1,227#3,228#3,229#1,230#5,231#5,232#1,233#3,234#1,235#1,236#5,237#1,238#1,239#3,240#7,241#3,242#1,243#3,244#1,245#1,246#3,247#1,248#3,249#3,250#1,251#1,252#3,253#1,254#3,255#1,256#3,257#1,258#1,259#7,260#9,261#3,262#1,263#1,264#3,265#1,266#1,267#5,268#1,269#1,270#1,271#1,272#1,273#3,274#1,275#1,276#1,277#3,278#1,279#7,280#1,281#238803,282#3,283#1,284#1,285#1,286#1,287#3,288#1,289#1,290#1,291#16,292#3,293#3,294#1,295#1,296#5,297#3,298#3,299#7,300#5,301#5,302#1,303#1,304#1,305#3,306#3,307#5,308#1,309#1,310#3,311#3,312#1,313#1,314#5,315#3,316#7,317#1,318#1,319#3,320#3,321#3,322#1,323#9,324#3,325#1,326#5,327#1,328#3,329#9,330#3,331#1,332#3,333#3,334#3,335#3,336#3,337#9,338#1,339#1,340#1,341#3,342#3,343#3,344#2,345#3,346#3,347#1,34
 
8#3,349#3,350#5,351#1,352#5,353#8,354#5,355#1,356#3,357#1,358#3,359#1,360#1,361#1,362#1,363#5,364#1,365#1,366#1,367#1,368#1,369#1,370#3,371#3,372#3,373#3,374#1,375#3,376#5,377#8,378#1,379#1,380#1,381#1,382#3,383#3,384#1,385#3,386#1,387#1,388#1,389#2,390#1,391#1,392#1,393#1,394#5,395#5,396#1,397#5,398#3,399#1,400#1,401#7,402#1,403#5,404#1,405#1,406#3,407#3,408#3,409#1,410#1,411#1,412#1,413#1,414#1,415#1,416#1,417#1,418#3,419#3,420#1,421#5,422#5,423#5,424#1,425#1,426#1,427#1,428#9,429#1,430#3,431#1,432#3,433#3,434#1,435#1,436#3,437#5,438#3,439#3,440#1,441#1,442#1,443#1,444#1,445#5,446#3,447#9,448#1,449#1,450#1,451#1,452#1,453#3,454#1,455#1,456#3,457#5,458#5,459#3,460#9,461#1,462#1,463#1,464#1,465#1,466#1,467#3,468#1,469#3,470#8,471#1,472#1,473#1,474#7,475#3,476#1,477#11,478#10,479#1,480#1,481#1,482#3,483#1,484#7,485#1,486#1,487#3,488#8,489#1,490#1,491#8,492#3,493#1,494#3,495#5,496#1,497#1,498#1,499#3,500#1,501#5,502#1,503#1,504#3,505#1,506#1,507#1,508#3,509#3,510#3,511#1,512#7,513#3,5
 
14#1,515#3,516#5,517#1,518#3,519#1,520#3,521#3,522#3,523#5,524#3,525#3,526#1,527#1,528#1,529#1,530#1,531#3,532#1,533#3,534#5,535#5,536#1,537#3,538#3,539#5,540#1,541#1,542#1,543#1,544#7,545#3,546#1,547#1,548#1,549#3,550#1,551#1,552#3,553#1,554#3,555#3,556#1,557#3,558#5,559#3,560#3,561#8,562#1,563#3,564#7,565#3,566#5,567#1,568#3,569#3,570#3,571#3,572#5,573#1,574#1,575#5,576#5,577#3,578#7,579#1,580#1,581#5,582#1,583#3,584#1,585#1,586#9,587#9,588#3,589#1,590#1,591#3,592#3,593#1,594#10,595#3,596#1,597#3,598#1,599#1,600#1,601#3,602#3,603#1,604#3,605#5,606#1,607#1,608#1,609#10,610#3,611#264711,612#3,613#5,614#3,615#3,616#1,617#1,618#7,619#3,620#3,621#1,622#3,623#1,624#3,625#1,626#3,627#1,628#1,629#3,630#1,631#2,632#233833,633#1,634#1,635#5,636#1,637#1,638#1,639#1,640#3,641#1,642#1,643#3,644#1,645#1,646#10,647#1,648#1,649#1,650#3,651#1,652#1,653#5,654#1,655#7,656#7,657#1,658#1,659#5,660#3,661#3,662#1,663#3,664#1,665#1,666#3,667#1,668#7,669#1,670#3,671#3,672#1,673#1,674#1,675#1,676#1,677#3,6
 

[ceph-users] Re: MON sync time depends on outage duration

2023-07-28 Thread Eugen Block

Hi,

I think we found an explanation for the behaviour, we still need to  
verify it though. Just wanted to write it up for posterity.
We already knew that the large number of "purged_snap" keys in the mon  
store is responsible for the long synchronization. Removing them  
didn't seem to have a negative impact in my test cluster, but don't  
want to try that in production. They also tried a couple of variations  
with mon_sync_payload_size but it didn't have a significant impact (it  
impacted a few other keys, but not the osd_snap keys). We seemed to  
hit the payload_keys limit (default 2000), we'll suggest to increase  
it and hopefully find a suitable value. But it still didn't explain  
the variations in the sync duration.
So we looked deeper (also dived into the code) and finally got some  
debug logs we could analyse.
The paxos versions determine if a "full sync" is required or a "recent  
sync" is sufficient:


if (paxos->get_version() < m->paxos_first_version &&
m->paxos_first_version > 1) {
  dout(10) << " peer paxos first versions [" << m->paxos_first_version
   << "," << m->paxos_last_version << "]"
   << " vs my version " << paxos->get_version()
   << " (too far ahead)"
   << dendl;
...

So if the current version of the to-be-synced mon is lower than the  
first available version of the peer it starts a full sync, otherwise a  
recent sync is started. In one of the tests (simulating a mon reboot)  
the difference between paxos versions was 628. I checked the available  
mon config options and found "paxos_min" (default 500). This will be  
the next suggestion, increase paxos_min to 1000 so the cluster doesn't  
require a full sync after a regular reboot and only do a full sync in  
case it's down for a longer period of time. Not sure what other impact  
it could have except for some more storage consumption, but we'll let  
them test it.
But this still doesn't explain the variations in the startup times. My  
current theory is that the duration depends on the timing of the  
reboot/daemon shutdown: The rbd-mirror is currently configured with a  
30 minute schedule. This means that every full and every half hour new  
snapshots are created and synced, older snapshots are deleted which  
impacts the osdmap. So if a MON goes down during this time it's very  
likely that its paxos version will be lower than the first available  
on the peer(s). So if a reboot is scheduled after the snapshot  
schedule the mon synchronisation time probably will decrease. This  
also needs some varification, still waiting for the results.


From my perspective, those two config options (mon_sync_payload_keys,  
paxos_min) and rebooting a MON server at the right time are the most  
promising approaches for now. Having the mon store on SSDs would help  
as well, of course, but unfortunately that's currently not an option.


I'll update this thread when we have more results, maybe my theory  
garbage, but I'm confident. :-) If you have comments or objections  
regarding those config options, I'd appreciate your comments.


Thanks,
Eugen

Zitat von Josh Baergen :


Out of curiosity, what is your require_osd_release set to? (ceph osd
dump | grep require_osd_release)

Josh

On Tue, Jul 11, 2023 at 5:11 AM Eugen Block  wrote:


I'm not so sure anymore if that could really help here. The dump-keys
output from the mon contains 42 million osd_snap prefix entries, 39
million of them are "purged_snap" keys. I also compared to other
clusters as well, those aren't tombstones but expected "history" of
purged snapshots. So I don't think removing a couple of hundred trash
snapshots will actually reduce the number of osd_snap keys. At least
doubling the payload_size seems to have a positive impact. The
compaction during the sync has a negative impact, of course, same as
not having the mon store on SSDs.
I'm currently playing with a test cluster, removing all "purged_snap"
entries from the mon db (not finished yet) to see what that will do
with the mon and if it will even start correctly. But has anyone done
that, removing keys from the mon store? Not sure what to expect yet...

Zitat von Dan van der Ster :

> Oh yes, sounds like purging the rbd trash will be the real fix here!
> Good luck!
>
> __
> Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com
>
>
>
>
> On Mon, Jul 10, 2023 at 6:10 AM Eugen Block  wrote:
>
>> Hi,
>> I got a customer response with payload size 4096, that made things
>> even worse. The mon startup time was now around 40 minutes. My doubts
>> wrt decreasing the payload size seem confirmed. Then I read Dan's
>> response again which also mentions that the default payload size could
>> be too small. So I asked them to double the default (2M instead of 1M)
>> and am now waiting for a new result. I'm still wondering why this only
>> happens when the mon is down for more than 5 minutes. Does 

[ceph-users] Re: MDS stuck in rejoin

2023-07-28 Thread Xiubo Li


On 7/26/23 22:13, Frank Schilder wrote:

Hi Xiubo.


... I am more interested in the kclient side logs. Just want to
know why that oldest request got stuck so long.

I'm afraid I'm a bad admin in this case. I don't have logs from the host any 
more, I would have needed the output of dmesg and this is gone. In case it 
happens again I will try to pull the info out.

The tracker https://tracker.ceph.com/issues/22885 sounds a lot more violent 
than our situation. We had no problems with the MDSes, the cache didn't grow 
and the relevant one was also not put into read-only mode. It was just this 
warning showing all the time, health was OK otherwise. I think the warning was 
there for at least 16h before I failed the MDS.

The MDS log contains nothing, this is the only line mentioning this client:

2023-07-20T00:22:05.518+0200 7fe13df59700  0 log_channel(cluster) log [WRN] : 
client.145678382 does not advance its oldest_client_tid (16121616), 10 
completed requests recorded in session


Okay, if so it's hard to say and dig out what has happened in client why 
it didn't advance the tid.


Thanks

- Xiubo



Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io