[ceph-users] Re: Mon-map inconsistency?

2021-09-06 Thread Josh Baergen
Hi Melanie,

On Mon, Sep 6, 2021 at 10:06 AM Desaive, Melanie
 wrote:
> When I execute "ceph mon_status --format json-pretty" from our 
> ceph-management VM, the correct mon nodes are returned.
>
> But when I execute "ceph daemon osd.xx config show | grep mon_host" on the 
> respective storage node the old mon node IPs are returned.
>
> I am now unsure, that if I change more mon nodes, the information known to 
> the OSDs could become invalid one after the other and we could run into heavy 
> problems?

"config show" is showing you the mon IPs read from the OSDs'
ceph.conf, and is what is used to initially connect to the mons. After
that, my understanding is that those IPs don't matter as the OSDs will
use the IPs from the mons for further connections/communication.
However, I'm not certain what happens if, for example, all of your
mons were to go down for a period of time; do the OSDs use the last
monmap for reconnecting to the mons or do they revert to using the
configured mon IPs?

At the very least, you should be fine to replace all of your mons and
update each ceph.conf with the new info without needing to restart the
OSDs. After that it may be wise to restart the OSDs to both update the
configured mon IPs as well as test to make sure that they can
reconnect to the new mons without issue in case of a future outage.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon-map inconsistency?

2021-09-06 Thread ceph
Hi,

what is the output of

ceph mon stat
Or
ceph mon dump

and
ceph quorum_status

When you see there your expectes nodes then i should all be fine...

Hth
Mehmet

Am 6. September 2021 18:23:46 MESZ schrieb Josh Baergen 
:
>Hi Melanie,
>
>On Mon, Sep 6, 2021 at 10:06 AM Desaive, Melanie
> wrote:
>> When I execute "ceph mon_status --format json-pretty" from our 
>> ceph-management VM, the correct mon nodes are returned.
>>
>> But when I execute "ceph daemon osd.xx config show | grep mon_host" on the 
>> respective storage node the old mon node IPs are returned.
>>
>> I am now unsure, that if I change more mon nodes, the information known to 
>> the OSDs could become invalid one after the other and we could run into 
>> heavy problems?
>
>"config show" is showing you the mon IPs read from the OSDs'
>ceph.conf, and is what is used to initially connect to the mons. After
>that, my understanding is that those IPs don't matter as the OSDs will
>use the IPs from the mons for further connections/communication.
>However, I'm not certain what happens if, for example, all of your
>mons were to go down for a period of time; do the OSDs use the last
>monmap for reconnecting to the mons or do they revert to using the
>configured mon IPs?
>
>At the very least, you should be fine to replace all of your mons and
>update each ceph.conf with the new info without needing to restart the
>OSDs. After that it may be wise to restart the OSDs to both update the
>configured mon IPs as well as test to make sure that they can
>reconnect to the new mons without issue in case of a future outage.
>
>Josh
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mon-map inconsistency?

2021-09-06 Thread Desaive, Melanie
Hi all,



I am quite new to Ceph and could use some advice:

We are running a Ceph cluster with mon services on some of the storage nodes. 
Last week it became necessary to change the mon nodes to different hosts. We 
already deployed one new mon and deinstalled an old one. We would now like to 
continue with further changes. But I stumbled over something that looks like an 
inconsistency to me.

When I execute "ceph mon_status --format json-pretty" from our ceph-management 
VM, the correct mon nodes are returned.

But when I execute "ceph daemon osd.xx config show | grep mon_host" on the 
respective storage node the old mon node IPs are returned.

I am now unsure, that if I change more mon nodes, the information known to the 
OSDs could become invalid one after the other and we could run into heavy 
problems?

Any advice is appreciated!

Our version is Ceph 10.2.11



Kind regards,

Melanie


Melanie Desaive
Cloud DevOps Engineer
Cloud Management Platform

[cid:logo-aktuell-193px_17c55ea4-b743-4459-8492-93a5f153938c.png]

melanie.desa...@nttdata.com

NTT DATA Business Solutions Global Managed Services GmbH
Bismarckstrasse 105 - 10625 Berlin, Germany

www.nttdata-solutions.com

[cid:image002_9a8f5529-347a-4a29-94f2-f6c20d166fc9.png]
 [cid:image003_9807fcff-df79-41d4-ae52-7e249915dd12.png] 
  
[cid:image007_945ef6da-cf94-4598-8077-367f3309531f.png] 
  
[cid:image005_0a7f9378-c955-4bf5-bffd-e42fceb2caa5.png] 
  
[cid:image004_1c0ea0b8-be9c-40f9-99ed-44853d4d35a9.png] 


Bleiben Sie auf dem Laufenden - mit unserem Infoservice! Jetzt kostenlos 
registrieren!

Gesch?ftsf?hrung: Mirko Kruse, Andr? Walter
Sitz und Amtsgericht: Dresden / HRB 21356

This email and any attachments are sent in strictest confidence for the sole 
use of the addressee and may contain legally privileged, confidential, and 
proprietary data. If you are not the intended recipient, then you have received 
this email in error. If this is the case please advise the sender by replying 
promptly to this email and then permanently delete this email and any 
attachments without any use, dissemination, printing, copying or forwarding. 
Copyright NTT DATA Business Solutions. All rights reserved
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance optimization

2021-09-06 Thread Robert Sander

Am 06.09.21 um 16:44 schrieb Simon Sutter:


|node1|node2|node3|node4|node5|node6|node7|node8|
|1x1TB|1x1TB|1x1TB|1x1TB|1x1TB|1x1TB|1x1TB|1x1TB|
|4x2TB|4x2TB|4x2TB|4x2TB|4x2TB|4x2TB|4x2TB|4x2TB|
|1x6TB|1x6TB|1x6TB|1x6TB|1x6TB|1x6TB|1x6TB|1x6TB|


"ceph osd df tree" should show the data distribution among the OSDs.

Are all of these HDDs? Are these HDDs equipped with RocksDB on SSD?
HDD only will have abysmal performance.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance optimization

2021-09-06 Thread Simon Sutter
Hello


> >

> > >> - The one 6TB disk, per node?
> > >
> > > You get bad distribution of data, why not move drives around between
> > these to clusters, so you have more the same in each.
> > >
> >
> > I would assume that this behaves exactly the other way around. As long
> > as you have the same number of block devices with the same size
> > distribution in each node you will get an even data distribution.
> >
> > If you have a node with 4 3TB drives and one with 4 6TB drives Ceph
> > cannot use the 6TB drives efficiently.
> >
> He has 2 clusters thus 3TB -> cluster 1, 6TB -> cluster eg.



Sorry for the bad Information.

I have two clusters, but my question was about just one of them.


Yes Robert is right, instead of this configuration:

| node1 | node2 | node3 | node4 | node5 | node6 | node7 | node8 |
| 1x1TB | 1x1TB | 1x1TB | 1x1TB | 1x1TB | 1x1TB | 1x1TB | 1x1TB |
| 4x2TB | 4x2TB | 4x2TB | 4x2TB | 4x2TB | 4x2TB | 4x2TB | 4x2TB |
| 1x6TB | 1x6TB | 1x6TB | 1x6TB | 1x6TB | 1x6TB | 1x6TB | 1x6TB |

This:
| node1 | node2 | node3 | node4 | node5 | node6 | node7 | node8 |
| 6x3TB | 6x3TB | 6x3TB | 6x3TB | 6x3TB | 6x3TB | 6x3TB | 6x3TB |


Would this even be a noticeable performance difference? Because if I'm not 
mistaken, ceph will try to fill every disk on one node to the same percentage.


And about Erasure Coded: what would be the recommended specification?
Because replicated uses so much more storage, it wasn't really an option until 
now.
We didn't have any problems with CPU utilization and I can go to 32GB for every 
node, and 64 for MDS nodes.


Thanks




Von: Marc 
Gesendet: Montag, 6. September 2021 13:53:06
An: Robert Sander; ceph-users@ceph.io
Betreff: [ceph-users] Re: Performance optimization

>
> >> - The one 6TB disk, per node?
> >
> > You get bad distribution of data, why not move drives around between
> these to clusters, so you have more the same in each.
> >
>
> I would assume that this behaves exactly the other way around. As long
> as you have the same number of block devices with the same size
> distribution in each node you will get an even data distribution.
>
> If you have a node with 4 3TB drives and one with 4 6TB drives Ceph
> cannot use the 6TB drives efficiently.
>
He has 2 clusters thus 3TB -> cluster 1, 6TB -> cluster eg.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem mounting cephfs Share

2021-09-06 Thread Hendrik Peyerl
Hi Eugen,

thanks for the idea but i didn’t have anything mounted that i could unmount


> On 6. Sep 2021, at 09:15, Eugen Block  wrote:
> 
> Hi,
> 
> I just got the same message in my lab environment (octopus) which I had 
> redeployed. The client's keyring had changed after redeployment and I think I 
> had a stale mount. After 'umount' and 'mount' with the proper keyring it 
> worked as expected.
> 
> 
> Zitat von Hendrik Peyerl :
> 
>> Hello All,
>> 
>> i recently tried to reactivate a CEPH Cluster that I setup last year. I 
>> applied patches regularly and did some tests afterwards which always worked.
>> 
>> Now I did run my usual tests again before getting it ready to use in 
>> production but I am not able to mount my cephfs shares anymore, I always run 
>> into the following error:
>> 
>> mount error: no mds server is up or the cluster is laggy
>> 
>> The ceph health is OK, I can reach all MDS Servers and all other servers 
>> aswell, the S3 Gateways is also still working. I did not find any errors 
>> within the logs that would help me debug this further.
>> 
>> As i want to learn how to debug those issues in the future I’d rather try to 
>> repair the cluster instead of just recreating it since I dont have any 
>> pressure to get it running again quickly.
>> 
>> Could you guys give me any hints on where to look further?
>> 
>> CEPH Version: 14.2.22
>> OS: CentOS7
>> 
>> Thanks in Advance,
>> 
>> Hendrik
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem mounting cephfs Share

2021-09-06 Thread Hendrik Peyerl
Hi Marc,

thanks for getting back to me.

Fuse debug output is: 7f84c1c9e700 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [2]

Any idea what that tells me?

Thanks,
Hendrik


> On 3. Sep 2021, at 19:38, Marc  wrote:
> 
> Maybe try and mount with ceph-fuse and debugging on?
> 
>> i recently tried to reactivate a CEPH Cluster that I setup last year. I
>> applied patches regularly and did some tests afterwards which always
>> worked.
>> 
>> Now I did run my usual tests again before getting it ready to use in
>> production but I am not able to mount my cephfs shares anymore, I always
>> run into the following error:
>> 
>> mount error: no mds server is up or the cluster is laggy
>> 
>> The ceph health is OK, I can reach all MDS Servers and all other servers
>> aswell, the S3 Gateways is also still working. I did not find any errors
>> within the logs that would help me debug this further.
>> 
>> As i want to learn how to debug those issues in the future I’d rather
>> try to repair the cluster instead of just recreating it since I dont
>> have any pressure to get it running again quickly.
>> 
>> Could you guys give me any hints on where to look further?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-06 Thread Frank Schilder
Hi Dan,

unfortunately, setting these parameters crashed the MDS cluster and we now have 
severe performance issues. Particularly bad is mds_recall_max_decay_rate. Even 
just setting it to the default value immediately makes all MDS daemons 
unresponsive and get failed by the MONs. I already set the mds beacon time-out 
to 10 minutes to avoid MDS daemons getting marked down too early when they need 
to trim a large (oversized) cache. The formerly active then failed daemons 
never recover, I have to restart them manually to get them back as stand-bys.

We are running mimic-13.2.10. Does explicitly setting mds_recall_max_decay_rate 
enable a different code path in this version?

I tried to fix the situation by removing all modified config pars (ceph config 
rm ...) again and doing a full restart of all daemons, first all stand-bys and 
then the active ones one by one. Unfortunately, this did not help. In addition, 
it looks like one of our fs data pools does not purge snapshots any more:

pool 12 'con-fs2-meta1' no removed_snaps list shown
pool 13 'con-fs2-meta2' removed_snaps 
[2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 14 'con-fs2-data' removed_snaps 
[2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 17 'con-fs2-data-ec-ssd' removed_snaps 
[2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 19 'con-fs2-data2' removed_snaps 
[2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]

con-fs2-meta2 is the primary data pool. It does not store actual file data, we 
have con-fs2-data2 set as data pool on the fs root. Its the new recommended 
3-pool layout with the meta-data- and the primary data pool storing meta-data 
only.

The MDS daemons report 12 snapshots and if I interpret the removed_snaps info 
correctly, the pools con-fs2-meta2, con-fs2-data and con-fs2-data-ec-ssd store 
12 snapshots. However, pool con-fs2-data2 has at least 20. We use rolling 
snapshots and it looks like the snapshots are not purged any more since I tried 
setting the MDS trimming parameters. This, in turn, is potentially a reason for 
the performance degradation we experience at the moment.

I would be most grateful if you could provide some pointers as to what to look 
for with regards of why snapshots don't disappear and/or what might have 
happened to our MDS daemons performance wise.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 31 August 2021 16:23:15
To: Dan van der Ster
Cc: ceph-users
Subject: [ceph-users] Re: MDS daemons stuck in resolve, please help

Hi Dan,

I'm running mimic latest version.

Thanks for the link to the PR, this looks good.

Directory pinning does not work in mimic, I had another case on that. The 
required xattribs are not implemented although documented. The default load 
balancing seems to work quite well for us - I saw the warnings about possible 
performance impacts in the documentation. I think I scaled the MDS cluster up 
to the right size, the MDS daemons usually manage to trim their cache well 
below the reservation point and can take peak loads without moving clients 
around. All MDSes have about the same average request load. With the 
reorganised meta data pool the aggregated performance is significantly better 
than with a single MDS. I would say that most of the time it scales with the 
MDS count.

Of course, the find over the entire FS tree did lead to a lot of fun. 
Fortunately, users don't do that.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dan van der Ster 
Sent: 31 August 2021 15:26:17
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] Re: MDS daemons stuck in resolve, please help

Hi Frank,

It helps if you start threads reminding us which version you're running.

During nautilus the caps recall issue (which is AFAIK the main cause
of mds cache overruns) should be solved with this PR:
https://github.com/ceph/ceph/pull/39134/files
If you're not running >= 14.2.17 then you should probably just apply
these settings all together. (Don't worry which order they are set or
whatever -- just make the changes within a short window).

Also, to try to understand your MDS issues -- are you using pinning or
letting metadata move around between MDSs ?
find / might wreak havoc if you aren't pinning.

-- dan


On Tue, Aug 31, 2021 at 2:13 PM Frank Schilder  wrote:
>
> I seem to be hit by the problem discussed here: 
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AOYWQSONTFROPB4DXVYADWW7V25C3G6Z/
>
> In my case, what helped getting the cash size growth somewhat under control 
> was
>
> ceph config set mds mds_recall_max_caps 1
>
> I'm not sure about the options 

[ceph-users] Re: Performance optimization

2021-09-06 Thread Marc
> 
> >> - The one 6TB disk, per node?
> >
> > You get bad distribution of data, why not move drives around between
> these to clusters, so you have more the same in each.
> >
> 
> I would assume that this behaves exactly the other way around. As long
> as you have the same number of block devices with the same size
> distribution in each node you will get an even data distribution.
> 
> If you have a node with 4 3TB drives and one with 4 6TB drives Ceph
> cannot use the 6TB drives efficiently.
> 
He has 2 clusters thus 3TB -> cluster 1, 6TB -> cluster eg.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance optimization

2021-09-06 Thread Robert Sander

Am 06.09.21 um 11:54 schrieb Marc:


- The one 6TB disk, per node?


You get bad distribution of data, why not move drives around between these to 
clusters, so you have more the same in each.



I would assume that this behaves exactly the other way around. As long 
as you have the same number of block devices with the same size 
distribution in each node you will get an even data distribution.


If you have a node with 4 3TB drives and one with 4 6TB drives Ceph 
cannot use the 6TB drives efficiently.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance optimization

2021-09-06 Thread Simon Sutter


Hello

Thanks for this first input, I already found, at least one of those 6TB Disks 
is a WD Blue WD60EZAZ which is according to WD with SMR.
I will replace everything with SMR in it, but in the process of replacing 
hardware, should I replace all disks with for example all 3TB disks?
And what do you think about having the os on one of the disks, used by ceph?

Thanks in advance,
Simon



Von: Kai Börnert 
Gesendet: Montag, 6. September 2021 10:54:24
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: Performance optimization

Hi,

are any of those old disks SMR ones? Because they will absolutely
destroy any kind of performance (ceph does not use writecaches due to
powerloss concerns, so they kinda do their whole magic for each
writerequest).

Greetings

On 9/6/21 10:47 AM, Simon Sutter wrote:
> Hello everyone!
>
> I have built two clusters with old hardware, which is lying around, the 
> possibility to upgrade is there.
> The clusters main usecase is hot backup. This means it's getting written 24/7 
> where 99% is writing and 1% is reading.
>
>
> It should be based on harddisks.
>
>
>
> At the moment, the nodes look like this:
> 8 Nodes
> Worst CPU: i7-3930K (up to i7-6850K)
>
> Worst ammount of RAM: 24GB (up to 64GB)
> HDD Layout:
> 1x 1TB
> 4x 2TB
> 1x 6TB
> all sata, some just 5400rpm
>
> I had to put the OS on the 6TB HDDs, because there are no more sata 
> connections on the motherboard.
>
> The servers, which have to be backed up, have mounted the ceph with cephfs.
> 99% of the files, that have to be backed up, are harddisk images, so sizes 
> from 5GB to 1TB.
>
> All files are written to an erasure-coded pool with k=6 m=2, compression is 
> on passive snappy, default settings.
>
> I'm getting really bad performace with this setup.
> This is a bench, run with: "rados -p ec_test bench -b 524288 60 write" while 
> normal operations:
>
> Total time run: 63.4957
> Total writes made:  459
> Write size: 524288
> Object size:524288
> Bandwidth (MB/sec): 3.61442
> Stddev Bandwidth:   3.30073
> Max bandwidth (MB/sec): 16
> Min bandwidth (MB/sec): 0
> Average IOPS:   7
> Stddev IOPS:6.6061
> Max IOPS:   32
> Min IOPS:   0
> Average Latency(s): 2.151
> Stddev Latency(s):  2.3661
> Max latency(s): 14.0916
> Min latency(s): 0.0420954
> Cleaning up (deleting benchmark objects)
> Removed 459 objects
> Clean up completed and total clean up time :35.6908
>
> [root@testnode01 ~]# ceph osd perf
> osd  commit_latency(ms)  apply_latency(ms)
>6 655655
>9  13 13
>   11  15 15
>7  17 17
>   10  19 19
>8  12 12
>   24 153153
>   25  22 22
>   47  20 20
>   46  23 23
>   45  43 43
>   44   8  8
>   16  26 26
>   15  18 18
>   14  14 14
>   13  23 23
>   12  47 47
>   18 595595
>1  20 20
>   38  25 25
>   17  17 17
>0 317317
>   37  19 19
>   19  14 14
>2  16 16
>   39   9  9
>   20  16 16
>3  18 18
>   40  10 10
>   21  23 23
>4  17 17
>   41  29 29
>5  18 18
>   42  16 16
>   22  16 16
>   23  13 13
>   26  20 20
>   27  10 10
>   28  28 28
>   29  13 13
>   30  34 34
>   31  10 10
>   32  31 31
>   33  44 44
>   34  21 21
>   35  22 22
>   36 295295
>   43   9  9
>
>
>
> What do you think is the most obvious Problem?
>
> - The one 6TB disk, per node?
> - The OS on the 6TB disk?
>
> What would you suggest?
>
> What I hope to replace 

[ceph-users] Re: Performance optimization

2021-09-06 Thread Marc
> 
> 
> At the moment, the nodes look like this:
> 8 Nodes
> Worst CPU: i7-3930K (up to i7-6850K)

> Worst ammount of RAM: 24GB (up to 64GB)
> HDD Layout:
> 1x 1TB
> 4x 2TB
> 1x 6TB
> all sata, some just 5400rpm
> 
> I had to put the OS on the 6TB HDDs, because there are no more sata
> connections on the motherboard.
Why not on the 1TB?

> The servers, which have to be backed up, have mounted the ceph with
> cephfs.
> 99% of the files, that have to be backed up, are harddisk images, so
> sizes from 5GB to 1TB.
> 
> All files are written to an erasure-coded pool with k=6 m=2, compression
> is on passive snappy, default settings.
> 
> I'm getting really bad performace with this setup.
> This is a bench, run with: "rados -p ec_test bench -b 524288 60 write"
> while normal operations:
> 
> Total time run: 63.4957
> Total writes made:  459
> Write size: 524288
> Object size:524288
> Bandwidth (MB/sec): 3.61442
> Stddev Bandwidth:   3.30073
> Max bandwidth (MB/sec): 16
> Min bandwidth (MB/sec): 0
> Average IOPS:   7
> Stddev IOPS:6.6061
> Max IOPS:   32
> Min IOPS:   0
> Average Latency(s): 2.151
> Stddev Latency(s):  2.3661
> Max latency(s): 14.0916
> Min latency(s): 0.0420954
> Cleaning up (deleting benchmark objects)
> Removed 459 objects
> Clean up completed and total clean up time :35.6908
> 
> [root@testnode01 ~]# ceph osd perf
> osd  commit_latency(ms)  apply_latency(ms)
>   6 655655
>   9  13 13
>  11  15 15
>   7  17 17
>  10  19 19
>   8  12 12
>  24 153153
>  25  22 22
>  47  20 20
>  46  23 23
>  45  43 43
>  44   8  8
>  16  26 26
>  15  18 18
>  14  14 14
>  13  23 23
>  12  47 47
>  18 595595
>   1  20 20
>  38  25 25
>  17  17 17
>   0 317317
>  37  19 19
>  19  14 14
>   2  16 16
>  39   9  9
>  20  16 16
>   3  18 18
>  40  10 10
>  21  23 23
>   4  17 17
>  41  29 29
>   5  18 18
>  42  16 16
>  22  16 16
>  23  13 13
>  26  20 20
>  27  10 10
>  28  28 28
>  29  13 13
>  30  34 34
>  31  10 10
>  32  31 31
>  33  44 44
>  34  21 21
>  35  22 22
>  36 295295
>  43   9  9
> 
> 
> 
> What do you think is the most obvious Problem?

erasure-coded

> - The one 6TB disk, per node?

You get bad distribution of data, why not move drives around between these to 
clusters, so you have more the same in each. 

> - The OS on the 6TB disk?

You have the os combined on a disk that also acts as a ceph osd? That is not 
really a pretty solution. Why just not use the 1TB as os?

> What would you suggest?

Forget about cephfs, that requires a mds, and mine was already eating 12GB. You 
do not have that memory. Use and rbd image 3x replicated.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What's your biggest ceph cluster?

2021-09-06 Thread zhang listar
Thanks, I'll check it out.

Christian Wuerdig  于2021年9月3日周五 上午6:16写道:

> This probably provides a reasonable overview -
> https://ceph.io/en/news/blog/2020/public-telemetry-dashboards/,
> specifically the grafana dashboard is here:
> https://telemetry-public.ceph.com
> Keep in mind not all clusters have telemetry enabled
>
> The largest recorded cluster seems to be in the 32-64PB bucket
>
> On Thu, 2 Sept 2021 at 19:22, zhang listar 
> wrote:
>
>> Hi, all. I want to know that how big cluster ceph can support.
>> Please give me some information about your ceph cluster, including cluster
>> type say object store, file system.
>> Thanks in advance.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS daemons stuck in resolve, please help

2021-09-06 Thread Dan van der Ster
Hi Frank,

That's unfortunate! Most of those options relax warnings and relax
when a client is considered having too many caps.
The option mds_recall_max_caps might be CPU intensive -- the MDS would
be busy recalling caps if indeed you have clients which are hammering
the MDSs with metadata workloads.
What is your current `ceph fs status` output? If you have very active
users, perhaps you can ask them to temporarily slow down and see the
impact on your cluster?

I'm not aware of any relation between caps recall and snap trimming.
We don't use snapshots (until now some pacific tests) so I can't say
if that is relevant to this issue.

-- dan




On Mon, Sep 6, 2021 at 11:18 AM Frank Schilder  wrote:
>
> Hi Dan,
>
> unfortunately, setting these parameters crashed the MDS cluster and we now 
> have severe performance issues. Particularly bad is 
> mds_recall_max_decay_rate. Even just setting it to the default value 
> immediately makes all MDS daemons unresponsive and get failed by the MONs. I 
> already set the mds beacon time-out to 10 minutes to avoid MDS daemons 
> getting marked down too early when they need to trim a large (oversized) 
> cache. The formerly active then failed daemons never recover, I have to 
> restart them manually to get them back as stand-bys.
>
> We are running mimic-13.2.10. Does explicitly setting 
> mds_recall_max_decay_rate enable a different code path in this version?
>
> I tried to fix the situation by removing all modified config pars (ceph 
> config rm ...) again and doing a full restart of all daemons, first all 
> stand-bys and then the active ones one by one. Unfortunately, this did not 
> help. In addition, it looks like one of our fs data pools does not purge 
> snapshots any more:
>
> pool 12 'con-fs2-meta1' no removed_snaps list shown
> pool 13 'con-fs2-meta2' removed_snaps 
> [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
> pool 14 'con-fs2-data' removed_snaps 
> [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
> pool 17 'con-fs2-data-ec-ssd' removed_snaps 
> [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
> pool 19 'con-fs2-data2' removed_snaps 
> [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
>
> con-fs2-meta2 is the primary data pool. It does not store actual file data, 
> we have con-fs2-data2 set as data pool on the fs root. Its the new 
> recommended 3-pool layout with the meta-data- and the primary data pool 
> storing meta-data only.
>
> The MDS daemons report 12 snapshots and if I interpret the removed_snaps info 
> correctly, the pools con-fs2-meta2, con-fs2-data and con-fs2-data-ec-ssd 
> store 12 snapshots. However, pool con-fs2-data2 has at least 20. We use 
> rolling snapshots and it looks like the snapshots are not purged any more 
> since I tried setting the MDS trimming parameters. This, in turn, is 
> potentially a reason for the performance degradation we experience at the 
> moment.
>
> I would be most grateful if you could provide some pointers as to what to 
> look for with regards of why snapshots don't disappear and/or what might have 
> happened to our MDS daemons performance wise.
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Frank Schilder 
> Sent: 31 August 2021 16:23:15
> To: Dan van der Ster
> Cc: ceph-users
> Subject: [ceph-users] Re: MDS daemons stuck in resolve, please help
>
> Hi Dan,
>
> I'm running mimic latest version.
>
> Thanks for the link to the PR, this looks good.
>
> Directory pinning does not work in mimic, I had another case on that. The 
> required xattribs are not implemented although documented. The default load 
> balancing seems to work quite well for us - I saw the warnings about possible 
> performance impacts in the documentation. I think I scaled the MDS cluster up 
> to the right size, the MDS daemons usually manage to trim their cache well 
> below the reservation point and can take peak loads without moving clients 
> around. All MDSes have about the same average request load. With the 
> reorganised meta data pool the aggregated performance is significantly better 
> than with a single MDS. I would say that most of the time it scales with the 
> MDS count.
>
> Of course, the find over the entire FS tree did lead to a lot of fun. 
> Fortunately, users don't do that.
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dan van der Ster 
> Sent: 31 August 2021 15:26:17
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] Re: MDS daemons stuck in resolve, please help
>
> Hi Frank,
>
> It helps if you start threads reminding us which version you're running.
>
> During nautilus the caps recall issue 

[ceph-users] Re: Performance optimization

2021-09-06 Thread Kai Börnert

Hi,

are any of those old disks SMR ones? Because they will absolutely 
destroy any kind of performance (ceph does not use writecaches due to 
powerloss concerns, so they kinda do their whole magic for each 
writerequest).


Greetings

On 9/6/21 10:47 AM, Simon Sutter wrote:

Hello everyone!

I have built two clusters with old hardware, which is lying around, the 
possibility to upgrade is there.
The clusters main usecase is hot backup. This means it's getting written 24/7 
where 99% is writing and 1% is reading.


It should be based on harddisks.



At the moment, the nodes look like this:
8 Nodes
Worst CPU: i7-3930K (up to i7-6850K)

Worst ammount of RAM: 24GB (up to 64GB)
HDD Layout:
1x 1TB
4x 2TB
1x 6TB
all sata, some just 5400rpm

I had to put the OS on the 6TB HDDs, because there are no more sata connections 
on the motherboard.

The servers, which have to be backed up, have mounted the ceph with cephfs.
99% of the files, that have to be backed up, are harddisk images, so sizes from 
5GB to 1TB.

All files are written to an erasure-coded pool with k=6 m=2, compression is on 
passive snappy, default settings.

I'm getting really bad performace with this setup.
This is a bench, run with: "rados -p ec_test bench -b 524288 60 write" while 
normal operations:

Total time run: 63.4957
Total writes made:  459
Write size: 524288
Object size:524288
Bandwidth (MB/sec): 3.61442
Stddev Bandwidth:   3.30073
Max bandwidth (MB/sec): 16
Min bandwidth (MB/sec): 0
Average IOPS:   7
Stddev IOPS:6.6061
Max IOPS:   32
Min IOPS:   0
Average Latency(s): 2.151
Stddev Latency(s):  2.3661
Max latency(s): 14.0916
Min latency(s): 0.0420954
Cleaning up (deleting benchmark objects)
Removed 459 objects
Clean up completed and total clean up time :35.6908

[root@testnode01 ~]# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
   6 655655
   9  13 13
  11  15 15
   7  17 17
  10  19 19
   8  12 12
  24 153153
  25  22 22
  47  20 20
  46  23 23
  45  43 43
  44   8  8
  16  26 26
  15  18 18
  14  14 14
  13  23 23
  12  47 47
  18 595595
   1  20 20
  38  25 25
  17  17 17
   0 317317
  37  19 19
  19  14 14
   2  16 16
  39   9  9
  20  16 16
   3  18 18
  40  10 10
  21  23 23
   4  17 17
  41  29 29
   5  18 18
  42  16 16
  22  16 16
  23  13 13
  26  20 20
  27  10 10
  28  28 28
  29  13 13
  30  34 34
  31  10 10
  32  31 31
  33  44 44
  34  21 21
  35  22 22
  36 295295
  43   9  9



What do you think is the most obvious Problem?

- The one 6TB disk, per node?
- The OS on the 6TB disk?

What would you suggest?

What I hope to replace with this setup:
6 servers, each with 4x3TB disks, with lvm, no redundancy. (two times, that's 
why I have set up two clusters)

Thanks in advance

Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Performance optimization

2021-09-06 Thread Simon Sutter
Hello everyone!

I have built two clusters with old hardware, which is lying around, the 
possibility to upgrade is there.
The clusters main usecase is hot backup. This means it's getting written 24/7 
where 99% is writing and 1% is reading.


It should be based on harddisks.



At the moment, the nodes look like this:
8 Nodes
Worst CPU: i7-3930K (up to i7-6850K)

Worst ammount of RAM: 24GB (up to 64GB)
HDD Layout:
1x 1TB
4x 2TB
1x 6TB
all sata, some just 5400rpm

I had to put the OS on the 6TB HDDs, because there are no more sata connections 
on the motherboard.

The servers, which have to be backed up, have mounted the ceph with cephfs.
99% of the files, that have to be backed up, are harddisk images, so sizes from 
5GB to 1TB.

All files are written to an erasure-coded pool with k=6 m=2, compression is on 
passive snappy, default settings.

I'm getting really bad performace with this setup.
This is a bench, run with: "rados -p ec_test bench -b 524288 60 write" while 
normal operations:

Total time run: 63.4957
Total writes made:  459
Write size: 524288
Object size:524288
Bandwidth (MB/sec): 3.61442
Stddev Bandwidth:   3.30073
Max bandwidth (MB/sec): 16
Min bandwidth (MB/sec): 0
Average IOPS:   7
Stddev IOPS:6.6061
Max IOPS:   32
Min IOPS:   0
Average Latency(s): 2.151
Stddev Latency(s):  2.3661
Max latency(s): 14.0916
Min latency(s): 0.0420954
Cleaning up (deleting benchmark objects)
Removed 459 objects
Clean up completed and total clean up time :35.6908

[root@testnode01 ~]# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
  6 655655
  9  13 13
 11  15 15
  7  17 17
 10  19 19
  8  12 12
 24 153153
 25  22 22
 47  20 20
 46  23 23
 45  43 43
 44   8  8
 16  26 26
 15  18 18
 14  14 14
 13  23 23
 12  47 47
 18 595595
  1  20 20
 38  25 25
 17  17 17
  0 317317
 37  19 19
 19  14 14
  2  16 16
 39   9  9
 20  16 16
  3  18 18
 40  10 10
 21  23 23
  4  17 17
 41  29 29
  5  18 18
 42  16 16
 22  16 16
 23  13 13
 26  20 20
 27  10 10
 28  28 28
 29  13 13
 30  34 34
 31  10 10
 32  31 31
 33  44 44
 34  21 21
 35  22 22
 36 295295
 43   9  9



What do you think is the most obvious Problem?

- The one 6TB disk, per node?
- The OS on the 6TB disk?

What would you suggest?

What I hope to replace with this setup:
6 servers, each with 4x3TB disks, with lvm, no redundancy. (two times, that's 
why I have set up two clusters)

Thanks in advance

Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-06 Thread Lokendra Rathour
Thanks, Mathew for the Update.
The upgrade got failed for some random wired reasons, Checking further
Ceph's status shows that "Ceph health is OK" and times it gives certain
warnings but I think that is ok.

but what if we see the Version mismatch between the daemons, i.e few
services have upgraded and the remaining could not be upgraded. So in this
state, we do two things:

   - Retrying the upgrade activity (to Pacific) - it might work this time.
   - Going back to the older Version (Octopus) - is this possible and if
   yes then how?

*Other Query:*
What if the complete cluster goes down, i.e mon crashes other daemon
crashes, can we try to restore the data in OSDs, maybe by reusing the OSD's
in another or new Ceph Cluster or something to save the data.

Please suggest !!

Best Regards,
Lokendra


On Fri, Sep 3, 2021 at 9:04 PM Matthew Vernon  wrote:

> On 02/09/2021 09:34, Lokendra Rathour wrote:
>
> > We have deployed the Ceph Octopus release using Ceph-Ansible.
> > During the upgrade from Octopus to Pacific release we saw the upgrade got
> > failed.
>
> I'm afraid you'll need to provide some more details (e.g. ceph -s
> output) on the state of your cluster; I'd expect a cluster mid-upgrade
> to still be operational, so you should still be able to access your OSDs.
>
> Regards,
>
> Matthew
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
~ Lokendra
www.inertiaspeaks.com
www.inertiagroups.com
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem mounting cephfs Share

2021-09-06 Thread Eugen Block

Hi,

I just got the same message in my lab environment (octopus) which I  
had redeployed. The client's keyring had changed after redeployment  
and I think I had a stale mount. After 'umount' and 'mount' with the  
proper keyring it worked as expected.



Zitat von Hendrik Peyerl :


Hello All,

i recently tried to reactivate a CEPH Cluster that I setup last  
year. I applied patches regularly and did some tests afterwards  
which always worked.


Now I did run my usual tests again before getting it ready to use in  
production but I am not able to mount my cephfs shares anymore, I  
always run into the following error:


mount error: no mds server is up or the cluster is laggy

The ceph health is OK, I can reach all MDS Servers and all other  
servers aswell, the S3 Gateways is also still working. I did not  
find any errors within the logs that would help me debug this further.


As i want to learn how to debug those issues in the future I’d  
rather try to repair the cluster instead of just recreating it since  
I dont have any pressure to get it running again quickly.


Could you guys give me any hints on where to look further?

CEPH Version: 14.2.22
OS: CentOS7

Thanks in Advance,

Hendrik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io