[ceph-users] Re: active+recovery_unfound+degraded in Pacific

2021-04-29 Thread Lomayani S. Laizer
Hello,
thanks for your reply.

I have stopped this osd and the cluster managed to recover. All is well now
for the past 4hrs.

My understanding of the unfound object was wrong. I thought it means the
object cant be found in all replicas

On Thu, Apr 29, 2021 at 9:47 AM Stefan Kooman  wrote:

> On 4/29/21 4:58 AM, Lomayani S. Laizer wrote:
> > Hello,
> >
> > Any advice on this. Am stuck because one VM is not working now. Looks
> there
> > is a read error in primary osd(15) for this pg. Should i mark osd 15 down
> > or out? Is there any risk of doing this?
> >
> > Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde]
> > tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
> > Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request:
> I/O
> > error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16
> prio
> > class 0
> > Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde]
> tag#28
> > FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde]
> tag#28
> > CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
> > Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request:
> I/O
> > error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16
> prio
> > class 0
>
> So this looks like a broken disk. I would take it out and let the
> cluster recover (ceph osd out 15).
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to delete versioned bucket

2021-04-29 Thread Mark Schouten
On Sat, Apr 24, 2021 at 06:06:04PM +0200, Mark Schouten wrote:
> Using the following command:
> 3cmd setlifecycle lifecycle.xml s3://syslog_tuxis_net
> 
> That gave no error, and I see in s3browser that it's active.
> 
> The RGW does not seem to kick in yet, but I'll keep an eye on that.

Unfortunatly, the deletemarkers are still there. Anyone has a tip on how
to fix this?

-- 
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | i...@tuxis.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: active+recovery_unfound+degraded in Pacific

2021-04-29 Thread Konstantin Shalygin
Not wrong, your drive has failed - degradation is just a signal on lower level 
issues



k

> On 29 Apr 2021, at 11:21, Lomayani S. Laizer  wrote:
> 
> My understanding of the unfound object was wrong. I thought it means the
> object cant be found in all replicas

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrade tips from Luminous to Nautilus?

2021-04-29 Thread Mark Schouten
Hi,

We've done our fair share of Ceph cluster upgrades since Hammer, and
have not seen much problems with them. I'm now at the point that I have
to upgrade a rather large cluster running Luminous and I would like to
hear from other users if they have experiences with issues I can expect
so that I can anticipate on them beforehand.

As said, the cluster is running Luminous (12.2.13) and has the following
services active:
  services:
mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04
mgr: osdnode01(active), standbys: osdnode02, osdnode03
mds: pmrb-3/3/3 up 
{0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active}, 1 
up:standby
osd: 116 osds: 116 up, 116 in;
rgw: 3 daemons active


Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster
is 1.01PiB.

We have 2 active crush-rules on 18 pools. All pools have a size of 3 there is a 
total of 5760 pgs.
{
"rule_id": 1,
"rule_name": "hdd-data",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "ssd-data",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

rbd -> crush_rule: hdd-data
.rgw.root -> crush_rule: hdd-data
default.rgw.control -> crush_rule: hdd-data
default.rgw.data.root -> crush_rule: ssd-data
default.rgw.gc -> crush_rule: ssd-data
default.rgw.log -> crush_rule: ssd-data
default.rgw.users.uid -> crush_rule: hdd-data
default.rgw.usage -> crush_rule: ssd-data
default.rgw.users.email -> crush_rule: hdd-data
default.rgw.users.keys -> crush_rule: hdd-data
default.rgw.meta -> crush_rule: hdd-data
default.rgw.buckets.index -> crush_rule: ssd-data
default.rgw.buckets.data -> crush_rule: hdd-data
default.rgw.users.swift -> crush_rule: hdd-data
default.rgw.buckets.non-ec -> crush_rule: ssd-data
DB0475 -> crush_rule: hdd-data
cephfs_pmrb_data -> crush_rule: hdd-data
cephfs_pmrb_metadata -> crush_rule: ssd-data


All but four clients are running Luminous, the four are running Jewel
(that needs upgrading before proceeding with this upgrade).

So, normally, I would 'just' upgrade all Ceph packages on the
monitor-nodes and restart mons and then mgrs.

After that, I would upgrade all Ceph packages on the OSD nodes and
restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting
the OSD's will probably take a while.

If anyone has a hint on what I should expect to cause some extra load or
waiting time, that would be great.

Obviously, we have read
https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking
for real world experiences.

Thanks!


-- 
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | i...@tuxis.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-04-29 Thread Nico Schottelius


I believe it was nautilus that started requiring

ms_bind_ipv4 = false
ms_bind_ipv6 = true

if you run IPv6 only clusters. OSDs prior to nautilus worked without
these settings for us.

I'm not sure if the port change (v1->v2) was part of luminous->nautilus
as well, but you might want to check your firewalling (if any).

Overall I recall luminous->nautilus a bit more rocky than usual
(compared to the previous releases), but nothing too serious.

Cheers,

Nico

Mark Schouten  writes:

> Hi,
>
> We've done our fair share of Ceph cluster upgrades since Hammer, and
> have not seen much problems with them. I'm now at the point that I have
> to upgrade a rather large cluster running Luminous and I would like to
> hear from other users if they have experiences with issues I can expect
> so that I can anticipate on them beforehand.
>
> As said, the cluster is running Luminous (12.2.13) and has the following
> services active:
>   services:
> mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04
> mgr: osdnode01(active), standbys: osdnode02, osdnode03
> mds: pmrb-3/3/3 up 
> {0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active}, 1 
> up:standby
> osd: 116 osds: 116 up, 116 in;
> rgw: 3 daemons active
>
>
> Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster
> is 1.01PiB.
>
> We have 2 active crush-rules on 18 pools. All pools have a size of 3 there is 
> a total of 5760 pgs.
> {
> "rule_id": 1,
> "rule_name": "hdd-data",
> "ruleset": 1,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -10,
> "item_name": "default~hdd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 2,
> "rule_name": "ssd-data",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -21,
> "item_name": "default~ssd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
>
> rbd -> crush_rule: hdd-data
> .rgw.root -> crush_rule: hdd-data
> default.rgw.control -> crush_rule: hdd-data
> default.rgw.data.root -> crush_rule: ssd-data
> default.rgw.gc -> crush_rule: ssd-data
> default.rgw.log -> crush_rule: ssd-data
> default.rgw.users.uid -> crush_rule: hdd-data
> default.rgw.usage -> crush_rule: ssd-data
> default.rgw.users.email -> crush_rule: hdd-data
> default.rgw.users.keys -> crush_rule: hdd-data
> default.rgw.meta -> crush_rule: hdd-data
> default.rgw.buckets.index -> crush_rule: ssd-data
> default.rgw.buckets.data -> crush_rule: hdd-data
> default.rgw.users.swift -> crush_rule: hdd-data
> default.rgw.buckets.non-ec -> crush_rule: ssd-data
> DB0475 -> crush_rule: hdd-data
> cephfs_pmrb_data -> crush_rule: hdd-data
> cephfs_pmrb_metadata -> crush_rule: ssd-data
>
>
> All but four clients are running Luminous, the four are running Jewel
> (that needs upgrading before proceeding with this upgrade).
>
> So, normally, I would 'just' upgrade all Ceph packages on the
> monitor-nodes and restart mons and then mgrs.
>
> After that, I would upgrade all Ceph packages on the OSD nodes and
> restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting
> the OSD's will probably take a while.
>
> If anyone has a hint on what I should expect to cause some extra load or
> waiting time, that would be great.
>
> Obviously, we have read
> https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking
> for real world experiences.
>
> Thanks!


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] librbd::operation::FlattenRequest

2021-04-29 Thread Lázár Imre

Hello,

I cannot flatten an image, it always restarts with:

r...@sm-node1.in.illusion.hu:~# rbd flatten vm-hdd/vm-104-disk-1
Image flatten: 28% complete...2021-04-29 10:50:27.373 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c4009db0 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 26% complete...2021-04-29 10:50:33.053 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c4008fc0 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 0% complete...2021-04-29 10:50:34.829 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c445b470 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 39% complete...2021-04-29 10:50:42.081 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c40324e0 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 0% complete...2021-04-29 10:50:43.897 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c4018890 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 42% complete...2021-04-29 10:51:07.813 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c402fe80 should_complete: 
encountered error: (85) Interrupted system call should be restarted
Image flatten: 42% complete...2021-04-29 10:51:29.372 7ff7caffd700 -1 
librbd::operation::FlattenRequest: 0x7ff7c40017c0 should_complete: 
encountered error: (85) Interrupted system call should be restarted



r...@sm-node1.in.illusion.hu:~# uname -a
Linux sm-node1.in.illusion.hu 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 
19 Mar 2021 11:08:47 +0100) x86_64 GNU/Linux


r...@sm-node1.in.illusion.hu:~# dpkg -l |grep ceph
ii  ceph 14.2.20-pve1    amd64    
distributed storage and file system
ii  ceph-base 14.2.20-pve1    amd64    
common ceph daemon libraries and management tools
ii  ceph-common 14.2.20-pve1    amd64    
common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse 14.2.20-pve1    amd64    
FUSE-based client for the Ceph distributed file system
ii  ceph-mds 14.2.20-pve1    amd64    
metadata server for the ceph distributed file system
ii  ceph-mgr 14.2.20-pve1    amd64    
manager for the ceph distributed storage system
ii  ceph-mgr-dashboard 14.2.20-pve1    
all  dashboard plugin for ceph-mgr
ii  ceph-mon 14.2.20-pve1    amd64    
monitor server for the ceph storage system
ii  ceph-osd 14.2.20-pve1    amd64    OSD 
server for the ceph storage system
ii  libcephfs2 14.2.20-pve1    amd64    Ceph 
distributed file system client library
ii  python-ceph-argparse 14.2.20-pve1    
all  Python 2 utility libraries for Ceph CLI
ii  python-cephfs 14.2.20-pve1    amd64    
Python 2 libraries for the Ceph libcephfs library



r...@sm-node1.in.illusion.hu:~# dpkg -l |grep rbd
ii  librbd1 14.2.20-pve1    amd64    RADOS 
block device client library
ii  python-rbd 14.2.20-pve1    amd64    
Python 2 libraries for the Ceph librbd library


r...@sm-node1.in.illusion.hu:~# modinfo rbd
filename: /lib/modules/5.4.106-1-pve/kernel/drivers/block/rbd.ko
license:    GPL
description:    RADOS Block Device (RBD) driver
author: Jeff Garzik 
author: Yehuda Sadeh 
author: Sage Weil 
author: Alex Elder 
srcversion: 7BA6FEE20249E416B2D09AB
depends:    libceph
retpoline:  Y
intree: Y
name:   rbd
vermagic:   5.4.106-1-pve SMP mod_unload modversions
parm:   single_major:Use a single major number for all rbd 
devices (default: true) (bool)


Please give me advice, how can I produce more information to catch this.

Thank you,

i.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: active+recovery_unfound+degraded in Pacific

2021-04-29 Thread Stefan Kooman

On 4/29/21 4:58 AM, Lomayani S. Laizer wrote:

Hello,

Any advice on this. Am stuck because one VM is not working now. Looks there
is a read error in primary osd(15) for this pg. Should i mark osd 15 down
or out? Is there any risk of doing this?

Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde]
tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0
Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde] tag#28
FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde] tag#28
CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0


So this looks like a broken disk. I would take it out and let the 
cluster recover (ceph osd out 15).


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph export not producing file?

2021-04-29 Thread Eugen Block

Hi,

I haven't had an issue with rbd export in the past. But just to rule  
that out, your first attempt is to write to /root/foo.img but later  
you mention /mnt/cirros2.img. Did you look in both places?

Or has this issue already been resolved?


Zitat von Piotr Baranowski :


Hey all!

rbd export images/ec8a7ff8-6609-4b7d-8bdd-fadcf3b7973e /root/foo.img
DOES NOT produce the target file

no matter if I use --pool --image format or the one above the target
file is not there.

Progress bar shows up and prints percentage. It ends up with exit 0

[root@controller-0 mnt]# rbd export  --pool=images db8290c3-93fd-4a4e-
ad71-7c131070ad6f /mnt/cirros2.img
Exporting image: 100% complete...done.
[root@controller-0 mnt]# echo $?


Any idea what's going on here?
It's ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94)
luminous (stable)

Any hints will be much appreciated

best regards
Piotr

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: one of 3 monitors keeps going down

2021-04-29 Thread Eugen Block

Hi,

instead of copying MON data to this one did you also try to redeploy  
the MON container entirely so it gets a fresh start?



Zitat von "Robert W. Eckert" :


Hi,
On a daily basis, one of my monitors goes down

[root@cube ~]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s); 1/3 mons down, quorum  
rhel1.robeckert.us,story

[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon mon.cube on cube.robeckert.us is in error state
[WRN] MON_DOWN: 1/3 mons down, quorum rhel1.robeckert.us,story
mon.cube (rank 2) addr  
[v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] is down (out of  
quorum)

[root@cube ~]# ceph --version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb)  
octopus (stable)


I have a script that will copy the mon data from another server and  
it restarts and runs well for a while.


It is always the same monitor, and when I look at the logs the only  
thing I really see is the cephadm log showing it down


2021-04-28 10:07:26,173 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,217 DEBUG /usr/bin/podman: stdout podman version 2.2.1
2021-04-28 10:07:26,222 DEBUG Running command: /usr/bin/podman  
inspect --format  
{{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index  
.Config.Labels "io.ceph.version"}}  
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-osd.2
2021-04-28 10:07:26,326 DEBUG /usr/bin/podman: stdout  
fab17e5242eb4875e266df19ca89b596a2f2b1d470273a99ff71da2ae81eeb3c,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-26 17:13:15.54183375 -0400  
EDT,
2021-04-28 10:07:26,328 DEBUG Running command: systemctl is-enabled  
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube

2021-04-28 10:07:26,334 DEBUG systemctl: stdout enabled
2021-04-28 10:07:26,335 DEBUG Running command: systemctl is-active  
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube

2021-04-28 10:07:26,340 DEBUG systemctl: stdout failed
2021-04-28 10:07:26,340 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,395 DEBUG /usr/bin/podman: stdout podman version 2.2.1
2021-04-28 10:07:26,402 DEBUG Running command: /usr/bin/podman  
inspect --format  
{{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index  
.Config.Labels "io.ceph.version"}}  
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-mon.cube
2021-04-28 10:07:26,526 DEBUG /usr/bin/podman: stdout  
04e7c673cbacf5160427b0c3eb2f0948b2f15d02c58bd1d9dd14f975a84cfc6f,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-28 08:54:57.614847512 -0400  
EDT,


I don't know if it matters, but this  server is an AMD 3600XT while  
my other two servers which have had no issues are intel based.


The root file system was originally on a SSD, and I switched to  
NVME, so I eliminated controller or drive issues.  (I didn't see  
anything in dmesg anyway)


If someone could point me in the right direction on where to  
troubleshoot next, I would appreciate it.


Thanks,
Rob Eckert
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: one of 3 monitors keeps going down

2021-04-29 Thread Sebastian Wagner

Right, here are the docs for that workflow:

https://docs.ceph.com/en/latest/cephadm/mon/#mon-service

Am 29.04.21 um 13:13 schrieb Eugen Block:

Hi,

instead of copying MON data to this one did you also try to redeploy the 
MON container entirely so it gets a fresh start?



Zitat von "Robert W. Eckert" :


Hi,
On a daily basis, one of my monitors goes down

[root@cube ~]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s); 1/3 mons down, quorum 
rhel1.robeckert.us,story

[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon mon.cube on cube.robeckert.us is in error state
[WRN] MON_DOWN: 1/3 mons down, quorum rhel1.robeckert.us,story
    mon.cube (rank 2) addr 
[v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] is down (out of quorum)

[root@cube ~]# ceph --version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) 
octopus (stable)


I have a script that will copy the mon data from another server and it 
restarts and runs well for a while.


It is always the same monitor, and when I look at the logs the only 
thing I really see is the cephadm log showing it down


2021-04-28 10:07:26,173 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,217 DEBUG /usr/bin/podman: stdout podman version 
2.2.1
2021-04-28 10:07:26,222 DEBUG Running command: /usr/bin/podman inspect 
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index 
.Config.Labels "io.ceph.version"}} 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-osd.2
2021-04-28 10:07:26,326 DEBUG /usr/bin/podman: stdout 
fab17e5242eb4875e266df19ca89b596a2f2b1d470273a99ff71da2ae81eeb3c,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-26 
17:13:15.54183375 -0400 EDT,
2021-04-28 10:07:26,328 DEBUG Running command: systemctl is-enabled 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube 


2021-04-28 10:07:26,334 DEBUG systemctl: stdout enabled
2021-04-28 10:07:26,335 DEBUG Running command: systemctl is-active 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube 


2021-04-28 10:07:26,340 DEBUG systemctl: stdout failed
2021-04-28 10:07:26,340 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,395 DEBUG /usr/bin/podman: stdout podman version 
2.2.1
2021-04-28 10:07:26,402 DEBUG Running command: /usr/bin/podman inspect 
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index 
.Config.Labels "io.ceph.version"}} 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-mon.cube
2021-04-28 10:07:26,526 DEBUG /usr/bin/podman: stdout 
04e7c673cbacf5160427b0c3eb2f0948b2f15d02c58bd1d9dd14f975a84cfc6f,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-28 
08:54:57.614847512 -0400 EDT,


I don't know if it matters, but this  server is an AMD 3600XT while my 
other two servers which have had no issues are intel based.


The root file system was originally on a SSD, and I switched to NVME, 
so I eliminated controller or drive issues.  (I didn't see anything in 
dmesg anyway)


If someone could point me in the right direction on where to 
troubleshoot next, I would appreciate it.


Thanks,
Rob Eckert
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to add osds with ceph-volume

2021-04-29 Thread and...@arhont.com
Thanks Eugene. I will try that.

Cheers

Get Outlook for Android


From: Eugen Block 
Sent: Wednesday, April 28, 2021 8:42:39 PM
To: Andrei Mikhailovsky 
Cc: ceph-users 
Subject: Re: [ceph-users] Unable to add osds with ceph-volume

Hi,

when specifying the db device you should use --block.db VG/LV not /dev/VG/LV

Zitat von Andrei Mikhailovsky :

> Hello everyone,
>
> I am running ceph version 15.2.8 on Ubuntu servers. I am using
> bluestore osds with data on hdd and db and wal on ssd drives. Each
> ssd has been partitioned such that it holds 5 dbs and 5 wals. The
> ssd were were prepared a while back probably when I was running ceph
> 13.x. I have been gradually adding new osd drives as needed.
> Recently, I've tried to add more osds, which have failed to my
> surprise. Previously I've had no issues adding the drives. However,
> it seems that I can no longer do that with version 15.2.x
>
> Here is what I get:
>
>
> root@arh-ibstorage4-ib  /home/andrei  ceph-volume lvm prepare
> --bluestore --data /dev/sds --block.db /dev/ssd3/db5 --block.wal
> /dev/ssd3/wal5
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> 6aeef34b-0724-4d20-a10b-197cab23e24d
> Running command: /usr/sbin/vgcreate --force --yes
> ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f /dev/sds
> stderr: WARNING: PV /dev/sdp in VG
> ceph-bc7587b5-0112-4097-8c9f-4442e8ea5645 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdo in VG
> ceph-33eda27c-53ed-493e-87a8-39e1862da809 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdn in VG ssd2 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdm in VG ssd1 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdj in VG
> ceph-9d8da00c-f6b9-473f-b499-fa60d74b46c5 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdi in VG
> ceph-1603149e-1e50-4b86-a360-1372f4243603 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdh in VG
> ceph-a5f4416c-8e69-4a66-a884-1d1229785acb is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sde in VG
> ceph-aac71121-e308-4e25-ae95-ca51bca7aaff is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdd in VG
> ceph-1e216580-c01b-42c5-a10f-293674a55c4c is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdc in VG
> ceph-630f7716-3d05-41bb-92c9-25402e9bb264 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sdb in VG
> ceph-a549c28d-9b06-46d5-8ba3-3bd99ff54f57 is using an old PV header,
> modify the VG to update.
> stderr: WARNING: PV /dev/sda in VG
> ceph-70943bd0-de71-4651-a73d-c61bc624755f is using an old PV header,
> modify the VG to update.
> stdout: Physical volume "/dev/sds" successfully created.
> stdout: Volume group "ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f"
> successfully created
> Running command: /usr/sbin/lvcreate --yes -l 3814911 -n
> osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d
> ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f
> stdout: Logical volume
> "osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d" created.
> --> blkid could not detect a PARTUUID for device: /dev/ssd3/wal5
> --> Was unable to complete a new OSD, will rollback changes
> Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.15
> --yes-i-really-mean-it
> stderr: 2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1 auth: unable to
> find a keyring on
> /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc
> /ceph/keyring.bin,: (2) No such file or directory
> 2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1
> AuthRegistry(0x7f76b4058e60) no keyring found at
> /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyrin
> g,/etc/ceph/keyring.bin,, disabling cephx
> stderr: purged osd.15
> --> RuntimeError: unable to use device
>
> I have tried to find a solution, but wasn't able to resolve the
> problem? I am sure that I've previously added new volumes using the
> above command.
>
> lvdisplay shows:
>
> --- Logical volume ---
> LV Path /dev/ssd3/wal5
> LV Name wal5
> VG Name ssd3
> LV UUID WPQJs9-olAj-ACbU-qnEM-6ytu-aLMv-hAABYy
> LV Write Access read/write
> LV Creation host, time arh-ibstorage4-ib, 2020-07-29 23:45:17 +0100
> LV Status available
> # open 0
> LV Size 1.00 GiB
> Current LE 256
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 256
> Block device 253:6
>
>
> --- Logical volume ---
> LV Path /dev/ssd3/db5
> LV Name db5
> VG Name ssd3
> LV UUID FVT2Mm-a00P-eCoQ-FZAf-AulX-4q9r-PaDTC6
> LV Write Access read/write
> LV Creation hos

[ceph-users] Re: ceph export not producing file?

2021-04-29 Thread Piotr Baranowski
Hi,

Those are just two different invoctions of the command. YES i DID check
both locations.

I developed a strange workaround THAT works

its rbd export pool/imageid - > filename.img
That works and creates a proper export file. 

Still this behaviour is really weird.

P.

W dniu czw, 29.04.2021 o godzinie 11∶11 +, użytkownik Eugen Block
napisał:
> Hi,
> 
> that out, your first attempt is to write to /root/foo.img but later  
> you mention /mnt/cirros2.img. Did you look in both places?
> Or has this issue already been resolved?
> 
> 
> Zitat von Piotr Baranowski :
> 
> > Hey all!
> > 
> > rbd export images/ec8a7ff8-6609-4b7d-8bdd-fadcf3b7973e
> > /root/foo.img
> > DOES NOT produce the target file
> > 
> > no matter if I use --pool --image format or the one above the
> > target
> > file is not there.
> > 
> > Progress bar shows up and prints percentage. It ends up with exit 0
> > 
> > [root@controller-0 mnt]# rbd export  --pool=images db8290c3-93fd-
> > 4a4e-
> > ad71-7c131070ad6f /mnt/cirros2.img
> > Exporting image: 100% complete...done.
> > [root@controller-0 mnt]# echo $?
> > 
> > 
> > Any idea what's going on here?
> > It's ceph version 12.2.10
> > (177915764b752804194937482a39e95e0ca3de94)
> > luminous (stable)
> > 
> > Any hints will be much appreciated
> > 
> > best regards
> > Piotr
> > 
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-04-29 Thread Eugen Block

Hi,

I haven't had any issues upgrading from Luminous to Nautilus in  
multiple clusters (mostly RBD usage, but also CephFS), including a  
couple of different setups in my lab (RGW, iGW).
Just recently I upgraded a customer cluster with around 220 OSDs, it  
was pretty straight forward without any hickups.
The only thing sticking out is your multi-active mds setup which I  
didn't have yet. But I don't see any reason why that could be an issue.


Regards,
Eugen


Zitat von Mark Schouten :


Hi,

We've done our fair share of Ceph cluster upgrades since Hammer, and
have not seen much problems with them. I'm now at the point that I have
to upgrade a rather large cluster running Luminous and I would like to
hear from other users if they have experiences with issues I can expect
so that I can anticipate on them beforehand.

As said, the cluster is running Luminous (12.2.13) and has the following
services active:
  services:
mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04
mgr: osdnode01(active), standbys: osdnode02, osdnode03
mds: pmrb-3/3/3 up  
{0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active},  
1 up:standby

osd: 116 osds: 116 up, 116 in;
rgw: 3 daemons active


Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster
is 1.01PiB.

We have 2 active crush-rules on 18 pools. All pools have a size of 3  
there is a total of 5760 pgs.

{
"rule_id": 1,
"rule_name": "hdd-data",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "ssd-data",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

rbd -> crush_rule: hdd-data
.rgw.root -> crush_rule: hdd-data
default.rgw.control -> crush_rule: hdd-data
default.rgw.data.root -> crush_rule: ssd-data
default.rgw.gc -> crush_rule: ssd-data
default.rgw.log -> crush_rule: ssd-data
default.rgw.users.uid -> crush_rule: hdd-data
default.rgw.usage -> crush_rule: ssd-data
default.rgw.users.email -> crush_rule: hdd-data
default.rgw.users.keys -> crush_rule: hdd-data
default.rgw.meta -> crush_rule: hdd-data
default.rgw.buckets.index -> crush_rule: ssd-data
default.rgw.buckets.data -> crush_rule: hdd-data
default.rgw.users.swift -> crush_rule: hdd-data
default.rgw.buckets.non-ec -> crush_rule: ssd-data
DB0475 -> crush_rule: hdd-data
cephfs_pmrb_data -> crush_rule: hdd-data
cephfs_pmrb_metadata -> crush_rule: ssd-data


All but four clients are running Luminous, the four are running Jewel
(that needs upgrading before proceeding with this upgrade).

So, normally, I would 'just' upgrade all Ceph packages on the
monitor-nodes and restart mons and then mgrs.

After that, I would upgrade all Ceph packages on the OSD nodes and
restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting
the OSD's will probably take a while.

If anyone has a hint on what I should expect to cause some extra load or
waiting time, that would be great.

Obviously, we have read
https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking
for real world experiences.

Thanks!


--
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | i...@tuxis.nl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ CEPH ANSIBLE FAILOVER TESTING ] Ceph Native Driver issue

2021-04-29 Thread Lokendra Rathour
Hi Reed,
Thankyou so much  for the input and support. We have tried using the
variable suggested by you, but could not see any impact on the current
system.
*"ceph fs set cephfs allow_standby_replay true " *it did not create any *impact
in the failover time*

Furthermore we have tried more scenarios that we tested using our test :
*scenario 1:*
[image: image.png]

   - In this we have tried to see the logs at the new node on which the mds
   will failover to, i.e in this case if we reboot cephnode2 so new active MDS
   will be Cephnode1. Checking logs for cephnode1 in two scenarios:
   - 1. *normal reboot of Cephnode2 by keeping the I/O operation in
   progress,*
  - we see that log at cephnode1 instantiates immediately and then wait
  for sometime (around 15 seconds for some beacon time) + some
additional 6-7
  seconds during which it activated MDS on cephnode1 and resumes I/O. Refer
  logs as :
  - 2021-04-29T15:49:42.480+0530 7fa747690700  1 mds.cephnode1 Updating
  MDS map to version 505 from mon.2
  2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 handle_mds_map
  i am now mds.0.505
  2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 handle_mds_map
  state change up:boot --> up:replay
  2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 replay_start
  2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505  recovery set
  is
  2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505  waiting for
  osdmap 486 (which blacklists prior instance)
  2021-04-29T15:49:55.686+0530 7fa74568c700  1 mds.beacon.cephnode1 MDS
  connection to Monitors appears to be laggy; 15.9769s since last
acked beacon
  2021-04-29T15:49:55.686+0530 7fa74568c700  1 mds.0.505 skipping
  upkeep work because connection to Monitors appears laggy
  2021-04-29T15:49:57.533+0530 7fa749e95700  0 mds.beacon.cephnode1
   MDS is no longer laggy
  2021-04-29T15:49:59.599+0530 7fa740e83700  0 mds.0.cache creating
  system inode with ino:0x100
  2021-04-29T15:49:59.599+0530 7fa740e83700  0 mds.0.cache creating
  system inode with ino:0x1
  2021-04-29T15:50:00.456+0530 7fa73f680700  1 mds.0.505 Finished
  replaying journal
  2021-04-29T15:50:00.456+0530 7fa73f680700  1 mds.0.505 making mds
  journal writeable
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.cephnode1 Updating
  MDS map to version 506 from mon.2
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 handle_mds_map
  i am now mds.0.505
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 handle_mds_map
  state change up:replay --> up:reconnect
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 reconnect_start
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 reopen_log
  2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.server
  reconnect_clients -- 2 sessions
  2021-04-29T15:50:00.964+0530 7fa747690700  0 log_channel(cluster) log
  [DBG] : reconnect by client.6892 v1:10.0.4.96:0/1646469259 after
  0.0047
  2021-04-29T15:50:00.972+0530 7fa747690700  0 log_channel(cluster) log
  [DBG] : reconnect by client.6990 v1:10.0.4.115:0/2776266880 after
  0.012
  2021-04-29T15:50:00.972+0530 7fa747690700  1 mds.0.505 reconnect_done
  2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.cephnode1 Updating
  MDS map to version 507 from mon.2
  2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 handle_mds_map
  i am now mds.0.505
  2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 handle_mds_map
  state change up:reconnect --> up:rejoin
  2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 rejoin_start
  2021-04-29T15:50:02.008+0530 7fa747690700  1 mds.0.505
  rejoin_joint_start
  2021-04-29T15:50:02.040+0530 7fa740e83700  1 mds.0.505 rejoin_done
  2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.cephnode1 Updating
  MDS map to version 508 from mon.2
  2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 handle_mds_map
  i am now mds.0.505
  2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 handle_mds_map
  state change up:rejoin --> up:clientreplay
  2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 recovery_done
  -- successful recovery!
  2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505
  clientreplay_start
  2021-04-29T15:50:03.094+0530 7fa740e83700  1 mds.0.505
  clientreplay_done
  2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.cephnode1 Updating
  MDS map to version 509 from mon.2
  2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 handle_mds_map
  i am now mds.0.505
  2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 handle_mds_map
  state change up:clientreplay --> up:active
  2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 active_start
  2021-04-29T15:50:04.085+0530 7fa747690700  1 mds.0.505 cluster
 

[ceph-users] ceph pool size 1 for (temporary and expendable data) still using 2X storage?

2021-04-29 Thread Joshua West
ceph pool size 1 for (temporary and expendable data) still using 2X storage?

Hey Ceph Users!

With all the buzz around chia coin, I want to dedicate a few TB to
storage mining, really just to play with the chia CLI tool, and learn
how it all works.

At the whole concept is about dedicating disk space for large
calculation outputs, the data is meaningless.

For this reason, I am hoping to use a pool with size 1, min_size 1,
and did set up the same.

However, as a proxmox user, I noticed that this pool appears to still
use 2X storage space, or at a minimum, the pool's maximum size is
limited to 50% of total storage space (not that I plan on maxing out
my storage for this.)

I suspect there is a novice-user failsafe which ensures foolishly
configured size=1 is automatically treated as size=2...

Can anyone point me towards how best to leverage my ceph cluster to
store expendable data at size=1 without wasting x2 actual disk space?

My cluster is perfectly balanced, so I am reluctant to pull an osd
out, generally don't have any other disks on hand, and don't plan to
spend money on additional storage for this endeavour. I do want to
ensure I am not wasting more space than I am expecting though.

(Small hobby cluster, if it matters)

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Performance questions - 4 node (commodity) cluster - what to expect (and what not ;-)

2021-04-29 Thread Schmid, Michael
Hello folks,

I am new to ceph and at the moment I am doing some performance tests with a 4 
node ceph-cluster (pacific, 16.2.1).

Node hardware (4 identical nodes):

  *   DELL 3620 workstation
  *   Intel Quad-Core i7-6700@3.4 GHz
  *   8 GB RAM
  *   Debian Buster (base system, installed a dedicated on Patriot Burst 120 GB 
SATA-SSD)
  *   HP 530SPF+ 10 GBit dual-port NIC (tested with iperf to 9.4 GBit/s from 
node to node)
  *   1 x Kingston KC2500 M2 NVMe PCIe SSD (500 GB, NO power loss protection !)
  *   3 x Seagate Barracuda SATA disk drives (7200 rpm, 500 GB)

After bootstrapping a containerized (docker) ceph-cluster, I did some 
performance tests on the NVMe storage by creating a storage pool called 
„ssdpool“, consisting of 4 OSDs per (one) NVMe device (per node). A first 
write-performance test yields

=
root@ceph1:~# rados bench -p ssdpool 10 write -b 4M -t 16 --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_ceph1_78
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0   0 0 0 0 0   -   0
1  16301455.99756   0.02099770.493427
2  165337   73.990392   0.02643050.692179
3  167660   79.9871920.5595050.664204
4  169983   82.9879920.6093320.721016
5  16   116   100   79.9889680.6860930.698084
6  16   132   116   77.322464 1.197150.731808
7  16   153   137   78.2741840.6226460.755812
8  16   171   15577.48672 0.254090.764022
9  16   192   176   78.2076840.9683210.775292
   10  16   214   198   79.1856880.4013390.766764
   11   1   214   213   77.4408600.9696930.784002
Total time run: 11.0698
Total writes made:  214
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 77.3272
Stddev Bandwidth:   13.7722
Max bandwidth (MB/sec): 92
Min bandwidth (MB/sec): 56
Average IOPS:   19
Stddev IOPS:3.44304
Max IOPS:   23
Min IOPS:   14
Average Latency(s): 0.785372
Stddev Latency(s):  0.49011
Max latency(s): 2.16532
Min latency(s): 0.0144995
=

... and I think that 80 MB/s throughput is a very poor result in conjunction 
with NVMe devices and 10 GBit nics.

A bare write-test (with fsync=0 option) of the NVMe drives yields a write 
throughput of round about 800 MB/s per device ... the second test (with 
fsync=1) drops performance to 200 MB/s.

=
root@ceph1:/home/mschmid# fio --rw=randwrite --name=IOPS-write --bs=1024k 
--direct=1 --filename=/dev/nvme0n1 --numjobs=4 --ioengine=libaio --iodepth=32 
--refill_buffers --group_reporting --runtime=30 --time_based --fsync=0
IOPS-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, 
(T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32...
fio-3.12
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=723MiB/s][w=722 IOPS][eta 00m:00s]
IOPS-write: (groupid=0, jobs=4): err= 0: pid=31585: Thu Apr 29 15:15:03 2021
  write: IOPS=740, BW=740MiB/s (776MB/s)(21.8GiB/30206msec); 0 zone resets
slat (usec): min=16, max=810, avg=106.48, stdev=30.48
clat (msec): min=7, max=1110, avg=172.09, stdev=120.18
 lat (msec): min=7, max=1110, avg=172.19, stdev=120.18
clat percentiles (msec):
 |  1.00th=[   32],  5.00th=[   48], 10.00th=[   53], 20.00th=[   63],
 | 30.00th=[  115], 40.00th=[  161], 50.00th=[  169], 60.00th=[  178],
 | 70.00th=[  190], 80.00th=[  220], 90.00th=[  264], 95.00th=[  368],
 | 99.00th=[  667], 99.50th=[  751], 99.90th=[  894], 99.95th=[  986],
 | 99.99th=[ 1036]
   bw (  KiB/s): min=22528, max=639744, per=25.02%, avg=189649.94, 
stdev=113845.69, samples=240
   iops: min=   22, max=  624, avg=185.11, stdev=111.18, samples=240
  lat (msec)   : 10=0.01%, 20=0.19%, 50=6.43%, 100=20.29%, 250=61.52%
  lat (msec)   : 500=8.21%, 750=2.85%, 1000=0.47%
  cpu  : usr=11.87%, sys=2.05%, ctx=13141, majf=0, minf=45
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=99.4%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued rwts: total=0,22359,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=740MiB/s (776MB/s), 740MiB/s-740MiB/s (776MB/s-776MB/s), io=21.8GiB 
(23.4GB), run=30206-30206msec

Disk stats (read/write):
  nvme0n1: ios=0/89150, merge=0/0, ticks=0/15065724, in_queue

[ceph-users] Re: Host ceph version in dashboard incorrect after upgrade

2021-04-29 Thread Eugen Block

I would restart the active MGR, that should resolve it.

Zitat von mabi :


Hello,

I upgraded my Octopus test cluster which has 5 hosts because one of  
the node (a mon/mgr node) was still on version 15.2.10 but all the  
others on 15.2.11.


For the upgrade I used the following command:

ceph orch upgrade start --ceph-version 15.2.11

The upgrade worked correctly and I did not see any errors in the  
logs but the host version in the ceph dashboard (under the  
navigation Cluster -> Hosts) still snows 15.2.10 for that specific  
node.


The output of "ceph versions", shows that every component is on  
15.2.11 as you can see below:


{
"mon": {
"ceph version 15.2.11  
(e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 3

},
"mgr": {
"ceph version 15.2.11  
(e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2

},
"osd": {
"ceph version 15.2.11  
(e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2

},
"mds": {},
"overall": {
"ceph version 15.2.11  
(e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 7

}
}

So why is it still stuck on 15.2.10 in the dashboard?

Best regards,
Mabi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Host ceph version in dashboard incorrect after upgrade

2021-04-29 Thread Eugen Block

Try this:

ceph orch daemon stop mgr.

and then after another daemon took over its role start it again:

ceph orch daemon start mgr.


Zitat von mabi :

I also thought about restarting the MGR service but I am new to ceph  
and could not find the "cephadm orch" command in order to do that...  
What would be the command to restart the mgr service on a specific  
node?


‐‐‐ Original Message ‐‐‐
On Thursday, April 29, 2021 7:23 PM, Eugen Block  wrote:


I would restart the active MGR, that should resolve it.

Zitat von mabi m...@protonmail.ch:

> Hello,
> I upgraded my Octopus test cluster which has 5 hosts because one of
> the node (a mon/mgr node) was still on version 15.2.10 but all the
> others on 15.2.11.
> For the upgrade I used the following command:
> ceph orch upgrade start --ceph-version 15.2.11
> The upgrade worked correctly and I did not see any errors in the
> logs but the host version in the ceph dashboard (under the
> navigation Cluster -> Hosts) still snows 15.2.10 for that specific
> node.
> The output of "ceph versions", shows that every component is on
> 15.2.11 as you can see below:
> {
> "mon": {
> "ceph version 15.2.11
> (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 3
> },
> "mgr": {
> "ceph version 15.2.11
> (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
> },
> "osd": {
> "ceph version 15.2.11
> (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
> },
> "mds": {},
> "overall": {
> "ceph version 15.2.11
> (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 7
> }
> }
> So why is it still stuck on 15.2.10 in the dashboard?
> Best regards,
> Mabi
>
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-29 Thread Dan van der Ster
On Sat, Apr 10, 2021 at 2:10 AM Robert LeBlanc  wrote:
>
> On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster  wrote:
> >
> > Here's what you should look for, with debug_mon=10. It shows clearly
> > that it takes the mon 23 seconds to run through
> > get_removed_snaps_range.
> > So if this is happening every 30s, it explains at least part of why
> > this mon is busy.
> >
> > 2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader)
> > e45 handle_subscribe
> > mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
> > 2021-04-09 17:07:27.238 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 check_osdmap_sub
> > 0x55e2e2133de0 next 1170448 (onetime)
> > 2021-04-09 17:07:27.238 7f9fc83e4700  5
> > mon.sun-storemon01@0(leader).osd e1987355 send_incremental
> > [1170448..1987355] to client.131831153
> > 2021-04-09 17:07:28.590 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 0
> > [1~3]
> > 2021-04-09 17:07:29.898 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 5 []
> > 2021-04-09 17:07:31.258 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 6 []
> > 2021-04-09 17:07:32.562 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 20
> > []
> > 2021-04-09 17:07:33.866 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 21
> > []
> > 2021-04-09 17:07:35.162 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 22
> > []
> > 2021-04-09 17:07:36.470 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 23
> > []
> > 2021-04-09 17:07:37.778 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 24
> > []
> > 2021-04-09 17:07:39.090 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 25
> > []
> > 2021-04-09 17:07:40.398 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 26
> > []
> > 2021-04-09 17:07:41.706 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 27
> > []
> > 2021-04-09 17:07:43.006 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 28
> > []
> > 2021-04-09 17:07:44.322 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 29
> > []
> > 2021-04-09 17:07:45.630 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 30
> > []
> > 2021-04-09 17:07:46.938 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 31
> > []
> > 2021-04-09 17:07:48.246 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 32
> > []
> > 2021-04-09 17:07:49.562 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 34
> > []
> > 2021-04-09 17:07:50.862 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 get_removed_snaps_range 35
> > []
> > 2021-04-09 17:07:50.862 7f9fc83e4700 20
> > mon.sun-storemon01@0(leader).osd e1987355 send_incremental starting
> > with base full 1986745 664086 bytes
> > 2021-04-09 17:07:50.862 7f9fc83e4700 10
> > mon.sun-storemon01@0(leader).osd e1987355 build_incremental
> > [1986746..1986785] with features 107b84a842aca
> >
> > So have a look for that client again or other similar traces.
>
> So, even though I blacklisted the client and we remounted the file
> system on it, it wasn't enough for it to keep performing the same bad
> requests. We found another node that had two sessions to the same
> mount point. We rebooted both nodes and the CPU is now back at a
> reasonable 4-6% and the cluster is running at full performance again.
> I've added in back both MONs to have all 3 mons in the system and
> there are no more elections. Thank you for helping us track down the
> bad clients out of over 2,000 clients.
>
> > > Maybe if that code path isn't needed in Nautilus it can be removed in
> > > the next point release?
> >
> > I think there were other major changes in this area that might make
> > such a backport difficult. And we should expect nautilus to be nearing
> > its end...
>
> But ... we just got to Nautilus... :)

Ouch, we just suffered this or a similar issue on our big prod block
storage cluster running 14.2.19.
But in our case it wasn't related to an old client -- rather we had
100% mon cpu and election storms but also huge tcmallocs all following
the recreation of a couple OSDs.
We wrote the details here: https://tracker.ceph.com/issues/50587

-- Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Host ceph version in dashboard incorrect after upgrade

2021-04-29 Thread mabi
Hello,

I upgraded my Octopus test cluster which has 5 hosts because one of the node (a 
mon/mgr node) was still on version 15.2.10 but all the others on 15.2.11.

For the upgrade I used the following command:

ceph orch upgrade start --ceph-version 15.2.11

The upgrade worked correctly and I did not see any errors in the logs but the 
host version in the ceph dashboard (under the navigation Cluster -> Hosts) 
still snows 15.2.10 for that specific node.

The output of "ceph versions", shows that every component is on 15.2.11 as you 
can see below:

{
"mon": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) 
octopus (stable)": 3
},
"mgr": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) 
octopus (stable)": 2
},
"osd": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) 
octopus (stable)": 2
},
"mds": {},
"overall": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) 
octopus (stable)": 7
}
}

So why is it still stuck on 15.2.10 in the dashboard?

Best regards,
Mabi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance questions - 4 node (commodity) cluster - what to expect (and what not ;-)

2021-04-29 Thread Marc
Are you sure your ssd pool is only having ssd's and not maybe some hdd's? In 
past versions of ceph you had to modify crush rules to separate ssd and hdd 
classes. Could be this is not necessary any more in pacific.



> -Original Message-
> From: Schmid, Michael 
> Sent: 29 April 2021 15:52
> To: ceph-users@ceph.io
> Subject: [ceph-users] Performance questions - 4 node (commodity) cluster
> - what to expect (and what not ;-)
> 
> Hello folks,
> 
> I am new to ceph and at the moment I am doing some performance tests
> with a 4 node ceph-cluster (pacific, 16.2.1).
> 
> Node hardware (4 identical nodes):
> 
>   *   DELL 3620 workstation
>   *   Intel Quad-Core i7-6700@3.4 GHz
>   *   8 GB RAM
>   *   Debian Buster (base system, installed a dedicated on Patriot Burst
> 120 GB SATA-SSD)
>   *   HP 530SPF+ 10 GBit dual-port NIC (tested with iperf to 9.4 GBit/s
> from node to node)
>   *   1 x Kingston KC2500 M2 NVMe PCIe SSD (500 GB, NO power loss
> protection !)
>   *   3 x Seagate Barracuda SATA disk drives (7200 rpm, 500 GB)
> 
> After bootstrapping a containerized (docker) ceph-cluster, I did some
> performance tests on the NVMe storage by creating a storage pool called
> „ssdpool“, consisting of 4 OSDs per (one) NVMe device (per node). A
> first write-performance test yields
> 
> =
> root@ceph1:~# rados bench -p ssdpool 10 write -b 4M -t 16 --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size
> 4194304 for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_ceph1_78
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
> 0   0 0 0 0 0   -
> 0
> 1  16301455.99756   0.0209977
> 0.493427
> 2  165337   73.990392   0.0264305
> 0.692179
> 3  167660   79.9871920.559505
> 0.664204
> 4  169983   82.9879920.609332
> 0.721016
> 5  16   116   100   79.9889680.686093
> 0.698084
> 6  16   132   116   77.322464 1.19715
> 0.731808
> 7  16   153   137   78.2741840.622646
> 0.755812
> 8  16   171   15577.48672 0.25409
> 0.764022
> 9  16   192   176   78.2076840.968321
> 0.775292
>10  16   214   198   79.1856880.401339
> 0.766764
>11   1   214   213   77.4408600.969693
> 0.784002
> Total time run: 11.0698
> Total writes made:  214
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 77.3272
> Stddev Bandwidth:   13.7722
> Max bandwidth (MB/sec): 92
> Min bandwidth (MB/sec): 56
> Average IOPS:   19
> Stddev IOPS:3.44304
> Max IOPS:   23
> Min IOPS:   14
> Average Latency(s): 0.785372
> Stddev Latency(s):  0.49011
> Max latency(s): 2.16532
> Min latency(s): 0.0144995
> =
> 
> ... and I think that 80 MB/s throughput is a very poor result in
> conjunction with NVMe devices and 10 GBit nics.
> 
> A bare write-test (with fsync=0 option) of the NVMe drives yields a
> write throughput of round about 800 MB/s per device ... the second test
> (with fsync=1) drops performance to 200 MB/s.
> 
> =
> root@ceph1:/home/mschmid# fio --rw=randwrite --name=IOPS-write --
> bs=1024k --direct=1 --filename=/dev/nvme0n1 --numjobs=4 --
> ioengine=libaio --iodepth=32 --refill_buffers --group_reporting --
> runtime=30 --time_based --fsync=0
> IOPS-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-
> 1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32...
> fio-3.12
> Starting 4 processes
> Jobs: 4 (f=4): [w(4)][100.0%][w=723MiB/s][w=722 IOPS][eta 00m:00s]
> IOPS-write: (groupid=0, jobs=4): err= 0: pid=31585: Thu Apr 29 15:15:03
> 2021
>   write: IOPS=740, BW=740MiB/s (776MB/s)(21.8GiB/30206msec); 0 zone
> resets
> slat (usec): min=16, max=810, avg=106.48, stdev=30.48
> clat (msec): min=7, max=1110, avg=172.09, stdev=120.18
>  lat (msec): min=7, max=1110, avg=172.19, stdev=120.18
> clat percentiles (msec):
>  |  1.00th=[   32],  5.00th=[   48], 10.00th=[   53], 20.00th=[
> 63],
>  | 30.00th=[  115], 40.00th=[  161], 50.00th=[  169], 60.00th=[
> 178],
>  | 70.00th=[  190], 80.00th=[  220], 90.00th=[  264], 95.00th=[
> 368],
>  | 99.00th=[  667], 99.50th=[  751], 99.90th=[  894], 99.95th=[
> 986],
>  | 99.99th=[ 1036]
>bw (  KiB/s): min=22528, max=639744, per=25.02%, avg=189649.94,
> stdev=113845.69, samples=240
>iops: min=   22, max=  624, avg=185.11, stdev=111.18,
> samples=240
>   lat (msec)   : 10=0.01%, 20=0.19%, 50=6.43%, 100=20.29%, 250=61.52%
>   lat (msec)   : 500=8.21%, 750=2.85%, 1000=0.47%
>   cpu  : usr=11.87%, sys=2.05%, ctx=13141, majf=

[ceph-users] Re: Host ceph version in dashboard incorrect after upgrade

2021-04-29 Thread mabi
I also thought about restarting the MGR service but I am new to ceph and could 
not find the "cephadm orch" command in order to do that... What would be the 
command to restart the mgr service on a specific node?

‐‐‐ Original Message ‐‐‐
On Thursday, April 29, 2021 7:23 PM, Eugen Block  wrote:

> I would restart the active MGR, that should resolve it.
>
> Zitat von mabi m...@protonmail.ch:
>
> > Hello,
> > I upgraded my Octopus test cluster which has 5 hosts because one of
> > the node (a mon/mgr node) was still on version 15.2.10 but all the
> > others on 15.2.11.
> > For the upgrade I used the following command:
> > ceph orch upgrade start --ceph-version 15.2.11
> > The upgrade worked correctly and I did not see any errors in the
> > logs but the host version in the ceph dashboard (under the
> > navigation Cluster -> Hosts) still snows 15.2.10 for that specific
> > node.
> > The output of "ceph versions", shows that every component is on
> > 15.2.11 as you can see below:
> > {
> > "mon": {
> > "ceph version 15.2.11
> > (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 3
> > },
> > "mgr": {
> > "ceph version 15.2.11
> > (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
> > },
> > "osd": {
> > "ceph version 15.2.11
> > (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
> > },
> > "mds": {},
> > "overall": {
> > "ceph version 15.2.11
> > (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 7
> > }
> > }
> > So why is it still stuck on 15.2.10 in the dashboard?
> > Best regards,
> > Mabi
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-04-29 Thread Alex Gorbachev
Mark,

My main note was to make sure and NOT enable msgr2 until all OSDs are
upgraded to Nautilus.  I made that mistake early in the lab, and had to
work hard to get it back together.  Otherwise, pretty smooth process.

--
Alex Gorbachev
ISS/Storcium



On Thu, Apr 29, 2021 at 4:58 AM Mark Schouten  wrote:

> Hi,
>
> We've done our fair share of Ceph cluster upgrades since Hammer, and
> have not seen much problems with them. I'm now at the point that I have
> to upgrade a rather large cluster running Luminous and I would like to
> hear from other users if they have experiences with issues I can expect
> so that I can anticipate on them beforehand.
>
> As said, the cluster is running Luminous (12.2.13) and has the following
> services active:
>   services:
> mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04
> mgr: osdnode01(active), standbys: osdnode02, osdnode03
> mds: pmrb-3/3/3 up
> {0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active}, 1
> up:standby
> osd: 116 osds: 116 up, 116 in;
> rgw: 3 daemons active
>
>
> Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster
> is 1.01PiB.
>
> We have 2 active crush-rules on 18 pools. All pools have a size of 3 there
> is a total of 5760 pgs.
> {
> "rule_id": 1,
> "rule_name": "hdd-data",
> "ruleset": 1,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -10,
> "item_name": "default~hdd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 2,
> "rule_name": "ssd-data",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -21,
> "item_name": "default~ssd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
>
> rbd -> crush_rule: hdd-data
> .rgw.root -> crush_rule: hdd-data
> default.rgw.control -> crush_rule: hdd-data
> default.rgw.data.root -> crush_rule: ssd-data
> default.rgw.gc -> crush_rule: ssd-data
> default.rgw.log -> crush_rule: ssd-data
> default.rgw.users.uid -> crush_rule: hdd-data
> default.rgw.usage -> crush_rule: ssd-data
> default.rgw.users.email -> crush_rule: hdd-data
> default.rgw.users.keys -> crush_rule: hdd-data
> default.rgw.meta -> crush_rule: hdd-data
> default.rgw.buckets.index -> crush_rule: ssd-data
> default.rgw.buckets.data -> crush_rule: hdd-data
> default.rgw.users.swift -> crush_rule: hdd-data
> default.rgw.buckets.non-ec -> crush_rule: ssd-data
> DB0475 -> crush_rule: hdd-data
> cephfs_pmrb_data -> crush_rule: hdd-data
> cephfs_pmrb_metadata -> crush_rule: ssd-data
>
>
> All but four clients are running Luminous, the four are running Jewel
> (that needs upgrading before proceeding with this upgrade).
>
> So, normally, I would 'just' upgrade all Ceph packages on the
> monitor-nodes and restart mons and then mgrs.
>
> After that, I would upgrade all Ceph packages on the OSD nodes and
> restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting
> the OSD's will probably take a while.
>
> If anyone has a hint on what I should expect to cause some extra load or
> waiting time, that would be great.
>
> Obviously, we have read
> https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking
> for real world experiences.
>
> Thanks!
>
>
> --
> Mark Schouten | Tuxis B.V.
> KvK: 74698818 | http://www.tuxis.nl/
> T: +31 318 200208 | i...@tuxis.nl
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance questions - 4 node (commodity) cluster - what to expect (and what not ;-)

2021-04-29 Thread Schmid, Michael
Hello,

i did the necessary modifications to the crush ruleset and checked the osd to 
pool mapping.

Best regards,
Michael

Schöne Grüße,

Michael Schmid
__
Berufliche Oberschule Rosenheim
Westerndorfer Straße 45
D - 83024 Rosenheim
fon: +49 (0) 80 31 / 2843 - 422
fax: +49 (0) 80 31 / 2843 - 435
mail: m.sch...@fosbos-rosenheim.de
web: https://www.fosbos-rosenheim.de
__

P.S.: Diese Nachricht wurde von einem mobilen Gerät per Spracheingabe gesendet. 
Ich bitte daher um Entschuldigung für etwaige Fehler.

Von: Marc 
Gesendet: Thursday, April 29, 2021 10:15:20 PM
An: Schmid, Michael ; ceph-users@ceph.io 

Betreff: RE: Performance questions - 4 node (commodity) cluster - what to 
expect (and what not ;-)

Are you sure your ssd pool is only having ssd's and not maybe some hdd's? In 
past versions of ceph you had to modify crush rules to separate ssd and hdd 
classes. Could be this is not necessary any more in pacific.



> -Original Message-
> From: Schmid, Michael 
> Sent: 29 April 2021 15:52
> To: ceph-users@ceph.io
> Subject: [ceph-users] Performance questions - 4 node (commodity) cluster
> - what to expect (and what not ;-)
>
> Hello folks,
>
> I am new to ceph and at the moment I am doing some performance tests
> with a 4 node ceph-cluster (pacific, 16.2.1).
>
> Node hardware (4 identical nodes):
>
>   *   DELL 3620 workstation
>   *   Intel Quad-Core i7-6700@3.4 GHz
>   *   8 GB RAM
>   *   Debian Buster (base system, installed a dedicated on Patriot Burst
> 120 GB SATA-SSD)
>   *   HP 530SPF+ 10 GBit dual-port NIC (tested with iperf to 9.4 GBit/s
> from node to node)
>   *   1 x Kingston KC2500 M2 NVMe PCIe SSD (500 GB, NO power loss
> protection !)
>   *   3 x Seagate Barracuda SATA disk drives (7200 rpm, 500 GB)
>
> After bootstrapping a containerized (docker) ceph-cluster, I did some
> performance tests on the NVMe storage by creating a storage pool called
> „ssdpool“, consisting of 4 OSDs per (one) NVMe device (per node). A
> first write-performance test yields
>
> =
> root@ceph1:~# rados bench -p ssdpool 10 write -b 4M -t 16 --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size
> 4194304 for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_ceph1_78
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
> 0   0 0 0 0 0   -
> 0
> 1  16301455.99756   0.0209977
> 0.493427
> 2  165337   73.990392   0.0264305
> 0.692179
> 3  167660   79.9871920.559505
> 0.664204
> 4  169983   82.9879920.609332
> 0.721016
> 5  16   116   100   79.9889680.686093
> 0.698084
> 6  16   132   116   77.322464 1.19715
> 0.731808
> 7  16   153   137   78.2741840.622646
> 0.755812
> 8  16   171   15577.48672 0.25409
> 0.764022
> 9  16   192   176   78.2076840.968321
> 0.775292
>10  16   214   198   79.1856880.401339
> 0.766764
>11   1   214   213   77.4408600.969693
> 0.784002
> Total time run: 11.0698
> Total writes made:  214
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 77.3272
> Stddev Bandwidth:   13.7722
> Max bandwidth (MB/sec): 92
> Min bandwidth (MB/sec): 56
> Average IOPS:   19
> Stddev IOPS:3.44304
> Max IOPS:   23
> Min IOPS:   14
> Average Latency(s): 0.785372
> Stddev Latency(s):  0.49011
> Max latency(s): 2.16532
> Min latency(s): 0.0144995
> =
>
> ... and I think that 80 MB/s throughput is a very poor result in
> conjunction with NVMe devices and 10 GBit nics.
>
> A bare write-test (with fsync=0 option) of the NVMe drives yields a
> write throughput of round about 800 MB/s per device ... the second test
> (with fsync=1) drops performance to 200 MB/s.
>
> =
> root@ceph1:/home/mschmid# fio --rw=randwrite --name=IOPS-write --
> bs=1024k --direct=1 --filename=/dev/nvme0n1 --numjobs=4 --
> ioengine=libaio --iodepth=32 --refill_buffers --group_reporting --
> runtime=30 --time_based --fsync=0
> IOPS-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-
> 1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32...
> fio-3.12
> Starting 4 processes
> Jobs: 4 (f=4): [w(4)][100.0%][w=723MiB/s][w=722 IOPS][eta 00m:00s]
> IOPS-write: (groupid=0, jobs=4): err= 0: pid=31585: Thu Apr 29 15:15:03
> 2021
>   write: IOPS=740, BW=740MiB/s (776MB/s)(21.8GiB/30206msec); 0 zone
> resets
> slat (usec): min=16, max=810, avg=106.48, stdev=30.48
> clat (msec): min=7, max=1110, a