[ceph-users] Re: Ceph OIDC Integration
Hello, The next Octopus release should be there in 3-4 weeks. In Octopus, shadow users aren't created ((for federated oidc users). But we later realised that shadow users are needed to maintain user stats, hence the code for the same is under the process of being added as of now and should be available in Pacific release. We have also done away with the token introspection url in the latest code and have switched over to offline token validation using the IDP's certs since using token introspection url wouldn't scale well for multiple clients. There is a related Ceph Tech Talk here, that you can watch: https://www.youtube.com/watch?v=Lc32meILfNI=410s Thanks, Pritha On Mon, Oct 19, 2020 at 8:30 PM wrote: > Dear Pritha, thanks a lot for your feedback and apologies for missing your > comment about the backporting. Would you have a rough estimate on the next > Octopus release by any chance? > > On another note on the same subject, would you be able to give us some > feedback on how the users will be created in Ceph? (for example when we > used ldap, an ldap user used to be created in Ceph for "mapping", will it > be the same in this case) > > If we have multiple tenants (unique usernames "emails" in KeyCloak) how > will the introspect url's be defined for different tenants? > > Thanks in advance > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: multiple OSD crash, unfound objects
Dear Michael, > > Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an > > OSD mapping? I meant here with crush rule replicated_host_nvme. Sorry, forgot. > Yes, the OSD was still out when the previous health report was created. Hmm, this is odd. If this is correct, then it did report a slow op even though it was out of the cluster: > from https://pastebin.com/3G3ij9ui: > [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons > [osd.0,osd.41] have slow ops. Not sure what to make of that. It looks almost like you have a ghost osd.41. I think (some of) the slow ops you are seeing are directed to the health_metrics pool and can be ignored. If it is too annoying, you could try to find out who runs the client with IDs client.7524484 and disable it. Might be an MGR module. Looking at the data you provided and also some older threads of yours (https://www.mail-archive.com/ceph-users@ceph.io/msg05842.html), I start considering that we are looking at the fall-out of a past admin operation. A possibility is, that an upmap for PG 1.0 exists that conflicts with the crush rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. This result is an empty set. I couldn't really find a simple command to list up-maps. The only non-destructive way seems to be to extract the osdmap and create a clean-up command file. The cleanup file should contain a command for every PG with an upmap. To check this, you can execute (see also https://docs.ceph.com/en/latest/man/8/osdmaptool/) # ceph osd getmap > osd.map # osdmaptool osd.map --upmap-cleanup cleanup.cmd If you do this, could you please post as usual the contents of cleanup.cmd? Also, with the OSD map of your cluster, you can simulate certain admin operations and check resulting PG mappings for pools and other things without having to touch the cluster; see https://docs.ceph.com/en/latest/man/8/osdmaptool/. To dig a little bit deeper, could you please post as usual the output of: - ceph pg 1.0 query - ceph pg 7.39d query It would also be helpful if you could post the decoded crush map. You can get the map as a txt-file as follows: # ceph osd getcrushmap -o crush-orig.bin # crushtool -d crush-orig.bin -o crush.txt and post the contents of file crush.txt. Did the slow MDS request complete by now? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 Contents of previous messages removed. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Huge RAM Ussage on OSD recovery
Hi, today mi Infra provider has a blackout, then the Ceph was try to recover but are in an inconsistent state because many OSD can recover itself because the kernel kill it by OOM. Even now one OSD that was OK, go down by OOM killed. Even in a server with 32GB RAM the OSD use ALL that and never recover, i think that can be a memory leak, ceph version octopus 15.2.3 In: https://pastebin.pl/view/59089adc You can see that buffer_anon get 32GB, but why?? all my cluster is down because that. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: multiple OSD crash, unfound objects
On 10/20/20 1:18 PM, Frank Schilder wrote: Dear Michael, Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD mapping? I meant here with crush rule replicated_host_nvme. Sorry, forgot. Seems to have worked fine: https://pastebin.com/PFgDE4J1 Yes, the OSD was still out when the previous health report was created. Hmm, this is odd. If this is correct, then it did report a slow op even though it was out of the cluster: from https://pastebin.com/3G3ij9ui: [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons [osd.0,osd.41] have slow ops. Not sure what to make of that. It looks almost like you have a ghost osd.41. I think (some of) the slow ops you are seeing are directed to the health_metrics pool and can be ignored. If it is too annoying, you could try to find out who runs the client with IDs client.7524484 and disable it. Might be an MGR module. I'm also pretty certain that the slow ops are related to the health metrics pool, which is why I've been ignoring them. What I'm not sure about is whether re-creating the device_health_metrics pool will cause any problems in the ceph cluster. Looking at the data you provided and also some older threads of yours (https://www.mail-archive.com/ceph-users@ceph.io/msg05842.html), I start considering that we are looking at the fall-out of a past admin operation. A possibility is, that an upmap for PG 1.0 exists that conflicts with the crush rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. This result is an empty set. So var I've been unable to locate the client with the ID 7524484. It's not showing up in the manager dashboard -> Filesystems page, nor in the output of 'ceph tell mds.ceph1 client ls'. I'm digging through the compress logs for the past week to see if I can find the culprit. I couldn't really find a simple command to list up-maps. The only non-destructive way seems to be to extract the osdmap and create a clean-up command file. The cleanup file should contain a command for every PG with an upmap. To check this, you can execute (see also https://docs.ceph.com/en/latest/man/8/osdmaptool/) # ceph osd getmap > osd.map # osdmaptool osd.map --upmap-cleanup cleanup.cmd If you do this, could you please post as usual the contents of cleanup.cmd? It was empty: [root@ceph1 ~]# ceph osd getmap > osd.map got osdmap epoch 52833 [root@ceph1 ~]# osdmaptool osd.map --upmap-cleanup cleanup.cmd osdmaptool: osdmap file 'osd.map' writing upmap command output to: cleanup.cmd checking for upmap cleanups [root@ceph1 ~]# wc cleanup.cmd 0 0 0 cleanup.cmd Also, with the OSD map of your cluster, you can simulate certain admin operations and check resulting PG mappings for pools and other things without having to touch the cluster; see https://docs.ceph.com/en/latest/man/8/osdmaptool/. To dig a little bit deeper, could you please post as usual the output of: - ceph pg 1.0 query - ceph pg 7.39d query Oddly, it claims that it doesn't have pgid 1.0. https://pastebin.com/pHh33Dq7 It would also be helpful if you could post the decoded crush map. You can get the map as a txt-file as follows: # ceph osd getcrushmap -o crush-orig.bin # crushtool -d crush-orig.bin -o crush.txt and post the contents of file crush.txt. https://pastebin.com/EtEGpWy3 Did the slow MDS request complete by now? Nope. --Mike ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] v14.2.12 Nautilus released
This is the 12th backport release in the Nautilus series. This release brings a number of bugfixes across all major components of Ceph. We recommend that all Nautilus users upgrade to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v14-2-12-nautilus-released Notable Changes --- * The `ceph df` command now lists the number of pgs in each pool. * Monitors now have a config option `mon_osd_warn_num_repaired`, 10 by default. If any OSD has repaired more than this many I/O errors in stored data a `OSD_TOO_MANY_REPAIRS` health warning is generated. In order to allow clearing of the warning, a new command `ceph tell osd.# clear_shards_repaired [count]` has been added. By default it will set the repair count to 0. If you wanted to be warned again if additional repairs are performed you can provide a value to the command and specify the value of `mon_osd_warn_num_repaired`. This command will be replaced in future releases by the health mute/unmute feature. * It is now possible to specify the initial monitor to contact for Ceph tools and daemons using the `mon_host_override` config option or `--mon-host-override ` command-line switch. This generally should only be used for debugging and only affects initial communication with Ceph’s monitor cluster. Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.12.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 2f3caa3b8b3d5c5f2719a1e9d8e7deea5ae1a5c6 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
Hello Eugen, Rebooted the other two MONs fixed the problem root@osswrkprbe001:~# ceph status cluster: id: 56820176-ae5b-4e58-84a2-442b2fc03e6d health: HEALTH_OK services: mon: 3 daemons, quorum osswrkprbe001,osswrkprbe002,osswrkprbe003 (age 3m) mgr: osswrkprbe002(active, since 6m), standbys: osswrkprbe001, osswrkprbe003 osd: 3 osds: 3 up, 3 in data: pools: 6 pools, 162 pgs objects: 16.27k objects, 58 GiB usage: 173 GiB used, 2.0 TiB / 2.2 TiB avail pgs: 162 active+clean io: client: 341 B/s rd, 473 KiB/s wr, 0 op/s rd, 43 op/s wr Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) - Original Message - From: "Emanuel Alejandro Castelli" To: "Eugen Block" Cc: "ceph-users" Sent: Tuesday, October 20, 2020 10:29:05 AM Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04 And the same for MON3 [5243018.443159] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243033.801504] libceph: mon2 192.168.14.152:6789 socket error on write [5243034.473450] libceph: mon2 192.168.14.152:6789 socket error on write [5243035.497397] libceph: mon2 192.168.14.152:6789 socket error on write [5243037.481225] libceph: mon2 192.168.14.152:6789 socket error on write [5243041.480864] libceph: mon2 192.168.14.152:6789 socket error on write [5243049.672236] libceph: mon2 192.168.14.152:6789 socket error on write [5243064.519492] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243065.479388] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243066.471478] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243068.455281] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243072.454806] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243080.646202] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243095.236682] libceph: mon2 192.168.14.152:6789 socket error on write [5243096.484606] libceph: mon2 192.168.14.152:6789 socket error on write [5243097.476518] libceph: mon2 192.168.14.152:6789 socket error on write [5243099.492380] libceph: mon2 192.168.14.152:6789 socket error on write [5243103.684014] libceph: mon2 192.168.14.152:6789 socket error on write [5243111.619439] libceph: mon2 192.168.14.152:6789 socket error on write Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) - Original Message - From: "Emanuel Alejandro Castelli" To: "Eugen Block" Cc: "ceph-users" Sent: Tuesday, October 20, 2020 10:27:15 AM Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04 From MON1, dmesg I get this: [3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
And the same for MON3 [5243018.443159] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243033.801504] libceph: mon2 192.168.14.152:6789 socket error on write [5243034.473450] libceph: mon2 192.168.14.152:6789 socket error on write [5243035.497397] libceph: mon2 192.168.14.152:6789 socket error on write [5243037.481225] libceph: mon2 192.168.14.152:6789 socket error on write [5243041.480864] libceph: mon2 192.168.14.152:6789 socket error on write [5243049.672236] libceph: mon2 192.168.14.152:6789 socket error on write [5243064.519492] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243065.479388] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243066.471478] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243068.455281] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243072.454806] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243080.646202] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [5243095.236682] libceph: mon2 192.168.14.152:6789 socket error on write [5243096.484606] libceph: mon2 192.168.14.152:6789 socket error on write [5243097.476518] libceph: mon2 192.168.14.152:6789 socket error on write [5243099.492380] libceph: mon2 192.168.14.152:6789 socket error on write [5243103.684014] libceph: mon2 192.168.14.152:6789 socket error on write [5243111.619439] libceph: mon2 192.168.14.152:6789 socket error on write Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) - Original Message - From: "Emanuel Alejandro Castelli" To: "Eugen Block" Cc: "ceph-users" Sent: Tuesday, October 20, 2020 10:27:15 AM Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04 From MON1, dmesg I get this: [3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) Butfrom MON2 I get this: [5242753.074620] libceph: mon2
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
From MON1, dmesg I get this: [3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) Butfrom MON2 I get this: [5242753.074620] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242761.266727] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242779.959468] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [5242806.834049] libceph: mon1 192.168.14.151:6789 socket error on write [5242808.049952] libceph: mon1 192.168.14.151:6789 socket error on write [5242809.041947] libceph: mon1 192.168.14.151:6789 socket error on write [5242811.057917] libceph: mon1 192.168.14.151:6789 socket error on write [5242815.285867] libceph: mon1 192.168.14.151:6789 socket error on write [5242824.241921] libceph: mon1 192.168.14.151:6789 socket error on write [5242837.554174] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242838.034339] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242839.026139] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242841.010177] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242845.234101] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242853.169905] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242870.102324] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [5242901.041812] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242902.033763] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242903.026350] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242905.009497] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242909.233740] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242917.169724] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242931.761103] libceph: mon1 192.168.14.151:6789 socket error on write [5242932.049095] libceph: mon1 192.168.14.151:6789 socket error on write
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
I have 3 MON, I don't know why it's showing only one. root@osswrkprbe001:~# ceph --connect-timeout 60 status Cluster connection interrupted or timed out cephadm logs --name mon.osswrkprbe001 --> Is there any way to go to a specific date? Because it stars from Oct 4. I want to check from Oct 16 and ahead. I suspect that something happened that day. Also, I don't know how to troubleshoot this. I did the same (./cephadm logs --name mon.osswrkprbe002) in the second MON but it starts the logs from Sep 30. I would need to check Oct 16 also. I would appreciate if you can help me with the troubleshooting. Thank you. Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) - Original Message - From: "Eugen Block" To: "ceph-users" Sent: Tuesday, October 20, 2020 10:02:35 AM Subject: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04 Your mon container seems up and running, have you tried restarting it? You just have one mon, is that correct? Do you see anything in the logs? cephadm logs --name mon.osswrkprbe001 How long do you wait until you hit CTRL-C? There's a connection-timeout option for ceph commands, maybe try a higher timeout? ceph --connect-timeout 60 status Is the node hosting the mon showing any issues in dmesg, df -h, syslog, etc.? Regards, Eugen Zitat von Emanuel Alejandro Castelli : > Hello > > > I'm facing an issue with ceph. I cannot run any ceph command. It > literally hangs. I need to hit CTRL-C to get this: > > > > > ^CCluster connection interrupted or timed out > > > > > This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get > information from the cluster, but now there is no data to graph. Any > clue? > > > BQ_BEGIN > > > cephadm version > BQ_END > > BQ_BEGIN > > > INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version > 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) > BQ_END > > cephadm ls > [ > { > "style": "cephadm:v1", > "name": "mon.osswrkprbe001", > "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", > "systemd_unit": > "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001", > "enabled": true, > "state": "running", > "container_id": > "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf", > "container_image_name": "docker.io/ceph/ceph:v15.2.1", > "container_image_id": > "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", > "version": "15.2.1", > "started": "2020-10-19T19:03:16.759730", > "created": "2020-09-04T23:30:30.250336", > "deployed": "2020-09-04T23:48:20.956277", > "configured": "2020-09-04T23:48:22.100283" > }, > { > "style": "cephadm:v1", > "name": "mgr.osswrkprbe001", > "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", > "systemd_unit": > "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001", > "enabled": true, > "state": "running", > "container_id": > "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8", > "container_image_name": "docker.io/ceph/ceph:v15.2.1", > "container_image_id": > "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", > "version": "15.2.1", > "started": "2020-10-19T20:43:38.329529", > "created": "2020-09-04T23:30:31.110341", > "deployed": "2020-09-04T23:47:41.604057", > "configured": "2020-09-05T00:00:21.064246" > } > ] > > > Thank you in advance. > > > Saludos, > > > > EMANUEL CASTELLI > > Arquitecto de Información - Gerencia OSS > > C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | > ecaste...@telecentro.net.ar > > Lavardén 157 1er piso. CABA (C1437FBC) > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Problems with ceph command - Octupus - Ubuntu 16.04
Hello I'm facing an issue with ceph. I cannot run any ceph command. It literally hangs. I need to hit CTRL-C to get this: ^CCluster connection interrupted or timed out This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get information from the cluster, but now there is no data to graph. Any clue? BQ_BEGIN cephadm version BQ_END BQ_BEGIN INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) BQ_END cephadm ls [ { "style": "cephadm:v1", "name": "mon.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001", "enabled": true, "state": "running", "container_id": "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T19:03:16.759730", "created": "2020-09-04T23:30:30.250336", "deployed": "2020-09-04T23:48:20.956277", "configured": "2020-09-04T23:48:22.100283" }, { "style": "cephadm:v1", "name": "mgr.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001", "enabled": true, "state": "running", "container_id": "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T20:43:38.329529", "created": "2020-09-04T23:30:31.110341", "deployed": "2020-09-04T23:47:41.604057", "configured": "2020-09-05T00:00:21.064246" } ] Thank you in advance. Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pool pgp_num not updated
OK, so for interventions, I've pushed these configs out: ceph config set mon.* target_max_misplaced_ratio 0.05 > 0.20 ceph config get osd.* osd_max_backfills 1 > 4 ceph config set osd.* osd_recovery_max_active 1 > 4 And also ran injectargs to push the changes to the OSDs hot. I'll monitor it for a bit to see how it reacts to the more aggressive settings. Thanks, Mac Wynkoop On Tue, Oct 20, 2020 at 8:52 AM Eugen Block wrote: > The default for max misplaced objects is this (5%): > > ceph-node1:~ # ceph config get mon target_max_misplaced_ratio > 0.05 > > You can increase this for the splitting process but I would recommend > to rollback as soon as the splitting has finished. > > > Zitat von Lindsay Mathieson : > > > On 20/10/2020 11:38 pm, Mac Wynkoop wrote: > >> Autoscaler isn't on, what part of Ceph is handling the increase of > pgp_num? > >> Because I'd like to turn up the rate at which it splits the PG's, but if > >> autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas? > > > > Normal recovery ops I imagine - Bump up the recovery settings, Max > > Backfills and Recovery Max Active > > > > -- > > Lindsay > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pool pgp_num not updated
The default for max misplaced objects is this (5%): ceph-node1:~ # ceph config get mon target_max_misplaced_ratio 0.05 You can increase this for the splitting process but I would recommend to rollback as soon as the splitting has finished. Zitat von Lindsay Mathieson : On 20/10/2020 11:38 pm, Mac Wynkoop wrote: Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num? Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas? Normal recovery ops I imagine - Bump up the recovery settings, Max Backfills and Recovery Max Active -- Lindsay ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pool pgp_num not updated
On 20/10/2020 11:38 pm, Mac Wynkoop wrote: Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num? Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas? Normal recovery ops I imagine - Bump up the recovery settings, Max Backfills and Recovery Max Active -- Lindsay ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pool pgp_num not updated
Alrighty, so we're all recovered and balanced at this point, but I'm not seeing this behavior: *pool 40 'hou-ec-1.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1109 pgp_num_target 2048 last_change 8654141 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete stripe_width 24576 fast_read 1 application rgw* I don't have autoscaler enabled for the cluster, or this pool, but the pgp_num is slowly incrementing up to the pgp_num_target value. If Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num? Because I'd like to turn up the rate at which it splits the PG's, but if autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas? Thanks, Mac Wynkoop On Thu, Oct 8, 2020 at 8:16 AM Mac Wynkoop wrote: > OK, great. We'll keep tabs on it for now then and try again once we're > fully rebalanced. > Mac Wynkoop, Senior Datacenter Engineer > *NetDepot.com:* Cloud Servers; Delivered > Houston | Atlanta | NYC | Colorado Springs > > 1-844-25-CLOUD Ext 806 > > > > > On Thu, Oct 8, 2020 at 2:08 AM Eugen Block wrote: > >> Yes, after your cluster has recovered you'll be able to increase >> pgp_num. Or your change will be applied automatically since you >> already set it, I'm not sure but you'll see. >> >> >> Zitat von Mac Wynkoop : >> >> > Well, backfilling sure, but will it allow me to actually change the >> pgp_num >> > as more space frees up? Because the issue is that I cannot modify that >> > value. >> > >> > Thanks, >> > Mac Wynkoop, Senior Datacenter Engineer >> > *NetDepot.com:* Cloud Servers; Delivered >> > Houston | Atlanta | NYC | Colorado Springs >> > >> > 1-844-25-CLOUD Ext 806 >> > >> > >> > >> > >> > On Wed, Oct 7, 2020 at 1:50 PM Eugen Block wrote: >> > >> >> Yes, I think that’s exactly the reason. As soon as the cluster has >> >> more space the backfill will continue. >> >> >> >> >> >> Zitat von Mac Wynkoop : >> >> >> >> > The cluster is currently in a warn state, here's the scrubbed output >> of >> >> > ceph -s: >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > *cluster:id: *redacted*health: HEALTH_WARN >> >> > noscrub,nodeep-scrub flag(s) set22 nearfull osd(s) >> >> 2 >> >> > pool(s) nearfullLow space hindering backfill (add >> storage if >> >> > this doesn't resolve itself): 277 pgs backfill_toofull >> >> Degraded >> >> > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 >> pgs >> >> > degraded, 341 pgs undersized1214 pgs not deep-scrubbed in >> >> time >> >> > 2647 pgs not scrubbed in time2 daemons have >> >> recently >> >> > crashed services:mon: 5 daemons, *redacted* (age 44h) >> >> mgr: >> >> > *redacted*osd: 162 osds: 162 up (since 44h), 162 >> in >> >> > (since 4d); 971 remapped pgs flags >> noscrub,nodeep-scrub >> >> > rgw: 3 daemons active *redacted*tcmu-runner: 18 daemons >> >> active >> >> > *redacted* data:pools: 10 pools, 2648 pgsobjects: 409.56M >> >> > objects, 738 TiBusage: 1.3 PiB used, 580 TiB / 1.8 PiB avail >> >> pgs: >> >> > 32652738/3651947772 objects degraded (0.894%) >> >> > 517370913/3651947772 objects misplaced (14.167%) 1677 >> >> > active+clean 477 active+remapped+backfill_wait >> >> 100 >> >> > active+remapped+backfill_wait+backfill_toofull 80 >> >> > active+undersized+degraded+remapped+backfill_wait 60 >> >> > active+undersized+degraded+remapped+backfill_wait+backfill_toofull >> >> >42 active+undersized+degraded+remapped+backfill_toofull >> >> 33 >> >> > active+undersized+degraded+remapped+backfilling 25 >> >> > active+remapped+backfilling 25 >> >> > active+remapped+backfill_toofull 24 >> >> > active+undersized+remapped+backfilling 23 >> >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait >> >> >19 >> >> > >> >> >> active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull >> >> >15 active+undersized+remapped+backfill_wait >> 14 >> >> > active+undersized+remapped+backfill_wait+backfill_toofull >> 12 >> >> > active+forced_recovery+undersized+degraded+remapped+backfill_toofull >> >> > 12 >> active+forced_recovery+undersized+degraded+remapped+backfilling >> >> >5active+undersized+remapped+backfill_toofull >>3 >> >> > active+remapped 1active+undersized+remapped >> >> 1 >> >> >active+forced_recovery+undersized+remapped+backfilling io: >> >>
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
Your 'cephadm ls' output was only from one node, I assumed you just bootstrapped the first node. The 'cephadm logs' command should provide pager-output so you can scroll or search for a specific date. I'm not sure what caused this but "error on write" is bad. As I already wrote check the filesystems on your nodes, dmesg etc. It seems as if two of your MONs are down which would make your cluster unavailable (no quorum). Is mon3 up and running? Bringing back one of the other two MONs would bring the cluster back up. Zitat von Emanuel Alejandro Castelli : From MON1, dmesg I get this: [3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed (con state CONNECTING) [3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) Butfrom MON2 I get this: [5242753.074620] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242761.266727] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242779.959468] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [5242806.834049] libceph: mon1 192.168.14.151:6789 socket error on write [5242808.049952] libceph: mon1 192.168.14.151:6789 socket error on write [5242809.041947] libceph: mon1 192.168.14.151:6789 socket error on write [5242811.057917] libceph: mon1 192.168.14.151:6789 socket error on write [5242815.285867] libceph: mon1 192.168.14.151:6789 socket error on write [5242824.241921] libceph: mon1 192.168.14.151:6789 socket error on write [5242837.554174] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242838.034339] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242839.026139] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242841.010177] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242845.234101] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242853.169905] libceph: mon2 192.168.14.152:6789 socket closed (con state CONNECTING) [5242870.102324] libceph: mon0 192.168.14.150:6789 socket closed (con state OPEN) [5242901.041812] libceph: mon2 192.168.14.152:6789 socket closed
[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04
Your mon container seems up and running, have you tried restarting it? You just have one mon, is that correct? Do you see anything in the logs? cephadm logs --name mon.osswrkprbe001 How long do you wait until you hit CTRL-C? There's a connection-timeout option for ceph commands, maybe try a higher timeout? ceph --connect-timeout 60 status Is the node hosting the mon showing any issues in dmesg, df -h, syslog, etc.? Regards, Eugen Zitat von Emanuel Alejandro Castelli : Hello I'm facing an issue with ceph. I cannot run any ceph command. It literally hangs. I need to hit CTRL-C to get this: ^CCluster connection interrupted or timed out This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get information from the cluster, but now there is no data to graph. Any clue? BQ_BEGIN cephadm version BQ_END BQ_BEGIN INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) BQ_END cephadm ls [ { "style": "cephadm:v1", "name": "mon.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001", "enabled": true, "state": "running", "container_id": "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T19:03:16.759730", "created": "2020-09-04T23:30:30.250336", "deployed": "2020-09-04T23:48:20.956277", "configured": "2020-09-04T23:48:22.100283" }, { "style": "cephadm:v1", "name": "mgr.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001", "enabled": true, "state": "running", "container_id": "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T20:43:38.329529", "created": "2020-09-04T23:30:31.110341", "deployed": "2020-09-04T23:47:41.604057", "configured": "2020-09-05T00:00:21.064246" } ] Thank you in advance. Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecaste...@telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph octopus centos7, containers, cephadm
I am running Nautilus on centos7. Does octopus run similar as nautilus thus: - runs on el7/centos7 - runs without containers by default - runs without cephadm by default ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Octopus
I wonder if this would be impactful, even if `nodown` were set. When a given OSD latches onto the new replication network, I would expect it to want to use it for heartbeats — but when its heartbeat peers aren’t using the replication network yet, they won’t be reachable. I also expected at least some sort of impact, I just tested it in a virtual lab environment. But besides the temporary "down" OSDs during container restart the cluster was always responsive (although there's no client traffic). I didn't even set "nodown". But all OSDs now have a new backend address and the cluster seems to be happy. Regards, Eugen Zitat von Anthony D'Atri : I wonder if this would be impactful, even if `nodown` were set. When a given OSD latches onto the new replication network, I would expect it to want to use it for heartbeats — but when its heartbeat peers aren’t using the replication network yet, they won’t be reachable. Unless something has changed since I tried this with Luminous. On Oct 20, 2020, at 12:47 AM, Eugen Block wrote: Hi, a quick search [1] shows this: ---snip--- # set new config ceph config set global cluster_network 192.168.1.0/24 # let orchestrator reconfigure the daemons ceph orch daemon reconfig mon.host1 ceph orch daemon reconfig mon.host2 ceph orch daemon reconfig mon.host3 ceph orch daemon reconfig osd.1 ceph orch daemon reconfig osd.2 ceph orch daemon reconfig osd.3 ---snip--- I haven't tried it myself though. Regards, Eugen [1] https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm Zitat von Amudhan P : Hi, I have installed Ceph Octopus cluster using cephadm with a single network now I want to add a second network and configure it as a cluster address. How do I configure ceph to use second Network as cluster network?. Amudhan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RE Re: Recommended settings for PostgreSQL
I wanted to create a few statefull containers with mysql/postgres that did not depend on local persistant storage, so I can dynamically move them around. What about using; - a 1x replicated pool and use rbd mirror, - or having postgres use 2 1x replicated pools - or upon task launch create an lvm mirror between a ceph rbd and a local drive and use local drive as a primary access device (if that is even possible with lvm) -Original Message- Cc: ceph-users@ceph.io Subject: *SPAM* [ceph-users] Re: Recommended settings for PostgreSQL Another option is to let PosgreSQL do the replication with local storage. There are great reasons for Ceph, but databases optimize for this kind of thing extremely well. With replication in hand, run snapshots to RADOS buckets for long term storage. > > Hi, > > I have an existing few RBDs. I would like to create a new RBD Image > for PostgreSQL. Do you have any suggestions for such use cases? For > example; > > Currently defaults are: > > Object size (4MB) and Stripe Unit (None) > Features: Deep flatten + Layering + Exclusive Lock + Object Map + > FastDiff > > Should I use as is or should I use 16KB of object size and different sets of features for PostgreSQL? > > Thanks, > Gencer. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Octopus
I wonder if this would be impactful, even if `nodown` were set. When a given OSD latches onto the new replication network, I would expect it to want to use it for heartbeats — but when its heartbeat peers aren’t using the replication network yet, they won’t be reachable. Unless something has changed since I tried this with Luminous. > On Oct 20, 2020, at 12:47 AM, Eugen Block wrote: > > Hi, > > a quick search [1] shows this: > > ---snip--- > # set new config > ceph config set global cluster_network 192.168.1.0/24 > > # let orchestrator reconfigure the daemons > ceph orch daemon reconfig mon.host1 > ceph orch daemon reconfig mon.host2 > ceph orch daemon reconfig mon.host3 > ceph orch daemon reconfig osd.1 > ceph orch daemon reconfig osd.2 > ceph orch daemon reconfig osd.3 > ---snip--- > > I haven't tried it myself though. > > Regards, > Eugen > > [1] > https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm > > > Zitat von Amudhan P : > >> Hi, >> >> I have installed Ceph Octopus cluster using cephadm with a single network >> now I want to add a second network and configure it as a cluster address. >> >> How do I configure ceph to use second Network as cluster network?. >> >> Amudhan >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Octopus
Hi, a quick search [1] shows this: ---snip--- # set new config ceph config set global cluster_network 192.168.1.0/24 # let orchestrator reconfigure the daemons ceph orch daemon reconfig mon.host1 ceph orch daemon reconfig mon.host2 ceph orch daemon reconfig mon.host3 ceph orch daemon reconfig osd.1 ceph orch daemon reconfig osd.2 ceph orch daemon reconfig osd.3 ---snip--- I haven't tried it myself though. Regards, Eugen [1] https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm Zitat von Amudhan P : Hi, I have installed Ceph Octopus cluster using cephadm with a single network now I want to add a second network and configure it as a cluster address. How do I configure ceph to use second Network as cluster network?. Amudhan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mon DB compaction MON_DISK_BIG
Okay, thank you very much. From: Anthony D'Atri Sent: Tuesday, October 20, 2020 9:32 AM To: Szabo, Istvan (Agoda) Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: Mon DB compaction MON_DISK_BIG Email received from outside the company. If in doubt don't click links nor open attachments! > > Hi, > > Yeah, sequentially and waited for finish, and it looks like it is still doing > something in the background because now it is 9.5GB even if it tells > compaction done. > I think the ceph tell compact initiated harder so not sure how far it will go > down, but looks promising. When I sent the email it was 13, now 9.5. Online compaction isn’t as fast as offline compaction. If you set mon_compact_on_start = true in ceph.conf the mons will compact more efficiently before joining the quorum. This means of course that they’ll take longer to start up and become active. Arguably this should > 1 osd is down long time and but that one I want to remove from the cluster > soon, all pgs are active clean. There’s an issue with at least some versions of Luminous where having down/out OSDs confounds comnpaction. If you don’t end up soon with the mon DB size you expect, try removing or replacing that OSD and I’ll bet you have better results. — aad > > mon stat same yes. > > now I fininshed the email it is 8.7Gb. > > I hope I didn't break anything and it will delete everything. > > Thank you > > From: Anthony D'Atri > Sent: Tuesday, October 20, 2020 9:13 AM > To: ceph-users@ceph.io > Cc: Szabo, Istvan (Agoda) > Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG > > Email received from outside the company. If in doubt don't click links nor > open attachments! > > > I hope you restarted those mons sequentially, waiting between each for the > quorum to return. > > Is there any recovery or pg autoscaling going on? > > Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the > same? > > — aad > >> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) >> wrote: >> >> Hi, >> >> >> I've received a warning today morning: >> >> >> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) >> >> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. >> >> I've also ran this command: >> >> ceph tell mon.`hostname -s` compact on the first node, but it wents down >> only to 13GB. >> >> >> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G total >> >> >> Anything else I can do to reduce it? >> >> >> Luminous 12.2.8 is the version. >> >> >> Thank you in advance. >> >> >> >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by copyright >> or other legal rules. If you have received it by mistake please let us know >> by reply email and delete it from your system. It is prohibited to copy this >> message or disclose its content to anyone. Any confidentiality or privilege >> is not waived or lost by any mistaken delivery or unauthorized disclosure of >> the message. All messages sent to and from Agoda may be monitored to ensure >> compliance with company policies, to protect the company's interests and to >> remove potential malware. Electronic messages may be intercepted, amended, >> lost or deleted, or contain viruses. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mon DB compaction MON_DISK_BIG
Hi, Yeah, sequentially and waited for finish, and it looks like it is still doing something in the background because now it is 9.5GB even if it tells compaction done. I think the ceph tell compact initiated harder so not sure how far it will go down, but looks promising. When I sent the email it was 13, now 9.5. 1 osd is down long time and but that one I want to remove from the cluster soon, all pgs are active clean. mon stat same yes. now I fininshed the email it is 8.7Gb. I hope I didn't break anything and it will delete everything. Thank you From: Anthony D'Atri Sent: Tuesday, October 20, 2020 9:13 AM To: ceph-users@ceph.io Cc: Szabo, Istvan (Agoda) Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG Email received from outside the company. If in doubt don't click links nor open attachments! I hope you restarted those mons sequentially, waiting between each for the quorum to return. Is there any recovery or pg autoscaling going on? Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the same? — aad > On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) > wrote: > > Hi, > > > I've received a warning today morning: > > > HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot > of disk space > MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a > lot of disk space >mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) > > It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. > > I've also ran this command: > > ceph tell mon.`hostname -s` compact on the first node, but it wents down only > to 13GB. > > > du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G total > > > Anything else I can do to reduce it? > > > Luminous 12.2.8 is the version. > > > Thank you in advance. > > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy this > message or disclose its content to anyone. Any confidentiality or privilege > is not waived or lost by any mistaken delivery or unauthorized disclosure of > the message. All messages sent to and from Agoda may be monitored to ensure > compliance with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, amended, > lost or deleted, or contain viruses. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Mon DB compaction MON_DISK_BIG
Hi, I've received a warning today morning: HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. I've also ran this command: ceph tell mon.`hostname -s` compact on the first node, but it wents down only to 13GB. du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ 13G total Anything else I can do to reduce it? Luminous 12.2.8 is the version. Thank you in advance. This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io