[ceph-users] Re: Ceph OIDC Integration

2020-10-20 Thread Pritha Srivastava
Hello,

The next Octopus release should be there in 3-4 weeks.

In Octopus, shadow users aren't created ((for federated oidc users). But we
later realised that shadow users are needed to maintain user stats, hence
the code for the same is under the process of being added as of now and
should be available in Pacific release.

We have also done away with the token introspection url in the latest code
and have switched over to offline token validation using the IDP's certs
since using token introspection url wouldn't scale well for multiple
clients.

There is a related Ceph Tech Talk here, that you can watch:
https://www.youtube.com/watch?v=Lc32meILfNI=410s

Thanks,
Pritha



On Mon, Oct 19, 2020 at 8:30 PM  wrote:

> Dear Pritha, thanks a lot for your feedback and apologies for missing your
> comment about the backporting. Would you have a rough estimate on the next
> Octopus release by any chance?
>
> On another note on the same subject, would you be able to give us some
> feedback on how the users will be created in Ceph? (for example when we
> used ldap, an ldap user used to be created in Ceph for "mapping", will it
> be the same in this case)
>
> If we have multiple tenants (unique usernames "emails" in KeyCloak) how
> will the introspect url's be defined for different tenants?
>
> Thanks in advance
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-20 Thread Frank Schilder
Dear Michael,

> > Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an 
> > OSD mapping?

I meant here with crush rule replicated_host_nvme. Sorry, forgot.


> Yes, the OSD was still out when the previous health report was created.

Hmm, this is odd. If this is correct, then it did report a slow op even though 
it was out of the cluster:

> from https://pastebin.com/3G3ij9ui:
> [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons 
> [osd.0,osd.41] have slow ops.

Not sure what to make of that. It looks almost like you have a ghost osd.41.


I think (some of) the slow ops you are seeing are directed to the 
health_metrics pool and can be ignored. If it is too annoying, you could try to 
find out who runs the client with IDs client.7524484 and disable it. Might be 
an MGR module.


Looking at the data you provided and also some older threads of yours 
(https://www.mail-archive.com/ceph-users@ceph.io/msg05842.html), I start 
considering that we are looking at the fall-out of a past admin operation. A 
possibility is, that an upmap for PG 1.0 exists that conflicts with the crush 
rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 
1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. 
This result is an empty set.

I couldn't really find a simple command to list up-maps. The only 
non-destructive way seems to be to extract the osdmap and create a clean-up 
command file. The cleanup file should contain a command for every PG with an 
upmap. To check this, you can execute (see also 
https://docs.ceph.com/en/latest/man/8/osdmaptool/)

  # ceph osd getmap > osd.map
  # osdmaptool osd.map --upmap-cleanup cleanup.cmd

If you do this, could you please post as usual the contents of cleanup.cmd?

Also, with the OSD map of your cluster, you can simulate certain admin 
operations and check resulting PG mappings for pools and other things without 
having to touch the cluster; see 
https://docs.ceph.com/en/latest/man/8/osdmaptool/.


To dig a little bit deeper, could you please post as usual the output of:

- ceph pg 1.0 query
- ceph pg 7.39d query

It would also be helpful if you could post the decoded crush map. You can get 
the map as a txt-file as follows:

  # ceph osd getcrushmap -o crush-orig.bin
  # crushtool -d crush-orig.bin -o crush.txt

and post the contents of file crush.txt.


Did the slow MDS request complete by now?

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

Contents of previous messages removed.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Huge RAM Ussage on OSD recovery

2020-10-20 Thread Ing . Luis Felipe Domínguez Vega
Hi, today mi Infra provider has a blackout, then the Ceph was try to 
recover but are in an inconsistent state because many OSD can recover 
itself because the kernel kill it by OOM. Even now one OSD that was OK, 
go down by OOM killed.


Even in a server with 32GB RAM the OSD use ALL that and never recover, i 
think that can be a memory leak, ceph version octopus 15.2.3


In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is down 
because that.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-20 Thread Michael Thomas

On 10/20/20 1:18 PM, Frank Schilder wrote:

Dear Michael,


Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD 
mapping?


I meant here with crush rule replicated_host_nvme. Sorry, forgot.


Seems to have worked fine:

https://pastebin.com/PFgDE4J1


Yes, the OSD was still out when the previous health report was created.


Hmm, this is odd. If this is correct, then it did report a slow op even though 
it was out of the cluster:


from https://pastebin.com/3G3ij9ui:
[WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons 
[osd.0,osd.41] have slow ops.


Not sure what to make of that. It looks almost like you have a ghost osd.41.


I think (some of) the slow ops you are seeing are directed to the 
health_metrics pool and can be ignored. If it is too annoying, you could try to 
find out who runs the client with IDs client.7524484 and disable it. Might be 
an MGR module.


I'm also pretty certain that the slow ops are related to the health 
metrics pool, which is why I've been ignoring them.


What I'm not sure about is whether re-creating the device_health_metrics 
pool will cause any problems in the ceph cluster.



Looking at the data you provided and also some older threads of yours 
(https://www.mail-archive.com/ceph-users@ceph.io/msg05842.html), I start 
considering that we are looking at the fall-out of a past admin operation. A 
possibility is, that an upmap for PG 1.0 exists that conflicts with the crush 
rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 
1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. 
This result is an empty set.


So var I've been unable to locate the client with the ID 7524484.  It's 
not showing up in the manager dashboard -> Filesystems page, nor in the 
output of 'ceph tell mds.ceph1 client ls'.


I'm digging through the compress logs for the past week to see if I can 
find the culprit.



I couldn't really find a simple command to list up-maps. The only 
non-destructive way seems to be to extract the osdmap and create a clean-up 
command file. The cleanup file should contain a command for every PG with an 
upmap. To check this, you can execute (see also 
https://docs.ceph.com/en/latest/man/8/osdmaptool/)

   # ceph osd getmap > osd.map
   # osdmaptool osd.map --upmap-cleanup cleanup.cmd

If you do this, could you please post as usual the contents of cleanup.cmd?


It was empty:

[root@ceph1 ~]# ceph osd getmap > osd.map
got osdmap epoch 52833

[root@ceph1 ~]# osdmaptool osd.map --upmap-cleanup cleanup.cmd
osdmaptool: osdmap file 'osd.map'
writing upmap command output to: cleanup.cmd
checking for upmap cleanups

[root@ceph1 ~]# wc cleanup.cmd
0 0 0 cleanup.cmd


Also, with the OSD map of your cluster, you can simulate certain admin 
operations and check resulting PG mappings for pools and other things without 
having to touch the cluster; see 
https://docs.ceph.com/en/latest/man/8/osdmaptool/.


To dig a little bit deeper, could you please post as usual the output of:

- ceph pg 1.0 query
- ceph pg 7.39d query


Oddly, it claims that it doesn't have pgid 1.0.

https://pastebin.com/pHh33Dq7


It would also be helpful if you could post the decoded crush map. You can get 
the map as a txt-file as follows:

   # ceph osd getcrushmap -o crush-orig.bin
   # crushtool -d crush-orig.bin -o crush.txt

and post the contents of file crush.txt.


https://pastebin.com/EtEGpWy3


Did the slow MDS request complete by now?


Nope.

--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v14.2.12 Nautilus released

2020-10-20 Thread David Galloway
This is the 12th backport release in the Nautilus series. This release
brings a number of bugfixes across all major components of Ceph. We
recommend that all Nautilus users upgrade to this release. For a
detailed release notes with links & changelog please
refer to the official blog entry at
https://ceph.io/releases/v14-2-12-nautilus-released


Notable Changes
---
* The `ceph df` command now lists the number of pgs in each pool.

* Monitors now have a config option `mon_osd_warn_num_repaired`, 10 by
default. If any OSD has repaired more than this many I/O errors in
stored data a `OSD_TOO_MANY_REPAIRS` health warning is generated. In
order to allow clearing of the warning, a new command `ceph tell osd.#
clear_shards_repaired [count]` has been added. By default it will set
the repair count to 0. If you wanted to be warned again if additional
repairs are performed you can provide a value to the command and specify
the value of `mon_osd_warn_num_repaired`. This command will be replaced
in future releases by the health mute/unmute feature.

* It is now possible to specify the initial monitor to contact for Ceph
tools and daemons using the `mon_host_override` config option or
`--mon-host-override ` command-line switch. This generally should
only be used for debugging and only affects initial communication with
Ceph’s monitor cluster.


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.12.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 2f3caa3b8b3d5c5f2719a1e9d8e7deea5ae1a5c6
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Emanuel Alejandro Castelli
Hello Eugen,

Rebooted the other two MONs fixed the problem

root@osswrkprbe001:~# ceph status
  cluster:
id: 56820176-ae5b-4e58-84a2-442b2fc03e6d
health: HEALTH_OK
 
  services:
mon: 3 daemons, quorum osswrkprbe001,osswrkprbe002,osswrkprbe003 (age 3m)
mgr: osswrkprbe002(active, since 6m), standbys: osswrkprbe001, osswrkprbe003
osd: 3 osds: 3 up, 3 in
 
  data:
pools:   6 pools, 162 pgs
objects: 16.27k objects, 58 GiB
usage:   173 GiB used, 2.0 TiB / 2.2 TiB avail
pgs: 162 active+clean
 
  io:
client:   341 B/s rd, 473 KiB/s wr, 0 op/s rd, 43 op/s wr


Saludos, 



EMANUEL CASTELLI 

Arquitecto de Información - Gerencia OSS 

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | 
ecaste...@telecentro.net.ar 

Lavardén 157 1er piso. CABA (C1437FBC)

- Original Message -
From: "Emanuel Alejandro Castelli" 
To: "Eugen Block" 
Cc: "ceph-users" 
Sent: Tuesday, October 20, 2020 10:29:05 AM
Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 
16.04

And the same for MON3

[5243018.443159] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243033.801504] libceph: mon2 192.168.14.152:6789 socket error on write
[5243034.473450] libceph: mon2 192.168.14.152:6789 socket error on write
[5243035.497397] libceph: mon2 192.168.14.152:6789 socket error on write
[5243037.481225] libceph: mon2 192.168.14.152:6789 socket error on write
[5243041.480864] libceph: mon2 192.168.14.152:6789 socket error on write
[5243049.672236] libceph: mon2 192.168.14.152:6789 socket error on write
[5243064.519492] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243065.479388] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243066.471478] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243068.455281] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243072.454806] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243080.646202] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243095.236682] libceph: mon2 192.168.14.152:6789 socket error on write
[5243096.484606] libceph: mon2 192.168.14.152:6789 socket error on write
[5243097.476518] libceph: mon2 192.168.14.152:6789 socket error on write
[5243099.492380] libceph: mon2 192.168.14.152:6789 socket error on write
[5243103.684014] libceph: mon2 192.168.14.152:6789 socket error on write
[5243111.619439] libceph: mon2 192.168.14.152:6789 socket error on write


Saludos, 



EMANUEL CASTELLI 

Arquitecto de Información - Gerencia OSS 

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | 
ecaste...@telecentro.net.ar 

Lavardén 157 1er piso. CABA (C1437FBC)

- Original Message -
From: "Emanuel Alejandro Castelli" 
To: "Eugen Block" 
Cc: "ceph-users" 
Sent: Tuesday, October 20, 2020 10:27:15 AM
Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 
16.04

From MON1, dmesg I get this:

[3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state 
OPEN)
[3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state 

[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Emanuel Alejandro Castelli
And the same for MON3

[5243018.443159] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243033.801504] libceph: mon2 192.168.14.152:6789 socket error on write
[5243034.473450] libceph: mon2 192.168.14.152:6789 socket error on write
[5243035.497397] libceph: mon2 192.168.14.152:6789 socket error on write
[5243037.481225] libceph: mon2 192.168.14.152:6789 socket error on write
[5243041.480864] libceph: mon2 192.168.14.152:6789 socket error on write
[5243049.672236] libceph: mon2 192.168.14.152:6789 socket error on write
[5243064.519492] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243065.479388] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243066.471478] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243068.455281] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243072.454806] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243080.646202] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[5243095.236682] libceph: mon2 192.168.14.152:6789 socket error on write
[5243096.484606] libceph: mon2 192.168.14.152:6789 socket error on write
[5243097.476518] libceph: mon2 192.168.14.152:6789 socket error on write
[5243099.492380] libceph: mon2 192.168.14.152:6789 socket error on write
[5243103.684014] libceph: mon2 192.168.14.152:6789 socket error on write
[5243111.619439] libceph: mon2 192.168.14.152:6789 socket error on write


Saludos, 



EMANUEL CASTELLI 

Arquitecto de Información - Gerencia OSS 

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | 
ecaste...@telecentro.net.ar 

Lavardén 157 1er piso. CABA (C1437FBC)

- Original Message -
From: "Emanuel Alejandro Castelli" 
To: "Eugen Block" 
Cc: "ceph-users" 
Sent: Tuesday, October 20, 2020 10:27:15 AM
Subject: Re: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 
16.04

From MON1, dmesg I get this:

[3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state 
OPEN)
[3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)

Butfrom MON2 I get this:

[5242753.074620] libceph: mon2 

[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Emanuel Alejandro Castelli
From MON1, dmesg I get this:

[3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed (con state 
OPEN)
[3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed (con state 
CONNECTING)
[3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)

Butfrom MON2 I get this:

[5242753.074620] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242761.266727] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242779.959468] libceph: mon0 192.168.14.150:6789 socket closed (con state 
OPEN)
[5242806.834049] libceph: mon1 192.168.14.151:6789 socket error on write
[5242808.049952] libceph: mon1 192.168.14.151:6789 socket error on write
[5242809.041947] libceph: mon1 192.168.14.151:6789 socket error on write
[5242811.057917] libceph: mon1 192.168.14.151:6789 socket error on write
[5242815.285867] libceph: mon1 192.168.14.151:6789 socket error on write
[5242824.241921] libceph: mon1 192.168.14.151:6789 socket error on write
[5242837.554174] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242838.034339] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242839.026139] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242841.010177] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242845.234101] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242853.169905] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242870.102324] libceph: mon0 192.168.14.150:6789 socket closed (con state 
OPEN)
[5242901.041812] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242902.033763] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242903.026350] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242905.009497] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242909.233740] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242917.169724] libceph: mon2 192.168.14.152:6789 socket closed (con state 
CONNECTING)
[5242931.761103] libceph: mon1 192.168.14.151:6789 socket error on write
[5242932.049095] libceph: mon1 192.168.14.151:6789 socket error on write

[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Emanuel Alejandro Castelli
I have 3 MON, I don't know why it's showing only one. 

root@osswrkprbe001:~# ceph --connect-timeout 60 status
Cluster connection interrupted or timed out

cephadm logs --name mon.osswrkprbe001 --> Is there any way to go to a specific 
date? Because it stars from Oct 4. I want to check from Oct 16 and ahead. I 
suspect that something happened that day.

Also, I don't know how to troubleshoot this. I did the same (./cephadm logs 
--name mon.osswrkprbe002) in the second MON but it starts the logs from Sep 30. 
I would need to check Oct 16 also.

I would appreciate if you can help me with the troubleshooting.

Thank you.

Saludos, 



EMANUEL CASTELLI 

Arquitecto de Información - Gerencia OSS 

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | 
ecaste...@telecentro.net.ar 

Lavardén 157 1er piso. CABA (C1437FBC)

- Original Message -
From: "Eugen Block" 
To: "ceph-users" 
Sent: Tuesday, October 20, 2020 10:02:35 AM
Subject: [ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

Your mon container seems up and running, have you tried restarting it?  
You just have one mon, is that correct? Do you see anything in the logs?

cephadm logs --name mon.osswrkprbe001

How long do you wait until you hit CTRL-C? There's a  
connection-timeout option for ceph commands, maybe try a higher timeout?

ceph --connect-timeout 60 status

Is the node hosting the mon showing any issues in dmesg, df -h, syslog, etc.?

Regards,
Eugen


Zitat von Emanuel Alejandro Castelli :

> Hello
>
>
> I'm facing an issue with ceph. I cannot run any ceph command. It  
> literally hangs. I need to hit CTRL-C to get this:
>
>
>
>
> ^CCluster connection interrupted or timed out
>
>
>
>
> This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get  
> information from the cluster, but now there is no data to graph. Any  
> clue?
>
>
> BQ_BEGIN
>
>
> cephadm version
> BQ_END
>
> BQ_BEGIN
>
>
> INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version  
> 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
> BQ_END
>
> cephadm ls
> [
> {
> "style": "cephadm:v1",
> "name": "mon.osswrkprbe001",
> "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
> "systemd_unit":  
> "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001",
> "enabled": true,
> "state": "running",
> "container_id":  
> "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf",
> "container_image_name": "docker.io/ceph/ceph:v15.2.1",
> "container_image_id":  
> "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
> "version": "15.2.1",
> "started": "2020-10-19T19:03:16.759730",
> "created": "2020-09-04T23:30:30.250336",
> "deployed": "2020-09-04T23:48:20.956277",
> "configured": "2020-09-04T23:48:22.100283"
> },
> {
> "style": "cephadm:v1",
> "name": "mgr.osswrkprbe001",
> "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
> "systemd_unit":  
> "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001",
> "enabled": true,
> "state": "running",
> "container_id":  
> "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8",
> "container_image_name": "docker.io/ceph/ceph:v15.2.1",
> "container_image_id":  
> "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
> "version": "15.2.1",
> "started": "2020-10-19T20:43:38.329529",
> "created": "2020-09-04T23:30:31.110341",
> "deployed": "2020-09-04T23:47:41.604057",
> "configured": "2020-09-05T00:00:21.064246"
> }
> ]
>
>
> Thank you in advance.
>
>
> Saludos,
>
>
>
> EMANUEL CASTELLI
>
> Arquitecto de Información - Gerencia OSS
>
> C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 |  
> ecaste...@telecentro.net.ar
>
> Lavardén 157 1er piso. CABA (C1437FBC)
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Emanuel Alejandro Castelli



Hello 


I'm facing an issue with ceph. I cannot run any ceph command. It literally 
hangs. I need to hit CTRL-C to get this: 




^CCluster connection interrupted or timed out 




This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get 
information from the cluster, but now there is no data to graph. Any clue? 


BQ_BEGIN


cephadm version 
BQ_END

BQ_BEGIN


INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version 15.2.4 
(7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) 
BQ_END

cephadm ls
[
{
"style": "cephadm:v1",
"name": "mon.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": 
"ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": 
"afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": 
"bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T19:03:16.759730",
"created": "2020-09-04T23:30:30.250336",
"deployed": "2020-09-04T23:48:20.956277",
"configured": "2020-09-04T23:48:22.100283"
},
{
"style": "cephadm:v1",
"name": "mgr.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": 
"ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": 
"1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": 
"bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T20:43:38.329529",
"created": "2020-09-04T23:30:31.110341",
"deployed": "2020-09-04T23:47:41.604057",
"configured": "2020-09-05T00:00:21.064246"
}
] 


Thank you in advance. 


Saludos, 



EMANUEL CASTELLI 

Arquitecto de Información - Gerencia OSS 

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | 
ecaste...@telecentro.net.ar 

Lavardén 157 1er piso. CABA (C1437FBC) 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pool pgp_num not updated

2020-10-20 Thread Mac Wynkoop
OK, so for interventions, I've pushed these configs out:

ceph config set mon.* target_max_misplaced_ratio 0.05  > 0.20
ceph config get osd.* osd_max_backfills 1 > 4
ceph config set osd.* osd_recovery_max_active 1 > 4

And also ran injectargs to push the changes to the OSDs hot. I'll monitor
it for a bit to see how it reacts to the more aggressive settings.

Thanks,

Mac Wynkoop





On Tue, Oct 20, 2020 at 8:52 AM Eugen Block  wrote:

> The default for max misplaced objects is this (5%):
>
> ceph-node1:~ # ceph config get mon target_max_misplaced_ratio
> 0.05
>
> You can increase this for the splitting process but I would recommend
> to rollback as soon as the splitting has finished.
>
>
> Zitat von Lindsay Mathieson :
>
> > On 20/10/2020 11:38 pm, Mac Wynkoop wrote:
> >> Autoscaler isn't on, what part of Ceph is handling the increase of
> pgp_num?
> >> Because I'd like to turn up the rate at which it splits the PG's, but if
> >> autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?
> >
> > Normal recovery ops I imagine - Bump up the recovery settings, Max
> > Backfills and Recovery Max Active
> >
> > --
> > Lindsay
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pool pgp_num not updated

2020-10-20 Thread Eugen Block

The default for max misplaced objects is this (5%):

ceph-node1:~ # ceph config get mon target_max_misplaced_ratio
0.05

You can increase this for the splitting process but I would recommend  
to rollback as soon as the splitting has finished.



Zitat von Lindsay Mathieson :


On 20/10/2020 11:38 pm, Mac Wynkoop wrote:

Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num?
Because I'd like to turn up the rate at which it splits the PG's, but if
autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?


Normal recovery ops I imagine - Bump up the recovery settings, Max  
Backfills and Recovery Max Active


--
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pool pgp_num not updated

2020-10-20 Thread Lindsay Mathieson

On 20/10/2020 11:38 pm, Mac Wynkoop wrote:

Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num?
Because I'd like to turn up the rate at which it splits the PG's, but if
autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?


Normal recovery ops I imagine - Bump up the recovery settings, Max 
Backfills and Recovery Max Active


--
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pool pgp_num not updated

2020-10-20 Thread Mac Wynkoop
Alrighty, so we're all recovered and balanced at this point, but I'm not
seeing this behavior:


*pool 40 'hou-ec-1.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2
object_hash rjenkins pg_num 2048 pgp_num 1109 pgp_num_target 2048
last_change 8654141 lfor 0/0/8445757 flags
hashpspool,ec_overwrites,nodelete stripe_width 24576 fast_read 1
application rgw*
I don't have autoscaler enabled for the cluster, or this pool, but the
pgp_num is slowly incrementing up to the pgp_num_target value. If
Autoscaler isn't on, what part of Ceph is handling the increase of pgp_num?
Because I'd like to turn up the rate at which it splits the PG's, but if
autoscaler isn't doing it, I'd have no clue what to adjust. Any ideas?

Thanks,
Mac Wynkoop





On Thu, Oct 8, 2020 at 8:16 AM Mac Wynkoop  wrote:

> OK, great. We'll keep tabs on it for now then and try again once we're
> fully rebalanced.
> Mac Wynkoop, Senior Datacenter Engineer
> *NetDepot.com:* Cloud Servers; Delivered
> Houston | Atlanta | NYC | Colorado Springs
>
> 1-844-25-CLOUD Ext 806
>
>
>
>
> On Thu, Oct 8, 2020 at 2:08 AM Eugen Block  wrote:
>
>> Yes, after your cluster has recovered you'll be able to increase
>> pgp_num. Or your change will be applied automatically since you
>> already set it, I'm not sure but you'll see.
>>
>>
>> Zitat von Mac Wynkoop :
>>
>> > Well, backfilling sure, but will it allow me to actually change the
>> pgp_num
>> > as more space frees up? Because the issue is that I cannot modify that
>> > value.
>> >
>> > Thanks,
>> > Mac Wynkoop, Senior Datacenter Engineer
>> > *NetDepot.com:* Cloud Servers; Delivered
>> > Houston | Atlanta | NYC | Colorado Springs
>> >
>> > 1-844-25-CLOUD Ext 806
>> >
>> >
>> >
>> >
>> > On Wed, Oct 7, 2020 at 1:50 PM Eugen Block  wrote:
>> >
>> >> Yes, I think that’s exactly the reason. As soon as the cluster has
>> >> more space the backfill will continue.
>> >>
>> >>
>> >> Zitat von Mac Wynkoop :
>> >>
>> >> > The cluster is currently in a warn state, here's the scrubbed output
>> of
>> >> > ceph -s:
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > *cluster:id: *redacted*health: HEALTH_WARN
>> >> > noscrub,nodeep-scrub flag(s) set22 nearfull osd(s)
>> >>   2
>> >> > pool(s) nearfullLow space hindering backfill (add
>> storage if
>> >> > this doesn't resolve itself): 277 pgs backfill_toofull
>> >> Degraded
>> >> > data redundancy: 32652738/3651947772 objects degraded (0.894%), 281
>> pgs
>> >> > degraded, 341 pgs undersized1214 pgs not deep-scrubbed in
>> >> time
>> >> >   2647 pgs not scrubbed in time2 daemons have
>> >> recently
>> >> > crashed   services:mon: 5 daemons, *redacted* (age 44h)
>> >> mgr:
>> >> > *redacted*osd: 162 osds: 162 up (since 44h), 162
>> in
>> >> > (since 4d); 971 remapped pgs flags
>> noscrub,nodeep-scrub
>> >> > rgw: 3 daemons active *redacted*tcmu-runner: 18 daemons
>> >> active
>> >> > *redacted*   data:pools:   10 pools, 2648 pgsobjects: 409.56M
>> >> > objects, 738 TiBusage:   1.3 PiB used, 580 TiB / 1.8 PiB avail
>> >> pgs:
>> >> > 32652738/3651947772 objects degraded (0.894%)
>> >> >  517370913/3651947772 objects misplaced (14.167%) 1677
>> >> > active+clean 477  active+remapped+backfill_wait
>> >>  100
>> >> >  active+remapped+backfill_wait+backfill_toofull 80
>> >> > active+undersized+degraded+remapped+backfill_wait 60
>> >> > active+undersized+degraded+remapped+backfill_wait+backfill_toofull
>> >> >42   active+undersized+degraded+remapped+backfill_toofull
>> >>  33
>> >> >   active+undersized+degraded+remapped+backfilling 25
>> >> > active+remapped+backfilling 25
>> >> > active+remapped+backfill_toofull 24
>> >> > active+undersized+remapped+backfilling 23
>> >> > active+forced_recovery+undersized+degraded+remapped+backfill_wait
>> >> >19
>> >> >
>> >>
>> active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull
>> >> >15   active+undersized+remapped+backfill_wait
>>  14
>> >> > active+undersized+remapped+backfill_wait+backfill_toofull
>>  12
>> >> > active+forced_recovery+undersized+degraded+remapped+backfill_toofull
>> >> >  12
>>  active+forced_recovery+undersized+degraded+remapped+backfilling
>> >> >5active+undersized+remapped+backfill_toofull
>>3
>> >> >  active+remapped 1active+undersized+remapped
>> >>  1
>> >> >active+forced_recovery+undersized+remapped+backfilling   io:
>> >> 

[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Eugen Block
Your 'cephadm ls' output was only from one node, I assumed you just  
bootstrapped the first node.


The 'cephadm logs' command should provide pager-output so you can  
scroll or search for a specific date.


I'm not sure what caused this but "error on write" is bad. As I  
already wrote check the filesystems on your nodes, dmesg etc. It seems  
as if two of your MONs are down which would make your cluster  
unavailable (no quorum). Is mon3 up and running? Bringing back one of  
the other two MONs would bring the cluster back up.



Zitat von Emanuel Alejandro Castelli :


From MON1, dmesg I get this:

[3348025.306195] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348033.241973] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348048.089325] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348049.209243] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348050.201209] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348052.185167] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348056.280992] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348064.216703] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348078.808431] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348079.192418] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348080.220345] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348082.232299] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348086.232103] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348094.167722] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348110.411216] libceph: mon0 192.168.14.150:6789 socket closed  
(con state OPEN)
[3348140.245900] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348141.173884] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348142.229859] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348144.213777] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348148.437674] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348157.397327] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348170.965496] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348172.213118] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348173.205087] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348175.188934] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348179.412719] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348187.348441] libceph: mon1 192.168.14.151:6789 socket closed  
(con state CONNECTING)
[3348201.683707] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348202.195745] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348203.187654] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348205.175585] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348209.363409] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[3348217.299298] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)


Butfrom MON2 I get this:

[5242753.074620] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242761.266727] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242779.959468] libceph: mon0 192.168.14.150:6789 socket closed  
(con state OPEN)

[5242806.834049] libceph: mon1 192.168.14.151:6789 socket error on write
[5242808.049952] libceph: mon1 192.168.14.151:6789 socket error on write
[5242809.041947] libceph: mon1 192.168.14.151:6789 socket error on write
[5242811.057917] libceph: mon1 192.168.14.151:6789 socket error on write
[5242815.285867] libceph: mon1 192.168.14.151:6789 socket error on write
[5242824.241921] libceph: mon1 192.168.14.151:6789 socket error on write
[5242837.554174] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242838.034339] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242839.026139] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242841.010177] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242845.234101] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242853.169905] libceph: mon2 192.168.14.152:6789 socket closed  
(con state CONNECTING)
[5242870.102324] libceph: mon0 192.168.14.150:6789 socket closed  
(con state OPEN)
[5242901.041812] libceph: mon2 192.168.14.152:6789 socket closed  

[ceph-users] Re: Problems with ceph command - Octupus - Ubuntu 16.04

2020-10-20 Thread Eugen Block
Your mon container seems up and running, have you tried restarting it?  
You just have one mon, is that correct? Do you see anything in the logs?


cephadm logs --name mon.osswrkprbe001

How long do you wait until you hit CTRL-C? There's a  
connection-timeout option for ceph commands, maybe try a higher timeout?


ceph --connect-timeout 60 status

Is the node hosting the mon showing any issues in dmesg, df -h, syslog, etc.?

Regards,
Eugen


Zitat von Emanuel Alejandro Castelli :


Hello


I'm facing an issue with ceph. I cannot run any ceph command. It  
literally hangs. I need to hit CTRL-C to get this:





^CCluster connection interrupted or timed out




This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get  
information from the cluster, but now there is no data to graph. Any  
clue?



BQ_BEGIN


cephadm version
BQ_END

BQ_BEGIN


INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version  
15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)

BQ_END

cephadm ls
[
{
"style": "cephadm:v1",
"name": "mon.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit":  
"ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mon.osswrkprbe001",

"enabled": true,
"state": "running",
"container_id":  
"afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf",

"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id":  
"bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",

"version": "15.2.1",
"started": "2020-10-19T19:03:16.759730",
"created": "2020-09-04T23:30:30.250336",
"deployed": "2020-09-04T23:48:20.956277",
"configured": "2020-09-04T23:48:22.100283"
},
{
"style": "cephadm:v1",
"name": "mgr.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit":  
"ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d@mgr.osswrkprbe001",

"enabled": true,
"state": "running",
"container_id":  
"1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8",

"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id":  
"bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",

"version": "15.2.1",
"started": "2020-10-19T20:43:38.329529",
"created": "2020-09-04T23:30:31.110341",
"deployed": "2020-09-04T23:47:41.604057",
"configured": "2020-09-05T00:00:21.064246"
}
]


Thank you in advance.


Saludos,



EMANUEL CASTELLI

Arquitecto de Información - Gerencia OSS

C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 |  
ecaste...@telecentro.net.ar


Lavardén 157 1er piso. CABA (C1437FBC)


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph octopus centos7, containers, cephadm

2020-10-20 Thread Marc Roos


I am running Nautilus on centos7. Does octopus run similar as nautilus 
thus:

- runs on el7/centos7
- runs without containers by default
- runs without cephadm by default



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus

2020-10-20 Thread Eugen Block
I wonder if this would be impactful, even if  `nodown` were set.   
When a given OSD latches onto
the new replication network, I would expect it to want to use it for  
heartbeats — but when
its heartbeat peers aren’t using the replication network yet, they  
won’t be reachable.


I also expected at least some sort of impact, I just tested it in a  
virtual lab environment. But besides the temporary "down" OSDs during  
container restart the cluster was always responsive (although there's  
no client traffic). I didn't even set "nodown". But all OSDs now have  
a new backend address and the cluster seems to be happy.


Regards,
Eugen


Zitat von Anthony D'Atri :

I wonder if this would be impactful, even if  `nodown` were set.   
When a given OSD latches onto
the new replication network, I would expect it to want to use it for  
heartbeats — but when
its heartbeat peers aren’t using the replication network yet, they  
won’t be reachable.


Unless something has changed since I tried this with Luminous.


On Oct 20, 2020, at 12:47 AM, Eugen Block  wrote:

Hi,

a quick search [1] shows this:

---snip---
# set new config
ceph config set global cluster_network 192.168.1.0/24

# let orchestrator reconfigure the daemons
ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3
---snip---

I haven't tried it myself though.

Regards,
Eugen

[1]  
https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm



Zitat von Amudhan P :


Hi,

I have installed Ceph Octopus cluster using cephadm with a single network
now I want to add a second network and configure it as a cluster address.

How do I configure ceph to use second Network as cluster network?.

Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RE Re: Recommended settings for PostgreSQL

2020-10-20 Thread Marc Roos


I wanted to create a few statefull containers with mysql/postgres that 
did not depend on local persistant storage, so I can dynamically move 
them around. What about using;

- a 1x replicated pool and use rbd mirror, 
- or having postgres use 2 1x replicated pools
- or upon task launch create an lvm mirror between a ceph rbd and a 
local drive and use local drive as a primary access device (if that is 
even possible with lvm)



 

-Original Message-
Cc: ceph-users@ceph.io
Subject: *SPAM* [ceph-users] Re: Recommended settings for 
PostgreSQL

Another option is to let PosgreSQL do the replication with local 
storage. There are great reasons for Ceph, but databases optimize for 
this kind of thing extremely well. 

With replication in hand, run snapshots to RADOS buckets for long term 
storage.

> 
> Hi,
> 
> I have an existing few RBDs. I would like to create a new RBD Image 
> for PostgreSQL. Do you have any suggestions for such use cases? For 
> example;
> 
> Currently defaults are:
> 
> Object size (4MB) and Stripe Unit (None)
> Features: Deep flatten + Layering + Exclusive Lock + Object Map + 
> FastDiff
> 
> Should I use as is or should I use 16KB of object size and different 
sets of features for PostgreSQL?
> 
> Thanks,
> Gencer.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus

2020-10-20 Thread Anthony D'Atri
I wonder if this would be impactful, even if  `nodown` were set.  When a given 
OSD latches onto
the new replication network, I would expect it to want to use it for heartbeats 
— but when
its heartbeat peers aren’t using the replication network yet, they won’t be 
reachable.

Unless something has changed since I tried this with Luminous.

> On Oct 20, 2020, at 12:47 AM, Eugen Block  wrote:
> 
> Hi,
> 
> a quick search [1] shows this:
> 
> ---snip---
> # set new config
> ceph config set global cluster_network 192.168.1.0/24
> 
> # let orchestrator reconfigure the daemons
> ceph orch daemon reconfig mon.host1
> ceph orch daemon reconfig mon.host2
> ceph orch daemon reconfig mon.host3
> ceph orch daemon reconfig osd.1
> ceph orch daemon reconfig osd.2
> ceph orch daemon reconfig osd.3
> ---snip---
> 
> I haven't tried it myself though.
> 
> Regards,
> Eugen
> 
> [1] 
> https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm
> 
> 
> Zitat von Amudhan P :
> 
>> Hi,
>> 
>> I have installed Ceph Octopus cluster using cephadm with a single network
>> now I want to add a second network and configure it as a cluster address.
>> 
>> How do I configure ceph to use second Network as cluster network?.
>> 
>> Amudhan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus

2020-10-20 Thread Eugen Block

Hi,

a quick search [1] shows this:

---snip---
# set new config
ceph config set global cluster_network 192.168.1.0/24

# let orchestrator reconfigure the daemons
ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3
---snip---

I haven't tried it myself though.

Regards,
Eugen

[1]  
https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm



Zitat von Amudhan P :


Hi,

I have installed Ceph Octopus cluster using cephadm with a single network
now I want to add a second network and configure it as a cluster address.

How do I configure ceph to use second Network as cluster network?.

Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon DB compaction MON_DISK_BIG

2020-10-20 Thread Szabo, Istvan (Agoda)
Okay, thank you very much.

From: Anthony D'Atri 
Sent: Tuesday, October 20, 2020 9:32 AM
To: Szabo, Istvan (Agoda)
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Mon DB compaction MON_DISK_BIG

Email received from outside the company. If in doubt don't click links nor open 
attachments!


>
> Hi,
>
> Yeah, sequentially and waited for finish, and it looks like it is still doing 
> something in the background because now it is 9.5GB even if it tells 
> compaction done.
> I think the ceph tell compact initiated harder so not sure how far it will go 
> down, but looks promising. When I sent the email it was 13, now 9.5.

Online compaction isn’t as fast as offline compaction.  If you set 
mon_compact_on_start = true in ceph.conf the mons will compact more efficiently 
before joining the quorum.  This means of course that they’ll take longer to 
start up and become active.  Arguably this should

> 1 osd is down long time and but that one I want to remove from the cluster 
> soon, all pgs are active clean.

There’s an issue with at least some versions of Luminous where having down/out 
OSDs confounds comnpaction.  If you don’t end up soon with the mon DB size you 
expect, try removing or replacing that OSD and I’ll bet you have better results.

— aad

>
> mon stat same yes.
>
> now I fininshed the email it is 8.7Gb.
>
> I hope I didn't break anything  and it will delete everything.
>
> Thank you
> 
> From: Anthony D'Atri 
> Sent: Tuesday, October 20, 2020 9:13 AM
> To: ceph-users@ceph.io
> Cc: Szabo, Istvan (Agoda)
> Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG
>
> Email received from outside the company. If in doubt don't click links nor 
> open attachments!
> 
>
> I hope you restarted those mons sequentially, waiting between each for the 
> quorum to return.
>
> Is there any recovery or pg autoscaling going on?
>
> Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the 
> same?
>
> — aad
>
>> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda)  
>> wrote:
>>
>> Hi,
>>
>>
>> I've received a warning today morning:
>>
>>
>> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
>> lot of disk space
>> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
>> lot of disk space
>>   mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
>>   mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
>>   mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)
>>
>> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.
>>
>> I've also ran this command:
>>
>> ceph tell mon.`hostname -s` compact on the first node, but it wents down 
>> only to 13GB.
>>
>>
>> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
>> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
>> 13G total
>>
>>
>> Anything else I can do to reduce it?
>>
>>
>> Luminous 12.2.8 is the version.
>>
>>
>> Thank you in advance.
>>
>>
>> 
>> This message is confidential and is for the sole use of the intended 
>> recipient(s). It may also be privileged or otherwise protected by copyright 
>> or other legal rules. If you have received it by mistake please let us know 
>> by reply email and delete it from your system. It is prohibited to copy this 
>> message or disclose its content to anyone. Any confidentiality or privilege 
>> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
>> the message. All messages sent to and from Agoda may be monitored to ensure 
>> compliance with company policies, to protect the company's interests and to 
>> remove potential malware. Electronic messages may be intercepted, amended, 
>> lost or deleted, or contain viruses.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon DB compaction MON_DISK_BIG

2020-10-20 Thread Szabo, Istvan (Agoda)
Hi,

Yeah, sequentially and waited for finish, and it looks like it is still doing 
something in the background because now it is 9.5GB even if it tells compaction 
done.
I think the ceph tell compact initiated harder so not sure how far it will go 
down, but looks promising. When I sent the email it was 13, now 9.5.

1 osd is down long time and but that one I want to remove from the cluster 
soon, all pgs are active clean.

mon stat same yes.

now I fininshed the email it is 8.7Gb.

I hope I didn't break anything  and it will delete everything.

Thank you

From: Anthony D'Atri 
Sent: Tuesday, October 20, 2020 9:13 AM
To: ceph-users@ceph.io
Cc: Szabo, Istvan (Agoda)
Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG

Email received from outside the company. If in doubt don't click links nor open 
attachments!


I hope you restarted those mons sequentially, waiting between each for the 
quorum to return.

Is there any recovery or pg autoscaling going on?

Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the 
same?

— aad

> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda)  
> wrote:
>
> Hi,
>
>
> I've received a warning today morning:
>
>
> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot 
> of disk space
> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
> lot of disk space
>mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
>mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
>mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)
>
> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.
>
> I've also ran this command:
>
> ceph tell mon.`hostname -s` compact on the first node, but it wents down only 
> to 13GB.
>
>
> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
> 13G total
>
>
> Anything else I can do to reduce it?
>
>
> Luminous 12.2.8 is the version.
>
>
> Thank you in advance.
>
>
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mon DB compaction MON_DISK_BIG

2020-10-20 Thread Szabo, Istvan (Agoda)
Hi,


I've received a warning today morning:


HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot 
of disk space
MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot 
of disk space
mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)

It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.

I've also ran this command:

ceph tell mon.`hostname -s` compact on the first node, but it wents down only 
to 13GB.


du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
13G total


Anything else I can do to reduce it?


Luminous 12.2.8 is the version.


Thank you in advance.



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io