[ceph-users] Ceph orch commands non-responsive after mgr/mon reboots 16.2.9

2022-07-22 Thread Tim Olow
Howdy,

I seem to be facing a problem on my 16.2.9 ceph cluster.  After a staggered 
reboot of my 3 infra nodes all of ceph orch commands are hanging much like in 
this previous reported issue [1]

I have paused orch and rebuilt a manager by hand as outlined here [2], and the 
issue continues to persist.   I am unable to scale up or down of services, 
restart daemons, etc.

ceph orch ls –verbose

[{'flags': 8,
  'help': 'List services known to orchestrator',
  'module': 'mgr',
  'perm': 'r',
  'sig': [argdesc(, req=True, name=prefix, 
n=1, numseen=0, prefix=orch),
  argdesc(, req=True, name=prefix, 
n=1, numseen=0, prefix=ls),
  argdesc(, req=False, 
name=service_type, n=1, numseen=0),
  argdesc(, req=False, 
name=service_name, n=1, numseen=0),
  argdesc(, req=False, name=export, 
n=1, numseen=0),
  argdesc(, req=False, name=format, 
n=1, numseen=0, strings=plain|json|json-pretty|yaml|xml-pretty|xml),
  argdesc(, req=False, name=refresh, 
n=1, numseen=0)]}]
Submitting command:  {'prefix': 'orch ls', 'target': ('mon-mgr', '')}
submit {"prefix": "orch ls", "target": ["mon-mgr", ""]} to mon-mgr




Debug output on the manager:

debug 2022-07-22T23:27:12.509+ 7fc180230700  0 log_channel(audit) log [DBG] 
: from='client.1084220 -' entity='client.admin' cmd=[{"prefix": "orch ls", 
"target": ["mon-mgr", ""]}]: dispatch

I have collected a startup of the manager and uploaded it for review [3]


Many Thanks,

Tim


[1] https://www.spinics.net/lists/ceph-users/msg68398.html
[2] 
https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
[3] https://pastebin.com/Dvb8sEbz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-22 Thread Neha Ojha
On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
>
> On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/56484
> > Release Notes - https://github.com/ceph/ceph/pull/47198
> >
> > Seeking approvals for:
> >
> > rados - Neha, Travis, Ernesto, Adam

rados approved!
known issue https://tracker.ceph.com/issues/55854

Thanks,
Neha

>
> > rgw - Casey
> > fs, kcephfs, multimds - Venky, Patrick
> > rbd - Ilya, Deepika
> > krbd  Ilya, Deepika
>
> rbd and krbd approved.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: dashboard on Ubuntu 22.04: python3-cheroot incompatibility

2022-07-22 Thread Matthias Ferdinand
On Fri, Jul 22, 2022 at 04:54:23PM +0100, James Page wrote:
> > If I remove the version check (see below), dashboard appears to be working.
> 
> 
> https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139
> 
> I just uploaded a fix for cheroot to resolve this issue - the stable
> release update team should pick that up next week.

thank you!

Matthias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: dashboard on Ubuntu 22.04: python3-cheroot incompatibility

2022-07-22 Thread James Page
Hi Matthias


On Fri, Jul 22, 2022 at 4:50 PM Matthias Ferdinand 
wrote:

> Hi,
>
> trying to activate ceph dashboard on a 17.2.0 cluster (Ubuntu 22.04
> using standard ubuntu repos), the dashboard module crashes because it
> cannot understand the python3-cheroot version number '8.5.2+ds1':
>
> root@mceph00:~# ceph crash info
> 2022-07-22T14:44:03.226395Z_a6b006a7-10c3-443d-9ead-161e06a27bf3
> {
> "backtrace": [
> "  File \"/usr/share/ceph/mgr/dashboard/__init__.py\", line
> 52, in \nfrom .module import Module, StandbyModule  # noqa:
> F401",
> "  File \"/usr/share/ceph/mgr/dashboard/module.py\", line 49,
> in \npatch_cherrypy(cherrypy.__version__)",
> "  File
> \"/usr/share/ceph/mgr/dashboard/cherrypy_backports.py\", line 197, in
> patch_cherrypy\naccept_socket_error_0(v)",
> "  File
> \"/usr/share/ceph/mgr/dashboard/cherrypy_backports.py\", line 124, in
> accept_socket_error_0\nif v < StrictVersion(\"9.0.0\") or
> cheroot_version < StrictVersion(\"6.5.5\"):",
> "  File \"/lib/python3.10/distutils/version.py\", line 64, in
> __gt__\nc = self._cmp(other)",
> "  File \"/lib/python3.10/distutils/version.py\", line 168, in
> _cmp\nother = StrictVersion(other)",
> "  File \"/lib/python3.10/distutils/version.py\", line 40, in
> __init__\nself.parse(vstring)",
> "  File \"/lib/python3.10/distutils/version.py\", line 137, in
> parse\nraise ValueError(\"invalid version number '%s'\" % vstring)",
> =>  "ValueError: invalid version number '8.5.2+ds1'"
> ],
> "ceph_version": "17.2.0",
> "crash_id":
> "2022-07-22T14:44:03.226395Z_a6b006a7-10c3-443d-9ead-161e06a27bf3",
> "entity_name": "mgr.mceph05",
> "mgr_module": "dashboard",
> "mgr_module_caller": "PyModule::load_subclass_of",
> "mgr_python_exception": "ValueError",
> "os_id": "22.04",
> "os_name": "Ubuntu 22.04 LTS",
> "os_version": "22.04 LTS (Jammy Jellyfish)",
> "os_version_id": "22.04",
> "process_name": "ceph-mgr",
> "stack_sig":
> "3f893983e716f2a7e368895904cf3485ac7064d3294a45ea14066a1576c818e3",
> "timestamp": "2022-07-22T14:44:03.226395Z",
> "utsname_hostname": "mceph05",
> "utsname_machine": "x86_64",
> "utsname_release": "5.15.0-41-generic",
> "utsname_sysname": "Linux",
> "utsname_version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022"
> }
>
> If I remove the version check (see below), dashboard appears to be working.


https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139

I just uploaded a fix for cheroot to resolve this issue - the stable
release update team should pick that up next week.

Cheers

James
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] dashboard on Ubuntu 22.04: python3-cheroot incompatibility

2022-07-22 Thread Matthias Ferdinand
Hi,

trying to activate ceph dashboard on a 17.2.0 cluster (Ubuntu 22.04
using standard ubuntu repos), the dashboard module crashes because it
cannot understand the python3-cheroot version number '8.5.2+ds1':

root@mceph00:~# ceph crash info 
2022-07-22T14:44:03.226395Z_a6b006a7-10c3-443d-9ead-161e06a27bf3
{
"backtrace": [
"  File \"/usr/share/ceph/mgr/dashboard/__init__.py\", line 52, in 
\nfrom .module import Module, StandbyModule  # noqa: F401",
"  File \"/usr/share/ceph/mgr/dashboard/module.py\", line 49, in 
\npatch_cherrypy(cherrypy.__version__)",
"  File \"/usr/share/ceph/mgr/dashboard/cherrypy_backports.py\", 
line 197, in patch_cherrypy\naccept_socket_error_0(v)",
"  File \"/usr/share/ceph/mgr/dashboard/cherrypy_backports.py\", 
line 124, in accept_socket_error_0\nif v < StrictVersion(\"9.0.0\") or 
cheroot_version < StrictVersion(\"6.5.5\"):",
"  File \"/lib/python3.10/distutils/version.py\", line 64, in 
__gt__\nc = self._cmp(other)",
"  File \"/lib/python3.10/distutils/version.py\", line 168, in 
_cmp\nother = StrictVersion(other)",
"  File \"/lib/python3.10/distutils/version.py\", line 40, in 
__init__\nself.parse(vstring)",
"  File \"/lib/python3.10/distutils/version.py\", line 137, in 
parse\nraise ValueError(\"invalid version number '%s'\" % vstring)",
=>  "ValueError: invalid version number '8.5.2+ds1'"
],
"ceph_version": "17.2.0",
"crash_id": 
"2022-07-22T14:44:03.226395Z_a6b006a7-10c3-443d-9ead-161e06a27bf3",
"entity_name": "mgr.mceph05",
"mgr_module": "dashboard",
"mgr_module_caller": "PyModule::load_subclass_of",
"mgr_python_exception": "ValueError",
"os_id": "22.04",
"os_name": "Ubuntu 22.04 LTS",
"os_version": "22.04 LTS (Jammy Jellyfish)",
"os_version_id": "22.04",
"process_name": "ceph-mgr",
"stack_sig": 
"3f893983e716f2a7e368895904cf3485ac7064d3294a45ea14066a1576c818e3",
"timestamp": "2022-07-22T14:44:03.226395Z",
"utsname_hostname": "mceph05",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-41-generic",
"utsname_sysname": "Linux",
"utsname_version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022"
}

If I remove the version check (see below), dashboard appears to be working.

Regards
Matthias

---

root@mceph00:~# diff -rbup 
/usr/share/ceph/mgr/dashboard/cherrypy_backports.py{.orig,}
--- /usr/share/ceph/mgr/dashboard/cherrypy_backports.py.orig2022-04-19 
00:08:27.0 +0200
+++ /usr/share/ceph/mgr/dashboard/cherrypy_backports.py 2022-07-22 
16:46:12.850768963 +0200
@@ -121,7 +121,8 @@ def accept_socket_error_0(v):
 except ImportError:
 pass

-if v < StrictVersion("9.0.0") or cheroot_version < StrictVersion("6.5.5"):
+#if v < StrictVersion("9.0.0") or cheroot_version < StrictVersion("6.5.5"):
+if v < StrictVersion("9.0.0"):
 generic_socket_error = OSError

 def accept_socket_error_0(func):

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't setup Basic Ceph Client

2022-07-22 Thread Jean-Marc FONTANA

Hello Iban,

We finally did it ! With your example, we set up a client which does 
what we need.
We only regret that the documentation of ceph auth in not a little more 
explicit, that

could have led us quicker to the solution.

Many thanks Iban, and Kai Stian Olstad too

Best regards

JM

Le 19/07/2022 à 14:12, Jean-Marc FONTANA a écrit :


Hello Iban,

Thanks for your answering ! We finally managed to connect with the 
admin keyring
and we think that is not the best practice.  We shall try your conf 
and get you advised of the result.


Best regards

JM

Le 19/07/2022 à 11:08, Iban Cabrillo a écrit :

Hi Jean,

   If you do not want to use the admin user, which is the most logical thing to 
do, you must create a client with rbd access to the pool on which you are going 
to perform the I/O actions.
For example in our case it is the user cinder:
client.cinder
key: 

caps: [mgr] allow r
caps: [mon] profile rbd
caps: [osd] profile rbd pool=vol1, profile rbd pool=vol2 . profile 
rbd pool=volx

   And the install the client keyring on the client node:

cephclient:~ # ls -la /etc/ceph/
total 28
drwxr-xr-x 2 root root 4096 Jul 18 11:37 .
drwxr-xr-x 132 root root 12288 Jul 18 11:37 ...
-rw-r--r-- 1 root root root 64 Oct 19 2017 ceph.client.cinder.keyring
-rw-r--r-- 1 root root root 2018 Jul 18 11:37 ceph.conf

In our case we have added

cat /etc/profile.d/ceph-cinder.sh
export CEPH_ARGS="--keyring /etc/ceph/ceph.client.cinder.keyring --id cinder"

so that it picks it up automatically-

cephclient:~ # rbd ls -p volumes
image01_to_remove
volume-01bbf2ee-198c-446d-80bf-f68292130f5c
volume-036865ad-6f9b-4966-b2ea-ce10bf09b6a9
volume-04445a86-a032-4731-8bff-203dfc5d02e1
..

I hope this help you.

Cheers, I



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: librbd leaks memory on crushmap updates

2022-07-22 Thread Peter Lieven
Am 21.07.22 um 17:50 schrieb Ilya Dryomov:
> On Thu, Jul 21, 2022 at 11:42 AM Peter Lieven  wrote:
>> Am 19.07.22 um 17:57 schrieb Ilya Dryomov:
>>> On Tue, Jul 19, 2022 at 5:10 PM Peter Lieven  wrote:
 Am 24.06.22 um 16:13 schrieb Peter Lieven:
> Am 23.06.22 um 12:59 schrieb Ilya Dryomov:
>> On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven  wrote:
>>> Am 22.06.22 um 15:46 schrieb Josh Baergen:
 Hey Peter,

> I found relatively large allocations in the qemu smaps and checked 
> the contents. It contained several hundred repetitions of osd and 
> pool names. We use the default builds on Ubuntu 20.04. Is there a 
> special memory allocator in place that might not clean up properly?
 I'm sure you would have noticed this and mentioned it if it was so -
 any chance the contents of these regions look like log messages of
 some kind? I recently tracked down a high client memory usage that
 looked like a leak that turned out to be a broken config option
 resulting in higher in-memory log retention:
 https://tracker.ceph.com/issues/56093. AFAICT it affects Nautilus+.
>>> Hi Josh, hi Ilya,
>>>
>>>
>>> it seems we were in fact facing 2 leaks with 14.x. Our long running VMs 
>>> with librbd 14.x have several million items in the osdmap mempool.
>>>
>>> In our testing environment with 15.x I see no unlimited increase in the 
>>> osdmap mempool (compared this to a second dev host with 14.x client 
>>> where I see the increase wiht my tests),
>>>
>>> but I still see leaking memory when I generate a lot of osdmap changes, 
>>> but this in fact seem to be log messages - thanks Josh.
>>>
>>>
>>> So I would appreciate if #56093 would be backported to Octopus before 
>>> its final release.
>> I picked up Josh's PR that was sitting there unnoticed but I'm not sure
>> it is the issue you are hitting.  I think Josh's change just resurrects
>> the behavior where clients stored only up to 500 log entries instead of
>> up to 1 (the default for daemons).  There is no memory leak there,
>> just a difference in how much memory is legitimately consumed.  The
>> usage is bounded either way.
>>
>> However in your case, the usage is slowly but constantly growing.
>> In the original post you said that it was observed both on 14.2.22 and
>> 15.2.16.  Are you saying that you are no longer seeing it in 15.x?
> After I understood whats the background of Josh issue I can confirm that 
> I still see increasing memory which is not caused
>
> by osdmap items and also not by log entries. There must be something else 
> going on.
 I still see increased memory (heap) usage. Might it be that it is just 
 heap fragmentation?
>>> Hi Peter,
>>>
>>> It could be but you never quantified the issue.  What is the actual
>>> heap usage you are seeing, how fast is it growing?  Is it specific to
>>> some particular VMs or does it affect the entire fleet?
>>
>> Hi Ilya,
>>
>>
>> I see the issue across the fleet. The memory increases about 200KB/day per 
>> attached drive.
>>
>> Same hypervisor with attached iSCSI storage - no issue.
>>
>>
>> However, the memory that is increasing is not listed as heap under 
>> /proc/{pid}/smaps.
>>
>> Does librbd use its own memory allocator?
> Hi Peter,
>
> By default, librbd uses tcmalloc.


Thats a good pointer. From what I read tcmalloc is not aggressively returning 
memory back to the OS after free.


>
>>
>> I am still testing with 15.x as I mainly have long running VMs in our 
>> production environment.
>>
>> With 14.x we had an additional issue with the ospmaps not beeing freed. That 
>> is gone with 15.x
>>
>>
>> I will try with a patched qemu that allocated the write buffers inside qemu 
>> and set disable_zero_copy_write = true.
>>
>> to see if this makes any difference.
> We are unlikely to be able to do anything about 15.x at this point so
> I'd encourage you to try 17.x.  That said, any new information would be
> helpful.


I will certainly do, but at the moment it looks like tcmalloc and heap 
fragmentation.


I am currently testing with a modified qemu that sets 
rbd_disable_zero_copy_writes to false and implements the bounce buffer 
internally.


It additionally has the benefit that we can avoid a buffer allocation for very 
small writes (e.g. up to 4k) and take the memory from the

coroutine stack.


(And it would allow for the implementation of FUA support with the existing 
librbd API, but thats future.)


Best


Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-22 Thread Jesper Lykkegaard Karlsen
It seems like a low hanging fruit to fix?
There must be a reason why the developers have not made a prioritized order of 
backfilling PGs.
Or maybe the prioritization is something else than available space?

The answer remains unanswered, as well as if my suggested approach/script would 
work or not?

Summer vacation?

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Janne Johansson 
Sendt: 20. juli 2022 19:39
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] replacing OSD nodes

Den ons 20 juli 2022 kl 11:22 skrev Jesper Lykkegaard Karlsen :
> Thanks for you answer Janne.
> Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once they 
> get too close for comfort.
>
> But I just though a continuous prioritization of rebalancing PGs, could make 
> this process more smooth, with less/no need for handheld operations.

You are absolutely right there, just wanted to chip in with my
experiences of "it nags at me but it will still work out" so other
people finding these mails later on can feel a bit relieved at knowing
that a few toofull warnings aren't a major disaster and that it
sometimes happens, because ceph looks for all possible moves, even
those who will run late in the rebalancing.

--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io