[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-05-16 Thread Adam King
At least for the current up-to-date reef branch (not sure what reef version you're on) when --image is not provided to the shell, it should try to infer the image in this order 1. from the CEPHADM_IMAGE env. variable 2. if you pass --name with a daemon name to the shell command, it will

[ceph-users] CLT meeting notes May 6th 2024

2024-05-06 Thread Adam King
- DigitalOcean credits - things to ask - what would promotional material require - how much are credits worth - Neha to ask - 19.1.0 centos9 container status - close to being ready - will be building centos 8 and 9 containers simultaneously - should test

[ceph-users] Re: ceph recipe for nfs exports

2024-04-24 Thread Adam King
> > - Although I can mount the export I can't write on it > > What error are you getting trying to do the write? The way you set things up doesn't look to different than one of our integration tests for ingress over nfs (

[ceph-users] Re: which grafana version to use with 17.2.x ceph version

2024-04-23 Thread Adam King
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana image in the quincy branch On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah wrote: > Hi, > > > in quay.io I can find a lot of grafana versions for ceph ( > https://quay.io/repository/ceph/grafana?tab=tags) how can I find

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-16 Thread Adam King
ph/ceph/pull/56714> On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote: > On behalf of @Radoslaw Zarzynski , rados approved. > > Below is the summary of the rados suite failures, divided by component. @Adam > King @Venky Shankar PTAL at the > orch and cephfs failures to se

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-14 Thread Adam King
es, still trying, Laura PTL > > rados - Radek, Laura approved? Travis? Nizamudeen? > > rgw - Casey approved? > fs - Venky approved? > orch - Adam King approved? > > krbd - Ilya approved > powercycle - seems fs related, Venky, Brad PTL > > ceph-volume - will

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
;> >> Let me just finish tucking in a devlish tyke here and i’ll get to it >> first thing >> >> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King : >> >>> I did end up writing a unit test to see what we calculated here, as well >>> as adding a bunch

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
ry_total_kb": 32827840, > > On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote: > >> Sorry to keep asking for more info, but can I also get what `cephadm >> gather-facts` on that host returns for "memory_total_kb". Might end up >> creating a unit test out o

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-04 Thread Adam King
running (3w) > 7m ago 11M2698M4096M 17.2.6 > osd.9my-ceph01 running (3w) > 7m ago 11M3364M4096M 17.2.6 > prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m > ago 13M 164M- 2.42

[ceph-users] Re: CEPHADM_HOST_CHECK_FAILED

2024-04-04 Thread Adam King
First, I guess I would make sure that peon7 and peon12 actually could pass the host check (you can run "cephadm check-host" on the host directly if you have a copy of the cephadm binary there) Then I'd try a mgr failover (ceph mgr fail) to clear out any in memory host values cephadm might have and

[ceph-users] Re: Pacific Bug?

2024-04-02 Thread Adam King
https://tracker.ceph.com/issues/64428 should be it. Backports are done for quincy, reef, and squid and the patch will be present in the next release for each of those versions. There isn't a pacific backport as, afaik, there are no more pacific releases planned. On Fri, Mar 29, 2024 at 6:03 PM

[ceph-users] Re: cephadm shell version not consistent across monitors

2024-04-02 Thread Adam King
From what I can see with the most recent cephadm binary on pacific, unless you have the CEPHADM_IMAGE env variable set, it does a `podman images --filter label=ceph=True --filter dangling=false` (or docker) and takes the first image in the list. It seems to be getting sorted by creation time by

[ceph-users] Re: Failed adding back a node

2024-03-28 Thread Adam King
No, you can't use the image id for hte upgrade command, it has to be the image name. So it should start, based on what you have, registry.redhat.io/rhceph/. As for the full name, it depends which image you want to go with. As for trying this on an OSD first, there is `ceph orch daemon redeploy

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Adam King
From the ceph versions output I can see "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160 }, It seems like all the OSD daemons on this cluster are using that 16.2.10-160 image, and I'm guessing most of them are running, so it

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
orch > ps. Then again, they are nowhere near the values stated in min_size_by_type > that you list. > Obviously yes, I could disable the auto tuning, but that would leave me > none the wiser as to why this exact host is trying to do this. > > > > On Tue, Mar 26, 2024 at 10:20 PM

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-26 Thread Adam King
For context, the value the autotune goes with takes the value from `cephadm gather-facts` on the host (the "memory_total_kb" field) and then subtracts from that per daemon on the host according to min_size_by_type = { 'mds': 4096 * 1048576, 'mgr': 4096 * 1048576,

[ceph-users] Re: Upgrading from Reef v18.2.1 to v18.2.2

2024-03-21 Thread Adam King
> > Hi, > > On 3/21/24 14:50, Michael Worsham wrote: > > > > Now that Reef v18.2.2 has come out, is there a set of instructions on > how to upgrade to the latest version via using Cephadm? > > Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/ > Just a note on that docs section, it

[ceph-users] Re: ceph-volume fails when adding spearate DATA and DATA.DB volumes

2024-03-06 Thread Adam King
If you want to be directly setting up the OSDs using ceph-volume commands (I'll pretty much always recommend following https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over manual ceph-volume stuff in cephadm deployments unless what you're doing can't be done with the spec

[ceph-users] Re: Ceph reef mon is not starting after host reboot

2024-03-06 Thread Adam King
When you ran this, was it directly on the host, or did you run `cephadm shell` first? The two things you tend to need to connect to the cluster (that "RADOS timed out" error is generally what you get when connecting to the cluster fails. A bunch of different causes all end with that error) are a

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Adam King
There was a bug with this that was fixed by https://github.com/ceph/ceph/pull/52122 (which also specifically added an integration test for this case). It looks like it's missing a reef and quincy backport though unfortunately. I'll try to open one for both. On Tue, Mar 5, 2024 at 8:26 AM Eugen

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-03 Thread Adam King
Okay, it seems like from what you're saying the RGW image itself isn't special compared to the other ceph daemons, it's just that you want to use the image on your local registry. In that case, I would still recommend just using `ceph orch upgrade start --image ` with the image from your local

[ceph-users] Re: [Quincy] NFS ingress mode haproxy-protocol not recognized

2024-03-03 Thread Adam King
According to https://tracker.ceph.com/issues/58933, that was only backported as far as reef. If I remember correctly, the reason for that was the ganehsa version itself we were including in our quincy containers wasn't new enough to support the feature on that end, so backporting the

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-01 Thread Adam King
There have been bugs in the past where things have gotten "stuck". Usually I'd say check the REFRESHED column in the output of `ceph orch ps`. It should refresh the daemons on each host roughly every 10 minutes, so if you see some value much larger than that, things are probably actually stuck. If

[ceph-users] Re: Migration from ceph-ansible to Cephadm

2024-02-29 Thread Adam King
> > - I still have the ceph-crash container, what should I do with it? > If it's the old one, I think you can remove it. Cephadm can deploy its own crash service (`ceph orch apply crash` if it hasn't). You can check if `crash` is listed under `ceph orch ls` and if it is there you can do `ceph

[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Adam King
In regards to > > From the reading you gave me I have understood the following : > 1 - Set osd_memory_target_autotune to true then set > autotune_memory_target_ratio to 0.2 > 2 - Or do the math. For my setup I have 384Go per node, each node has 4 > nvme disks of 7.6To, 0.2 of memory is 19.5G. So

[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Adam King
Cephadm does not have some variable that explicitly says it's an HCI deployment. However, the HCI variable in ceph ansible I believe only controlled the osd_memory_target attribute, which would automatically set it to 20% or 70% respectively of the memory on the node divided by the number of OSDs

[ceph-users] Re: first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Adam King
It seems the quincy backport for that feature ( https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According to the quincy part of https://docs.ceph.com/en/latest/releases/#release-timeline it looks like that would mean it would only be present in 17.2.7, but not 17.2.6. On Wed,

[ceph-users] Re: Pacific Bug?

2024-02-14 Thread Adam King
Does seem like a bug, actually in more than just this command. The `ceph orch host ls` with the --label and/or --host-pattern flag just piggybacks off of the existing filtering done for placements in service specs. I've just taken a look and you actually can create the same behavior with the

[ceph-users] Re: Pacific: Drain hosts does not remove mgr daemon

2024-01-31 Thread Adam King
If you just manually run `ceph orch daemon rm ` does it get removed? I know there's some logic in host drain that does some ok-to-stop checks that can cause things to be delayed or stuck if it doesn't think it's safe to remove the daemon for some reason. I wonder if it's being overly cautious

[ceph-users] CLT meeting notes January 24th 2024

2024-01-24 Thread Adam King
- Build/package PRs- who to best review these? - Example: https://github.com/ceph/ceph/pull/55218 - Idea: create a GitHub team specifically for these types of PRs https://github.com/orgs/ceph/teams - Laura will try to organize people for the group - Pacific 16.2.15 status

[ceph-users] Re: nfs export over RGW issue in Pacific

2023-12-07 Thread Adam King
The first handling of nfs exports over rgw in the nfs module, including the `ceph nfs export create rgw` command, wasn't added to the nfs module in pacific until 16.2.7. On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha wrote: > Hi, > > > oot@a001s016:~# cephadm version > > Using recent ceph image

[ceph-users] Re: error deploying ceph

2023-11-30 Thread Adam King
System, Insufficient space (<10 > extents) on vgs, LVM detected > node3-ceph /dev/xvdb ssd 100G N/A >N/A No 27m agoHas a FileSystem, Insufficient space (<10 > extents) on vgs, LVM detected > root@node1-ceph:~# > &g

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
ls: 0 pools, 0 pgs > objects: 0 objects, 0 B > usage: 0 B used, 0 B / 0 B avail > pgs: > > root@node1-ceph:~# > > Regards > > > > On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote: > >> I think I remember a bug that happened when there was

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
I think I remember a bug that happened when there was a small mismatch between the cephadm version being used for bootstrapping and the container. In this case, the cephadm binary used for bootstrap knows about the ceph-exporter service and the container image being used does not. The

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-16 Thread Adam King
ing. > > Travis, Adam King - any need to rerun any suites? > > On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux > wrote: > > > > Hi Yuri, > > > > > > > > Backport PR [2] for reef has been merged. > > > > > > >

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-14 Thread Adam King
t; ran the tests below and asking for approvals: > > smoke - Laura > rados/mgr - PASSED > rados/dashboard - Nizamudeen > orch - Adam King > > See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1 > > On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach > wrote:

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Adam King
> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything for > this? > Yes, but not an actual code change in the main ceph repo. I'm looking into a ceph-container change to alter the ganesha version in the container as a solution. On Wed, Nov 8, 2023 at 11:10 AM Yu

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Adam King
https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd > > Still seeing approvals. > smoke - Laura, Radek, Prashant, Venky in progress > rados - Neha, Radek, Travis, Ernesto, Adam King > rgw - Casey in progress > fs - Venky > orch - Adam King > rbd - Ilya a

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Adam King
> Should it be fixed for this release? > > Seeking approvals/reviews for: > > smoke - Laura > rados - Laura, Radek, Travis, Ernesto, Adam King > > rgw - Casey > fs - Venky > orch - Adam King > > rbd - Ilya > krbd - Ilya > > upgrade/quincy-p2p - Known is

[ceph-users] CLT weekly notes October 11th 2023

2023-10-11 Thread Adam King
d dropping that as a build target - Last Pacific? - Yes, 17.2.7, then 18.2.1, then 16.2.15 (final) - PTLs will need to go through and find what backports still need to get into pacific - A lot of open pacific backports right now Thanks, -

[ceph-users] Re: cephadm, cannot use ECDSA key with quincy

2023-10-10 Thread Adam King
The CA signed keys working in pacific was sort of accidental. We found out that it was a working use case in pacific but not in quincy earlier this year, which resulted in this tracker https://tracker.ceph.com/issues/62009. That has since been implemented in main, and backported to the reef branch

[ceph-users] Re: ceph orch osd data_allocate_fraction does not work

2023-09-21 Thread Adam King
Looks like the orchestation side support for this got brought into pacific with the rest of the drive group stuff, but the actual underlying feature in ceph-volume (from https://github.com/ceph/ceph/pull/40659) never got a pacific backport. I've opened the backport now

[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-28 Thread Adam King
up in the Jenkins api check, where these kinds of >> conditions are expected. In that case, I would call #1 more of a test >> issue, and say that the fix is to whitelist the warning for that test. >> Would be good to have someone from CephFS weigh in though-- @Patrick >

[ceph-users] Re: cephadm to setup wal/db on nvme

2023-08-23 Thread Adam King
this should be possible by specifying a "data_devices" and "db_devices" fields in the OSD spec file each with different filters. There's some examples in the docs https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that show roughly how that's done, and some other sections (

[ceph-users] Re: osdspec_affinity error in the Cephadm module

2023-08-16 Thread Adam King
it looks like you've hit https://tracker.ceph.com/issues/58946 which has a candidate fix open, but nothing merged. The description on the PR with the candidate fix says "When osdspec_affinity is not set, the drive selection code will fail. This can happen when a device has multiple LVs where some

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Adam King
I've seen this before where the ceph-volume process hanging causes the whole serve loop to get stuck (we have a patch to get it to timeout properly in reef and are backporting to quincy but nothing for pacific unfortunately). That's why I was asking about the REFRESHED column in the orch ps/ orch

[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-15 Thread Adam King
you could maybe try running "ceph config set global container quay.io/ceph/ceph:v16.2.9" before running the adoption. It seems it still thinks it should be deploying mons with the default image ( docker.io/ceph/daemon-base:latest-pacific-devel ) for some reason and maybe that config option is why.

[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Adam King
with the log to cluster level already on debug, if you do a "ceph mgr fail" what does cephadm log to the cluster before it reports sleeping? It should at least be doing something if it's responsive at all. Also, in "ceph orch ps" and "ceph orch device ls" are the REFRESHED columns reporting that

[ceph-users] Re: ref v18.2.0 QE Validation status

2023-07-31 Thread Adam King
ein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/62231#note-1 > > Seeking approvals/reviews for: > > smoke - Laura, Radek > rados - Neha, Radek, Travis, Ernesto, Adam King > rgw - Casey > fs - Venky > orch - Adam King > rbd

[ceph-users] Re: cephadm logs

2023-07-28 Thread Adam King
Not currently. Those logs aren't generated by any daemons, they come directly from anything done by the cephadm binary one the host, which tends to be quite a bit since the cephadm mgr module runs most of its operations on the host through a copy of the cephadm binary. It doesn't log to journal

[ceph-users] Re: Failing to restart mon and mgr daemons on Pacific

2023-07-25 Thread Adam King
.OrchestratorError: cephadm exited with an error > code: 1, stderr:Deploy daemon node-exporter.darkside1 ... > Verifying port 9100 ... > Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use > ERROR: TCP Port(s) '9100' required for node-exporter already in use > >

[ceph-users] Re: Failing to restart mon and mgr daemons on Pacific

2023-07-24 Thread Adam King
The logs you probably really want to look at here are the journal logs from the mgr and mon. If you have a copy of the cephadm tool on the host, you can do a "cephadm ls --no-detail | grep systemd" to list out the systemd unit names for the ceph daemons on the host, or just look find the systemd

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-19 Thread Adam King
b_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb", > > "name": "osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911", > > "osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71", > > "osd_id": "7", >

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-18 Thread Adam King
in the "ceph orch device ls --format json-pretty" output, in the blob for that specific device, is the "ceph_device" field set? There was a bug where it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it would make it so you couldn't use a device serving as a db device for any

[ceph-users] Re: CEPHADM_FAILED_SET_OPTION

2023-07-18 Thread Adam King
Someone hit what I think is this same issue the other day. Do you have a "config" section in your rgw spec that sets the "rgw_keystone_implicit_tenants" option to "True" or "true"? For them, changing the value to be 1 (which should be equivalent to "true" here) instead of "true" fixed it. Likely

[ceph-users] Re: CEPHADM_FAILED_SET_OPTION

2023-07-13 Thread Adam King
ck interval: 30 > rgw usage max shards: 32 > rgw usage max user shards: 1 > spec: > rgw_frontend_port: 8100 > > I deleted the 'rgw keystone implicit tenants’ settings now, and the > warning disappeared. Seems like it has been deprecated? The warning message > is very m

[ceph-users] Re: CEPHADM_FAILED_SET_OPTION

2023-07-13 Thread Adam King
Do you have a `config` section in your RGW spec? That health warning is from cephadm trying to set options from a spec section like that. There's a short bit about it at the top of https://docs.ceph.com/en/latest/cephadm/services/#service-specification. On Thu, Jul 13, 2023 at 3:39 AM wrote: >

[ceph-users] CLT Meeting Notes June 28th, 2023

2023-06-28 Thread Adam King
Reef RC linking failure on Alpine Linux. Do we worry about that? 1. https://tracker.ceph.com/issues/61718 2. Nice to fix, but not a requirement 3. If there are patches available, we should accept them, but probably don't put too much work into it currently debian bullseye build

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-24 Thread Adam King
Reminds me of https://tracker.ceph.com/issues/57007 which wasn't fixed in pacific until 16.2.11, so this is probably just the result of a cephadm bug unfortunately. On Fri, Jun 23, 2023 at 5:16 PM Malte Stroem wrote: > Hello Eugen, > > thanks. > > We found the cause. > > Somehow all > >

[ceph-users] Re: Error while adding host : Error EINVAL: Traceback (most recent call last): File /usr/share/ceph/mgr/mgr_module.py, line 1756, in _handle_command

2023-06-20 Thread Adam King
There was a cephadm bug that wasn't fixed by the time 17.2.6 came out (I'm assuming that's the version being used here, although it may have been present in some slightly earlier quincy versions) that caused this misleading error to be printed out when adding a host failed. There's a tracker for

[ceph-users] Re: stray daemons not managed by cephadm

2023-06-12 Thread Adam King
if you do a mgr failover ("ceph mgr fail") and wait a few minutes do the issues clear out? I know there's a bug where removed mons get marked as stray daemons while downsizing by multiple mons at once (cephadm might be removing them too quickly, not totally sure of the cause) but doing a mgr

[ceph-users] Re: change user root to non-root after deploy cluster by cephadm

2023-06-07 Thread Adam King
When you try to change the user using "ceph cephadm set-user" (or any of the other commands that change ssh settings) it will attempt a connection to a random host with the new settings, and run the "cephadm check-host" command on that host. If that fails, it will change the setting back and

[ceph-users] Re: reef v18.1.0 QE Validation status

2023-05-31 Thread Adam King
> > Seeking approvals/reviews for: > > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to > merge https://github.com/ceph/ceph/pull/51788 for > the core) > rgw - Casey > fs - Venky > orch - Adam King > rbd - Ilya > krbd - Ilya > upgrade/octopus-x - dep

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Adam King
(and since I found the shell processes, I can verify I didn't > have a typo ;-) ) > > Regarding broken record: I'm extremly thankful for your support. And I > should have checked that earlier. We all know that sometimes it's the > least probable things that go sideways. So

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Adam King
> > I don't see any information about the orchestrator module having > crashed. It's running as always. > > From the the prior problem I had some issues in my cephfs pools. So, > maybe there's something broken in the .mgr pool? Could that be a reason > for this behaviour? I

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-15 Thread Adam King
As you've already seem to have figured out, "ceph orch device ls" is populated with the results from "ceph-volume inventory". My best guess to try and debug this would be to manually run "cephadm ceph-volume -- inventory" (the same as "cephadm ceph-volume inventory", I just like to separate the

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Adam King
This is sort of similar to what I said in a previous email, but the only way I've seen this happen in other setups is through hanging cephadm commands. The debug process has been, do a mgr failover, wait a few minutes, see in "ceph orch ps" and "ceph orch device ls" which hosts have and have not

[ceph-users] Re: cephadm does not honor container_image default value

2023-05-15 Thread Adam King
I think with the `config set` commands there is logic to notify the relevant mgr modules and update their values. That might not exist with `config rm`, so it's still using the last set value. Looks like a real bug. Curious what happens if the mgr restarts after the `config rm`. Whether it goes

[ceph-users] Re: docker restarting lost all managers accidentally

2023-05-10 Thread Adam King
in /var/lib/ceph// on the host with that mgr reporting the error, there should be a unit.run file that shows what is being done to start the mgr as well as a few files that get mounted into the mgr on startup, notably the "config" and "keyring" files. That config file should include the mon host

[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-09 Thread Adam King
which I think was merged too late* (as in the patch wouldn't be in 17.2.6) On Tue, May 9, 2023 at 5:52 PM Adam King wrote: > What's the umask for the "deployer" user? We saw an instance of someone > hitting something like this, but for them it seemed to only happen when &g

[ceph-users] Re: non root deploy ceph 17.2.5 failed

2023-05-09 Thread Adam King
What's the umask for the "deployer" user? We saw an instance of someone hitting something like this, but for them it seemed to only happen when they had changed the umask to 027. We had patched in https://github.com/ceph/ceph/pull/50736 to address it, which I don't think was merged too late for

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-04 Thread Adam King
(BTW, my cephadmin user can run "sudo which python3" without prompting > password on other hosts now, but nothing has been solved) > > Best regards, > Reza > > On Tue, 2 May 2023 at 19:00, Adam King wrote: > >> The number of mgr daemons thing is expected.

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Adam King
what does specifically `ceph log last 200 debug cephadm` spit out? The log lines you've posted so far I don't think are generated by the orchestrator so curious what the last actions it took was (and how long ago). On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm wrote: > To completely rule out

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Adam King
First thing I always check when it seems like orchestrator commands aren't doing anything is "ceph orch ps" and "ceph orch device ls" and check the REFRESHED column. If it's well above 10 minutes for orch ps or 30 minutes for orch device ls, then it means the orchestrator is most likely hanging on

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-02 Thread Adam King
etch mode on this cluster. > > I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you > have any more hints I would be really glad to hear. > > Best regards, > Reza > > > > On Wed, 12 Apr 2023 at 17:18, Adam King wrote: > >> Ah, okay. Someone else had

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-01 Thread Adam King
approved for the rados/cephadm stuff On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/59542#note-1 > Release Notes - TBD > > Seeking approvals for: > > smoke - Radek, Laura > rados - Radek, Laura > rook -

[ceph-users] Re: pacific v16.2.1 (hot-fix) QE Validation status

2023-04-12 Thread Adam King
of time. On Wed, Apr 12, 2023 at 11:28 AM Yuri Weinstein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/59426#note-3 > Release Notes - TBD > > Seeking approvals/reviews for: > > smoke - Josh approved? >

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-04-12 Thread Adam King
st1 at addr (x.x.x.x) > > As I can see here, it turns out sudo is added to the code to be able to > continue: > > > https://github.com/ceph/ceph/blob/v17.2.5/src/pybind/mgr/cephadm/ssh.py#L143 > > I cannot privilege the cephadmin user to run sudo commands for some

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
5.pqxmvt ceph05 error 32m ago > 9M-- > mds.mds01.ceph06.rrxmks ceph06 error 32m ago > 10w-- > mds.mds01.ceph07.omdisd ceph07 error 32m ago > 2M--

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
Will also note that the normal upgrade process scales down the mds service to have only 1 mds per fs before upgrading it, so maybe something you'd want to do as well if the upgrade didn't do it already. It does so by setting the max_mds to 1 for the fs. On Mon, Apr 10, 2023 at 3:51 PM Adam King

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
You could try pausing the upgrade and manually "upgrading" the mds daemons by redeploying them on the new image. Something like "ceph orch daemon redeploy --image <17.2.6 image>" (daemon names should match those in "ceph orch ps" output). If you do that for all of them and then get them into an

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-04-06 Thread Adam King
Does "ceph health detail" give any insight into what the unexpected exception was? If not, I'm pretty confident some traceback would end up being logged. Could maybe still grab it with "ceph log last 200 info cephadm" if not a lot else has happened. Also, probably need to find out if the

[ceph-users] Re: ceph orch ps mon, mgr, osd shows for version, image and container id

2023-03-31 Thread Adam King
her items missing are PORTS, STATUS (time), MEM USE, > > NAME > HOST > PORTSSTATUS REFRESHED AGE MEM USE MEM LIM > VERSION IMAGE ID CONTAINER ID > > rgw.default.default.zp31

[ceph-users] Re: Cephadm - Error ENOENT: Module not found

2023-03-30 Thread Adam King
for the specific issue with that traceback, you can probably resolve that by removing the stored upgrade state. We put it at `mgr/cephadm/upgrade_state` I believe (can check "ceph config-key ls" and look for something related to upgrade state if that doesn't work) so running "ceph config-key rm

[ceph-users] Re: ceph orch ps mon, mgr, osd shows for version, image and container id

2023-03-30 Thread Adam King
if you put a copy of the cephadm binary onto one of these hosts (e.g. a002s002) and run "cephadm ls" what does it give for the OSDs? That's where the orch ps information comes from. On Thu, Mar 30, 2023 at 10:48 AM wrote: > Hi , > > Why is ceph orch ps showing ,unknown version, image and

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Adam King
y it happened 3 times in the initial run but never in the reruns, but the failure came from that, and the upgrade itself seems to still work fine. - Adam King ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgrade 16.2.11 -> 17.2.0 failed

2023-03-14 Thread Adam King
That's very odd, I haven't seen this before. What container image is the upgraded mgr running on (to know for sure, can check the podman/docker run command at the end of the /var/lib/ceph//mgr./unit.run file on the mgr's host)? Also, could maybe try "ceph mgr module enable cephadm" to see if it

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-10 Thread Adam King
The things in "ceph orch ps" output are gathered by checking the contents of the /var/lib/ceph// directory on the host. Those "cephadm." files get deployed normally though, and aren't usually reported in "ceph orch ps" as it should only report things that are directories rather than files. You

[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread Adam King
> > Current cluster status says healthy but I cannot deploy new daemons, the >> mgr information isnt refreshing (5 days old info) under hosts and services >> but the main dashboard is accurate like ceph -s >> Ceph -s will show accurate information but things like ceph orch ps >> --daemon-type mgr

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-07 Thread Adam King
that looks like it was expecting a json structure somewhere and got a blank string. Is there anything in the logs (ceph log last 100 info cephadm)? If not, might be worth trying a couple mgr failovers (I'm assuming only one got upgraded, so first failover would go back to the 15.2.17 one and then

[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread Adam King
Can I see the output of `ceph orch upgrade status` and `ceph config dump | grep image`? The "Pulling container image stop" implies somehow (as Eugen pointed out) that cephadm thinks the image to pull is named "stop" which means it is likely set as either the image to upgrade to or as one of the

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
INFO > cephadm.services.osd] Found osd claims for drivegroup None -> > {'nautilus2': ['7']} > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log > [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']} > > But I see no attempt to actually

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
ll correctly apply the identical drivegroup.yml > and when not. Anyway, the conclusion is to not interfere with cephadm > (nothing new here), but since the drivegroup was not applied correctly > I assumed I had to "help out" a bit by manually deploying an OSD. > > Thanks, > Eug

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
Going off of ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json you could try passing "--keyring -- lvm create'. I'm guessing it's trying to run the osd tree command within a container and I know cephadm mounts keyrings passed to

[ceph-users] Re: Any issues with podman 4.2 and Quincy?

2023-02-13 Thread Adam King
That table is definitely a bit out of date. We've been doing some testing with more recent podman versions and the only issues I'm aware of specific to the podman version are https://tracker.ceph.com/issues/58532 and https://tracker.ceph.com/issues/57018 (which are really the same issue affecting

[ceph-users] Re: CEPHADM_STRAY_DAEMON does not exist, how do I remove knowledge of it from ceph?

2023-02-01 Thread Adam King
I know there's a bug where when downsizing by multiple mons at once through cephadm this ghost stray mon daemon thing can end up happening (I think something about cephadm removing them too quickly in succession, not totally sure). In those cases, just doing a mgr failover ("ceph mgr fail") always

[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-20 Thread Adam King
cephadm approved. Known failures. On Fri, Jan 20, 2023 at 11:39 AM Yuri Weinstein wrote: > The overall progress on this release is looking much better and if we > can approve it we can plan to publish it early next week. > > Still seeking approvals > > rados - Neha, Laura > rook - Sébastien Han

[ceph-users] Re: 16.2.11 pacific QE validation status

2022-12-19 Thread Adam King
cephadm approved. rados/cephadm failures are mostly caused by https://github.com/ceph/ceph/pull/49285 not being merged (which just touches tests and docs so wouldn't block a release). Thanks - Adam King On Thu, Dec 15, 2022 at 12:15 PM Yuri Weinstein wrote: > Details of this rele

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Adam King
reaker? > -- > *From:* Adam King > *Sent:* Friday, December 2, 2022 2:48:19 PM > *To:* Sake Paulusma > *Cc:* ceph-users@ceph.io > *Subject:* Re: [ceph-users] How to replace or add a monitor in stretch > cluster? > > This can't be done in a

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Adam King
This can't be done in a very nice way currently. There's actually an open PR against main to allow setting the crush location for mons in the service spec specifically because others found that this was annoying as well. What I think should work as a workaround is, go to the host where the mon

  1   2   >