At least for the current up-to-date reef branch (not sure what reef version
you're on) when --image is not provided to the shell, it should try to
infer the image in this order
1. from the CEPHADM_IMAGE env. variable
2. if you pass --name with a daemon name to the shell command, it will
- DigitalOcean credits
- things to ask
- what would promotional material require
- how much are credits worth
- Neha to ask
- 19.1.0 centos9 container status
- close to being ready
- will be building centos 8 and 9 containers simultaneously
- should test
>
> - Although I can mount the export I can't write on it
>
> What error are you getting trying to do the write? The way you set things
up doesn't look to different than one of our integration tests for ingress
over nfs (
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana
image in the quincy branch
On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah
wrote:
> Hi,
>
>
> in quay.io I can find a lot of grafana versions for ceph (
> https://quay.io/repository/ceph/grafana?tab=tags) how can I find
ph/ceph/pull/56714>
On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote:
> On behalf of @Radoslaw Zarzynski , rados approved.
>
> Below is the summary of the rados suite failures, divided by component. @Adam
> King @Venky Shankar PTAL at the
> orch and cephfs failures to se
es, still trying, Laura PTL
>
> rados - Radek, Laura approved? Travis? Nizamudeen?
>
> rgw - Casey approved?
> fs - Venky approved?
> orch - Adam King approved?
>
> krbd - Ilya approved
> powercycle - seems fs related, Venky, Brad PTL
>
> ceph-volume - will
;>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>>
>>> I did end up writing a unit test to see what we calculated here, as well
>>> as adding a bunch
ry_total_kb": 32827840,
>
> On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote:
>
>> Sorry to keep asking for more info, but can I also get what `cephadm
>> gather-facts` on that host returns for "memory_total_kb". Might end up
>> creating a unit test out o
running (3w)
> 7m ago 11M2698M4096M 17.2.6
> osd.9my-ceph01 running (3w)
> 7m ago 11M3364M4096M 17.2.6
> prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m
> ago 13M 164M- 2.42
First, I guess I would make sure that peon7 and peon12 actually could pass
the host check (you can run "cephadm check-host" on the host directly if
you have a copy of the cephadm binary there) Then I'd try a mgr failover
(ceph mgr fail) to clear out any in memory host values cephadm might have
and
https://tracker.ceph.com/issues/64428 should be it. Backports are done for
quincy, reef, and squid and the patch will be present in the next release
for each of those versions. There isn't a pacific backport as, afaik, there
are no more pacific releases planned.
On Fri, Mar 29, 2024 at 6:03 PM
From what I can see with the most recent cephadm binary on pacific, unless
you have the CEPHADM_IMAGE env variable set, it does a `podman images
--filter label=ceph=True --filter dangling=false` (or docker) and takes the
first image in the list. It seems to be getting sorted by creation time by
No, you can't use the image id for hte upgrade command, it has to be the
image name. So it should start, based on what you have,
registry.redhat.io/rhceph/. As for the full name, it depends which image
you want to go with. As for trying this on an OSD first, there is `ceph
orch daemon redeploy
From the ceph versions output I can see
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
It seems like all the OSD daemons on this cluster are using that
16.2.10-160 image, and I'm guessing most of them are running, so it
orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar 26, 2024 at 10:20 PM
For context, the value the autotune goes with takes the value from `cephadm
gather-facts` on the host (the "memory_total_kb" field) and then subtracts
from that per daemon on the host according to
min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
>
> Hi,
>
> On 3/21/24 14:50, Michael Worsham wrote:
> >
> > Now that Reef v18.2.2 has come out, is there a set of instructions on
> how to upgrade to the latest version via using Cephadm?
>
> Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/
>
Just a note on that docs section, it
If you want to be directly setting up the OSDs using ceph-volume commands
(I'll pretty much always recommend following
https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over
manual ceph-volume stuff in cephadm deployments unless what you're doing
can't be done with the spec
When you ran this, was it directly on the host, or did you run `cephadm
shell` first? The two things you tend to need to connect to the cluster
(that "RADOS timed out" error is generally what you get when connecting to
the cluster fails. A bunch of different causes all end with that error) are
a
There was a bug with this that was fixed by
https://github.com/ceph/ceph/pull/52122 (which also specifically added an
integration test for this case). It looks like it's missing a reef and
quincy backport though unfortunately. I'll try to open one for both.
On Tue, Mar 5, 2024 at 8:26 AM Eugen
Okay, it seems like from what you're saying the RGW image itself isn't
special compared to the other ceph daemons, it's just that you want to use
the image on your local registry. In that case, I would still recommend
just using `ceph orch upgrade start --image ` with the image
from your local
According to https://tracker.ceph.com/issues/58933, that was only
backported as far as reef. If I remember correctly, the reason for that was
the ganehsa version itself we were including in our quincy containers
wasn't new enough to support the feature on that end, so backporting the
There have been bugs in the past where things have gotten "stuck". Usually
I'd say check the REFRESHED column in the output of `ceph orch ps`. It
should refresh the daemons on each host roughly every 10 minutes, so if you
see some value much larger than that, things are probably actually stuck.
If
>
> - I still have the ceph-crash container, what should I do with it?
>
If it's the old one, I think you can remove it. Cephadm can deploy its own
crash service (`ceph orch apply crash` if it hasn't). You can check if
`crash` is listed under `ceph orch ls` and if it is there you can do `ceph
In regards to
>
> From the reading you gave me I have understood the following :
> 1 - Set osd_memory_target_autotune to true then set
> autotune_memory_target_ratio to 0.2
> 2 - Or do the math. For my setup I have 384Go per node, each node has 4
> nvme disks of 7.6To, 0.2 of memory is 19.5G. So
Cephadm does not have some variable that explicitly says it's an HCI
deployment. However, the HCI variable in ceph ansible I believe only
controlled the osd_memory_target attribute, which would automatically set
it to 20% or 70% respectively of the memory on the node divided by the
number of OSDs
It seems the quincy backport for that feature (
https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According
to the quincy part of
https://docs.ceph.com/en/latest/releases/#release-timeline it looks like
that would mean it would only be present in 17.2.7, but not 17.2.6.
On Wed,
Does seem like a bug, actually in more than just this command. The `ceph
orch host ls` with the --label and/or --host-pattern flag just piggybacks
off of the existing filtering done for placements in service specs. I've
just taken a look and you actually can create the same behavior with the
If you just manually run `ceph orch daemon rm
` does it get removed? I know there's
some logic in host drain that does some ok-to-stop checks that can cause
things to be delayed or stuck if it doesn't think it's safe to remove the
daemon for some reason. I wonder if it's being overly cautious
- Build/package PRs- who to best review these?
- Example: https://github.com/ceph/ceph/pull/55218
- Idea: create a GitHub team specifically for these types of PRs
https://github.com/orgs/ceph/teams
- Laura will try to organize people for the group
- Pacific 16.2.15 status
The first handling of nfs exports over rgw in the nfs module, including the
`ceph nfs export create rgw` command, wasn't added to the nfs module in
pacific until 16.2.7.
On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha
wrote:
> Hi,
>
>
> oot@a001s016:~# cephadm version
>
> Using recent ceph image
System, Insufficient space (<10
> extents) on vgs, LVM detected
> node3-ceph /dev/xvdb ssd 100G N/A
>N/A No 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> root@node1-ceph:~#
>
&g
ls: 0 pools, 0 pgs
> objects: 0 objects, 0 B
> usage: 0 B used, 0 B / 0 B avail
> pgs:
>
> root@node1-ceph:~#
>
> Regards
>
>
>
> On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote:
>
>> I think I remember a bug that happened when there was
I think I remember a bug that happened when there was a small mismatch
between the cephadm version being used for bootstrapping and the container.
In this case, the cephadm binary used for bootstrap knows about the
ceph-exporter service and the container image being used does not. The
ing.
>
> Travis, Adam King - any need to rerun any suites?
>
> On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux
> wrote:
> >
> > Hi Yuri,
> >
> >
> >
> > Backport PR [2] for reef has been merged.
> >
> >
> >
>
t; ran the tests below and asking for approvals:
>
> smoke - Laura
> rados/mgr - PASSED
> rados/dashboard - Nizamudeen
> orch - Adam King
>
> See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1
>
> On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach
> wrote:
>
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>
Yes, but not an actual code change in the main ceph repo. I'm looking into
a ceph-container change to alter the ganesha version in the container as a
solution.
On Wed, Nov 8, 2023 at 11:10 AM Yu
https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>
> Still seeing approvals.
> smoke - Laura, Radek, Prashant, Venky in progress
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey in progress
> fs - Venky
> orch - Adam King
> rbd - Ilya a
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known is
d dropping
that as a build target
- Last Pacific?
- Yes, 17.2.7, then 18.2.1, then 16.2.15 (final)
- PTLs will need to go through and find what backports still need to get
into pacific
- A lot of open pacific backports right now
Thanks,
-
The CA signed keys working in pacific was sort of accidental. We found out
that it was a working use case in pacific but not in quincy earlier this
year, which resulted in this tracker https://tracker.ceph.com/issues/62009.
That has since been implemented in main, and backported to the reef branch
Looks like the orchestation side support for this got brought into pacific
with the rest of the drive group stuff, but the actual underlying feature
in ceph-volume (from https://github.com/ceph/ceph/pull/40659) never got a
pacific backport. I've opened the backport now
up in the Jenkins api check, where these kinds of
>> conditions are expected. In that case, I would call #1 more of a test
>> issue, and say that the fix is to whitelist the warning for that test.
>> Would be good to have someone from CephFS weigh in though-- @Patrick
>
this should be possible by specifying a "data_devices" and "db_devices"
fields in the OSD spec file each with different filters. There's some
examples in the docs
https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that
show roughly how that's done, and some other sections (
it looks like you've hit https://tracker.ceph.com/issues/58946 which has a
candidate fix open, but nothing merged. The description on the PR with the
candidate fix says "When osdspec_affinity is not set, the drive selection
code will fail. This can happen when a device has multiple LVs where some
I've seen this before where the ceph-volume process hanging causes the
whole serve loop to get stuck (we have a patch to get it to timeout
properly in reef and are backporting to quincy but nothing for pacific
unfortunately). That's why I was asking about the REFRESHED column in the
orch ps/ orch
you could maybe try running "ceph config set global container
quay.io/ceph/ceph:v16.2.9" before running the adoption. It seems it still
thinks it should be deploying mons with the default image (
docker.io/ceph/daemon-base:latest-pacific-devel ) for some reason and maybe
that config option is why.
with the log to cluster level already on debug, if you do a "ceph mgr fail"
what does cephadm log to the cluster before it reports sleeping? It should
at least be doing something if it's responsive at all. Also, in "ceph orch
ps" and "ceph orch device ls" are the REFRESHED columns reporting that
ein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/62231#note-1
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey
> fs - Venky
> orch - Adam King
> rbd
Not currently. Those logs aren't generated by any daemons, they come
directly from anything done by the cephadm binary one the host, which tends
to be quite a bit since the cephadm mgr module runs most of its operations
on the host through a copy of the cephadm binary. It doesn't log to journal
.OrchestratorError: cephadm exited with an error
> code: 1, stderr:Deploy daemon node-exporter.darkside1 ...
> Verifying port 9100 ...
> Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use
> ERROR: TCP Port(s) '9100' required for node-exporter already in use
>
>
The logs you probably really want to look at here are the journal logs from
the mgr and mon. If you have a copy of the cephadm tool on the host, you
can do a "cephadm ls --no-detail | grep systemd" to list out the systemd
unit names for the ceph daemons on the host, or just look find the systemd
b_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb",
> > "name": "osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911",
> > "osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71",
> > "osd_id": "7",
>
in the "ceph orch device ls --format json-pretty" output, in the blob for
that specific device, is the "ceph_device" field set? There was a bug where
it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it
would make it so you couldn't use a device serving as a db device for any
Someone hit what I think is this same issue the other day. Do you have a
"config" section in your rgw spec that sets the
"rgw_keystone_implicit_tenants" option to "True" or "true"? For them,
changing the value to be 1 (which should be equivalent to "true" here)
instead of "true" fixed it. Likely
ck interval: 30
> rgw usage max shards: 32
> rgw usage max user shards: 1
> spec:
> rgw_frontend_port: 8100
>
> I deleted the 'rgw keystone implicit tenants’ settings now, and the
> warning disappeared. Seems like it has been deprecated? The warning message
> is very m
Do you have a `config` section in your RGW spec? That health warning is
from cephadm trying to set options from a spec section like that. There's a
short bit about it at the top of
https://docs.ceph.com/en/latest/cephadm/services/#service-specification.
On Thu, Jul 13, 2023 at 3:39 AM wrote:
>
Reef RC linking failure on Alpine Linux. Do we worry about that?
1. https://tracker.ceph.com/issues/61718
2. Nice to fix, but not a requirement
3. If there are patches available, we should accept them, but probably
don't put too much work into it currently
debian bullseye build
Reminds me of https://tracker.ceph.com/issues/57007 which wasn't fixed in
pacific until 16.2.11, so this is probably just the result of a cephadm bug
unfortunately.
On Fri, Jun 23, 2023 at 5:16 PM Malte Stroem wrote:
> Hello Eugen,
>
> thanks.
>
> We found the cause.
>
> Somehow all
>
>
There was a cephadm bug that wasn't fixed by the time 17.2.6 came out (I'm
assuming that's the version being used here, although it may have been
present in some slightly earlier quincy versions) that caused this
misleading error to be printed out when adding a host failed. There's a
tracker for
if you do a mgr failover ("ceph mgr fail") and wait a few minutes do the
issues clear out? I know there's a bug where removed mons get marked as
stray daemons while downsizing by multiple mons at once (cephadm might be
removing them too quickly, not totally sure of the cause) but doing a mgr
When you try to change the user using "ceph cephadm set-user" (or any of
the other commands that change ssh settings) it will attempt a connection
to a random host with the new settings, and run the "cephadm check-host"
command on that host. If that fails, it will change the setting back and
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to
> merge https://github.com/ceph/ceph/pull/51788 for
> the core)
> rgw - Casey
> fs - Venky
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - dep
(and since I found the shell processes, I can verify I didn't
> have a typo ;-) )
>
> Regarding broken record: I'm extremly thankful for your support. And I
> should have checked that earlier. We all know that sometimes it's the
> least probable things that go sideways. So
>
> I don't see any information about the orchestrator module having
> crashed. It's running as always.
>
> From the the prior problem I had some issues in my cephfs pools. So,
> maybe there's something broken in the .mgr pool? Could that be a reason
> for this behaviour? I
As you've already seem to have figured out, "ceph orch device ls" is
populated with the results from "ceph-volume inventory". My best guess to
try and debug this would be to manually run "cephadm ceph-volume --
inventory" (the same as "cephadm ceph-volume inventory", I just like to
separate the
This is sort of similar to what I said in a previous email, but the only
way I've seen this happen in other setups is through hanging cephadm
commands. The debug process has been, do a mgr failover, wait a few
minutes, see in "ceph orch ps" and "ceph orch device ls" which hosts have
and have not
I think with the `config set` commands there is logic to notify the
relevant mgr modules and update their values. That might not exist with
`config rm`, so it's still using the last set value. Looks like a real bug.
Curious what happens if the mgr restarts after the `config rm`. Whether it
goes
in /var/lib/ceph// on the host with that mgr
reporting the error, there should be a unit.run file that shows what is
being done to start the mgr as well as a few files that get mounted into
the mgr on startup, notably the "config" and "keyring" files. That config
file should include the mon host
which I think was merged too late* (as in the patch wouldn't be in 17.2.6)
On Tue, May 9, 2023 at 5:52 PM Adam King wrote:
> What's the umask for the "deployer" user? We saw an instance of someone
> hitting something like this, but for them it seemed to only happen when
&g
What's the umask for the "deployer" user? We saw an instance of someone
hitting something like this, but for them it seemed to only happen when
they had changed the umask to 027. We had patched in
https://github.com/ceph/ceph/pull/50736 to address it, which I don't think
was merged too late for
(BTW, my cephadmin user can run "sudo which python3" without prompting
> password on other hosts now, but nothing has been solved)
>
> Best regards,
> Reza
>
> On Tue, 2 May 2023 at 19:00, Adam King wrote:
>
>> The number of mgr daemons thing is expected.
what does specifically `ceph log last 200 debug cephadm` spit out? The log
lines you've posted so far I don't think are generated by the orchestrator
so curious what the last actions it took was (and how long ago).
On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm
wrote:
> To completely rule out
First thing I always check when it seems like orchestrator commands aren't
doing anything is "ceph orch ps" and "ceph orch device ls" and check the
REFRESHED column. If it's well above 10 minutes for orch ps or 30 minutes
for orch device ls, then it means the orchestrator is most likely hanging
on
etch mode on this cluster.
>
> I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you
> have any more hints I would be really glad to hear.
>
> Best regards,
> Reza
>
>
>
> On Wed, 12 Apr 2023 at 17:18, Adam King wrote:
>
>> Ah, okay. Someone else had
approved for the rados/cephadm stuff
On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
> rook -
of time.
On Wed, Apr 12, 2023 at 11:28 AM Yuri Weinstein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59426#note-3
> Release Notes - TBD
>
> Seeking approvals/reviews for:
>
> smoke - Josh approved?
>
st1 at addr (x.x.x.x)
>
> As I can see here, it turns out sudo is added to the code to be able to
> continue:
>
>
> https://github.com/ceph/ceph/blob/v17.2.5/src/pybind/mgr/cephadm/ssh.py#L143
>
> I cannot privilege the cephadmin user to run sudo commands for some
5.pqxmvt ceph05 error 32m ago
> 9M--
> mds.mds01.ceph06.rrxmks ceph06 error 32m ago
> 10w--
> mds.mds01.ceph07.omdisd ceph07 error 32m ago
> 2M--
Will also note that the normal upgrade process scales down the mds service
to have only 1 mds per fs before upgrading it, so maybe something you'd
want to do as well if the upgrade didn't do it already. It does so by
setting the max_mds to 1 for the fs.
On Mon, Apr 10, 2023 at 3:51 PM Adam King
You could try pausing the upgrade and manually "upgrading" the mds daemons
by redeploying them on the new image. Something like "ceph orch daemon
redeploy --image <17.2.6 image>" (daemon names should
match those in "ceph orch ps" output). If you do that for all of them and
then get them into an
Does "ceph health detail" give any insight into what the unexpected
exception was? If not, I'm pretty confident some traceback would end up
being logged. Could maybe still grab it with "ceph log last 200 info
cephadm" if not a lot else has happened. Also, probably need to find out if
the
her items missing are PORTS, STATUS (time), MEM USE,
>
> NAME
> HOST
> PORTSSTATUS REFRESHED AGE MEM USE MEM LIM
> VERSION IMAGE ID CONTAINER ID
>
> rgw.default.default.zp31
for the specific issue with that traceback, you can probably resolve that
by removing the stored upgrade state. We put it at
`mgr/cephadm/upgrade_state` I believe (can check "ceph config-key ls" and
look for something related to upgrade state if that doesn't work) so
running "ceph config-key rm
if you put a copy of the cephadm binary onto one of these hosts (e.g.
a002s002) and run "cephadm ls" what does it give for the OSDs? That's where
the orch ps information comes from.
On Thu, Mar 30, 2023 at 10:48 AM wrote:
> Hi ,
>
> Why is ceph orch ps showing ,unknown version, image and
y it happened 3 times in the initial run but never in
the reruns, but the failure came from that, and the upgrade itself seems to
still work fine.
- Adam King
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
That's very odd, I haven't seen this before. What container image is the
upgraded mgr running on (to know for sure, can check the podman/docker run
command at the end of the /var/lib/ceph//mgr./unit.run file
on the mgr's host)? Also, could maybe try "ceph mgr module enable cephadm"
to see if it
The things in "ceph orch ps" output are gathered by checking the contents
of the /var/lib/ceph// directory on the host. Those
"cephadm." files get deployed normally though, and aren't usually
reported in "ceph orch ps" as it should only report things that are
directories rather than files. You
>
> Current cluster status says healthy but I cannot deploy new daemons, the
>> mgr information isnt refreshing (5 days old info) under hosts and services
>> but the main dashboard is accurate like ceph -s
>> Ceph -s will show accurate information but things like ceph orch ps
>> --daemon-type mgr
that looks like it was expecting a json structure somewhere and got a blank
string. Is there anything in the logs (ceph log last 100 info cephadm)? If
not, might be worth trying a couple mgr failovers (I'm assuming only one
got upgraded, so first failover would go back to the 15.2.17 one and then
Can I see the output of `ceph orch upgrade status` and `ceph config dump |
grep image`? The "Pulling container image stop" implies somehow (as Eugen
pointed out) that cephadm thinks the image to pull is named "stop" which
means it is likely set as either the image to upgrade to or as one of the
INFO
> cephadm.services.osd] Found osd claims for drivegroup None ->
> {'nautilus2': ['7']}
> Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
> [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}
>
> But I see no attempt to actually
ll correctly apply the identical drivegroup.yml
> and when not. Anyway, the conclusion is to not interfere with cephadm
> (nothing new here), but since the drivegroup was not applied correctly
> I assumed I had to "help out" a bit by manually deploying an OSD.
>
> Thanks,
> Eug
Going off of
ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
you could try passing "--keyring -- lvm create'. I'm guessing it's trying to run the
osd tree command within a container and I know cephadm mounts keyrings
passed to
That table is definitely a bit out of date. We've been doing some testing
with more recent podman versions and the only issues I'm aware of specific
to the podman version are https://tracker.ceph.com/issues/58532 and
https://tracker.ceph.com/issues/57018 (which are really the same issue
affecting
I know there's a bug where when downsizing by multiple mons at once through
cephadm this ghost stray mon daemon thing can end up happening (I think
something about cephadm removing them too quickly in succession, not
totally sure). In those cases, just doing a mgr failover ("ceph mgr fail")
always
cephadm approved. Known failures.
On Fri, Jan 20, 2023 at 11:39 AM Yuri Weinstein wrote:
> The overall progress on this release is looking much better and if we
> can approve it we can plan to publish it early next week.
>
> Still seeking approvals
>
> rados - Neha, Laura
> rook - Sébastien Han
cephadm approved. rados/cephadm failures are mostly caused by
https://github.com/ceph/ceph/pull/49285 not being merged (which just
touches tests and docs so wouldn't block a release).
Thanks
- Adam King
On Thu, Dec 15, 2022 at 12:15 PM Yuri Weinstein wrote:
> Details of this rele
reaker?
> --
> *From:* Adam King
> *Sent:* Friday, December 2, 2022 2:48:19 PM
> *To:* Sake Paulusma
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [ceph-users] How to replace or add a monitor in stretch
> cluster?
>
> This can't be done in a
This can't be done in a very nice way currently. There's actually an open
PR against main to allow setting the crush location for mons in the service
spec specifically because others found that this was annoying as well. What
I think should work as a workaround is, go to the host where the mon
1 - 100 of 178 matches
Mail list logo