date:20230730

[ceph-users] Re: cephadm logs

2023-07-30 Thread Luis Domingues

Hi,

We are interested in having cephadm log to journald. So I create the ticket: 
https://tracker.ceph.com/issues/62233

Thanks

Luis Domingues
Proton AG


--- Original Message ---
On Saturday, July 29th, 2023 at 20:55, John Mulligan 
 wrote:


> On Friday, July 28, 2023 11:51:06 AM EDT Adam King wrote:
> 
> > Not currently. Those logs aren't generated by any daemons, they come
> > directly from anything done by the cephadm binary one the host, which tends
> > to be quite a bit since the cephadm mgr module runs most of its operations
> > on the host through a copy of the cephadm binary. It doesn't log to journal
> > because it doesn't have a systemd unit or anything, it's just a python
> > script being run directly and nothing has been implemented to make it
> > possible for that to log to journald.
> 
> 
> 
> For what it's worth, there's no requirement that a process be executed
> directly by a specific systemd unit to have it log to the journal. These days
> I'm pretty sure that anything that tries to use the local syslog goes to the
> journal. Here's a quick example:
> 
> I create foo.py with the following:
> `import logging import logging.handlers import sys handler = 
> logging.handlers.SysLogHandler('/dev/log') handler.ident = 'notcephadm: ' h2 
> = logging.StreamHandler(stream=sys.stderr) logging.basicConfig( 
> level=logging.DEBUG, handlers=[handler, h2], format="(%(levelname)s): 
> %(message)s", ) log = logging.getLogger(__name__) log.debug("debug me") 
> log.error("oops, an error was here") log.info("some helpful information goes 
> here")`
> I ran the above and now I can run:
> `$ journalctl --no-pager -t notcephadm Jul 29 14:35:31 edfu 
> notcephadm[105868]: (DEBUG): debug me Jul 29 14:35:31 edfu 
> notcephadm[105868]: (ERROR): oops, an error was here Jul 29 14:35:31 edfu 
> notcephadm[105868]: (INFO): some helpful information goes here`
> 
> Just getting logs into the journal does not even require one of the libraries
> specific to the systemd journal. Personally, I find centralized logging with 
> the
> syslog/journal more appealing than logging to a file. But they both have their
> advantages and disadvantages.
> 
> Luis, I'd suggest that you should file a ceph tracker issue [1] if having
> cephadm log this way is a use case you would be interested in. We could also
> discuss the topic further in a ceph orchestration weekly meeting.
> 
> 
> [1]: https://tracker.ceph.com/projects/orchestrator/issues/new
> 
> > On Fri, Jul 28, 2023 at 9:43 AM Luis Domingues luis.doming...@proton.ch
> > wrote:
> > 
> > > Hi,
> > > 
> > > Quick question about cephadm and its logs. On my cluster I have every
> > > logs
> > > that goes to journald. But on each machine, I still have
> > > /var/log/ceph/cephadm.log that is alive.
> > > 
> > > Is there a way to make cephadm log to journald instead of a file? If yes
> > > did I miss it on the documentation? Of if not is there any reason to log
> > > into a file while everything else logs to journald?
> > > 
> > > Thanks
> > > 
> > > Luis Domingues
> > > Proton AG
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

2023-07-30 Thread Sultan Sm

Hello! Thanks for reply.
We restarted the services in order mon -> mgr -> osd
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

2023-07-30 Thread Sultan Sm

Hello! Thanks for reply! Sure, here is the output
# ceph versions
{
"mon": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 3
},
"mgr": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 1
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) 
octopus (stable)": 13
},
"mds": {},
"rgw": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 1
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) 
octopus (stable)": 13,
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 5
}
}

Most likely the problem is in " require-osd-release = nautilus ". We will try 
to rebuild mon service
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

2023-07-30 Thread Sultan Sm

Hello! Thanks for reply =)
We will try to build mon service
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck in rejoin

2023-07-30 Thread Xiubo Li


Hi Frank,

On 7/30/23 16:52, Frank Schilder wrote:

Hi Xiubo,

it happened again. This time, we might be able to pull logs from the client 
node. Please take a look at my intermediate action below - thanks!

I am in a bit of a calamity, I'm on holidays with terrible network connection 
and can't do much. My first priority is securing the cluster to avoid damage 
caused by this issue. I did an MDS evict by client ID on the MDS reporting the 
warning with the client ID reported in the warning. For some reason the client 
got blocked on 2 MDSes after this command, one of these is an ordinary stand-by 
daemon. Not sure if this is expected.

Main question: is this sufficient to prevent any damaging IO on the cluster? 
I'm thinking here about the MDS eating through all its RAM until it crashes 
hard in an irrecoverable state (that was described as a consequence in an old 
post about this warning). If this is a safe state, I can keep it in this state 
until I return from holidays.


Yeah, I think so.

BTW, are u using the kclients or user space clients ? I checked both 
kclient and libcephfs, it seems buggy in libcephfs, which could cause 
this issue. But for kclient it's okay till now.


Thanks

- Xiubo



Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Friday, July 28, 2023 11:37 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: MDS stuck in rejoin


On 7/26/23 22:13, Frank Schilder wrote:

Hi Xiubo.


... I am more interested in the kclient side logs. Just want to
know why that oldest request got stuck so long.

I'm afraid I'm a bad admin in this case. I don't have logs from the host any 
more, I would have needed the output of dmesg and this is gone. In case it 
happens again I will try to pull the info out.

The tracker https://tracker.ceph.com/issues/22885 sounds a lot more violent 
than our situation. We had no problems with the MDSes, the cache didn't grow 
and the relevant one was also not put into read-only mode. It was just this 
warning showing all the time, health was OK otherwise. I think the warning was 
there for at least 16h before I failed the MDS.

The MDS log contains nothing, this is the only line mentioning this client:

2023-07-20T00:22:05.518+0200 7fe13df59700  0 log_channel(cluster) log [WRN] : 
client.145678382 does not advance its oldest_client_tid (16121616), 10 
completed requests recorded in session

Okay, if so it's hard to say and dig out what has happened in client why
it didn't advance the tid.

Thanks

- Xiubo



Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 1 Large omap object found

2023-07-30 Thread Mark Johnson

I've been going round and round in circles trying to work this one out but I'm 
getting nowhere.  We're running a 4 node quincy cluster (17.2.6) which recently 
reported the following:

ceph.log-20230729.gz:2023-07-28T08:31:42.390003+ osd.26 (osd.26) 13834 : 
cluster [WRN] Large omap object found. Object: 
5:6c65dd84:users.uid::callrecordings$callrecordings_rw.buckets:head PG: 
5.21bba636 (5.16) Key count: 378454 Size (bytes): 75565579

This happened a week or so ago (only the key count was only just over the 
20 threshold on that occasion) and after much searching around, I found an 
article that suggested a deep scrub on the pg would likely resolve the issue, 
so I forced a deep scrub and shortly after, the warning cleared.  Came into the 
office today to discover the above.  It's on the same PG as before which is in 
the default.rgw.meta pool.  This time, after forcing a deep-scrub on that PG, 
nothing changed.  I did it a second time just to be sure but got the same 
result.

I keep finding a suse article that simply suggests increasing the threshold to 
the previous default of 2,000,000, but other articles I read say it was lowered 
for a reason and that by the time it hits that figure, it's too late so I don't 
want to just mask it.  Problem is that I don't really understand it.   I found 
a thread here from a bit over two years ago but their issue was in the 
default.rgw.buckets.index pool.  A step in the solution was to list out the 
problematic object id and check the objects per shard however, if I issue the 
command "rados -p default.rgw.meta ls" it returns nothing.  I get a big list 
from "rados -p default.rgw.buckets.index ls" just nothing from the first pool.  
I think it may be because the meta pool isn't indexed based on something I 
read, but I really don't know what I'm talking about tbh.

I don't know if this is helpful, but if I list out all the PGs for that pool, 
there are 32 PGs and 5.16 shows 80186950 bytes and 401505 keys.  PG 5.c has 
75298 and 384 keys.  The remaining 30 PGs show zero bytes and zero keys.  I'm 
really not sure how to troubleshoot and resolve from here.  For the record, 
dynamic resharding is enabled in that no options have been set in the config 
and that is the default setting.

Based on the suse article I mentioned which also references the 
default.rgw.meta pool, I'm gathering our issue is because we have so many 
buckets that are all owned by the one user and the solution is either:

* delete unused buckets
* create multiple users and spread buckets evenly across all users (not 
something we can do)
* increase the threshold to stop the warning

Problem is that I'm having trouble verifying this is the issue.  I've tried 
dumping out bucket stats to a file (radosgw-admin bucket stats > 
bucket_stats.txt) but after three hours this is still running with no output.

Thanks for your time,
Mark
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Luminous Bluestore issues and RGW Multi-site Recovery

2023-07-30 Thread Gregory O'Neill

Hello,


I have two main questions here.


1. What can I do when `ceph-bluestore-tool` outputs a stack trace for
`fsck`?

2. How does one recover from lost PGs / data corruption in an RGW
Multi-site setup?


---


I have a Luminous 12.2.12 cluster built on
ceph/daemon:v3.2.10-stable-3.2-luminous-centos-7-x86_64 for all daemons, no
ceph packages are installed on the systems. The OSD nodes have 128GB RAM, 6
SATA SSDs (Micron 5200, 2TB) and 1 NVMe SSD split into 4 OSDs.
osd_memory_target is set to 10GB and the OSD nodes have 128GB of RAM. That
should put me at 100/128GB used.


There are 3 PGs down, 3 of the OSDs that had those PGs won't stay online,
and they crash fairly quickly after starting. These are running on SATA
SSDs which are being replaced with NVMe SSDs. Crush re-weighting the SATA
drives down causes some SATA OSDs to crash and some NVMe drives have slow
or blocked ops (related to the down PGs).


I installed the ceph-osd package on one OSD host. When I ran
`ceph-bluestore-tool`, I got a bunch of tcmalloc and unexpected aio errors.
Exact output below. I also tried `ceph-objectstore-tool` but received
similar results. I cloned the other OSD that has the affected PGs to have a
copy I can work on, but I got the exact same results as before.


---


From what I can see, this is likely due to bad drives and automation trying
to restart down OSDs several times. With 3 down PGs, I am assuming my next
step would be to mark those PGs lost. From there, I am unsure what the
recovery procedure is to sync "clean" data from other zones into the
cluster that was impacted. Is RGW able to handle this? Do I need to use
`rclone`?



---



$ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-11 fsck



tcmalloc: large alloc 1283989504 bytes == 0x557fdbe46000 @  0x7fc87e4126d0
0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524
0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a
0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil)

tcmalloc: large alloc 2567970816 bytes == 0x5580286c8000 @  0x7fc87e4126d0
0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524
0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a
0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil)

tcmalloc: large alloc 5135933440 bytes == 0x5580c17ca000 @  0x7fc87e4126d0
0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524
0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a
0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil)

tcmalloc: large alloc 3025510400 bytes == 0x557f8f6e6000 @  0x7fc87e4126d0
0x7fc873354ae9 0x7fc87335582b 0x557f89d75d19 0x557f89d2edda 0x557f89d30524
0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a
0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil)

tcmalloc: large alloc 2269913088 bytes == 0x55832469e000 @  0x7fc87e3f2e50
0x7fc87e4121b9 0x7fc8756ca4f7 0x7fc8756cd304 0x557f89cc4661 0x557f89ad0858
0x557f89ad2224 0x557f89cb7b1d 0x557f89de584c 0x557f89de6a7e 0x557f89e05e7b
0x557f89d2cf48 0x557f89d2efd2 0x557f89d30524 0x557f89d318ef 0x557f89d33147
0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1
0x557f89ab03fa (nil)

2023-07-30 08:27:27.531919 7fc86f689700 -1 bdev(0x557f8add4240
/var/lib/ceph/osd/ceph-11/block) aio to 929504952320~2269908992 but
returned: 2147479552/build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc:
In function 'void KernelDevice::_aio_thread()' thread 7fc86f689700 time
2023-07-30 08:27:27.532004

/build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0
== "unexpected aio error")



ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous
(stable)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fc8757242c2]

2: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7]

3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d]

4: (()+0x74a4) [0x7fc8740104a4]

5: (clone()+0x3f) [0x7fc872a65d0f]

NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

2023-07-30 08:27:27.544215 7fc86f689700 -1
/build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: In function 'void
KernelDevice::_aio_thread()' thread 7fc86f689700 time 2023-07-30
08:27:27.532004

/build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0
== "unexpected aio error")



ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous
(stable)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fc8757242c2]

2: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7]

3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d]

4: (()+0x74a4) [0x7fc8740104a4]

5: (clone()+0x3f) [0x7fc872a65d0f]

NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.



-1> 2023-07-30 08:27:27.531919 7fc86f689700 -1 bdev(0x557f8add4240
/var/lib/ceph/osd/ceph-11/block) aio to 929504952320~2269908992 but

[ceph-users] ref v18.2.0 QE Validation status

2023-07-30 Thread Yuri Weinstein

Details of this release are summarized here:

https://tracker.ceph.com/issues/62231#note-1

Seeking approvals/reviews for:

smoke - Laura, Radek
rados - Neha, Radek, Travis, Ernesto, Adam King
rgw - Casey
fs - Venky
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade-clients:client-upgrade* - in progress
powercycle - Brad

Please reply to this email with approval and/or trackers of known
issues/PRs to address them.

bookworm distro support is an outstanding issue.

TIA
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck in rejoin

2023-07-30 Thread Frank Schilder

Hi Xiubo,

it happened again. This time, we might be able to pull logs from the client 
node. Please take a look at my intermediate action below - thanks!

I am in a bit of a calamity, I'm on holidays with terrible network connection 
and can't do much. My first priority is securing the cluster to avoid damage 
caused by this issue. I did an MDS evict by client ID on the MDS reporting the 
warning with the client ID reported in the warning. For some reason the client 
got blocked on 2 MDSes after this command, one of these is an ordinary stand-by 
daemon. Not sure if this is expected.

Main question: is this sufficient to prevent any damaging IO on the cluster? 
I'm thinking here about the MDS eating through all its RAM until it crashes 
hard in an irrecoverable state (that was described as a consequence in an old 
post about this warning). If this is a safe state, I can keep it in this state 
until I return from holidays.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Xiubo Li 
Sent: Friday, July 28, 2023 11:37 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: MDS stuck in rejoin

On 7/26/23 22:13, Frank Schilder wrote:
> Hi Xiubo.
>
>> ... I am more interested in the kclient side logs. Just want to
>> know why that oldest request got stuck so long.
> I'm afraid I'm a bad admin in this case. I don't have logs from the host any 
> more, I would have needed the output of dmesg and this is gone. In case it 
> happens again I will try to pull the info out.
>
> The tracker https://tracker.ceph.com/issues/22885 sounds a lot more violent 
> than our situation. We had no problems with the MDSes, the cache didn't grow 
> and the relevant one was also not put into read-only mode. It was just this 
> warning showing all the time, health was OK otherwise. I think the warning 
> was there for at least 16h before I failed the MDS.
>
> The MDS log contains nothing, this is the only line mentioning this client:
>
> 2023-07-20T00:22:05.518+0200 7fe13df59700  0 log_channel(cluster) log [WRN] : 
> client.145678382 does not advance its oldest_client_tid (16121616), 10 
> completed requests recorded in session

Okay, if so it's hard to say and dig out what has happened in client why
it didn't advance the tid.

Thanks

- Xiubo

> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: configure rgw

2023-07-30 Thread Tony Liu

When deploy rgw container, cephadm adds rgw_frontends into config db on daemon 
level. I was adding settings on node level. That's why I didn't see my setting 
take effect.
I need to put rgw_frontends on daemon level after deployment.

Thanks!
Tony

From: Tony Liu 
Sent: July 29, 2023 11:44 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] Re: configure rgw

A few updates.
1. "radosgw-admin --show-config -n client.rgw.qa.ceph-1.hzfrwq" doesn't show 
actual running config.

2. "ceph --admin-daemon 
/var/run/ceph/69f94d08-2811-11ee-9ab1-089204adfafa/ceph-client.rgw.qa.ceph-1.hzfrwq.7.94254493095368.asok
 config show" shows the actual running config.

3. All settings in client.rgw are applied to rgw running config, except for 
rgw_frontends.
```
# ceph config get client.rgw rgw_frontends
beast port=8086
# ceph --admin-daemon 
/var/run/ceph/69f94d08-2811-11ee-9ab1-089204adfafa/ceph-client.rgw.qa.ceph-1.hzfrwq.7.94254493095368.asok
 config get rgw_frontends
{
"rgw_frontends": "beast endpoint=10.250.80.100:80"
}
```
The only place I see "10.250.80.100" and "80" is unit.meta. How is that applied?

Found a workaround, remove rgw_frontends from config, restart rgw, rgw_frontends
goes back to default "port=7480". Add it back to config, restart rgw. Now 
rgw_frontends
is what I expect. The logic doesn't make much sense to me. I'd assume that 
unit.meta has
something to do with this, hopefully someone could shed light here.


Thanks!
Tony


From: Tony Liu 
Sent: July 29, 2023 10:40 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] configure rgw

Hi,

I'm using Pacific v16.2.10 container image, deployed by cephadm.
I used to manually build config file for rgw, deploy rgw, put config file in 
place
and restart rgw. It works fine.

Now, I'd like to put rgw config into config db. I tried with client.rgw, but 
the config
is not taken by rgw. Also "config show" doesn't work. It always says "no config 
state".

```
# ceph orch ps | grep rgw
rgw.qa.ceph-1.hzfrwq  ceph-1  10.250.80.100:80 running (10m)10m ago 
 53m51.4M-  16.2.10  32214388de9d  13169a213bc5
# ceph config get client.rgw | grep frontends
client.rgwbasic rgw_frontendsbeast port=8086

  *
# ceph config show rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon rgw.qa.ceph-1.hzfrwq
# ceph config show client.rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon client.rgw.qa.ceph-1.hzfrwq
# radosgw-admin --show-config -n client.rgw.qa.ceph-1.hzfrwq | grep frontends
rgw_frontends = beast port=7480
```

Any clues what I am missing here?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: configure rgw

2023-07-30 Thread Tony Liu

A few updates.
1. "radosgw-admin --show-config -n client.rgw.qa.ceph-1.hzfrwq" doesn't show 
actual running config.

2. "ceph --admin-daemon 
/var/run/ceph/69f94d08-2811-11ee-9ab1-089204adfafa/ceph-client.rgw.qa.ceph-1.hzfrwq.7.94254493095368.asok
 config show" shows the actual running config.

3. All settings in client.rgw are applied to rgw running config, except for 
rgw_frontends.
```
# ceph config get client.rgw rgw_frontends
beast port=8086
# ceph --admin-daemon 
/var/run/ceph/69f94d08-2811-11ee-9ab1-089204adfafa/ceph-client.rgw.qa.ceph-1.hzfrwq.7.94254493095368.asok
 config get rgw_frontends 
{
"rgw_frontends": "beast endpoint=10.250.80.100:80"
}
```
The only place I see "10.250.80.100" and "80" is unit.meta. How is that applied?

Found a workaround, remove rgw_frontends from config, restart rgw, rgw_frontends
goes back to default "port=7480". Add it back to config, restart rgw. Now 
rgw_frontends
is what I expect. The logic doesn't make much sense to me. I'd assume that 
unit.meta has
something to do with this, hopefully someone could shed light here.


Thanks!
Tony


From: Tony Liu 
Sent: July 29, 2023 10:40 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] configure rgw

Hi,

I'm using Pacific v16.2.10 container image, deployed by cephadm.
I used to manually build config file for rgw, deploy rgw, put config file in 
place
and restart rgw. It works fine.

Now, I'd like to put rgw config into config db. I tried with client.rgw, but 
the config
is not taken by rgw. Also "config show" doesn't work. It always says "no config 
state".

```
# ceph orch ps | grep rgw
rgw.qa.ceph-1.hzfrwq  ceph-1  10.250.80.100:80 running (10m)10m ago 
 53m51.4M-  16.2.10  32214388de9d  13169a213bc5
# ceph config get client.rgw | grep frontends
client.rgwbasic rgw_frontendsbeast port=8086

  *
# ceph config show rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon rgw.qa.ceph-1.hzfrwq
# ceph config show client.rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon client.rgw.qa.ceph-1.hzfrwq
# radosgw-admin --show-config -n client.rgw.qa.ceph-1.hzfrwq | grep frontends
rgw_frontends = beast port=7480
```

Any clues what I am missing here?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm logs

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

[ceph-users] Re: MDS stuck in rejoin

[ceph-users] 1 Large omap object found

[ceph-users] Luminous Bluestore issues and RGW Multi-site Recovery

[ceph-users] ref v18.2.0 QE Validation status

[ceph-users] Re: MDS stuck in rejoin

[ceph-users] Re: configure rgw

[ceph-users] Re: configure rgw

11 matches

Site Navigation

Mail list logo

Footer information