[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Nizamudeen A
dashboard approved!

On Tue, Oct 17, 2023 at 12:22 AM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Venky Shankar
On Tue, Oct 17, 2023 at 12:23 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky

fs approved.

> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-17 Thread Zakhar Kirpichenko
Many thanks for this, Eugen! I very much appreciate yours and Mykola's
efforts and insight!

Another thing I noticed was a reduction of RocksDB store after the
reduction of the total PG number by 30%, from 590-600 MB:

65M 3675511.sst
65M 3675512.sst
65M 3675513.sst
65M 3675514.sst
65M 3675515.sst
65M 3675516.sst
65M 3675517.sst
65M 3675518.sst
62M 3675519.sst

to about half of the original size:

-rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
-rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
-rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
-rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst

Then when I restarted the monitors one by one before adding compression,
RocksDB store reduced even further. I am not sure why and what exactly got
automatically removed from the store:

-rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
-rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
-rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst

Then I have enabled LZ4 and LZ4HC compression in our small production
cluster (6 nodes, 96 OSDs) on 3 out of 5
monitors: compression=kLZ4Compression,bottommost_compression=kLZ4HCCompression.
I specifically went for LZ4 and LZ4HC because of the balance between
compression/decompression speed and impact on CPU usage. The compression
doesn't seem to affect the cluster in any negative way, the 3 monitors with
compression are operating normally. The effect of the compression on
RocksDB store size and disk writes is quite noticeable:

Compression disabled, 155 MB store.db, ~125 MB RocksDB sst, and ~530 MB
writes over 5 minutes:

-rw-r--r-- 1 167 167  4227337 Oct 18 03:58 3080868.log
-rw-r--r-- 1 167 167 67253592 Oct 18 03:57 3080870.sst
-rw-r--r-- 1 167 167 57783180 Oct 18 03:57 3080871.sst

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/;
iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
155M
 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
2471602 be/4 167   6.05 M473.24 M  0.00 %  0.16 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
2471633 be/4 167 188.00 K 40.91 M  0.00 %  0.02 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
2471603 be/4 167  16.00 K 24.16 M  0.00 %  0.01 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]

Compression enabled, 60 MB store.db, ~23 MB RocksDB sst, and ~130 MB of
writes over 5 minutes:

-rw-r--r-- 1 167 167  5766659 Oct 18 03:56 3723355.log
-rw-r--r-- 1 167 167 22240390 Oct 18 03:56 3723357.sst

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/;
iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
60M
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph03/store.db/
2052031 be/4 1671040.00 K 83.48 M  0.00 %  0.01 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
2052062 be/4 167   0.00 B 40.79 M  0.00 %  0.01 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
2052032 be/4 167  16.00 K  4.68 M  0.00 %  0.00 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]
2052052 be/4 167  44.00 K  0.00 B  0.00 %  0.00 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [msgr-worker-0]

I haven't noticed a major CPU impact. Unfortunately I didn't specifically
measure CPU time for monitors and , but overall the CPU impact of monitor
store compression on our systems isn't noticeable. This may be different
for larger clusters with larger RocksDB datasets, then perhaps
compression=kLZ4Compression can be enabled by defualt and
bottommost_compression=kLZ4HCCompression can be optional, in theory this
should result 

[ceph-users] Re: Nautilus - Octopus upgrade - more questions

2023-10-17 Thread Tyler Stachecki
On Tue, Oct 17, 2023, 8:19 PM Dave Hall  wrote:

> Hello,
>
> I have a Nautilus cluster built using Ceph packages from Debian 10
> Backports, deployed with Ceph-Ansible.
>
> I see that Debian does not offer Ceph 15/Octopus packages.  However,
> download.ceph.com does offer such packages.
>
> Question:  Is it a safe upgrade to install the download.ceph.com packages
> over top of the buster-backports packages?


It "should" be: Debian packages are upstream builds with minimal changes
(e.g. security patches and various changes to conform with Debian
standards). The latter are straight upstream builds.

Nothing in life is certain, though!

If so, the next question is how to deploy this?  Should I pull down an
> appropriate version of Ceph-Ansible and use the rolling-upgrade playbook?
> Or just apg-get -f dist-upgrade the new Ceph packages into place?
>

Someone with ceph-ansible experience can probably provide a more resounding
"yes" on the former approach - but I imagine so. Maybe you need to apt-get
update first... not sure?

There are issues with running straight up apt-get dist-upgrade in certain
configurations. You are unlikely to lose data, but Ceph upgrades are more
nuisanced that just apt-get dist-upgrading everything (e.g. IIRC Debian and
maybe upstream downloads.ceph.com builds have postinst hooks that reboots
Ceph daemons immediately -- you'd want to be careful about this and keep it
in mind). You also probably want to minimally set noout during upgrades to
prevent RADOS from trying to recover unnecessarily if something goes awry.

The docs on the latter approach are about as good as it gets:
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus


> BTW, in the long run I'll probably want to get to container-based Reef, but
> I need to keep a stable cluster throughout.
>
> Any advice or reassurance much appreciated.
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2023-10-17 Thread Chris Dunlop

Hi Igor,

Thanks for the suggestions. You may have already seen my followup message 
where the solution was to use "ceph-bluestore-tool bluefs-bdev-migrate" to 
get the lingering 128KiB of data moved from the slow to the fast device. I 
wonder if your suggested "ceph-volume lvm migrate" would do the same.


Notably, DB compaction didn't help in my case.

Cheers,

Chris

On Mon, Oct 16, 2023 at 03:46:17PM +0300, Igor Fedotov wrote:

Hi Chris,

for the first question (osd.76) you might want to try ceph-volume's 
"lvm migrate --from data --target " command. Looks like some 
persistent DB remnants are still kept at main device causing the 
alert.


W.r.t osd.86's question - the line "SLOW    0 B 3.0 
GiB 59 GiB" means that RocksDB higher levels  data (usually L3+) 
are spread over DB and main (aka slow) devices as 3 GB and 59 GB 
respectively.


In other words SLOW row refers to DB data which is originally supposed 
to be at SLOW device (due to RocksDB data mapping mechanics). But 
improved bluefs logic (introduced by 
https://github.com/ceph/ceph/pull/29687) permitted extra DB disk usage 
for a part of this data.


Resizing DB volume and following DB compaction should do the trick and 
move all the data to DB device. Alternatively ceph-volume's lvm 
migrate command should do the same but the result will be rather 
temporary without DB volume resizing.


Hope this helps.


Thanks,

Igor

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus - Octopus upgrade - more questions

2023-10-17 Thread Dave Hall
Hello,

I have a Nautilus cluster built using Ceph packages from Debian 10
Backports, deployed with Ceph-Ansible.

I see that Debian does not offer Ceph 15/Octopus packages.  However,
download.ceph.com does offer such packages.

Question:  Is it a safe upgrade to install the download.ceph.com packages
over top of the buster-backports packages?

If so, the next question is how to deploy this?  Should I pull down an
appropriate version of Ceph-Ansible and use the rolling-upgrade playbook?
Or just apg-get -f dist-upgrade the new Ceph packages into place?

BTW, in the long run I'll probably want to get to container-based Reef, but
I need to keep a stable cluster throughout.

Any advice or reassurance much appreciated.

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to trigger scrubbing in Ceph on-demand ?

2023-10-17 Thread Jayjeet Chakraborty
Hi all,

I am trying to trigger deep scrubbing in Ceph reef (18.2.0) on demand on a
set of files that I randomly write to CephFS. I have tried both invoking
deep-scrub on CephFS using ceph tell and just deep scrubbing a
particular PG. Unfortunately, none of that seems to be working for me. I am
monitoring the ceph status output, it never shows any scrubbing
information. Can anyone please help me out on this ? In a nutshell, I need
Ceph to scrub for me anytime I want. I am using Ceph with default configs
for scrubbing. Thanks all.

Best Regards,
*Jayjeet Chakraborty*
Ph.D. Student
Department of Computer Science and Engineering
University of California, Santa Cruz
*Email: jayje...@ucsc.edu *
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] NFS - HA and Ingress completion note?

2023-10-17 Thread andreas
NFS - HA and Ingress:  [ https://docs.ceph.com/en/latest/mgr/nfs/#ingress ] 

Referring to Note#2, is NFS high-availability functionality considered complete 
(and stable)?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Radoslaw Zarzynski
+1.

On Tue, Oct 17, 2023 at 1:18 AM Laura Flores  wrote:

> On behalf of @Radoslaw Zarzynski , rados approved.
>
> Summary of known failures here:
> https://tracker.ceph.com/projects/rados/wiki/QUINCY#Quincy-v1727-validation
>
> On Mon, Oct 16, 2023 at 3:17 PM Ilya Dryomov  wrote:
>
>> On Mon, Oct 16, 2023 at 8:52 PM Yuri Weinstein 
>> wrote:
>> >
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/63219#note-2
>> > Release Notes - TBD
>> >
>> > Issue https://tracker.ceph.com/issues/63192 appears to be failing
>> several runs.
>> > Should it be fixed for this release?
>> >
>> > Seeking approvals/reviews for:
>> >
>> > smoke - Laura
>> > rados - Laura, Radek, Travis, Ernesto, Adam King
>> >
>> > rgw - Casey
>> > fs - Venky
>> > orch - Adam King
>> >
>> > rbd - Ilya
>> > krbd - Ilya
>>
>> rbd and krbd approved.
>>
>> Thanks,
>>
>> Ilya
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-17 Thread Eugen Block

Hi Zakhar,

I took a closer look into what the MONs really do (again with Mykola's  
help) and why manual compaction is triggered so frequently. With  
debug_paxos=20 I noticed that paxosservice and paxos triggered manual  
compactions. So I played with these values:


paxos_service_trim_max = 1000 (default 500)
paxos_service_trim_min = 500 (default 250)
paxos_trim_max = 1000 (default 500)
paxos_trim_min = 500 (default 250)

This reduced the amount of writes by a factor of 3 or 4, the iotop  
values are fluctuating a bit, of course. As Mykola suggested I created  
a tracker issue [1] to increase the default values since they don't  
seem suitable for a production environment. Although I don't have  
tested that in production yet I'll ask one of our customers to do that  
in their secondary cluster (for rbd mirroring) where they also suffer  
from large mon stores and heavy writes to the mon store. Your findings  
with the compaction were quite helpful as well, we'll test that as well.
Igor mentioned that the default bluestore_rocksdb config for OSDs will  
enable compression because of positive test results. If we can confirm  
that compression works well for MONs too, compression could be enabled  
by default as well.


Regards,
Eugen

https://tracker.ceph.com/issues/63229

Zitat von Zakhar Kirpichenko :


With the help of community members, I managed to enable RocksDB compression
for a test monitor, and it seems to be working well.

Monitor w/o compression writes about 750 MB to disk in 5 minutes:

   4854 be/4 167   4.97 M755.02 M  0.00 %  0.24 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

Monitor with LZ4 compression writes about 1/4 of that over the same time
period:

2034728 be/4 167 172.00 K199.27 M  0.00 %  0.06 % ceph-mon -n
mon.ceph05 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

This is caused by the apparent difference in store.db sizes.

Mon store.db w/o compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db
total 257196
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Aug 31 05:22 ..
-rw-r--r-- 1 167 167  1517623 Oct 16 14:00 3073035.log
-rw-r--r-- 1 167 167 67285944 Oct 16 14:00 3073037.sst
-rw-r--r-- 1 167 167 67402325 Oct 16 14:00 3073038.sst
-rw-r--r-- 1 167 167 62364991 Oct 16 14:00 3073039.sst

Mon store.db with compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/store.db
total 91188
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Oct 16 13:35 ..
-rw-r--r-- 1 167 167  1760114 Oct 16 14:00 012693.log
-rw-r--r-- 1 167 167 52236087 Oct 16 14:00 012695.sst

There are no apparent downsides thus far. If everything works well, I will
try adding compression to other monitors.

/Z

On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko  wrote:


The issue persists, although to a lesser extent. Any comments from the
Ceph team please?

/Z

On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko  wrote:


> Some of it is transferable to RocksDB on mons nonetheless.

Please point me to relevant Ceph documentation, i.e. a description of how
various Ceph monitor and RocksDB tunables affect the operations of
monitors, I'll gladly look into it.

> Please point me to such recommendations, if they're on docs.ceph.com I'll
get them updated.

This are the recommendations we used when we built our Pacific cluster:
https://docs.ceph.com/en/pacific/start/hardware-recommendations/

Our drives are 4x times larger than recommended by this guide. The drives
are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
storage of rarely modified files. It is not documented or suggested
anywhere that monitor processes write several hundred gigabytes of data per
day, exceeding the amount of data written by OSDs. Which is why I am not
convinced that what we're observing is expected behavior, but it's not easy
to get a definitive answer from the Ceph community.

/Z

On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
wrote:


Some of it is transferable to RocksDB on mons nonetheless.

but their specs exceed Ceph hardware recommendations by a good margin


Please point me to such recommendations, if they're on docs.ceph.com I'll
get them updated.

On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:

Thank you, Anthony. As I explained to you earlier, the article you had
sent is about RocksDB tuning for Bluestore OSDs, while the issue  
at hand is

not with OSDs but rather monitors and their RocksDB store. Indeed, the
drives are not enterprise-grade, but their specs exceed Ceph hardware
recommendations by 

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Prashant Dhange
Hi Yuri,

> Issue https://tracker.ceph.com/issues/63192 appears to be failing several
> runs.
> Should it be fixed for this release?
These failures are related to Quincy PR#53042
. I am reviewing the logs now. The
smoke tests need fixing as we are yet to fix
https://tracker.ceph.com/issues/63192.

I am stuck in one of the escalations. Let me reach out to you if I cannot
fix smoke tests today.

Regards,
Prashant

On Tue, Oct 17, 2023 at 7:30 AM Adam King  wrote:

> orch approved
>
> On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein 
> wrote:
>
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/63219#note-2
> > Release Notes - TBD
> >
> > Issue https://tracker.ceph.com/issues/63192 appears to be failing
> several
> > runs.
> > Should it be fixed for this release?
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura
> > rados - Laura, Radek, Travis, Ernesto, Adam King
> >
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> >
> > rbd - Ilya
> > krbd - Ilya
> >
> > upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
> >
> > client-upgrade-quincy-reef - Laura
> >
> > powercycle - Brad pls confirm
> >
> > ceph-volume - Guillaume pls take a look
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> > release.
> >
> > Thx
> > YuriW
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
SOLVED!

OK, there was some last-minute flailing around so I can't quite report
a cookbook recipe, but it goes something like this:


1. ceph config set client.mousetech rgw_admin_entry admin

Note: the standard example is for client.rgw, but I named my RGW
"mousetech" to make it distinguishable from possible internal magic
names. Tried both "client.mousetech" and "client.rgw" just to be sure.

Peeve: the "ceph config" command is silent. You cannot tell if it
successfully updated (or added?) a value and when it simply threw away
a bad config request.

2. Re-issued "ceph dashboard set-rgw-credentials" just to jam in
anything that might not have been set right previously. 

3. Restarted the RGW container on the RGW host.

I got to the point where I got the "503" error, but still no dashboard
even after logging in and out. Noted that the RGW logs were reporting
my manual requests, but the dashboard requests weren't showing up. Huh?

Head scratching. Realized that I can't get a "404" error unless there's
a host on the other end, so SOMETHING was talking and it didn't make
any sense that the RGW only logged some requests and not others.

Finally got a suspicion and did a "ceph orch ps". Yup, I have TWO RGW
servers running now (it was intentional, partly to flush out some of
the older failures). So I dialed into the alternate RGW, restarted it
and Dashboard Joy!

So basically, everything I wanted Ceph to do is working and working
clean, from the Ceph filesystem to NFS to VM backing stores to RGW and
I'm delirious with joy.

  Thanks, guys!

On Tue, 2023-10-17 at 10:23 -0400, Casey Bodley wrote:
> you're right that many docs still mention ceph.conf, after the mimic
> release added a centralized config database to ceph-mon. you can read
> about the mon-based 'ceph config' commands in
> https://docs.ceph.com/en/reef/rados/configuration/ceph-conf/#commands
> 
> to modify rgw_admin_entry for all radosgw instances, you'd use a
> command like:
> 
> $ ceph config set client.rgw rgw_admin_entry admin
> 
> then restart radosgws because they only read that value on startup
> 
> On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway 
> wrote:
> > 
> > Thanks, Casey!
> > 
> > I'm not really certain where to set this option. While Ceph is very
> > well-behaved once you know what to do, the nature of Internet-based
> > documentation (and occasionally incompletely-updated manuals) is
> > that
> > stale information is often given equal weight to the obsolete
> > information. It's a problem I had as support for JavaServer Faces,
> > in
> > fact. I spent literally years correcting people who'd got their
> > examples from obsoleted sources.
> > 
> > If I was to concoct a "Really, Really Newbies Intro to Ceph" I
> > think
> > that the two most fundamental items explained would be "Ceph as
> > traditional services" versus "Ceph as Containerized services" (As
> > far
> > as I can tell, both are still viable but containerization - at
> > least
> > for me - is a preferable approach). And the ceph.conf file versus
> > storing operational parameters within Ceph entities (e.g. buckets
> > or
> > pseudo-buckets like RGW is doing). While lots of stuff still
> > reference
> > ceph.conf for configuration, I'm feeling like it's actually no
> > longer
> > authoritative for some options, may be an alternative source for
> > others
> > (with which source has priority being unclear) and stuff that Ceph
> > no
> > longer even looks at because it has moved on.
> > 
> > Such is my plight.
> > 
> > I have no problem with making the administrative interface look
> > "bucket-like". Or for that matter, having the RGW report it as a
> > (missing) bucket if it isn't configured. But knowing where to
> > inject
> > the magic that activates that interface eludes me and whether to do
> > it
> > directly on the RGW container hos (and how) or on my master host is
> > totally unclear to me. It doesn't help that this is an item that
> > has
> > multiple values, not just on/off or that by default the docs seem
> > to
> > imply it should be already preset to standard values out of the
> > box.
> > 
> >    Thanks,
> >   Tim
> > 
> > On Tue, 2023-10-17 at 09:11 -0400, Casey Bodley wrote:
> > > hey Tim,
> > > 
> > > your changes to rgw_admin_entry probably aren't taking effect on
> > > the
> > > running radosgws. you'd need to restart them in order to set up
> > > the
> > > new route
> > > 
> > > there also seems to be some confusion about the need for a bucket
> > > named 'default'. radosgw just routes requests with paths starting
> > > with
> > > '/{rgw_admin_entry}' to a separate set of admin-related rest
> > > apis.
> > > otherwise they fall back to the s3 api, which treats '/foo' as a
> > > request for bucket foo - that's why you see NoSuchBucket errors
> > > when
> > > it's misconfigured
> > > 
> > > also note that, because of how these apis are nested,
> > > rgw_admin_entry='default' would prevent users from creating and
> > > operating on a bucket named 'default'
> > 

[ceph-users] Re: How do you handle large Ceph object storage cluster?

2023-10-17 Thread Wesley Dillingham
Well you are probably in the top 1% of cluster size. I would guess that
trying to cut your existing cluster in half while not encountering any
downtime as you shuffle existing buckets between old cluster and new
cluster would be harder than redirecting all new buckets (or users) to a
second cluster. Obviously you will need to account for each cluster having
a single bucket namespace when attempting to redirect requests to a cluster
of clusters. Lots of ways to skin this cat and it would be a large and
complicated architectural undertaking.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Oct 16, 2023 at 10:53 AM  wrote:

> Hi Everyone,
>
> My company is dealing with quite large Ceph cluster (>10k OSDs, >60 PB of
> data). It is entirely dedicated to object storage with S3 interface.
> Maintenance and its extension are getting more and more problematic and
> time consuming. We consider to split it to two or more completely separate
> clusters (without replication of data among them) and create S3 layer of
> abstraction with some additional metadata that will allow us to use these
> 2+ physically independent instances as a one logical cluster. Additionally,
> newest data is the most demanded data, so we have to spread it equally
> among clusters to avoid skews in cluster load.
>
> Do you have any similar experience? How did you handle it? Maybe you have
> some advice? I'm not a Ceph expert. I'm just a Ceph's user and software
> developer who does not like to duplicate someone's job.
>
> Best,
> Paweł
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Adam King
orch approved

On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef
> release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
you're right that many docs still mention ceph.conf, after the mimic
release added a centralized config database to ceph-mon. you can read
about the mon-based 'ceph config' commands in
https://docs.ceph.com/en/reef/rados/configuration/ceph-conf/#commands

to modify rgw_admin_entry for all radosgw instances, you'd use a command like:

$ ceph config set client.rgw rgw_admin_entry admin

then restart radosgws because they only read that value on startup

On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway  wrote:
>
> Thanks, Casey!
>
> I'm not really certain where to set this option. While Ceph is very
> well-behaved once you know what to do, the nature of Internet-based
> documentation (and occasionally incompletely-updated manuals) is that
> stale information is often given equal weight to the obsolete
> information. It's a problem I had as support for JavaServer Faces, in
> fact. I spent literally years correcting people who'd got their
> examples from obsoleted sources.
>
> If I was to concoct a "Really, Really Newbies Intro to Ceph" I think
> that the two most fundamental items explained would be "Ceph as
> traditional services" versus "Ceph as Containerized services" (As far
> as I can tell, both are still viable but containerization - at least
> for me - is a preferable approach). And the ceph.conf file versus
> storing operational parameters within Ceph entities (e.g. buckets or
> pseudo-buckets like RGW is doing). While lots of stuff still reference
> ceph.conf for configuration, I'm feeling like it's actually no longer
> authoritative for some options, may be an alternative source for others
> (with which source has priority being unclear) and stuff that Ceph no
> longer even looks at because it has moved on.
>
> Such is my plight.
>
> I have no problem with making the administrative interface look
> "bucket-like". Or for that matter, having the RGW report it as a
> (missing) bucket if it isn't configured. But knowing where to inject
> the magic that activates that interface eludes me and whether to do it
> directly on the RGW container hos (and how) or on my master host is
> totally unclear to me. It doesn't help that this is an item that has
> multiple values, not just on/off or that by default the docs seem to
> imply it should be already preset to standard values out of the box.
>
>Thanks,
>   Tim
>
> On Tue, 2023-10-17 at 09:11 -0400, Casey Bodley wrote:
> > hey Tim,
> >
> > your changes to rgw_admin_entry probably aren't taking effect on the
> > running radosgws. you'd need to restart them in order to set up the
> > new route
> >
> > there also seems to be some confusion about the need for a bucket
> > named 'default'. radosgw just routes requests with paths starting
> > with
> > '/{rgw_admin_entry}' to a separate set of admin-related rest apis.
> > otherwise they fall back to the s3 api, which treats '/foo' as a
> > request for bucket foo - that's why you see NoSuchBucket errors when
> > it's misconfigured
> >
> > also note that, because of how these apis are nested,
> > rgw_admin_entry='default' would prevent users from creating and
> > operating on a bucket named 'default'
> >
> > On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway 
> > wrote:
> > >
> > > Thank you, Ondřej!
> > >
> > > Yes, I set the admin entry set to "default". It's just the latest
> > > result of failed attempts ("admin" didn't work for me either). I
> > > did
> > > say there were some horrors in there!
> > >
> > > If I got your sample URL pattern right, the results of a GET on
> > > "http://x.y.z/default"; return 404, NoSuchBucket. If that means that
> > > I
> > > didn't properly set rgw_enable_apis, then I probably don't know how
> > > to
> > > set it right.
> > >
> > >Best Regards,
> > >   Tim
> > >
> > > On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > > > Hello Tim,
> > > >
> > > > I was also struggling with this when I was configuring the object
> > > > gateway for the first time.
> > > >
> > > > There is a few things that you should check to make sure the
> > > > dashboard would work.
> > > >
> > > > 1. You need to have the admin api enabled on all rgws with the
> > > > rgw_enable_apis option. (As far as I know you are not able to
> > > > force
> > > > the dashboard to use one rgw instance)
> > > > 2. It seems that you have the rgw_admin_entry set to a non
> > > > default
> > > > value - the default is admin but it seems that you have “default"
> > > > (by
> > > > the name of the bucket) make sure that you have this also set on
> > > > all
> > > > rgws.
> > > >
> > > > You can confirm that both of these settings are set properly by
> > > > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> > > > “default" in your case -> it should return 405 Method Not
> > > > Supported
> > > >
> > > > Btw there is actually no bucket that you would be able to see in
> > > > the
> > > > administration. It’s just abstraction on the rgw.
> > > >
> > > > Reagards,
> > > >
> > > > Ondrej
> > > >
> > > > > On 16. 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-17 Thread Johan

Which OS are your running?
What is the outcome of these two tests?
  cephadm --image quay.io/ceph/ceph:v16.2.10-20220920 ceph-volume inventory
  cephadm --image quay.io/ceph/ceph:v16.2.11-20230125 ceph-volume inventory

/Johan

Den 2023-10-16 kl. 08:25, skrev 544463...@qq.com:

I encountered a similar problem on ceph17.2.5, could you found which commit 
caused it?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
Thanks, Casey!

I'm not really certain where to set this option. While Ceph is very
well-behaved once you know what to do, the nature of Internet-based
documentation (and occasionally incompletely-updated manuals) is that
stale information is often given equal weight to the obsolete
information. It's a problem I had as support for JavaServer Faces, in
fact. I spent literally years correcting people who'd got their
examples from obsoleted sources.

If I was to concoct a "Really, Really Newbies Intro to Ceph" I think
that the two most fundamental items explained would be "Ceph as
traditional services" versus "Ceph as Containerized services" (As far
as I can tell, both are still viable but containerization - at least
for me - is a preferable approach). And the ceph.conf file versus
storing operational parameters within Ceph entities (e.g. buckets or
pseudo-buckets like RGW is doing). While lots of stuff still reference
ceph.conf for configuration, I'm feeling like it's actually no longer
authoritative for some options, may be an alternative source for others
(with which source has priority being unclear) and stuff that Ceph no
longer even looks at because it has moved on.

Such is my plight.

I have no problem with making the administrative interface look
"bucket-like". Or for that matter, having the RGW report it as a
(missing) bucket if it isn't configured. But knowing where to inject
the magic that activates that interface eludes me and whether to do it
directly on the RGW container hos (and how) or on my master host is
totally unclear to me. It doesn't help that this is an item that has
multiple values, not just on/off or that by default the docs seem to
imply it should be already preset to standard values out of the box.

   Thanks,
  Tim

On Tue, 2023-10-17 at 09:11 -0400, Casey Bodley wrote:
> hey Tim,
> 
> your changes to rgw_admin_entry probably aren't taking effect on the
> running radosgws. you'd need to restart them in order to set up the
> new route
> 
> there also seems to be some confusion about the need for a bucket
> named 'default'. radosgw just routes requests with paths starting
> with
> '/{rgw_admin_entry}' to a separate set of admin-related rest apis.
> otherwise they fall back to the s3 api, which treats '/foo' as a
> request for bucket foo - that's why you see NoSuchBucket errors when
> it's misconfigured
> 
> also note that, because of how these apis are nested,
> rgw_admin_entry='default' would prevent users from creating and
> operating on a bucket named 'default'
> 
> On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway 
> wrote:
> > 
> > Thank you, Ondřej!
> > 
> > Yes, I set the admin entry set to "default". It's just the latest
> > result of failed attempts ("admin" didn't work for me either). I
> > did
> > say there were some horrors in there!
> > 
> > If I got your sample URL pattern right, the results of a GET on
> > "http://x.y.z/default"; return 404, NoSuchBucket. If that means that
> > I
> > didn't properly set rgw_enable_apis, then I probably don't know how
> > to
> > set it right.
> > 
> >    Best Regards,
> >   Tim
> > 
> > On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > > Hello Tim,
> > > 
> > > I was also struggling with this when I was configuring the object
> > > gateway for the first time.
> > > 
> > > There is a few things that you should check to make sure the
> > > dashboard would work.
> > > 
> > > 1. You need to have the admin api enabled on all rgws with the
> > > rgw_enable_apis option. (As far as I know you are not able to
> > > force
> > > the dashboard to use one rgw instance)
> > > 2. It seems that you have the rgw_admin_entry set to a non
> > > default
> > > value - the default is admin but it seems that you have “default"
> > > (by
> > > the name of the bucket) make sure that you have this also set on
> > > all
> > > rgws.
> > > 
> > > You can confirm that both of these settings are set properly by
> > > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> > > “default" in your case -> it should return 405 Method Not
> > > Supported
> > > 
> > > Btw there is actually no bucket that you would be able to see in
> > > the
> > > administration. It’s just abstraction on the rgw.
> > > 
> > > Reagards,
> > > 
> > > Ondrej
> > > 
> > > > On 16. 10. 2023, at 22:00, Tim Holloway 
> > > > wrote:
> > > > 
> > > > First, an abject apology for the horrors I'm about to unveil. I
> > > > made a
> > > > cold migration from GlusterFS to Ceph a few months back, so it
> > > > was
> > > > a
> > > > learn-/screwup/-as-you-go affair.
> > > > 
> > > > For reasons of presumed compatibility with some of my older
> > > > servers, I
> > > > started with Ceph Octopus. Unfortunately, Octopus seems to have
> > > > been a
> > > > nexus of transitions from older Ceph organization and
> > > > management to
> > > > a
> > > > newer (cephadm) system combined with a relocation of many ceph
> > > > resources and compounded by stale bits of documentation
> > > > (notably

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey

rgw approved, thanks!

> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to delete rbd images

2023-10-17 Thread Eugen Block

Hi,

I would check the trash to see if the image has been moved there. If  
it is, try to restore it to check its watchers. If you're able to  
restore it, try blacklisting the specific client session, so something  
like this:


# check trash
rbd -p iscsi-images trash ls --all

# try restoring it
rbd -p iscsi-images trash ls --all
 tegile-500tb

rbd trash restore -p iscsi-images 

# check existing locks
rbd lock list iscsi-images/tegile-500tb

# could look like this:
tegile-500tb -p iscsi-images
There is 1 exclusive lock on this image.
Locker  IDAddress
client.1211875  auto 139643345791728  192.168.3.12:0/2259335316

# try blacklisting the client
ceph osd blacklist add client.1211875

# check the locks again

# try deleting the image again

If the image is not in the trash, does a lock list return anything? Do  
you know the block_name_prefix of that image to check if there are  
remainders of that imge in the pool? This would look like this:


rados -p iscsi-images ls | grep 

If there are rados objects belonging to the image but no header, you  
could try restoring the rbd_header object but let's not jump too far  
ahead. Check the other things first.


Regards,
Eugen

Zitat von Mohammad Alam :


Hello All,
Greetings. We've a Ceph Cluster with the version
*ceph version 14.2.16-402-g7d47dbaf4d
(7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable)
=

Issues: Can't able to delete rbd images

We have deleted target from the dashboard and now trying to delete  
rbd images from cli but not able to delete.


when we ran "rbd rm -f tegile-500tb -p iscsi-images" its returning
2023-10-16 15:22:16.719 7f90bb332700 -1  
librbd::image::PreRemoveRequest: 0x7f90a80041a0  
check_image_watchers: image has watchers - not removing

Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed.  
Try again after closing/unmapping it or waiting 30s for the crashed  
client to timeout.



It is also not being deleted from dashboard.


Even we tried to list the watcher but it is not returning anything  
like no such file or directory ,




"rbd info iscsi-images/tegile-500tb"
rbd: error opening image tegile-500tb: (2) No such file or directory



It is not showing on "rbd showmapped" output as well for that  
particular images, hence we can not unmap it.


We can not restart iscsi gateway because that is being running and  
we can not interrupt it.


===

Suggest how to fix this issue,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
hey Tim,

your changes to rgw_admin_entry probably aren't taking effect on the
running radosgws. you'd need to restart them in order to set up the
new route

there also seems to be some confusion about the need for a bucket
named 'default'. radosgw just routes requests with paths starting with
'/{rgw_admin_entry}' to a separate set of admin-related rest apis.
otherwise they fall back to the s3 api, which treats '/foo' as a
request for bucket foo - that's why you see NoSuchBucket errors when
it's misconfigured

also note that, because of how these apis are nested,
rgw_admin_entry='default' would prevent users from creating and
operating on a bucket named 'default'

On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway  wrote:
>
> Thank you, Ondřej!
>
> Yes, I set the admin entry set to "default". It's just the latest
> result of failed attempts ("admin" didn't work for me either). I did
> say there were some horrors in there!
>
> If I got your sample URL pattern right, the results of a GET on
> "http://x.y.z/default"; return 404, NoSuchBucket. If that means that I
> didn't properly set rgw_enable_apis, then I probably don't know how to
> set it right.
>
>Best Regards,
>   Tim
>
> On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > Hello Tim,
> >
> > I was also struggling with this when I was configuring the object
> > gateway for the first time.
> >
> > There is a few things that you should check to make sure the
> > dashboard would work.
> >
> > 1. You need to have the admin api enabled on all rgws with the
> > rgw_enable_apis option. (As far as I know you are not able to force
> > the dashboard to use one rgw instance)
> > 2. It seems that you have the rgw_admin_entry set to a non default
> > value - the default is admin but it seems that you have “default" (by
> > the name of the bucket) make sure that you have this also set on all
> > rgws.
> >
> > You can confirm that both of these settings are set properly by
> > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> > “default" in your case -> it should return 405 Method Not Supported
> >
> > Btw there is actually no bucket that you would be able to see in the
> > administration. It’s just abstraction on the rgw.
> >
> > Reagards,
> >
> > Ondrej
> >
> > > On 16. 10. 2023, at 22:00, Tim Holloway  wrote:
> > >
> > > First, an abject apology for the horrors I'm about to unveil. I
> > > made a
> > > cold migration from GlusterFS to Ceph a few months back, so it was
> > > a
> > > learn-/screwup/-as-you-go affair.
> > >
> > > For reasons of presumed compatibility with some of my older
> > > servers, I
> > > started with Ceph Octopus. Unfortunately, Octopus seems to have
> > > been a
> > > nexus of transitions from older Ceph organization and management to
> > > a
> > > newer (cephadm) system combined with a relocation of many ceph
> > > resources and compounded by stale bits of documentation (notably
> > > some
> > > references to SysV procedures and an obsolete installer that
> > > doesn't
> > > even come with Octopus).
> > >
> > > A far bigger problem was a known issue where actions would be
> > > scheduled
> > > but never executed if the system was even slightly dirty. And of
> > > course, since my system was hopelessly dirty, that was a major
> > > issue.
> > > Finally I took a risk and bumped up to Pacific, where that issue no
> > > longer exists. I won't say that I'm 100% clean even now, but at
> > > least
> > > the remaining crud is in areas where it cannot do any harm.
> > > Presumably.
> > >
> > > Given that, the only bar now remaining to total joy has been my
> > > inability to connect via the Ceph Dashboard to the Object Gateway.
> > >
> > > This seems to be an oft-reported problem, but generally referenced
> > > relative to higher-level administrative interfaces like Kubernetes
> > > and
> > > rook. I'm interfacing more directly, however. Regardless, the error
> > > reported is notably familiar:
> > >
> > > [quote]
> > > The Object Gateway Service is not configured
> > > Error connecting to Object Gateway: RGW REST API failed request
> > > with
> > > status code 404
> > > (b'{"Code":"NoSuchBucket","Message":"","BucketName":"default","Requ
> > > estI
> > > d":"tx00' b'000dd0c65b8bda685b4-00652d8e0f-5e3a9b-
> > > default","HostId":"5e3a9b-default-defa' b'ult"}')
> > > Please consult the documentation on how to configure and enable the
> > > Object Gateway management functionality.
> > > [/quote]
> > >
> > > In point of fact, what this REALLY means in my case is that the
> > > bucket
> > > that is supposed to contain the necessary information for the
> > > dashboard
> > > and rgw to communicate has not been created. Presumably that
> > > SHOULDhave
> > > been done by the "ceph dashboard set-rgw-credentials" command, but
> > > apparently isn't, because the default zone has no buckets at all,
> > > much
> > > less one named "default".
> > >
> > > By way of reference, the dashboard is definitely trying to interact
> > > with the rgw cont

[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-17 Thread Zakhar Kirpichenko
Thanks for this, Eugen. I think I'll stick to adding the option to the
config file, it seems like a safer way to do it.

/Z

On Tue, 17 Oct 2023, 15:21 Eugen Block,  wrote:

> Hi,
>
> I managed to get the compression setting into the MONs by using the
> extra-entrypoint-arguments [1]:
>
> ceph01:~ # cat mon-specs.yaml
> service_type: mon
> placement:
>hosts:
>- ceph01
>- ceph02
>- ceph03
> extra_entrypoint_args:
>-
>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>
> Just note that if you make a mistake and run 'ceph orch apply -i
> mon-specs.yaml' with incorrect options your MON containers will all
> fail. So test that in a non-critical environment first. In case the
> daemons fail to start you can remove those options from the unit.run
> file and get them up again.
> But for me that worked and the daemons have the compression setting
> enabled now. What remains unclear is which config options can be
> changed as usual with the config database and which require this
> extra-entrypoint-argument.
>
> Thanks again, Mykola!
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/cephadm/services/#extra-entrypoint-arguments
>
> Zitat von Zakhar Kirpichenko :
>
> > Thanks for the suggestion, Josh!
> >
> >  That part is relatively simple: the container gets ceph.conf from the
> > host's filesystem, for example:
> >
> > "HostConfig": {
> > "Binds": [
> > "/dev:/dev",
> > "/run/udev:/run/udev",
> >
> > "/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z",
> >
> > "/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05:/var/lib/ceph/mon/ceph-ceph05:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/config:/etc/ceph/ceph.conf:z"
> > ],
> >
> > When I stop the monitor, edit the file directly and restart the monitor,
> > mon_rocksdb_options seem to be applied correctly!
> >
> > Unfortunately, I specify global mon_rocksdb_options and redeploy the
> > monitor, the new ceph.conf doesn't have mon_rocksdb_options at all. I am
> > not sure that this is a reliable way to enable compression, but it works
> -
> > so it's better than other ways which don't work :-)
> >
> > /Z
> >
> > On Mon, 16 Oct 2023 at 16:16, Josh Baergen 
> > wrote:
> >
> >> > the resulting ceph.conf inside the monitor container doesn't have
> >> mon_rocksdb_options
> >>
> >> I don't know where this particular ceph.conf copy comes from, but I
> >> still suspect that this is where this particular option needs to be
> >> set. The reason I think this is that rocksdb mount options are needed
> >> _before_ the mon is able to access any of the centralized conf data,
> >> which I believe is itself stored in rocksdb.
> >>
> >> Josh
> >>
> >> On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko 
> >> wrote:
> >> >
> >> > Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf.
> >> This didn't work either: ceph.conf gets overridden at monitor start, the
> >> resulting ceph.conf inside the monitor container doesn't have
> >> mon_rocksdb_options, the monitor starts with no RocksDB compression.
> >> >
> >> > I would appreciate it if someone from the Ceph team could please chip
> in
> >> and suggest a working way to enable RocksDB compression in Ceph
> monitors.
> >> >
> >> > /Z
> >> >
> >> > On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko 
> >> wrote:
> >> >>
> >> >> Thanks for your response, Josh. Our ceph.conf doesn't have anything
> but
> >> the mon addresses, modern Ceph versions store their configuration in the
> >> monitor configuration database.
> >> >>
> >> >> This works rather well for various Ceph components, including the
> >> monitors. RocksDB options are also applied to monitors correctly, but
> for
> >> some reason are being ignored.
> >> >>
> >> >> /Z
> >> >>
> >> >> On Sat, 14 Oct 2023, 17:40 Josh Baergen, 
> >> wrote:
> >> >>>
> >> >>> Apologies if you tried this already and I missed it - have you tried
> >> >>> configuring that setting in /etc/ceph/ceph.conf (or wherever your
> conf
> >> >>> file is) instead of via 'ceph config'? I wonder if mon settings like
> >> >>> this one won't actually apply the way you want because they're
> needed
> >> >>> before the mon has the ability to obtain configuration from,
> >> >>> effectively, itself.
> >> >>>
> >> >>> Josh
> >> >>>
> >> >>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko <
> zak...@gmail.com>
> >> wrote:
> >> >>> >
> >> >>> > I also tried setting RocksDB compression options and deploying a
> new
> >> >>> > monitor. The monitor started with no RocksDB compression again.
> >> >>> >
> >> >>> > Ceph monitors seem to ignore mon_rocksdb

[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-17 Thread Eugen Block

Hi,

I managed to get the compression setting into the MONs by using the  
extra-entrypoint-arguments [1]:


ceph01:~ # cat mon-specs.yaml
service_type: mon
placement:
  hosts:
  - ceph01
  - ceph02
  - ceph03
extra_entrypoint_args:
  -  
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'


Just note that if you make a mistake and run 'ceph orch apply -i  
mon-specs.yaml' with incorrect options your MON containers will all  
fail. So test that in a non-critical environment first. In case the  
daemons fail to start you can remove those options from the unit.run  
file and get them up again.
But for me that worked and the daemons have the compression setting  
enabled now. What remains unclear is which config options can be  
changed as usual with the config database and which require this  
extra-entrypoint-argument.


Thanks again, Mykola!
Eugen

[1]  
https://docs.ceph.com/en/quincy/cephadm/services/#extra-entrypoint-arguments


Zitat von Zakhar Kirpichenko :


Thanks for the suggestion, Josh!

 That part is relatively simple: the container gets ceph.conf from the
host's filesystem, for example:

"HostConfig": {
"Binds": [
"/dev:/dev",
"/run/udev:/run/udev",

"/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z",

"/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05:/var/lib/ceph/mon/ceph-ceph05:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/config:/etc/ceph/ceph.conf:z"
],

When I stop the monitor, edit the file directly and restart the monitor,
mon_rocksdb_options seem to be applied correctly!

Unfortunately, I specify global mon_rocksdb_options and redeploy the
monitor, the new ceph.conf doesn't have mon_rocksdb_options at all. I am
not sure that this is a reliable way to enable compression, but it works -
so it's better than other ways which don't work :-)

/Z

On Mon, 16 Oct 2023 at 16:16, Josh Baergen 
wrote:


> the resulting ceph.conf inside the monitor container doesn't have
mon_rocksdb_options

I don't know where this particular ceph.conf copy comes from, but I
still suspect that this is where this particular option needs to be
set. The reason I think this is that rocksdb mount options are needed
_before_ the mon is able to access any of the centralized conf data,
which I believe is itself stored in rocksdb.

Josh

On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko 
wrote:
>
> Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf.
This didn't work either: ceph.conf gets overridden at monitor start, the
resulting ceph.conf inside the monitor container doesn't have
mon_rocksdb_options, the monitor starts with no RocksDB compression.
>
> I would appreciate it if someone from the Ceph team could please chip in
and suggest a working way to enable RocksDB compression in Ceph monitors.
>
> /Z
>
> On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko 
wrote:
>>
>> Thanks for your response, Josh. Our ceph.conf doesn't have anything but
the mon addresses, modern Ceph versions store their configuration in the
monitor configuration database.
>>
>> This works rather well for various Ceph components, including the
monitors. RocksDB options are also applied to monitors correctly, but for
some reason are being ignored.
>>
>> /Z
>>
>> On Sat, 14 Oct 2023, 17:40 Josh Baergen, 
wrote:
>>>
>>> Apologies if you tried this already and I missed it - have you tried
>>> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
>>> file is) instead of via 'ceph config'? I wonder if mon settings like
>>> this one won't actually apply the way you want because they're needed
>>> before the mon has the ability to obtain configuration from,
>>> effectively, itself.
>>>
>>> Josh
>>>
>>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko 
wrote:
>>> >
>>> > I also tried setting RocksDB compression options and deploying a new
>>> > monitor. The monitor started with no RocksDB compression again.
>>> >
>>> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at
mon
>>> > start and at mon deploy. How can I enable RocksDB compression in Ceph
>>> > monitors?
>>> >
>>> > Any input from anyone, please?
>>> >
>>> > /Z
>>> >
>>> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko 
wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I'm still trying to fight large Ceph monitor writes. One option I
>>> > > considered is enabling RocksDB compression, as our nodes have more
than
>>> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely
ignore
>>> > > the compression setting:
>>> > >
>>> > > I tried:
>>> > >
>>> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
>>> > >
"write_bu

[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
Thank you, Ondřej!

Yes, I set the admin entry set to "default". It's just the latest
result of failed attempts ("admin" didn't work for me either). I did
say there were some horrors in there!

If I got your sample URL pattern right, the results of a GET on
"http://x.y.z/default"; return 404, NoSuchBucket. If that means that I
didn't properly set rgw_enable_apis, then I probably don't know how to
set it right.

   Best Regards,
  Tim

On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> Hello Tim,
> 
> I was also struggling with this when I was configuring the object
> gateway for the first time.
> 
> There is a few things that you should check to make sure the
> dashboard would work.
> 
> 1. You need to have the admin api enabled on all rgws with the
> rgw_enable_apis option. (As far as I know you are not able to force
> the dashboard to use one rgw instance)
> 2. It seems that you have the rgw_admin_entry set to a non default
> value - the default is admin but it seems that you have “default" (by
> the name of the bucket) make sure that you have this also set on all
> rgws.
> 
> You can confirm that both of these settings are set properly by
> sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry} 
> “default" in your case -> it should return 405 Method Not Supported
> 
> Btw there is actually no bucket that you would be able to see in the
> administration. It’s just abstraction on the rgw.
> 
> Reagards,
> 
> Ondrej
> 
> > On 16. 10. 2023, at 22:00, Tim Holloway  wrote:
> > 
> > First, an abject apology for the horrors I'm about to unveil. I
> > made a
> > cold migration from GlusterFS to Ceph a few months back, so it was
> > a
> > learn-/screwup/-as-you-go affair.
> > 
> > For reasons of presumed compatibility with some of my older
> > servers, I
> > started with Ceph Octopus. Unfortunately, Octopus seems to have
> > been a
> > nexus of transitions from older Ceph organization and management to
> > a
> > newer (cephadm) system combined with a relocation of many ceph
> > resources and compounded by stale bits of documentation (notably
> > some
> > references to SysV procedures and an obsolete installer that
> > doesn't
> > even come with Octopus).
> > 
> > A far bigger problem was a known issue where actions would be
> > scheduled
> > but never executed if the system was even slightly dirty. And of
> > course, since my system was hopelessly dirty, that was a major
> > issue.
> > Finally I took a risk and bumped up to Pacific, where that issue no
> > longer exists. I won't say that I'm 100% clean even now, but at
> > least
> > the remaining crud is in areas where it cannot do any harm.
> > Presumably.
> > 
> > Given that, the only bar now remaining to total joy has been my
> > inability to connect via the Ceph Dashboard to the Object Gateway.
> > 
> > This seems to be an oft-reported problem, but generally referenced
> > relative to higher-level administrative interfaces like Kubernetes
> > and
> > rook. I'm interfacing more directly, however. Regardless, the error
> > reported is notably familiar:
> > 
> > [quote]
> > The Object Gateway Service is not configured
> > Error connecting to Object Gateway: RGW REST API failed request
> > with
> > status code 404
> > (b'{"Code":"NoSuchBucket","Message":"","BucketName":"default","Requ
> > estI
> > d":"tx00' b'000dd0c65b8bda685b4-00652d8e0f-5e3a9b-
> > default","HostId":"5e3a9b-default-defa' b'ult"}')
> > Please consult the documentation on how to configure and enable the
> > Object Gateway management functionality. 
> > [/quote]
> > 
> > In point of fact, what this REALLY means in my case is that the
> > bucket
> > that is supposed to contain the necessary information for the
> > dashboard
> > and rgw to communicate has not been created. Presumably that
> > SHOULDhave
> > been done by the "ceph dashboard set-rgw-credentials" command, but
> > apparently isn't, because the default zone has no buckets at all,
> > much
> > less one named "default".
> > 
> > By way of reference, the dashboard is definitely trying to interact
> > with the rgw container, because trying object gateway options on
> > the
> > dashboard result in the container logging the following.
> > 
> > beast: 0x7efd29621620: 10.0.1.16 - dashboard
> > [16/Oct/2023:19:25:03.678
> > +] "GET /default/metadata/user?myself HTTP/1.1" 404
> > 
> > To make everything happy, I'd be glad to accept instructions on how
> > to
> > manually brute-force construct this bucket.
> > 
> > Of course, as a cleaner long-term solution, it would be nice if the
> > failure to create could be detected and logged.
> > 
> > And of course, the ultimate solution: something that would assist
> > in
> > making whatever processes are unhappy be happy.
> > 
> >    Thanks,
> >  Tim
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing l

[ceph-users] RGW: How to trigger to recalculate the bucket stats?

2023-10-17 Thread Huy Nguyen
Hi,
For some reason, I need to recalculate the bucket stats. Is this possible?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Frank Schilder
Hi Stefan,

probably. Its 2 compute nodes and there are jobs running. Our epilogue script 
will drop the caches, at which point I indeed expect the warning to disappear. 
We have no time limit on these nodes though, so this can be a while. I was 
hoping there was an alternative to that, say, a user-level command that I could 
execute on the client without possibly affecting other users jobs.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: Tuesday, October 17, 2023 11:13 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] stuck MDS warning: Client HOST failing to respond to 
cache pressure

On 17-10-2023 09:22, Frank Schilder wrote:
> Hi all,
>
> I'm affected by a stuck MDS warning for 2 clients: "failing to respond to 
> cache pressure". This is a false alarm as no MDS is under any cache pressure. 
> The warning is stuck already for a couple of days. I found some old threads 
> about cases where the MDS does not update flags/triggers for this warning in 
> certain situations. Dating back to luminous and I'm probably hitting one of 
> these.
>
> In these threads I could find a lot except for instructions for how to clear 
> this out in a nice way. Is there something I can do on the clients to clear 
> this warning? I don't want to evict/reboot just because of that.

echo 2 > /proc/sys/vm/drop_caches on the clients  does that help?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-17 Thread Johan

The problem appears in v16.2.11-20230125.
I have no insight into the different commits.

/Johan

Den 2023-10-16 kl. 08:25, skrev 544463...@qq.com:

I encountered a similar problem on ceph17.2.5, could you found which commit 
caused it?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Stefan Kooman

On 17-10-2023 09:22, Frank Schilder wrote:

Hi all,

I'm affected by a stuck MDS warning for 2 clients: "failing to respond to cache 
pressure". This is a false alarm as no MDS is under any cache pressure. The warning 
is stuck already for a couple of days. I found some old threads about cases where the MDS 
does not update flags/triggers for this warning in certain situations. Dating back to 
luminous and I'm probably hitting one of these.

In these threads I could find a lot except for instructions for how to clear 
this out in a nice way. Is there something I can do on the clients to clear 
this warning? I don't want to evict/reboot just because of that.


echo 2 > /proc/sys/vm/drop_caches on the clients  does that help?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Frank Schilder
Hi all,

I'm affected by a stuck MDS warning for 2 clients: "failing to respond to cache 
pressure". This is a false alarm as no MDS is under any cache pressure. The 
warning is stuck already for a couple of days. I found some old threads about 
cases where the MDS does not update flags/triggers for this warning in certain 
situations. Dating back to luminous and I'm probably hitting one of these.

In these threads I could find a lot except for instructions for how to clear 
this out in a nice way. Is there something I can do on the clients to clear 
this warning? I don't want to evict/reboot just because of that.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io