[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
the pgp_num reduce quickly but pg_num is still slowly.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
Thanks, I will take a look.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
Thanks, other question is how to know where this option is set, mon or mgr?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

2023-06-08 Thread Patrick Donnelly
On Mon, Jun 5, 2023 at 11:48 AM Janek Bevendorff
 wrote:
>
> Hi Patrick, hi Dan!
>
> I got the MDS back and I think the issue is connected to the "newly
> corrupt dentry" bug [1]. Even though I couldn't see any particular
> reason for the SIGABRT at first, I then noticed one of these awfully
> familiar stack traces.
>
> I rescheduled the two broken MDS ranks on two machines with 1.5TB RAM
> each (just to make sure it's not that) and then let them do their thing.
> The routine goes as follows: both replay the journal, then rank 4 goes
> into the "resolve" state, but as soon as rank 3 also starts resolving,
> they both crash.
>
> Then I set
>
> ceph config mds mds_abort_on_newly_corrupt_dentry false
> ceph config mds mds_go_bad_corrupt_dentry false
>
> and this time I was able to recover the ranks, even though "resolve" and
> "clientreplay" took forever. I uploaded a compressed log of rank 3 using
> ceph-post-file [2]. It's a log of several crash cycles, including the
> final successful attempt after changing the settings. The log
> decompresses to 815MB. I didn't censor any paths and they are not
> super-secret, but please don't share.

Probably only

ceph config mds mds_go_bad_corrupt_dentry false

was necessary for recovery. You don't have any logs showing it hit
those asserts?

I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
our ceph-post-file storage has been non-functional since the beginning
of the lab outage last year. We're looking into it.

> While writing this, the metadata pool size has reduced from 6TiB back to
> 440GiB. I am starting to think that the fill-ups may also be connected
> to the corruption issue.

Extremely unlikely.

> I also noticed that the ranks 3 and 4 always
> have huge journals. An inspection using ceph-journal-tool takes forever
> and consumes 50GB of memory in the process. Listing the events in the
> journal is impossible without running out of RAM. Ranks 0, 1, and 2
> don't have this problem and this wasn't a problem for ranks 3 and 4
> either before the fill-ups started happening.

So clearly (a) an incredible number of journal events are being logged
and (b) trimming is slow or unable to make progress. I'm looking into
why but you can help by running the attached script when the problem
is occurring so I can investigate. I'll need a tarball of the outputs.

Also, in the off-chance this is related to the MDS balancer, please
disable it since you're using ephemeral pinning:

ceph config set mds mds_bal_interval 0

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Eugen Block

Sure: https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling

Zitat von Louis Koo :


ok, I will try it. Could you show me the archive doc?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd ls failed with operation not permitted

2023-06-08 Thread Konstantin Shalygin
Hi,

> On 7 Jun 2023, at 14:39, zyz  wrote:
> 
> When set the user's auth and then ls namespace, it is ok.
> 
> 
> But when I set the user's auth with namespace, ls namespace returns with 
> error, but why?

Because data with namespaces in "without namespace" space


k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Konstantin Shalygin
Hi,

> On 7 Jun 2023, at 10:02, Louis Koo  wrote:
> 
> I had set it from 0.05 to 1 with "ceph config set mon 
> target_max_misplaced_ratio 1.0", it's still invalid.


Because is setting for a mgr, not for mon, try `ceph config set mgr 
target_max_misplaced_ratio 1`

Cheers,
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW] what is log_meta and log_data config in a multisite config?

2023-06-08 Thread Gilles Mocellin
Hi Richard,

Thank you, that's what I thought, I've also seen that doc.
But so, I imagine that log_meta is false on secondary zones because metadata 
requests are forwarded to master zone, no need to sync.

Regards,
--
Gilles

Le jeudi 8 juin 2023, 03:15:56 CEST Richard Bade a écrit :
> Hi Gilles,
> I'm not 100% sure but I believe this is relating to the logs kept for
> doing incremental sync. When these are false then changes are not
> tracked and sync doesn't happen.
> My reference is this Red Hat documentation on configuring zones
> without replication.
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/ob
> ject_gateway_guide/advanced-configuration#configuring-multiple-zones-without
> -replication_rgw "Open the file for editing, and set the log_meta, log_data,
> and
> sync_from_all fields to false"
> 
> I hope that helps.
> Rich
> 
> On Mon, 5 Jun 2023 at 20:42, Gilles Mocellin
> 
>  wrote:
> > Hi Cephers,
> > 
> > In a multisite config, with one zonegroup and 2 zones, when I look at
> > `radiosgw-admin zonegroup get`,
> > 
> > I see by defaut these two parameters :
> >  "log_meta": "false",
> >  "log_data": "true",
> > 
> > Where can I find documentation on these, I can't find.
> > 
> > I set log_meta to true, because, why not ?
> > Is it a bad thing ?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd ls failed with operation not permitted

2023-06-08 Thread zyz
When set the user's auth and then ls namespace, it is ok.


But when I set the user's auth with namespace, ls namespace returns with error, 
but why?___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
I had set it from 0.05 to 1 with "ceph config set mon 
target_max_misplaced_ratio 1.0", it's still invalid.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
ok, I will try it. Could you show me the archive doc?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-08 Thread Louis Koo
ceph df detail:
[root@k8s-1 ~]# ceph df detail
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
hdd600 GiB  600 GiB  157 MiB   157 MiB   0.03
TOTAL  600 GiB  600 GiB  157 MiB   157 MiB   0.03
 
--- POOLS ---
POOLID  PGS   STORED   (DATA)  (OMAP)  
OBJECTS USED   (DATA)  (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES 
 DIRTY  USED COMPR  UNDER COMPR
device_health_metrics11  0 B  0 B 0 B   
 0  0 B  0 B 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.buckets.index2  230  0 B  0 B 0 B   
11  0 B  0 B 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.control  38  0 B  0 B 0 B   
 8  0 B  0 B 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.log  48  3.7 KiB  3.7 KiB 0 B  
180  420 KiB  420 KiB 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.meta 58  1.9 KiB  1.9 KiB 0 B   
 7   72 KiB   72 KiB 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
.rgw.root68  4.9 KiB  4.9 KiB 0 B   
16  180 KiB  180 KiB 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.buckets.non-ec   78  0 B  0 B 0 B   
 0  0 B  0 B 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.otp  88  0 B  0 B 0 B   
 0  0 B  0 B 0 B  0158 GiBN/A  N/A
N/A 0 B  0 B
os-gwbtwlltuklwxrpl.rgw.buckets.data 9   32  0 B  0 B 0 B   
 0  0 B  0 B 0 B  0317 GiBN/A  N/A
N/A 0 B  0 B

ceph osd pool ls detail:
[root@k8s-1 ~]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 25 flags 
hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application 
mgr_devicehealth
pool 2 'os-gwbtwlltuklwxrpl.rgw.buckets.index' replicated size 3 min_size 2 
crush_rule 5 object_hash rjenkins pg_num 217 pgp_num 209 pg_num_target 32 
pgp_num_target 32 autoscale_mode off last_change 322 lfor 0/322/320 flags 
hashpspool stripe_width 0 compression_mode none pg_num_min 8 target_size_ratio 
0.5 application rook-ceph-rgw
pool 3 'os-gwbtwlltuklwxrpl.rgw.control' replicated size 3 min_size 2 
crush_rule 7 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on 
last_change 36 flags hashpspool stripe_width 0 compression_mode none pg_num_min 
8 target_size_ratio 0.5 application rook-ceph-rgw
pool 4 'os-gwbtwlltuklwxrpl.rgw.log' replicated size 3 min_size 2 crush_rule 6 
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 35 flags 
hashpspool stripe_width 0 compression_mode none pg_num_min 8 target_size_ratio 
0.5 application rook-ceph-rgw
pool 5 'os-gwbtwlltuklwxrpl.rgw.meta' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 35 flags 
hashpspool stripe_width 0 compression_mode none pg_num_min 8 target_size_ratio 
0.5 application rook-ceph-rgw
pool 6 '.rgw.root' replicated size 3 min_size 2 crush_rule 4 object_hash 
rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 35 flags hashpspool 
stripe_width 0 compression_mode none pg_num_min 8 target_size_ratio 0.5 
application rook-ceph-rgw
pool 7 'os-gwbtwlltuklwxrpl.rgw.buckets.non-ec' replicated size 3 min_size 2 
crush_rule 8 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on 
last_change 35 flags hashpspool stripe_width 0 compression_mode none pg_num_min 
8 target_size_ratio 0.5 application rook-ceph-rgw
pool 8 'os-gwbtwlltuklwxrpl.rgw.otp' replicated size 3 min_size 2 crush_rule 3 
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 36 flags 
hashpspool stripe_width 0 compression_mode none pg_num_min 8 target_size_ratio 
0.5 application rook-ceph-rgw
pool 9 'os-gwbtwlltuklwxrpl.rgw.buckets.data' erasure profile 
os-gwbtwlltuklwxrpl.rgw.buckets.data_ecprofile size 3 min_size 2 crush_rule 9 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 70 
flags hashpspool,ec_overwrites stripe_width 8192 compression_mode none 
target_size_ratio 0.5 application rook-ceph-rgw

I tested under a smaller ceph cluster with 6 osds, and set the pg num of the 
index pool to 256, the reduce it to 32, it still slowly.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 

[ceph-users] S3 and Omap

2023-06-08 Thread xadhoom76
Hi, we have a ceph 17.2.6 with ragosgw and a couple of buckets in it.
We use it for backup with lock directly from veeam.
After few backups we got 
HEALTH_WARN 2 large omap objects
│·
[WRN] LARGE_OMAP_OBJECTS: 2 large omap objects  
│·
2 large objects found in pool 'backup.rgw.buckets.index'
│·
Search the cluster log for 'Large omap object found' for more details.  
  

What is causing this ? could we get a bigger threshold avout omap size and 
ignore safely this warning ?
What's the issue?
Best regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about xattr and subvolumes

2023-06-08 Thread Dario Graña
Thank you for the answer, that's what I was looking for!

On Wed, Jun 7, 2023 at 7:59 AM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

>
>
> On Tue, Jun 6, 2023 at 4:30 PM Dario Graña  wrote:
>
>> Hi,
>>
>> I'm installing a new instance (my first) of Ceph. Our cluster runs
>> AlmaLinux9 + Quincy. Now I'm dealing with CephFS and quotas. I read
>> documentation about setting up quotas with virtual attributes (xattr) and
>> creating volumes and subvolumes with a prefixed size. I cannot distinguish
>> which is the best option for us.
>>
>
> Creating a volume would create a fs and subvolumes are essentially
> directories inside the fs which are managed through
> mgr subvolume APIs. The subvolumes are introduced for openstack and
> openshift use case which expect these subvolumes
> to be programmatically managed via APIs.
>
> Answering the quota question, in cephfs, quota is set using the virtual
> xattr. The subvolume creation with size essentially
> uses the same virtual xattr interface to set the quota size.
>
>
>> Currently we create a directory with a project name and some
>> subdirectories
>> inside.
>>
>
> You can explore subvolumegroup and subvolume mgr APIs if it fits your use
> case. Please note that it's mainly designed for
> openstack/openshift kind of use cases where each subvolume is per PVC and
> the data distinction is maintained e.g., there won't
> be hardlinks created across the subvolumes.
>
>
>> I would like to understand the difference between both options.
>>
>> Thanks in advance.
>>
>> --
>> Dario Graña
>> PIC (Port d'Informació Científica)
>> Campus UAB, Edificio D
>> E-08193 Bellaterra, Barcelona
>> http://www.pic.es
>> Avis - Aviso - Legal Notice: http://legal.ifae.es
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] keep rbd command history ever executed

2023-06-08 Thread huxia...@horebdata.cn
Dear Ceph folks,

In a Ceph cluster there could be multiple points (e.g. librbd clients) being 
able to execute rbd commands. My question is that , is there a methold to 
reliably record or keep a full rbd command history that ever being executed?  
This would be helpful for auditors as well as for system operators.

any ideas?


Samuel



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Operations: cannot update immutable features

2023-06-08 Thread Adam Boyhan
I have a small cluster on Pacific with roughly 600 RBD images.   Out of those 
600 images I have 2 which are in a somewhat odd state.

root@cephmon:~# rbd info Cloud-Ceph1/vm-134-disk-0
rbd image 'vm-134-disk-0':
size 1000 GiB in 256000 objects
order 22 (4 MiB objects)
snapshot_count: 11
id: 7c326b8b4567
block_name_prefix: rbd_data.7c326b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, operations
op_features: snap-trash
flags:
create_timestamp: Fri Aug 14 07:11:44 2020
access_timestamp: Thu Jun  8 06:31:06 2023
modify_timestamp: Thu Jun  8 06:31:11 2023

Specifically, the feature "operations".   I never set this feature; and I don't 
see it listed in documentation.

This feature is causing my backup software to not be able to backup the rdb.  
Otherwise, it's working aok.

I did attempt to remove the feature.

root@cephmon:~# rbd feature disable Cloud-Ceph1/vm-464-disk-0 operations
rbd: failed to update image features: 2023-06-08T07:50:21.899-0400 7fdea52ae340 
-1 librbd::Operations: cannot update immutable features
(22) Invalid argument

Any help or input is great appreciated.
This message and any attachments may contain information that is protected by 
law as privileged and confidential, and is transmitted for the sole use of the 
intended recipient(s). If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying or retention of this e-mail or 
the information contained herein is strictly prohibited. If you received this 
e-mail in error, please immediately notify the sender by e-mail, and 
permanently delete this e-mail.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RadosGW S3 API Multi-Tenancy

2023-06-08 Thread Brad House

Curious if anyone had any guidance on this question...

On 4/29/23 7:47 AM, Brad House wrote:
I'm in the process of exploring if it is worthwhile to add RadosGW to 
our existing ceph cluster.  We've had a few internal requests for 
exposing the S3 API for some of our business units, right now we just 
use the ceph cluster for VM disk image storage via RBD.


Everything looks pretty straight forward until we hit multitenancy. 
The page on multi-tenancy doesn't dive into permission delegation:

https://docs.ceph.com/en/quincy/radosgw/multitenancy/

The end goal I want is to be able to create a single user per tenant 
(Business Unit) which will act as their 'administrator', where they 
can then do basically whatever they want under their tenant sandbox 
(though I don't think we need more advanced cases like creations of 
roles or policies, just create/delete their own users, buckets, 
objects).  I was hopeful this would just work, and I asked on the ceph 
IRC channel on OFTC and was told once I grant a user caps="users=*", 
they would then be allowed to create users *outside* of their own 
tenant using the Rados Admin API and that I should explore IAM roles.


I think it would make sense to add a feature, such as a flag that can 
be set on a user, to ensure they stay in their "sandbox". I'd assume 
this is probably a common use-case.


Anyhow, if its possible to do today using iam roles/policies, then 
great, unfortunately this is my first time looking at this stuff and 
there are some things not immediately obvious.


I saw this online about AWS itself and creating a permissions 
boundary, but that's for allowing creation of roles within a boundary:
https://www.qloudx.com/delegate-aws-iam-user-and-role-creation-without-giving-away-admin-access/ 



I'm not sure what "Action" is associated with the Rados Admin API 
create user for applying a boundary that the user can only create 
users with the same tenant name.

https://docs.ceph.com/en/quincy/radosgw/adminops/#create-user

Any guidance on this would be extremely helpful.

Thanks!
-Brad

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13: ERROR:ceph-crash:directory /var/lib/ceph/crash/posted does not exist; please create

2023-06-08 Thread Eugen Block

Hi,

I wonder if a redeploy of the crash service would fix that, did you try that?

Zitat von Zakhar Kirpichenko :


I've opened a bug report https://tracker.ceph.com/issues/61589, which
unfortunately received no attention.

I fixed the issue by manually setting directory ownership
for /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash
and /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash/posted to
167:167, which in my system is the user ID crash process uses inside the
crash container.

/Z

On Mon, 5 Jun 2023 at 11:24, Zakhar Kirpichenko  wrote:


Any other thoughts on this, please? Should I file a bug report?

/Z

On Fri, 2 Jun 2023 at 06:11, Zakhar Kirpichenko  wrote:


Thanks, Josh. The cluster is managed by cephadm.

On Thu, 1 Jun 2023, 23:07 Josh Baergen, 
wrote:


Hi Zakhar,

I'm going to guess that it's a permissions issue arising from
https://github.com/ceph/ceph/pull/48804, which was included in 16.2.13.
You may need to change the directory permissions, assuming that you manage
the directories yourself. If this is managed by cephadm or something like
that, then that seems like some sort of missing migration in the upgrade.

Josh

On Thu, Jun 1, 2023 at 12:34 PM Zakhar Kirpichenko 
wrote:


Hi,

I'm having an issue with crash daemons on Pacific 16.2.13 hosts.
ceph-crash
throws the following error on all hosts:

ERROR:ceph-crash:directory /var/lib/ceph/crash/posted does not exist;
please create
ERROR:ceph-crash:directory /var/lib/ceph/crash/posted does not exist;
please create
ERROR:ceph-crash:directory /var/lib/ceph/crash/posted does not exist;
please create

ceph-crash runs in docker, the container has the directory mounted: -v

/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z

The mount works correctly:

18:26 [root@ceph02
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86]# ls
-al crash/posted/
total 8
drwx-- 2 nobody nogroup 4096 May  6  2021 .
drwx-- 3 nobody nogroup 4096 May  6  2021 ..

18:26 [root@ceph02 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86]#
touch crash/posted/a

18:26 [root@ceph02 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86]#
docker exec -it c0cd2b8022d8 bash

[root@ceph02 /]# ls -al /var/lib/ceph/crash/posted/
total 8
drwx-- 2 nobody nobody 4096 Jun  1 18:26 .
drwx-- 3 nobody nobody 4096 May  6  2021 ..
-rw-r--r-- 1 root   root  0 Jun  1 18:26 a

I.e. the directory actually exists and is correctly mounted in the crash
container, yet ceph-crash says it doesn't exist. How can I convince it
that the directory is there?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-06-08 Thread Eugen Block

Hi,

sorry for not responing earlier.


Pardon my ignorance, I'm not quite sure I know what you mean by subtree
pinning. I quickly googled it and saw it was a new feature in Luminous. We
are running Pacific. I would assume this feature was not out yet.


Luminous is older than Pacific, so the feature would be available for  
your cluster.



I can definitely try. However, I tried to lower the max number of mds.
Unfortunately, one of the MDSs seem to be stuck in "stopping" state for
more than 12 hours now.


It sounds indeed like reducing the max_mds is causing issues, other  
users with a high client load reported similar issues during ceph  
upgrades where max_mds has to be reduced to 1 as well. Can you share  
more details about the MDS utilization (are those standalone servers  
or colocated services for example with OSDs?), how many cephfs clients  
(ceph fs status), what kind of workload do they produce, the general  
ceph load (ceph -s). Just to get a better impression of what's going  
on there. To check if and what pinning you use you could checkout the  
docs [1] and see if any (upper level) directory returns something with  
for the getfattr commands. Or maybe someone documented using setfattr  
for your cephfs, maybe in the command history?


[1]  
https://docs.ceph.com/en/quincy/cephfs/multimds/#setting-subtree-partitioning-policies


Zitat von Emmanuel Jaep :


Hi Eugen,

Also, do you know why you use a multi-active MDS setup?
To be completely candid, I don't really know why this choice was made. I
assume the goal was to provide fault-tolerance and load-balancing.

Was that a requirement for subtree pinning (otherwise multiple active
daemons would balance the hell out of each other) or maybe just an
experiment?
Pardon my ignorance, I'm not quite sure I know what you mean by subtree
pinning. I quickly googled it and saw it was a new feature in Luminous. We
are running Pacific. I would assume this feature was not out yet.

Depending on the workload pinning might have been necessary, maybe you
would impact performance if you removed 3 MDS daemons?
I can definitely try. However, I tried to lower the max number of mds.
Unfortunately, one of the MDSs seem to be stuck in "stopping" state for
more than 12 hours now.

Best,

Emmanuel

On Wed, May 24, 2023 at 4:34 PM Eugen Block  wrote:


Hi,

using standby-replay daemons is something to test as it can have a
negative impact, it really depends on the actual workload. We stopped
using standby-replay in all clusters we (help) maintain, in one
specific case with many active MDSs and a high load the failover time
decreased and was "cleaner" for the client application.
Also, do you know why you use a multi-active MDS setup? Was that a
requirement for subtree pinning (otherwise multiple active daemons
would balance the hell out of each other) or maybe just an experiment?
Depending on the workload pinning might have been necessary, maybe you
would impact performance if you removed 3 MDS daemons? As an
alternative you can also deploy multiple MDS daemons per host
(count_per_host) which can utilize the server better, not sure which
Pacific version that is, I just tried successfully on 16.2.13. That
way you could still maintain the required number of MDS daemons (if
it's still 7 ) and also have enough standby daemons. But that of
course means in case one MDS host goes down all it's daemons will also
be unavailable. But we used this feature in an older version
(customized Nautilus) quite successfully in a customer cluster.
There are many things to consider here, just wanted to share a couple
of thoughts.

Regards,
Eugen

Zitat von Hector Martin :

> Hi,
>
> On 24/05/2023 22.02, Emmanuel Jaep wrote:
>> Hi Hector,
>>
>> thank you very much for the detailed explanation and link to the
>> documentation.
>>
>> Given our current situation (7 active MDSs and 1 standby MDS):
>> RANK  STATE  MDS ACTIVITY DNSINOS   DIRS   CAPS
>>  0active  icadmin012  Reqs:   82 /s  2345k  2288k  97.2k   307k
>>  1active  icadmin008  Reqs:  194 /s  3789k  3789k  17.1k   641k
>>  2active  icadmin007  Reqs:   94 /s  5823k  5369k   150k   257k
>>  3active  icadmin014  Reqs:  103 /s   813k   796k  47.4k   163k
>>  4active  icadmin013  Reqs:   81 /s  3815k  3798k  12.9k   186k
>>  5active  icadmin011  Reqs:   84 /s   493k   489k  9145176k
>>  6active  icadmin015  Reqs:  374 /s  1741k  1669k  28.1k   246k
>>   POOL TYPE USED  AVAIL
>> cephfs_metadata  metadata  8547G  25.2T
>>   cephfs_data  data 223T  25.2T
>> STANDBY MDS
>>  icadmin006
>>
>> I would probably be better off having:
>>
>>1. having only 3 active MDSs (rank 0 to 2)
>>2. configure 3 standby-replay to mirror the ranks 0 to 2
>>3. have 2 'regular' standby MDSs
>>
>> Of course, this raises the question of storage and performance.
>>
>> Since I would be moving from 7 active MDSs to 3:
>>
>>1. each new active MDS will have to store more than 

[ceph-users] Bucket resharding in multisite without data replication

2023-06-08 Thread Danny Webb
Hi Ceph users,

We have 3 clusters running Pacific 16.2.9 all setup in a multisite 
configuration with no data replication (we wanted to use per bucket policies 
but never got them working to our satisfaction).  All of the resharding 
documentation I've found regarding multisite is centred around multisite with 
data replication and having to reshard from the primary region and 
re-replicating data.  But our data is spread amongst the regions and may not be 
in the primary region.

Testing has shown that resharding from the primary region in the case of a 
bucket with data only in a remote region results in the remote bucket losing 
it's ability to list contents (seemingly breaking the index in the remote 
region).

Is there a way (besides waiting for reef and dynamic bucket resharding for 
multisite) to reshard buckets in this setup?

Cheers,

Danny

Danny Webb
Principal OpenStack Engineer
danny.w...@thehutgroup.com
[THG Ingenuity Logo]
[https://i.imgur.com/wbpVRW6.png]
  [https://i.imgur.com/c3040tr.png] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Updating the Grafana SSL certificate in Quincy

2023-06-08 Thread Eugen Block

Hi,

can you paste the following output?

# ceph config-key list | grep grafana

Do you have a mgr/cephadm/grafana_key set? I would check the contents  
of crt and key and see if they match. A workaround to test the  
certificate and key pair would be to use a per-host config [1]. Maybe  
it's not even a workaround but the desired procedure according to this  
PR [2].


ceph config-key set mgr/cephadm/{hostname}/grafana_key -i $PWD/key.pem
ceph config-key set mgr/cephadm/{hostname}/grafana_crt -i $PWD/certificate.pem

Hope this helps.
Eugen

[1]  
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#setting-up-grafana

[2] https://github.com/ceph/ceph/pull/47098

Zitat von Thorne Lawler :


Hi everyone!

I have a containerised (cephadm built) 17.2.6 cluster where I have  
installed a custom commercial SSL certificate under dashboard.


Before I upgraded from 17.2 to 17.2.6, I successfully installed the  
custom SSL cert everywhere, including grafana, but since the upgrade  
I am finding that I can't update the certificate for grafana. Have  
tried many many commands like the following:


ceph config-key set mgr/cephadm/grafana_crt -i  
/etc/pki/tls/certs/_.quick.net.au_2024.pem

ceph orch reconfig grafana
ceph dashboard set-grafana-frontend-api-url https://san.quick.net.au:3000
restorecon /etc/pki/tls/certs/_.quick.net.au_2024.pem
ceph orch reconfig grafana
ceph dashboard set-grafana-frontend-api-url https://san.quick.net.au:3000
ceph dashboard set-grafana-frontend-url https://san.quick.net.au:3000
ceph dashboard grafana
ceph dashboard grafana dashboards update
ceph orch reconfig grafana
ceph config-key set mgr/cephadm/grafana_crt -i  
/etc/pki/tls/certs/_.quick.net.au_2024.pem

ceph orch redeploy grafana
ceph config set mgr mgr/dashboard/GRAFANA_API_URL  
https://san.quick.net.au:3000


...but to no avail. The grafana frames within dashboard continue to  
use the self-signed key.


Have the commands for updating this changed between 17.2.0 and 17.2.6?

Thank you.

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and  
any attached files may be confidential information, and may also be  
the subject of legal professional privilege. _If you are not the  
intended recipient any use, disclosure or copying of this email is  
unauthorised. _If you received this email in error, please notify  
Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this  
matter and delete all copies of this transmission together with any  
attachments. /

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to secure erasing a rbd image without encryption?

2023-06-08 Thread Janne Johansson
Den tors 8 juni 2023 kl 09:43 skrev Marc :
> > I bumped into an very interesting challenge, how to secure erase a rbd
> > image data without any encryption?

As Darren replied while I was typing this, you can't have dangerous
data written all over a cluster which automatically moves data around,
and after-the-fact prove that no bits ever can be found anywhere. That
said, you need to figure out exactly which kind of "opponent" you are
protecting from.

If the opponent is an auditor with lots of fantasies and less
technical knowledge, they will be able to imagine "even if you
overwrite 1-2-3-10 times, there are side channels on the disk tracks
on which $imaginary_enemy can read by moving the head slightly to the
side and find old data". If you pay insane amounts to disk rescue
companies, they will still not succeed with this, but auditors seldom
care about that. Or you overwrite 5 times and they will say "it must
be 10 times", because failing you means more money for them, more
consultant hours and all that.

Now, if you mount the image after stopping the previous RBD image
user, and overwrite it one or ten times over with zeroes or patterns
or /dev/urandom contents, you will end up in a situation where me, you
and probably everyone else on this list could not get consistent data
back EVEN if someone threatened our families and loved ones with pain
and suffering. Especially not if we must make the attempt via the APIs
and qemu/openstack layers and try to get someone elses old data back
this way instead of stealing drives from data centers.

For many situations, this probably is as good as it gets unless you
destroy all OSD drives to molten metal. For everything in between "I
could not save my kids" and "this is what the auditor is finally ok
with", there is just a huge amount of "work" and probably zero actual
gains to be had. Auditors love for you to document the amount of work
you did, all the time and money spent on the effort, but the actual
end effect you get from all of it is just lots of paperwork and almost
no extra digital security.

If we get customers who want to make sure no one can read their data
when they destroy a box, we always suggest they make an
image/OS-install with local drive encryption, then before deleting the
instance, re-set the key to something random that neither they nor we
know, and then remove the instance and have the system delete the
encrypted data whenever it suits.

This means someone needs to type in a password or passphrase at each
boot to unlock, but that is the price one has to pay in order to
protect against someone dumpster-diving old OSD drives, or evil ceph
admins that don't really delete the image but moves it away or
something along those lines.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to secure erasing a rbd image without encryption?

2023-06-08 Thread darren
Unfortunately this is impossible to achieve.

Unless you can guarantee that the same physical pieces of disk are going to 
always be mapped to the same parts of the RBD device then you will leave data 
lying around on the array. How easy it is to recover is a bit of a question 
about how valuable the data is to someone.

Ceph moves data around at the backend which means there could be old blocks 
left on OSD’s that contain the user data. There is no guarantee as to how long 
those pieces of data will be around for. 

If your RBD device is on SSD/NVME then you cannot get to all the blocks that 
contain your data unless you use the manufacturer supplied utilities to erase 
the device completely. This problem is overcome with encrypted OSD’s but it 
doesn’t help your end user RBD device that needs to be deleted. If the RBD 
devices had snapshots of it then there is even more copies of the data within 
the array which you cannot directly access.

Any array that moves data around without the client knowing about it and being 
able to map to all the blocks used means there are old parts of the image that 
where presented to the client that have the original data that can still be 
recovered.

Things like a re-balance or an OSD server failure mean that some of the 
original data is on blocks that are no longer available.

The only way to guarantee that your data is secure and no-one can read it is to 
control the actual code that does the encryption and to keep control of the 
encryption keys. Ie you do something on the client before you send it to the 
array.

This is not a unique to Ceph problem but an issue for all arrays.




Darren Soothill

Looking for help with your Ceph cluster? Contact us at https://croit.io/
 
croit GmbH, Freseniusstr. 31h, 81247 Munich 
CEO: Martin Verges - VAT-ID: DE310638492 
Com. register: Amtsgericht Munich HRB 231263 
Web: https://croit.io/ | YouTube: https://goo.gl/PGE1Bx

> On 8 Jun 2023, at 06:14, huxia...@horebdata.cn wrote:
> 
> Dear ceph folks,
> 
> I bumped into an very interesting challenge, how to secure erase a rbd image 
> data without any encryption? 
> 
> The motivation is to ensure that there is no information leak on OSDs after 
> deleting a user specified rbd image, without the extra burden of using rbd 
> encryption.
> 
> any ideas, suggestions are highly appreciated,
> 
> 
> Samuel  
> 
> 
> 
> 
> 
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues in installing old dumpling version to add a new monitor

2023-06-08 Thread Janne Johansson
> I have a very old Ceph cluster running the old dumpling version 0.67.1. One
> of the three monitors suffered a hardware failure and I am setting up a new
> server to replace the third monitor running Ubuntu 22.04 LTS (all the other
> monitors are using the old Ubuntu 12.04 LTS).


> - Try to install the same old OS (Ubuntu 12.04 LTS) on the new server (not
> too sure if I still have the ISO) and see if it works?

Perhaps you could DL the 12.04 ISO from some ubuntu archive-site and
run such an instance in a VM on the third machine?

http://old-releases.ubuntu.com/releases/12.04/

and

https://download.ceph.com/archive/debian-dumpling/dists/precise/

I guess.


As soon as you have 3 working monitors, you can figure out if you want
to upgrade the whole cluster or migrate data some other way into a
more recent ceph cluster or whatever.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to secure erasing a rbd image without encryption?

2023-06-08 Thread Marc
> 
> I bumped into an very interesting challenge, how to secure erase a rbd
> image data without any encryption?
> 
> The motivation is to ensure that there is no information leak on OSDs
> after deleting a user specified rbd image, without the extra burden of
> using rbd encryption.
> 
> any ideas, suggestions are highly appreciated,
> 

what about something this? Maybe even via qemu-agent-command?

find /home -type f -exec shred -n 1 {} \;

Why are you not encrypting the osd? I thought that was the most efficient way 
of using encryption.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues in installing old dumpling version to add a new monitor

2023-06-08 Thread Nico Schottelius


Hey,

in case building from source does not work out for you, here is a
strategy we used to recover older systems before:

- Create a .tar from /, pipe it out via ssh to another host
  - basically take everything with the exception of unwanted mountpoints
- Untar it, modify networking, hostname, ceph config, etc.
  - When this is done you have a directory usable as root filesystem suitable 
on your new monitor
- Prepare a new disk using (s)fdisk and friends, filesystem, etc.
- Copy/tar over the new filesystem
- Add the bootloader to the disk (likely grub or lilo in our case)
- Add disk into the new monitor, boot from it

This is by far not the cleanest method and it is easy to forget to
change old configurations from the source host, but in case you cannot
get the binaries from anywhere else, this is a known way for recovery.

We have various scripts that we use for bootstrapping servers, checkout
[0] for instance which contains most steps from above.

I wish you good success for recovering - we are still running some
Nautilus clusters, which are also due for upgrade, so I can feel your
situation.

Best regards,

Nico

[0] 
https://code.ungleich.ch/ungleich-public/ungleich-tools/src/branch/master/alpine-install-on-disk.sh


Cloud List  writes:

> Hi,
>
> I have a very old Ceph cluster running the old dumpling version 0.67.1. One
> of the three monitors suffered a hardware failure and I am setting up a new
> server to replace the third monitor running Ubuntu 22.04 LTS (all the other
> monitors are using the old Ubuntu 12.04 LTS).
>
> I used ceph-deploy to deploy the cluster initially, and I can't use it
> since it's a very old version of ceph-deploy -- having issues with apt-key
> being deprecated and since ceph-deploy is no longer maintained, I can't
> upgrade it. And even if I can, I am not too sure if it works since the
> dumpling version is no longer in Ceph's official repository.
>
> So I tried to install it manually by cloning it from git:
>
> git clone -b dumpling https://github.com/ceph/ceph.git
>
> But when I tried to run "git submodule update --init" or "./autogen.sh" as
> per the README file, I am encountering this error:
>
> 
> root@ceph-mon-04:~/ceph-dumpling/ceph# git submodule update --init
> Submodule 'ceph-object-corpus' (git://ceph.com/git/ceph-object-corpus.git)
> registered for path 'ceph-object-corpus'
> Submodule 'src/libs3' (git://github.com/ceph/libs3.git) registered for path
> 'src/libs3'
> Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
> fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
> fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
> path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
> Failed to clone 'ceph-object-corpus'. Retry scheduled
> Cloning into '/root/ceph-dumpling/ceph/src/libs3'...
> Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
> fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
> fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
> path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
> Failed to clone 'ceph-object-corpus' a second time, aborting
> root@ceph-mon-04:~/ceph-dumpling/ceph# git submodule update --init
> --recursive
> Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
> fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
> fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
> path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
> Failed to clone 'ceph-object-corpus'. Retry scheduled
> Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
> fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
> fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
> path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
> Failed to clone 'ceph-object-corpus' a second time, aborting
> root@ceph-mon-04:~/ceph-dumpling/ceph#
> 
>
> It seems that the repositories required for the submodules are no longer
> there. Anyone can advise me on the correct direction on how can I install
> the dumpling version of Ceph for me to add a new monitor? At the moment
> only 2 monitors out of 3 are up and I am worried that the cluster will be
> down if I lose another monitor.
>
> $ ceph status
>   cluster 1660b11f-1074-4f5d-aa7c-64b479397a2f
>health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
>
> What approach I should take:
> - Continue trying the manual installation/compiling route?
> - Continue trying the ceph-deploy route (by fixing the apt-key deprecation
> issue)?
> - Try to install the same old OS (Ubuntu 12.04 LTS) on the new server (not
> too sure if I still have the ISO) and see if it works?
> - Try to upgrade the current cluster and then add the monitor later after
> upgrade? (is it risky to upgrade with HEALTH_WARN status)?
>
> Any advice is greatly appreciated.
>
> Best regards,
> -ip-
>