[ceph-users] Re: rgw: disallowing bucket creation for specific users?

2023-10-06 Thread Matthias Ferdinand
On Fri, Oct 06, 2023 at 08:55:42AM +0200, Ondřej Kukla wrote:
> Hello Matthias,
> 
> In our setup we have a set of users that are only use to read from certain 
> buckets (they have s3:GetObject set in the bucket policy).
> 
> When we create those read users using the Admin Ops API we add the 
> max-buckets=-1 parameter which disables bucket creation.
> 
> https://docs.ceph.com/en/quincy/radosgw/adminops/#create-user
> 
> Isn’t this what are you looking for?

yes, it is, thank you. Seems to be essentially the same as providing
"--max-buckets=-1" on the CLI with "radosgw quota set". But interesting
to see this done in a single step when creating the user.

Matthias

> 
> Regards,
> 
> Ondrej
> 
> > On 6. 10. 2023, at 8:44, Matthias Ferdinand  wrote:
> > 
> > On Thu, Oct 05, 2023 at 09:22:29AM +0200, Robert Hish wrote:
> >> Unless I'm misunderstanding your situation, you could also tag your
> >> placement targets. You then tag users with the corresponding tag enabling
> >> them to create new buckets at that placement target. If a user is not 
> >> tagged
> >> with the corresponding tag they cannot create new buckets at that placement
> >> target. The tags do not prevent users from using buckets that are already
> >> owned/created by them.
> >> 
> >> It adds a bit of management overhead, but should give you the control youre
> >> looking for.
> >> 
> >> https://docs.ceph.com/en/latest/radosgw/placement/#user-placement
> > 
> > 
> > thanks. Seems like a bit of work initially configuring placement
> > targets, but perhaps easier to handle when only a few users get to
> > create buckets vs. a large number of users without bucket creation
> > rights.
> > 
> > Matthias
> > 
> > 
> >> 
> >> -Robert
> >> 
> >> On 10/4/23 19:32, Matthias Ferdinand wrote:
>  Tried a negative number ("--max-buckets=-1"), but that had no effect at
>  all (not even an error message).
> >>> 
> >>> must have mistyped the command; trying again with "-max-buckets=-1", it
> >>> shows the wanted effect: user cannot create any bucket.
> >>> 
> >>> So, an effective and elegant method indeed :-)
> >>> 
> >>> Matthias
> >>> 
> >>> PS: answering my own question below "how can I keep users from deleting
> >>> their own bucket?": bucket policy.
> >>> 
> >>> $ cat denypolicy.txt
> >>> {
> >>>   "Version": "2012-10-17",
> >>>   "Statement": [{
> >>> "Effect": "Deny",
> >>> "Principal": {"AWS": ["*"]},
> >>> "Action": [ "s3:DeleteBucket", "s3:DeleteBucketPolicy", 
> >>> "s3:PutBucketPolicy" ],
> >>> "Resource": [
> >>>   "arn:aws:s3:::*"
> >>> ]
> >>>   }]
> >>> }
> >>> 
> >>> Bucket cannot be deleted any more even by the bucket owner, and also the
> >>> bucket policy can't be modified/deleted anymore.
> >>> 
> >>> This closes the loopholes I could come up with so far; there might still
> >>> be some left I am currently not aware of :-)
> >>> 
> >>> 
> >>> On Wed, Oct 04, 2023 at 06:20:09PM +0200, Matthias Ferdinand wrote:
>  On Tue, Oct 03, 2023 at 06:10:17PM +0200, Matthias Ferdinand wrote:
> > On Sun, Oct 01, 2023 at 12:00:58PM +0200, Peter Goron wrote:
> >> Hi Matthias,
> >> 
> >> One possible way to achieve your need is to set a quota on number of
> >> buckets  at user level (see
> >> https://docs.ceph.com/en/reef/radosgw/admin/#quota-management). Quotas 
> >> are
> >> under admin control.
> > 
> > thanks a lot, rather an elegant solution.
>  
>  sadly, bucket quotas are not really as effective and elegant as I first
>  thought, since "--max-buckets=0" means "unlimited", not "no buckets".
>  
>  Setting and enabling per-user bucket-scoped quota:
>  
>  # radosgw-admin quota set --uid=rgw_user_03 --quota-scope=bucket 
>  --max-objects=1 --max-size=1 --max-buckets=1
>  
>  # radosgw-admin quota enable --quota-scope=bucket --uid=rgw_user_03
>  
>  # radosgw-admin user info --uid=rgw_user_03 | jq 
>  '.max_buckets,.bucket_quota'
>  1
>  {
>    "enabled": true,
>    "check_on_raw": false,
>    "max_size": 1024,
>    "max_size_kb": 1,
>    "max_objects": 1
>  }
>  
>  
>  "--max-buckets=0": number of buckets seems to be effectively unlimited
>  
>  "--max-buckets=1": the user can create exactly 1 bucket, further bucket
>  creation attempts get a "TooManyBuckets" HTTP 400
> response.
>  
>  Tried a negative number ("--max-buckets=-1"), but that had no effect at
>  all (not even an error message).
>  
>  I might pre-create a bucket for each user, e.g. a bucket named
>  "dead-end-bucket-for-rgw_user_03", so they are already at their maximum
>  bucket number when they first get their account credentials.
>  But can I also keep the user from simply deleting this pre-created
>  bucket and creating a new one with a name we intended for some other
>  use?
>  

[ceph-users] Re: rgw: disallowing bucket creation for specific users?

2023-10-06 Thread Ondřej Kukla
If you want to do it using CLI in one command then try this “radosgw-admin user 
create --uid=test --display-name=“Test User" --max-buckets=-1”

Ondrej

> On 6. 10. 2023, at 9:07, Matthias Ferdinand  wrote:
> 
> On Fri, Oct 06, 2023 at 08:55:42AM +0200, Ondřej Kukla wrote:
>> Hello Matthias,
>> 
>> In our setup we have a set of users that are only use to read from certain 
>> buckets (they have s3:GetObject set in the bucket policy).
>> 
>> When we create those read users using the Admin Ops API we add the 
>> max-buckets=-1 parameter which disables bucket creation.
>> 
>> https://docs.ceph.com/en/quincy/radosgw/adminops/#create-user
>> 
>> Isn’t this what are you looking for?
> 
> yes, it is, thank you. Seems to be essentially the same as providing
> "--max-buckets=-1" on the CLI with "radosgw quota set". But interesting
> to see this done in a single step when creating the user.
> 
> Matthias
> 
>> 
>> Regards,
>> 
>> Ondrej
>> 
>>> On 6. 10. 2023, at 8:44, Matthias Ferdinand  wrote:
>>> 
>>> On Thu, Oct 05, 2023 at 09:22:29AM +0200, Robert Hish wrote:
 Unless I'm misunderstanding your situation, you could also tag your
 placement targets. You then tag users with the corresponding tag enabling
 them to create new buckets at that placement target. If a user is not 
 tagged
 with the corresponding tag they cannot create new buckets at that placement
 target. The tags do not prevent users from using buckets that are already
 owned/created by them.
 
 It adds a bit of management overhead, but should give you the control youre
 looking for.
 
 https://docs.ceph.com/en/latest/radosgw/placement/#user-placement
>>> 
>>> 
>>> thanks. Seems like a bit of work initially configuring placement
>>> targets, but perhaps easier to handle when only a few users get to
>>> create buckets vs. a large number of users without bucket creation
>>> rights.
>>> 
>>> Matthias
>>> 
>>> 
 
 -Robert
 
 On 10/4/23 19:32, Matthias Ferdinand wrote:
>> Tried a negative number ("--max-buckets=-1"), but that had no effect at
>> all (not even an error message).
> 
> must have mistyped the command; trying again with "-max-buckets=-1", it
> shows the wanted effect: user cannot create any bucket.
> 
> So, an effective and elegant method indeed :-)
> 
> Matthias
> 
> PS: answering my own question below "how can I keep users from deleting
> their own bucket?": bucket policy.
> 
> $ cat denypolicy.txt
> {
>  "Version": "2012-10-17",
>  "Statement": [{
>"Effect": "Deny",
>"Principal": {"AWS": ["*"]},
>"Action": [ "s3:DeleteBucket", "s3:DeleteBucketPolicy", 
> "s3:PutBucketPolicy" ],
>"Resource": [
>  "arn:aws:s3:::*"
>]
>  }]
> }
> 
> Bucket cannot be deleted any more even by the bucket owner, and also the
> bucket policy can't be modified/deleted anymore.
> 
> This closes the loopholes I could come up with so far; there might still
> be some left I am currently not aware of :-)
> 
> 
> On Wed, Oct 04, 2023 at 06:20:09PM +0200, Matthias Ferdinand wrote:
>> On Tue, Oct 03, 2023 at 06:10:17PM +0200, Matthias Ferdinand wrote:
>>> On Sun, Oct 01, 2023 at 12:00:58PM +0200, Peter Goron wrote:
 Hi Matthias,
 
 One possible way to achieve your need is to set a quota on number of
 buckets  at user level (see
 https://docs.ceph.com/en/reef/radosgw/admin/#quota-management). Quotas 
 are
 under admin control.
>>> 
>>> thanks a lot, rather an elegant solution.
>> 
>> sadly, bucket quotas are not really as effective and elegant as I first
>> thought, since "--max-buckets=0" means "unlimited", not "no buckets".
>> 
>> Setting and enabling per-user bucket-scoped quota:
>> 
>># radosgw-admin quota set --uid=rgw_user_03 --quota-scope=bucket 
>> --max-objects=1 --max-size=1 --max-buckets=1
>> 
>># radosgw-admin quota enable --quota-scope=bucket --uid=rgw_user_03
>> 
>># radosgw-admin user info --uid=rgw_user_03 | jq 
>> '.max_buckets,.bucket_quota'
>>1
>>{
>>  "enabled": true,
>>  "check_on_raw": false,
>>  "max_size": 1024,
>>  "max_size_kb": 1,
>>  "max_objects": 1
>>}
>> 
>> 
>> "--max-buckets=0": number of buckets seems to be effectively unlimited
>> 
>> "--max-buckets=1": the user can create exactly 1 bucket, further bucket
>> creation attempts get a "TooManyBuckets" HTTP 400
>>   response.
>> 
>> Tried a negative number ("--max-buckets=-1"), but that had no effect at
>> all (not even an error message).
>> 
>> I might pre-create a bucket for each user, e.g. a bucket named
>> "dead-end-bucket-for-rgw_user_03", so they are already at their maximum
>> bucket num

[ceph-users] Received signal: Hangup from killall

2023-10-06 Thread Rok Jaklič
Hi,

yesterday we changed RGW from civetweb to beast and at 04:02 RGW stopped
working; we had to restart it in the morning.

In one rgw log for previous day we can see:
2023-10-06T04:02:01.105+0200 7fb71d45d700 -1 received  signal: Hangup from
killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw
rbd-mirror cephfs-mirror  (PID: 3202663) UID: 0
and in the next day log we can see:
2023-10-06T04:02:01.133+0200 7fb71d45d700 -1 received  signal: Hangup from
 (PID: 3202664) UID: 0

and after that no requests came. We had to restart rgw.

In ceph.conf we have something like

[client.radosgw.ctplmon2]
host = ctplmon2
log_file = /var/log/ceph/client.radosgw.ctplmon2.log
rgw_dns_name = ctplmon2
rgw_frontends = "beast ssl_endpoint=0.0.0.0:4443 ssl_certificate=..."
rgw_max_put_param_size = 15728640

We assume it has something to do with logrotate.

/etc/logrotate.d/ceph:
/var/log/ceph/*.log {
rotate 90
daily
compress
sharedscripts
postrotate
killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw
rbd-mirror cephfs-mirror || pkill -1 -x
"ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw|rbd-mirror|cephfs-mirror"
|| true
endscript
missingok
notifempty
su root ceph
}

ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific
(stable)

And ideas why this happend?

Kind regards,
Rok
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random issues with Reef

2023-10-06 Thread Eugen Block

Hi,

either the cephadm version installed on the host should be updated as  
well so it matches the cluster version or you can also use the one  
that the orchestrator uses which stores its different versions in this  
path (@Mykola thanks again for pointing that out), the latest matches  
the current ceph version:


/var/lib/ceph/${fsid}/cephadm.*

If you set the executable bit you can use it as usual:

# pacific package version
$ rpm -qf /usr/sbin/cephadm
cephadm-16.2.11.65+g8b7e6fc0182-lp154.3872.1.noarch

$ chmod +x  
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/cephadm.7dcbd4aab60af3e83970c60d4a8a2cc6ea7b997ecc2f4de0a47eeacbb88dde46


$ python3  
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/cephadm.7dcbd4aab60af3e83970c60d4a8a2cc6ea7b997ecc2f4de0a47eeacbb88dde46  
ls

[
{
"style": "cephadm:v1",
...
}
]



Also the command:

ceph orch upgrade start -ceph_version v18.2.0


That looks like a bug to me, it's reproducable:

$ ceph orch upgrade check --ceph-version 18.2.0
Error EINVAL: host ceph01 `cephadm pull` failed: cephadm exited with  
an error code: 1, stderr: Pulling container image  
quay.io/ceph/ceph:v18:v18.2.0...
Non-zero exit code 125 from /usr/bin/podman pull  
quay.io/ceph/ceph:v18:v18.2.0 --authfile=/etc/ceph/podman-auth.json

/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull  
quay.io/ceph/ceph:v18:v18.2.0 --authfile=/etc/ceph/podman-auth.json


It works correctly with 17.2.6:

# ceph orch upgrade check --ceph-version 18.2.0
{
"needs_update": {
"crash.soc9-ceph": {
"current_id":  
"2d45278716053f92517e447bc1a7b64945cc4ecbaff4fe57aa0f21632a0b9930",
"current_name":  
"quay.io/ceph/ceph@sha256:1e442b0018e6dc7445c3afa7c307bc61a06189ebd90580a1bb8b3d0866c0d8ae",

"current_version": "17.2.6"
...

I haven't checked for existing tracker issues yet. I'd recommend to  
check and create a bug report:


https://tracker.ceph.com/

Regards,
Eugen

Zitat von Martin Conway :


Hi

I have been using Ceph for many years now, and recently upgraded to Reef.

Seems I made the jump too quickly, as I have been hitting a few  
issues. I can't find any mention of them in the bug reports. I  
thought I would share them here in case it is something to do with  
my setup.


On V18.2.0

cephadm version

Fails with the following output:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File "/usr/sbin/cephadm/__main__.py", line 10096, in 
  File "/usr/sbin/cephadm/__main__.py", line 10084, in main
  File "/usr/sbin/cephadm/__main__.py", line 2240, in _infer_image
  File "/usr/sbin/cephadm/__main__.py", line 2338, in infer_local_ceph_image
  File "/usr/sbin/cephadm/__main__.py", line 2301, in get_container_info
  File "/usr/sbin/cephadm/__main__.py", line 2301, in 
  File "/usr/sbin/cephadm/__main__.py", line 222, in __getattr__
AttributeError: 'CephadmContext' object has no attribute 'fsid'

I don't know if it is related, but

cephadm adopt --style legacy --name osd.X

Tries to use a V15 image which then fails to start after being  
imported. The OSD in question has an SSD device from block.db if  
that is relevant.


Using the latest head version of cephadm from github let me work  
around this issue, but the adopted OSDs were running  
18.0.0-6603-g6c4ed58a and needed to be upgraded to 18.2.0.


Also the command:

ceph orch upgrade start -ceph_version v18.2.0

Does not work, it fails to find the right image. From memory I think  
it tried to pull quay.io/ceph/ceph:v18:v18.2.0


Ceph orch upgrade start quay.io/ceph/ceph:v18.2.0

Does work as expected.

Let me know if there is any other information that would be helpful,  
but I have since worked around these issues and have my ceph back in  
a happy state.


Regards,
Martin Conway
IT and Digital Media Manager
Research School of Physics
Australian National University
Canberra ACT 2601

+61 2 6125 1599
https://physics.anu.edu.au

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: is the rbd mirror journal replayed on primary after a crash?

2023-10-06 Thread Scheurer François
Dear all,


replying to my own question ;-)


this document explains the rbd mirroring / journaling process more in details: 
https://pad.ceph.com/p/I-rbd_mirroring


especially this part:
on startup, replay journal from flush position
Store journal metadata in journal header, to be more general

  *   flush position

  *   per-zone flush positions

pointers to positions in the journal (object, offset)
- one for each reader so we can tell how far we can trim
- store trim pos in primary and secondary zones, so despite loss of primary dc 
we can tell who's most up to date
=> so apparently there is one pointer to position in the journal for each 
secondary images (journal reader) and also importantly one for the primary 
image (normally journal writer, but also reader during open / crash recovery)
this apparently confirms that clients on the primary are not only writing to 
the journal (to support replication on secondary) but also actively reading 
from it after a crash to replay the latest IO's that were missing on primary 
image.


also useful info: https://tracker.ceph.com/projects/ceph/wiki/RBD_-_Mirroring
  *
  *   on open, replay recent journal operations
  *   periodically update a journal position pointer in the rbd image header 
(to limit replays on open)

and this: https://docs.ceph.com/en/pacific/rbd/rbd-mirroring/#force-image-resync
If a split-brain event is detected by the rbd-mirror daemon, it will not 
attempt to mirror the affected image until corrected.

cheers
Francois Scheurer




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch

From: Scheurer François 
Sent: Tuesday, October 3, 2023 4:38:07 PM
To: dilla...@redhat.com; ceph-users@ceph.io
Subject: [ceph-users] is the rbd mirror journal replayed on primary after a 
crash?


Hello



Short question regarding journal-based rbd mirroring.


▪IO path with journaling w/o cache:

a. Create an event to describe the update
b. Asynchronously append event to journal object
c. Asynchronously update image once event is safe
d. Complete IO to client once update is safe


[cf. 
https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring_0.pdf]


If a client crashes between b. and c., is there a mechanism to replay the IO 
from the journal on the primary image?

If not, then the primary and secondary images would get out-of-sync (because of 
the extra write(s) on secondary) and subsequent writes to the primary would 
corrupt the secondary. Is that correct?



Cheers

Francois Scheurer




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

Hi

we're still in HEALTH_ERR state with our cluster, this is the top of the 
output of `ceph health detail`


HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors; 
Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent; 
Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg 
degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not 
scrubbed in time

[WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
pg 26.323 has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 248 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs 
inconsistent
pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound
pg 26.337 is active+clean+inconsistent, acting 
[139,137,48,126,165,89,237,199,189]
pg 26.3e2 is active+clean+inconsistent, acting 
[12,27,24,234,195,173,98,32,35]
[WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects 
degraded (0.000%), 1 pg degraded, 1 pg undersized
pg 13.3a5 is stuck undersized for 4m, current state 
active+undersized+remapped+backfilling, last acting 
[2,45,32,62,2147483647,55,116,25,225,202,240]
pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound



For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc., 
however it fails to get resolved.


The osd.116 is already marked out and is beginning to get empty. I've 
tried restarting the osd processes of the first osd listed for each PG, 
but that doesn't get it resolved either.


I guess we should have enough redundancy to get the correct data back, 
but how can I tell ceph to fix it in order to get back to a healthy state?


Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

On 06/10/2023 16:09, Simon Oosthoek wrote:

Hi

we're still in HEALTH_ERR state with our cluster, this is the top of the 
output of `ceph health detail`


HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors; 
Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent; 
Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg 
degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not 
scrubbed in time

[WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
     pg 26.323 has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 248 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs 
inconsistent
     pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound
     pg 26.337 is active+clean+inconsistent, acting 
[139,137,48,126,165,89,237,199,189]
     pg 26.3e2 is active+clean+inconsistent, acting 
[12,27,24,234,195,173,98,32,35]
[WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects 
degraded (0.000%), 1 pg degraded, 1 pg undersized
     pg 13.3a5 is stuck undersized for 4m, current state 
active+undersized+remapped+backfilling, last acting 
[2,45,32,62,2147483647,55,116,25,225,202,240]
     pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound



For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc., 
however it fails to get resolved.


The osd.116 is already marked out and is beginning to get empty. I've 
tried restarting the osd processes of the first osd listed for each PG, 
but that doesn't get it resolved either.


I guess we should have enough redundancy to get the correct data back, 
but how can I tell ceph to fix it in order to get back to a healthy state?


I guess this could be related to the number of scrubs going on, I read 
somewhere that this may interfere with the repair request. I would 
expect the repair would have priority over scrubs...


BTW, we're running pacific for now, we want to update when the cluster 
is healthy again.


Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Wesley Dillingham
A repair is just a type of scrub and it is also limited by osd_max_scrubs
which in pacific is 1.

If another scrub is occurring on any OSD in the PG it wont start.

do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all
scrubs to stop (a few seconds probably)

Then issue the pg repair command again. It may start.

You also have pgs in backfilling state. Note that by default OSDs in
backfill or backfill_wait also wont perform scrubs.

You can modify this behavior with `ceph config set osd
osd_scrub_during_recovery
true`

I would suggest only setting that after the noscub flags are set and the
only scrub you want to get processed is your manual repair.

Then rm the scrub_during_recovery config item before unsetting the noscrub
flags.



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Oct 6, 2023 at 11:02 AM Simon Oosthoek 
wrote:

> On 06/10/2023 16:09, Simon Oosthoek wrote:
> > Hi
> >
> > we're still in HEALTH_ERR state with our cluster, this is the top of the
> > output of `ceph health detail`
> >
> > HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors;
> > Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent;
> > Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg
> > degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not
> > scrubbed in time
> > [WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
> >  pg 26.323 has 1 unfound objects
> > [ERR] OSD_SCRUB_ERRORS: 248 scrub errors
> > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs
> > inconsistent
> >  pg 26.323 is active+recovery_unfound+degraded+remapped, acting
> > [92,109,116,70,158,128,243,189,256], 1 unfound
> >  pg 26.337 is active+clean+inconsistent, acting
> > [139,137,48,126,165,89,237,199,189]
> >  pg 26.3e2 is active+clean+inconsistent, acting
> > [12,27,24,234,195,173,98,32,35]
> > [WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects
> > degraded (0.000%), 1 pg degraded, 1 pg undersized
> >  pg 13.3a5 is stuck undersized for 4m, current state
> > active+undersized+remapped+backfilling, last acting
> > [2,45,32,62,2147483647,55,116,25,225,202,240]
> >  pg 26.323 is active+recovery_unfound+degraded+remapped, acting
> > [92,109,116,70,158,128,243,189,256], 1 unfound
> >
> >
> > For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc.,
> > however it fails to get resolved.
> >
> > The osd.116 is already marked out and is beginning to get empty. I've
> > tried restarting the osd processes of the first osd listed for each PG,
> > but that doesn't get it resolved either.
> >
> > I guess we should have enough redundancy to get the correct data back,
> > but how can I tell ceph to fix it in order to get back to a healthy
> state?
>
> I guess this could be related to the number of scrubs going on, I read
> somewhere that this may interfere with the repair request. I would
> expect the repair would have priority over scrubs...
>
> BTW, we're running pacific for now, we want to update when the cluster
> is healthy again.
>
> Cheers
>
> /Simon
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Kai Stian Olstad

On 06.10.2023 17:48, Wesley Dillingham wrote:
A repair is just a type of scrub and it is also limited by 
osd_max_scrubs

which in pacific is 1.

If another scrub is occurring on any OSD in the PG it wont start.

do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all
scrubs to stop (a few seconds probably)

Then issue the pg repair command again. It may start.

You also have pgs in backfilling state. Note that by default OSDs in
backfill or backfill_wait also wont perform scrubs.

You can modify this behavior with `ceph config set osd
osd_scrub_during_recovery
true`

I would suggest only setting that after the noscub flags are set and 
the

only scrub you want to get processed is your manual repair.

Then rm the scrub_during_recovery config item before unsetting the 
noscrub

flags.


Hi Simon

Just to add to Wes's answer, Cern have made a nice script that do the 
steps Wes explained above

https://github.com/cernceph/ceph-scripts/blob/master/tools/scrubbing/autorepair.sh
that you might want to take a look at.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

Hi Wesley,

On 06/10/2023 17:48, Wesley Dillingham wrote:
A repair is just a type of scrub and it is also limited by 
osd_max_scrubs which in pacific is 1.


We've increased that to 4 (and temporarily to 8) since we have so many 
OSDs and are running behind on scrubbing.





If another scrub is occurring on any OSD in the PG it wont start.


that explains a lot.



do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all 
scrubs to stop (a few seconds probably)


Then issue the pg repair command again. It may start.


The script Kai linked seems like a good idea to fix this when needed.



You also have pgs in backfilling state. Note that by default OSDs in 
backfill or backfill_wait also wont perform scrubs.


You can modify this behavior with `ceph config set osd 
osd_scrub_during_recovery true`


We've set this already



I would suggest only setting that after the noscub flags are set and the 
only scrub you want to get processed is your manual repair.


Then rm the scrub_during_recovery config item before unsetting the 
noscrub flags.


Thanks for the suggestion!

Cheers

/Simon





Respectfully,

*Wes Dillingham*
w...@wesdillingham.com 
LinkedIn 


On Fri, Oct 6, 2023 at 11:02 AM Simon Oosthoek > wrote:


On 06/10/2023 16:09, Simon Oosthoek wrote:
 > Hi
 >
 > we're still in HEALTH_ERR state with our cluster, this is the top
of the
 > output of `ceph health detail`
 >
 > HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors;
 > Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent;
 > Degraded data redundancy: 6/7118781559 objects degraded (0.000%),
1 pg
 > degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657
pgs not
 > scrubbed in time
 > [WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
 >      pg 26.323 has 1 unfound objects
 > [ERR] OSD_SCRUB_ERRORS: 248 scrub errors
 > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs
 > inconsistent
 >      pg 26.323 is active+recovery_unfound+degraded+remapped, acting
 > [92,109,116,70,158,128,243,189,256], 1 unfound
 >      pg 26.337 is active+clean+inconsistent, acting
 > [139,137,48,126,165,89,237,199,189]
 >      pg 26.3e2 is active+clean+inconsistent, acting
 > [12,27,24,234,195,173,98,32,35]
 > [WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects
 > degraded (0.000%), 1 pg degraded, 1 pg undersized
 >      pg 13.3a5 is stuck undersized for 4m, current state
 > active+undersized+remapped+backfilling, last acting
 > [2,45,32,62,2147483647,55,116,25,225,202,240]
 >      pg 26.323 is active+recovery_unfound+degraded+remapped, acting
 > [92,109,116,70,158,128,243,189,256], 1 unfound
 >
 >
 > For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc.,
 > however it fails to get resolved.
 >
 > The osd.116 is already marked out and is beginning to get empty.
I've
 > tried restarting the osd processes of the first osd listed for
each PG,
 > but that doesn't get it resolved either.
 >
 > I guess we should have enough redundancy to get the correct data
back,
 > but how can I tell ceph to fix it in order to get back to a
healthy state?

I guess this could be related to the number of scrubs going on, I read
somewhere that this may interfere with the repair request. I would
expect the repair would have priority over scrubs...

BTW, we're running pacific for now, we want to update when the cluster
is healthy again.

Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Hardware recommendations for a Ceph cluster

2023-10-06 Thread Gustavo Fahnle
Hi,

Currently, I have an OpenStack installation with a Ceph cluster consisting of 4 
servers for OSD, each with 16TB SATA HDDs. My intention is to add a second, 
independent Ceph cluster to provide faster disks for OpenStack VMs.
The idea for this second cluster is to exclusively provide RBD services to 
OpenStack. I plan to start with a cluster composed of 3 mon/mgr nodes similar 
to what we currently have (3 virtualized servers with VMware) with 4 cores, 8GB 
of memory, 80GB disk and 10GB network
each server.
In the current cluster, these nodes have low resource consumption, less than 
10% CPU usage, 40% memory usage, and less than 100Mb/s of network usage.

For the OSDs, I'm thinking of starting with 3 or 4 servers, specifically 
Supermicro AS-1114S-WN10RT, each with:

1 AMD EPYC 7713P Gen 3 processor (64 Core, 128 Threads, 2.0GHz)
256GB of RAM
2 x NVME 1TB for the operating system
10 x NVME Kingston DC1500M U.2 7.68TB for the OSDs
Two Intel NIC E810-XXVDA2 25GbE Dual Port (2 x SFP28) PCIe 4.0 x8 cards
Connected to 2 MikroTik CRS518-16XS-2XQ-RM switches at 100GbE per server
Connection to OpenStack would be via 4 x 10GB to our core switch.

I would like to hear opinions about this configuration, recommendations, 
criticisms, etc.

If any of you have references or experience with any of the components in this 
initial configuration, they would be very welcome.

Thank you very much in advance.

Gustavo Fahnle

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hardware recommendations for a Ceph cluster

2023-10-06 Thread Anthony D'Atri


> Currently, I have an OpenStack installation with a Ceph cluster consisting of 
> 4 servers for OSD, each with 16TB SATA HDDs. My intention is to add a second, 
> independent Ceph cluster to provide faster disks for OpenStack VMs.

Indeed, I know from experience that LFF spinners don't cut it for boot drives.  
Even with strawberries.

> The idea for this second cluster is to exclusively provide RBD services to 
> OpenStack

Do you strictly need a second cluster?  Or could you just constrain your pools 
on the existing cluster based on deviceclass?

> For the OSDs, I'm thinking of starting with 3 or 4 servers, specifically 
> Supermicro AS-1114S-WN10RT,

SMCI offers chassis that are NVMe-only I think.  The above I think comes with 
an HBA you don't need or want.

> each with:
> 
> 1 AMD EPYC 7713P Gen 3 processor (64 Core, 128 Threads, 2.0GHz)
> 256GB of RAM
> 2 x NVME 1TB for the operating system
> 10 x NVME Kingston DC1500M U.2 7.68TB for the OSDs

The Kingstons are cost-effective, but last I looked up the specs they were 
kinda meh.  Beats spinners though.
This is more CPU and more RAM than you need for 10xNVMe unless you're also 
going to run RGW or other compute on them.

> Two Intel NIC E810-XXVDA2 25GbE Dual Port (2 x SFP28) PCIe 4.0 x8 cards

Why two?

> Connected to 2 MikroTik CRS518-16XS-2XQ-RM switches at 100GbE per server
> Connection to OpenStack would be via 4 x 10GB to our core switch.

Might 25GE be an alternative?


> 
> I would like to hear opinions about this configuration, recommendations, 
> criticisms, etc.
> 
> If any of you have references or experience with any of the components in 
> this initial configuration, they would be very welcome.
> 
> Thank you very much in advance.
> 
> Gustavo Fahnle
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Introduce: Storage stability testing and DATA consistency verifying tools and system

2023-10-06 Thread 张友加
Dear All,




I hope you are all well. I would like to introduce new tools I have developed, 
named "LBA tools" which including hd_write_verify & hd_write_verify_dump.




github: https://github.com/zhangyoujia/hd_write_verify




pdf:  https://github.com/zhangyoujia/hd_write_verify/DISK&MEMORY stability 
testing and DATA consistency verifying tools and system.pdf




ppt:  https://github.com/zhangyoujia/hd_write_verify/存储稳定性测试与数据一致性校验工具和系统.pptx




bin:  https://github.com/zhangyoujia/hd_write_verify/bin




iso:  https://github.com/zhangyoujia/hd_write_verify/iso




Data is a vital asset for many businesses, making storage stability and data 
consistency the most fundamental requirements in storage technology scenarios.




The purpose of storage stability testing is to ensure that storage devices or 
systems can operate normally and remain stable over time, while also handling 
various abnormal situations such as sudden power outages and network failures. 
This testing typically includes stress testing, load testing, fault tolerance 
testing, and other evaluations to assess the performance and reliability of the 
storage system.




Data consistency checking is designed to ensure that the data stored in the 
system is accurate and consistent. This means that whenever data changes occur, 
all replicas should be updated simultaneously to avoid data inconsistency. Data 
consistency checking typically involves aspects such as data integrity, 
accuracy, consistency, and reliability.




LBA tools are very useful for testing Storage stability and verifying DATA 
consistency, there are much better than FIO & vdbench's verifying functions.




I believe that LBA tools will have a positive impact on the community and help 
users handle storage data more effectively. Your feedback and suggestions are 
greatly appreciated, and I hope you can try using LBA tools and share your 
experiences and recommendations.




Best regards


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io