[ceph-users] Re: Recovering from total mon loss and backing up lockbox secrets

2024-08-06 Thread Christian Rohmann

On 06.08.24 1:19 PM, Boris wrote:

I am in the process of creating disaster recovery documentation and I have
two topics where I am not sure how to do it or even if it is possible.

Is it possible to recover from a 100% mon data loss? Like all mons fail and
the actual mon data is not recoverable.

In my head I would thing that I can just create new mons with the same
cluster ID and then start everything. The OSDs still have their PGs and
data and after some period of time everything will be ok again.

But then I thought that we use dmcrypt in ceph and I would need to somehow
backup all the keys to some offsite location.

So here are my questions:
- How do I backup the lockbox secrets?
- Do I need to backup the whole mon data, and if so how can I do it?


You are indeed correct - the keys need to be backed up outside of Ceph!

See:

 * Issue: https://tracker.ceph.com/issues/63801
 * PR by poelzl to add automatic backups: 
https://github.com/ceph/ceph/pull/56772




Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Release 18.2.4

2024-07-24 Thread Christian Rohmann

Hello Alfredo, all,

On 24.07.24 1:05 PM, Alfredo Rezinovsky wrote:

Ceph dashboard offers me to upgrade to v18.2.4.

I can't find any information on 18.2.4.
The is no 18.2.4 inhttps://docs.ceph.com/en/latest/releases/
Not a tag inhttps://github.com/ceph/ceph

I don´t understand why there's a 18.2.4 image and what's in it.


I brought this to peoples attention multiple times (16.x, 17.x releases) 
already,
see my ML thread: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QZOJKAL7UHVNSRHPZPTHEPOCZJP35TMK/#WJW23WQATBE6ZJULL3RKZ4YVWUCPMAP2


Mike Perez was mentioned by k0ste as someone who might know more about 
and could make changes to "the flow"



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Small issue with perms

2024-07-18 Thread Christian Rohmann

On 18.07.24 9:56 AM, Albert Shih wrote:

   Error scraping /var/lib/ceph/crash: [Errno 13] Permission denied: 
'/var/lib/ceph/crash'

There is / was a bug with the permissions for ceph-crash, see

* 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VACLBNVXTYNSXJSNXJSRAQNZHCHABDF4/

* https://tracker.ceph.com/issues/64548
* Reef backport (NOT merged yet): https://github.com/ceph/ceph/pull/58458


Maybe your issue is somewhat related?


Regards


Christian





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite: metadata behind on shards

2024-05-13 Thread Christian Rohmann

On 13.05.24 5:26 AM, Szabo, Istvan (Agoda) wrote:

Wonder what is the mechanism behind the sync mechanism because I need to 
restart all the gateways every 2 days on the remote sites to keep those it in 
sync. (Octopus 15.2.7)
We've also seen lots of those issues with stuck RGWs with earlier 
versions. But there have been lots of fixes in this area ... e.g. 
https://tracker.ceph.com/issues/39657



Is upgrading Ceph to a more recent version an option for you?



Regards


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-18 Thread Christian Rohmann

On 18.04.24 8:13 PM, Laura Flores wrote:
Thanks for bringing this to our attention. The leads have decided that 
since this PR hasn't been merged to main yet and isn't approved, it 
will not go in v18.2.3, but it will be prioritized for v18.2.4.
I've already added the PR to the v18.2.4 milestone so it's sure to be 
picked up.


Thanks a bunch. If you miss the train, you miss the train - fair enough.
Nice to know there is another one going soon and that bug is going to be 
on it !



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-18 Thread Christian Rohmann

Hey Laura,


On 17.04.24 4:58 PM, Laura Flores wrote:

There are two PRs that were added later to the 18.2.3 milestone concerning
debian packaging:
https://github.com/ceph/ceph/pulls?q=is%3Apr+is%3Aopen+milestone%3Av18.2.3
The user is asking if these can be included.


I know everybody always wants their most anticipated PR in the next 
point release,
but please let me kindly point you to the issue of ceph-crash not 
working due to some small glitch it's directory permissions:


 * ML post to the ML 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VACLBNVXTYNSXJSNXJSRAQNZHCHABDF4/

 * Bug Report: https://github.com/ceph/ceph/pull/55917
 * Non-backport PR fixing this: https://tracker.ceph.com/issues/64548


Since this is really potentially a one liner fix allowing for ceph-crash 
reports to be sent again.
When I noticed this, I had 47 non-reported crashes queues up in one my 
clusters.




Regards


Christian




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw s3 bucket policies limitations (on users)

2024-04-02 Thread Christian Rohmann

Hey Garcetto,

On 29.03.24 4:13 PM, garcetto wrote:

   i am trying to set bucket policies to allow to different users to access
same bucket with different permissions, BUT it seems that is not yet
supported, am i wrong?

https://docs.ceph.com/en/reef/radosgw/bucketpolicy/#limitations

"We do not yet support setting policies on users, groups, or roles."


Maybe check out my previous, somewhat similar question: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/S2TV7GVFJTWPYA6NVRXDL2JXYUIQGMIN/

And PR https://github.com/ceph/ceph/pull/44434 could also be of interest.

I would love for RGW to support more detailed bucket policies, 
especially with external / Keystone authentication.




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Journal size recommendations

2024-03-08 Thread Christian Rohmann

On 01.03.22 19:57, Eugen Block wrote:
can you be more specific what exactly you are looking for? Are you 
talking about the rocksDB size? And what is the unit for 5012? It’s 
really not clear to me what you’re asking. And since the 
recommendations vary between different use cases you might want to 
share more details about your use case.



FWIW, I suppose OP was asking about this setting: 
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_journal_size
And reading 
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#journal-settings 
states


"This section applies only to the older Filestore OSD back end. Since 
Luminous BlueStore has been default and preferred."



It's totally obsolete with bluestore.



Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw dynamic bucket sharding will hang io

2024-03-08 Thread Christian Rohmann

On 08.03.24 14:25, Christian Rohmann wrote:
What do you mean by blocking IO? No bucket actions (read / write) or 
high IO utilization?


According to https://docs.ceph.com/en/latest/radosgw/dynamicresharding/

"Writes to the target bucket are blocked (but reads are not) briefly 
during resharding process."


Are you observing this not being that "briefly" then?



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw dynamic bucket sharding will hang io

2024-03-08 Thread Christian Rohmann

On 08.03.24 07:22, nuabo tan wrote:

When reshard occurs, io will be blocked, why has this serious problem not been 
solved?


Do you care to elaborate on this a bit more?

Which Ceph release are you using?
Are you using multisite replication or are you talking about a single 
RGW site?


What do you mean by blocking IO? No bucket actions (read / write) or 
high IO utilization?




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: debian-reef_OLD?

2024-03-05 Thread Christian Rohmann

On 04.03.24 22:24, Daniel Brown wrote:

debian-reef/

Now appears to be:

debian-reef_OLD/


Could this have been  some sort of "release script" just messing up the 
renaming / symlinking to the most recent stable?




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-crash NOT reporting crashes due to wrong permissions on /var/lib/ceph/crash/posted (Debian / Ubuntu packages)

2024-02-29 Thread Christian Rohmann




On 23.02.24 16:18, Christian Rohmann wrote:
I just noticed issues with ceph-crash using the Debian /Ubuntu 
packages (package: ceph-base):


While the /var/lib/ceph/crash/posted folder is created by the package 
install,

it's not properly chowned to ceph:ceph by the postinst script.

[...]

You might want to check if you might be affected as well.
Failing to post crashes to the local cluster results in them not being 
reported back via telemetry.


Sorry to bluntly bump this again, but did nobody else notice this on 
your clusters?
Call me egoistic, but the more clusters return crash reports the more 
stable my Ceph likely becomes ;-)



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-crash NOT reporting crashes due to wrong permissions on /var/lib/ceph/crash/posted (Debian / Ubuntu packages)

2024-02-23 Thread Christian Rohmann

Hey ceph-users,

I just noticed issues with ceph-crash using the Debian /Ubuntu packages 
(package: ceph-base):


While the /var/lib/ceph/crash/posted folder is created by the package 
install,

it's not properly chowned to ceph:ceph by the postinst script.
This might also affect RPM based installs somehow, but I did not look 
into that.


I opened a bug report with all the details and two ideas to fix this: 
https://tracker.ceph.com/issues/64548



The wrong ownership causes ceph-crash to NOT work at all. I myself 
missed quite a few crash reports. All of them were just sitting around 
on the machines, but were reported right after I did


 chown ceph:ceph /var/lib/ceph/crash/posted
 systemctl restart ceph-crash.service

You might want to check if you might be affected as well.
Failing to post crashes to the local cluster results in them not being 
reported back via telemetry.



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-02-05 Thread Christian Rohmann

On 01.02.24 10:10, Christian Rohmann wrote:

[...]
I am wondering if ceph-exporter ([2] is also built and packaged via 
the ceph packages [3] for installations that use them?




[2] https://github.com/ceph/ceph/tree/main/src/exporter
[3] https://docs.ceph.com/en/latest/install/get-packages/


I could not find ceph-exporter in any of the packages or as single 
binary, so I opened an issue:


https://tracker.ceph.com/issues/64317



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how can install latest dev release?

2024-02-01 Thread Christian Rohmann

On 31.01.24 11:33, garcetto wrote:
thank you, but seems related to quincy, there is nothing on latest 
vesions in the doc...maybe the doc is not updated?



I don't understand what you are missing. I just used a documentation 
link pointing to the Quincy version of this page, yes.
The "latest" documentation is at 
https://docs.ceph.com/en/latest/install/get-packages/#ceph-development-packages.
But it seems nothing has changed. There are dev packages available at 
the URLs mentioned there.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-02-01 Thread Christian Rohmann
This change is documented at 
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics,
also mentioning the deployment of ceph-exporter which is now used to 
collect per-host metrics from the local daemons.


While this deployment is done by cephadm if used, I am wondering if 
ceph-exporter ([2] is also built and packaged via the ceph packages [3] 
for installations that use them?




Regards


Christian





[1] 
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics

[2] https://github.com/ceph/ceph/tree/main/src/exporter
[3] https://docs.ceph.com/en/latest/install/get-packages/




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how can install latest dev release?

2024-01-31 Thread Christian Rohmann

On 31.01.24 09:38, garcetto wrote:

  how can i install latest dev release using cephadm?
I suppose you found 
https://docs.ceph.com/en/quincy/install/get-packages/#ceph-development-packages, 
but yes, that only seems to target a package installation.
Would be nice if there were also dev containers being built somewhere to 
use with cephadm.




Regards

Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

2024-01-12 Thread Christian Rohmann

Hey Istvan,

On 10.01.24 03:27, Szabo, Istvan (Agoda) wrote:
I'm using in the frontend https config on haproxy like this, it works 
so far good:


stick-table type ip size 1m expire 10s store http_req_rate(10s)

tcp-request inspect-delay 10s
tcp-request content track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 1 }



But this serves as a basic rate limit for all request coming from a 
single IP address, right?



My question was rather about limiting clients in regards to 
authentication requests / unauthorized requests,

which end up hammering the auth system (Keystone in my case) at full rate.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

2024-01-09 Thread Christian Rohmann

Happy New Year Ceph-Users!

With the holidays and people likely being away, I take the liberty to 
bluntly BUMP this question about protecting RGW from DoS below:



On 22.12.23 10:24, Christian Rohmann wrote:

Hey Ceph-Users,


RGW does have options [1] to rate limit ops or bandwidth per bucket or 
user.

But those only come into play when the request is authenticated.

I'd like to also protect the authentication subsystem from malicious 
or invalid requests.
So in case e.g. some EC2 credentials are not valid (anymore) and 
clients start hammering the RGW with those requests, I'd like to make 
it cheap to deal with those requests. Especially in case some external 
authentication like OpenStack Keystone [2] is used, valid access 
tokens are cached within the RGW. But requests with invalid 
credentials end up being sent at full rate to the external API [3] as 
there is no negative caching. And even if there was, that would only 
limit the external auth requests for the same set of invalid 
credentials, but it would surely reduce the load in that case:


Since the HTTP request is blocking  



[...]
2023-12-18T15:25:55.861+ 7fec91dbb640 20 sending request to 
https://keystone.example.com/v3/s3tokens
2023-12-18T15:25:55.861+ 7fec91dbb640 20 register_request 
mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0
2023-12-18T15:25:55.861+ 7fec91dbb640 20 WARNING: blocking http 
request
2023-12-18T15:25:55.861+ 7fede37fe640 20 link_request 
req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0

[...]



this does not only stress the external authentication API (keystone in 
this case), but also blocks RGW threads for the duration of the 
external call.


I am currently looking into using the capabilities of HAProxy to rate 
limit requests based on their resulting http-response [4]. So in 
essence to rate-limit or tarpit clients that "produce" a high number 
of 403 "InvalidAccessKeyId" responses. To have less collateral it 
might make sense to limit based on the presented credentials 
themselves. But this would require to extract and track HTTP headers 
or URL parameters (presigned URLs) [5] and to put them into tables.



* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those 
cases itself?

** adding negative caching
** rate limits on concurrent external authentication requests (or is 
there a pool of connections for those requests?)




Regards


Christian



[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2] 
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone
[3] 
https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/src/rgw/rgw_auth_keystone.cc#L475
[4] 
https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-http-response%20track-sc0
[5] 
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#auth-methods-intro



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

2023-12-22 Thread Christian Rohmann

Hey Ceph-Users,


RGW does have options [1] to rate limit ops or bandwidth per bucket or user.
But those only come into play when the request is authenticated.

I'd like to also protect the authentication subsystem from malicious or 
invalid requests.
So in case e.g. some EC2 credentials are not valid (anymore) and clients 
start hammering the RGW with those requests, I'd like to make it cheap 
to deal with those requests. Especially in case some external 
authentication like OpenStack Keystone [2] is used, valid access tokens 
are cached within the RGW. But requests with invalid credentials end up 
being sent at full rate to the external API [3] as there is no negative 
caching. And even if there was, that would only limit the external auth 
requests for the same set of invalid credentials, but it would surely 
reduce the load in that case:


Since the HTTP request is blocking  



[...]
2023-12-18T15:25:55.861+ 7fec91dbb640 20 sending request to 
https://keystone.example.com/v3/s3tokens
2023-12-18T15:25:55.861+ 7fec91dbb640 20 register_request 
mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0
2023-12-18T15:25:55.861+ 7fec91dbb640 20 WARNING: blocking http 
request
2023-12-18T15:25:55.861+ 7fede37fe640 20 link_request 
req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0

[...]



this does not only stress the external authentication API (keystone in 
this case), but also blocks RGW threads for the duration of the external 
call.


I am currently looking into using the capabilities of HAProxy to rate 
limit requests based on their resulting http-response [4]. So in essence 
to rate-limit or tarpit clients that "produce" a high number of 403 
"InvalidAccessKeyId" responses. To have less collateral it might make 
sense to limit based on the presented credentials themselves. But this 
would require to extract and track HTTP headers or URL parameters 
(presigned URLs) [5] and to put them into tables.



* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those 
cases itself?

** adding negative caching
** rate limits on concurrent external authentication requests (or is 
there a pool of connections for those requests?)




Regards


Christian



[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2] 
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone
[3] 
https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/src/rgw/rgw_auth_keystone.cc#L475
[4] 
https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#4.2-http-response%20track-sc0
[5] 
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#auth-methods-intro

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Automatic triggering of the Ubuntu SRU process, e.g. for the recent 17.2.7 Quincy point release?

2023-11-12 Thread Christian Rohmann

Hey Yuri, hey ceph-users,

first of all, thanks for all your work on developing and maintaining Ceph.

I was just wondering if there was any sort of process or trigger to the 
Ubuntu release team following a point release, for them to also create 
updated packages.
If you look at https://packages.ubuntu.com/jammy-updates/ceph, there 
still only is 17.2.6 as the current update available.
There was an [SRU] bug raised for 17.2.6 
(https://bugs.launchpad.net/cloud-archive/+bug/2018929), I now opened a 
similar one (https://bugs.launchpad.net/cloud-archive/+bug/2043336) 
hoping I went the right way of triggering the packaging this point release.


Even though the Ceph team does not build Quincy packages for Ubuntu 
22.04 LTS (Jammy) themselves, it would be nice to still treat it 
somewhat of as a release channel and to automatically trigger these kind 
of processes.




Regards


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Packages for 17.2.7 released without release notes / announcement (Re: Re: Status of Quincy 17.2.5 ?)

2023-10-30 Thread Christian Rohmann

Sorry to dig up this old thread ...

On 25.01.23 10:26, Christian Rohmann wrote:

On 20/10/2022 10:12, Christian Rohmann wrote:

1) May I bring up again my remarks about the timing:

On 19/10/2022 11:46, Christian Rohmann wrote:

I believe the upload of a new release to the repo prior to the 
announcement happens quite regularly - it might just be due to the 
technical process of releasing.
But I agree it would be nice to have a more "bit flip" approach to 
new releases in the repo and not have the packages appear as updates 
prior to the announcement and final release and update notes.
By my observations sometimes there are packages available on the 
download servers via the "last stable" folders such as 
https://download.ceph.com/debian-quincy/ quite some time before the 
announcement of a release is out.
I know it's hard to time this right with mirrors requiring some time 
to sync files, but would be nice to not see the packages or have 
people install them before there are the release notes and potential 
pointers to changes out. 


Todays 16.2.11 release shows the exact issue I described above 

1) 16.2.11 packages are already available via e.g. 
https://download.ceph.com/debian-pacific
2) release notes not yet merged: 
(https://github.com/ceph/ceph/pull/49839), thus 
https://ceph.io/en/news/blog/2022/v16-2-11-pacific-released/ show a 
404 :-)
3) No announcement like 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/QOCU563UD3D3ZTB5C5BJT5WRSJL5CVSD/ 
to the ML yet.




I really appreciate the work (implementation and also testing) that goes 
into each release.
But the release of 17.2.7 showed the issue of "packages available before 
the news is out":


1) packages are available on e.g. download.ceph.com
2) There are NO release notes on at 
https://docs.ceph.com/en/latest/releases/ yet

3) And there is no announcement on the ML yet


It would be awesome if you could consider bit-flip releases with 
packages only available right with the communication / release notes.




Regards


Christian






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CVE-2023-43040 - Improperly verified POST keys in Ceph RGW?

2023-09-27 Thread Christian Rohmann

Hey Ceph-users,

i just noticed there is a post to oss-security 
(https://www.openwall.com/lists/oss-security/2023/09/26/10) about a 
security issue with Ceph RGW.

Signed by IBM / Redhat and including a patch by DO.


I also raised an issue on the tracker 
(https://tracker.ceph.com/issues/63004) about this, as I could not find 
one yet.
It seems a weird way of disclosing such a thing and am wondering if 
anybody knew any more about this?




Regards


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What is causing *.rgw.log pool to fill up / not be expired (Re: RGW multisite logs (data, md, bilog) not being trimmed automatically?)

2023-09-14 Thread Christian Rohmann
I am unfortunately still observing this issue of the RADOS pool 
"*.rgw.log" filling up with more and more objects:


On 26.06.23 18:18, Christian Rohmann wrote:

On the primary cluster I am observing an ever growing (objects and 
bytes) "sitea.rgw.log" pool, not so on the remote "siteb.rgw.log" 
which is only 300MiB and around 15k objects with no growth.
Metrics show that the growth of pool on primary is linear for at least 
6 months, so not sudden spikes or anything. Also sync status appears 
to be totally happy.

There are also no warnings in regards to large OMAPs or anything similar.


Could anybody kindly point me into the right direction to search for the 
cause of this?

What kinds of logs and data are stored in this pool?



Thanks and with kind regards,


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-26 Thread Christian Rohmann

On 25.08.23 09:09, Eugen Block wrote:

I'm still not sure if we're on the same page.


Maybe not, I'll respond inline to clarify.




By looking at 
https://docs.ceph.com/en/latest/man/8/ceph-volume/#cmdoption-ceph-volume-lvm-prepare-block.db 
it seems that ceph-volume wants an LV or partition. So it's 
apparently not just taking a VG itself? Also if there were multiple 
VGs / devices , I likely would need to at least pick those.


ceph-volume creates all required VGs/LVs automatically, and the OSD 
creation happens in batch mode, for example when run by cephadm:

ceph-volume lvm batch --yes /dev/sdb /dev/sdc /dev/sdd

In a non-cephadm deployment you can fiddle with ceph-volume manually, 
where you also can deploy single OSDs, with or without providing your 
own pre-built VGs/LVs. In a cephadm deployment manually creating OSDs 
will result in "stray daemons not managed by cephadm" warnings.


1) I am mostly asking about an non-cephadm environment and would just 
like to know if ceph-volume can also manage the VG of a DB/WAL device 
that is used for multiple OSD and create the individual LVs which are 
used for DB or WAL devices when creating a single OSD. Below you give an 
example "before we upgraded to Pacific" in which you run lvcreate 
manually. Is that not required anymore with >= Quincy?
2) Even with cephadm there is the "db_devices" as part of the 
drivegroups. But the question remains if cephadm can use a single 
db_device for multiple OSDs.



Before we upgraded to Pacific we did manage our block.db devices 
manually with pre-built LVs, e.g.:


$ lvcreate -L 30G -n bluefsdb-30 ceph-journals
$ ceph-volume lvm create --data /dev/sdh --block.db 
ceph-journals/bluefsdb-30


As asked and explained in the paragraph above, this is what I am 
currently doing (lvcreate + ceph-volume lvm create). My question 
therefore is, if ceph-volume (!) could somehow create this LV for the DB 
automagically if I'd just give it a device (or existing VG)?



Thank you very much for your patience in clarifying and responding to my 
questions.

Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-24 Thread Christian Rohmann

On 11.08.23 16:06, Eugen Block wrote:
if you deploy OSDs from scratch you don't have to create LVs manually, 
that is handled entirely by ceph-volume (for example on cephadm based 
clusters you only provide a drivegroup definition). 


By looking at 
https://docs.ceph.com/en/latest/man/8/ceph-volume/#cmdoption-ceph-volume-lvm-prepare-block.db 
it seems that ceph-volume wants an LV or partition. So it's apparently 
not just taking a VG itself? Also if there were multiple VGs / devices , 
I likely would need to at least pick those.


But I suppose this orchestration would then require cephadm 
(https://docs.ceph.com/en/latest/cephadm/services/osd/#drivegroups) and 
cannot be done via ceph-volume which merely takes care of ONE OSD at a time.



I'm not sure if automating db/wal migration has been considered, it 
might be (too) difficult. But moving the db/wal devices to 
new/different devices doesn't seem to be a reoccuring issue (corner 
case?), so maybe having control over that process for each OSD 
individually is the safe(r) option in case something goes wrong. 


Sorry for the confusion. I was not talking about any migrations, just 
the initial creation of spinning rust OSDs with DB or WAL on fast storage.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] When to use the auth profiles simple-rados-client and profile simple-rados-client-with-blocklist?

2023-08-22 Thread Christian Rohmann

Hey ceph-users,

1) When configuring Gnocchi to use Ceph storage (see 
https://gnocchi.osci.io/install.html#ceph-requirements)

I was wondering if one could use any of the auth profiles like
 * simple-rados-client
 * simple-rados-client-with-blocklist ?

Or are those for different use cases?

2) I was also wondering why the documentation mentions "(Monitor only)" 
but then it says

"Gives a user read-only permissions for monitor, OSD, and PG data."?

3) And are those profiles really for "read-only" users? Why don't they 
have "read-only" in their name like the rbd and the corresponding 
"rbd-read-only" profile?



Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-11 Thread Christian Rohmann

Hey ceph-users,

I was wondering if ceph-volume did anything in regards to the management 
(creation, setting metadata, ) of LVs which are used for

DB / WAL of an OSD?

Reading the documentation at 
https://docs.ceph.com/en/latest/man/8/ceph-volume/#new-db is seems to 
indicate that the LV to be used as e.g. DB needs to be created manually 
(without ceph-volume) and exist prior to using ceph-volume to move the 
DB to that LV? I suppose the same is true for "ceph-volume lvm create" 
or "ceph-volume lvm prepare" and "--block.db"


It's not that creating a few LVs is hard... it's just that ceph volume 
does apply some structure to the naming of LVM VGs and LVs on the OSD 
device and also adds metadata. That would then be up to the user, right?




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume lvm new-db fails

2023-08-11 Thread Christian Rohmann

On 10/08/2023 13:30, Christian Rohmann wrote:

It's already fixed master, but the backports are all still pending ...


There are PRs for the backports now:

* https://tracker.ceph.com/issues/62060
* https://tracker.ceph.com/issues/62061
* https://tracker.ceph.com/issues/62062



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-volume lvm new-db fails

2023-08-10 Thread Christian Rohmann



On 11/05/2022 23:21, Joost Nieuwenhuijse wrote:
After a reboot the OSD turned out to be corrupt. Not sure if 
ceph-volume lvm new-db caused the problem, or failed because of 
another problem.



I just ran into the same issue trying to add a db to an existing OSD.
Apparently this is a known bug: https://tracker.ceph.com/issues/55260

It's already fixed master, but the backports are all still pending ...



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW accessing real source IP address of a client (e.g. in S3 bucket policies)

2023-07-06 Thread Christian Rohmann

Hey Casey, all,

On 16/06/2023 17:00, Casey Bodley wrote:



But when applying a bucket policy with aws:SourceIp it seems to only work if I 
set the internal IP of the HAProxy instance, not the public IP of the client.
So the actual remote address is NOT used in my case.


Did I miss any config setting anywhere?


your 'rgw remote addr param' config looks right. with that same
config, i was able to set a bucket policy that denied access based on


I found the issue. Embarrassingly it was simply a NAT-Hairpin which was 
applied to the traffic from the server I was testing with.
In short: Even though I targeted the public IP from the HAProxy instance 
the internal IP address of my test server was maintained as source since 
both machines are on the same network segment.
That is why I first thought the LB IP was applied to the policy, but not 
the actual public source IP of the client. In reality it was simply the 
private, RFC1918, IP of the test machine that came in as source.




Sorry for the noise and thanks for your help.

Christian


P.S. With IPv6, this would not have happened.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW multisite logs (data, md, bilog) not being trimmed automatically?

2023-06-29 Thread Christian Rohmann
There was a similar issue reported at 
https://tracker.ceph.com/issues/48103 and yet another ML post at

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/5LGXQINAJBIGFUZP5WEINVHNPBJEV5X7

May I second the question if it's safe to run radosgw-admin autotrim on 
those logs?
If so, why is that required and why seems to be no periodic trimming 
happening?




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bluestore compression - Which algo to choose? Zstd really still that bad?

2023-06-27 Thread Christian Rohmann

Hey Igor,

On 27/06/2023 12:06, Igor Fedotov wrote:
I can't say anything about your primary question on zstd 
benefits/drawbacks but I'd like to emphasize that compression ratio at 
BlueStore is (to a major degree) determined by the input data flow 
characteristics (primarily write block size), object store allocation 
unit size (bluestore_min_alloc_size) and some parameters (e.g. maximum 
blob size) that determine how input data chunks are logically split 
when landing on disk.
E.g. if one has min_alloc_size set to 4K and write block size is in 
(4K-8K] then resulting compressed block would never be less than 4K. 
Hence compression ratio is never more than 2.
Similarly if min_alloc_size is 64K there would be no benefit in 
compression at all for the above input since target allocation units 
are always larger than input blocks.
The rationale of the above behavior is that compression is applied 
exclusively on input blocks - there is no additional processing to 
merge input and existing data and compress them all together.



Thanks for the emphasis on input data and its block-size. Yes, that is 
certainly the most important factor for the compression efficiency and 
choice of an suitable algorithm for a certain use-case.
In my case the pool is RBD only, so (by default) the blocks are 4M if I 
am not mistaken. I also understand that even though larger blocks 
generally compress better, I know there is no relation between
different blocks in regard to compression dictionaries (going along the 
lines of de-duplication). In the end in my use-case it boils down to the 
type of data stored on the RBD images and how compressible that might be.
But since those blocks are only written once, and I am ready to invest 
more CPU cycles to reduce the size on disk.


I am simply looking for data other might have collected on their similar 
use-cases.
Also I am still wondering if there really is nobody that worked/played 
more with zstd since that has become so popular in recent months...



Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW multisite logs (data, md, bilog) not being trimmed automatically?

2023-06-26 Thread Christian Rohmann

Hey ceph-users,

I am running two (now) Quincy clusters doing RGW multi-site replication 
with only one actually being written to by clients.

The other site is intended simply as a remote copy.

On the primary cluster I am observing an ever growing (objects and 
bytes) "sitea.rgw.log" pool, not so on the remote "siteb.rgw.log" which 
is only 300MiB and around 15k objects with no growth.
Metrics show that the growth of pool on primary is linear for at least 6 
months, so not sudden spikes or anything. Also sync status appears to be 
totally happy.

There are also no warnings in regards to large OMAPs or anything similar.

I was under the impression that RGW will trim its three logs (md, bi, 
data) automatically and only keep data that has not yet been replicated 
by the other zonegroup members?
The config option "ceph config get mgr rgw_sync_log_trim_interval" is 
set to 1200, so 20 Minutes.


So I am wondering if there might be some inconsistency or how I can best 
analyze what the cause for the accumulation of log data is?
There are older questions on the ML, such as [1], but there was not 
really a solution or root cause identified.


I know there is manual trimming, but I'd rather want to analyze the 
current situation and figure out what the cause for the lack of 
auto-trimming is.



  * Do I need to go through all buckets and count logs and look at 
their timestamps? Which queries do make sense here?
  * Is there usually any logging of the log trimming activity that I 
should expect? Or that might indicate why trimming does not happen?



Regards

Christian


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/WZCFOAMLWV3XCGJ3TVLHGMJFVYNZNKLD/




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Radogw ignoring HTTP_X_FORWARDED_FOR header

2023-06-26 Thread Christian Rohmann

Hello Yosr,

On 26/06/2023 11:41, Yosr Kchaou wrote:

We are facing an issue with getting the right value for the header
HTTP_X_FORWARDED_FOR when getting client requests. We need this value to do
the source ip check validation.

[...]

Currently, RGW sees that all requests come from 127.0.0.1. So it is still
considering the nginx ip address and not the client who made the request.
May I point you to my recent post to this ML about this very question: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/IKGLAROSVWHSRZQSYTLLHVRWFPOLBEGL/


I am still planning to reproduce this issue with simple examples and 
headers set manually via e.g. curl to rule out anything stupid I might 
have misconfigured in my case. I just did not find the time yet.


But did you sniff any traffic to the backend or verified how the headers 
look like in your case? Any debug logging "debug rgw = 20" where you can 
see what RGW things of the incoming request?
Did you test with S3 bucket policies or how did you come to the 
conclusion that RGW is not using the X_FORWARDED_FOR header? Or what is 
your indication that things are not working as expected?


From what I can see, the rgw client log does NOT print the external IP 
from the header, but the source IP of the incoming TCP connection:


    2023-06-26T11:14:37.070+ 7f0389e0b700  1 beast: 0x7f051c776660: 
192.168.1.1 - someid [26/Jun/2023:11:14:36.990 +] "PUT 
/bucket/object HTTP/1.1" 200 43248 - "aws-sdk-go/1.27.0 (go1.16.15; 
linux; amd64) S3Manager" - latency=0.07469s



while the rgw ops log does indeed print the remote_address in remote_addr:

{"bucket":"bucket","time":"2023-06-26T11:16:08.721465Z","time_local":"2023-06-26T11:16:08.721465+","remote_addr":"xxx.xxx.xxx.xxx","user":"someuser","operation":"put_obj","uri":"PUT 
/bucket/object 
HTTP/1.1","http_status":"200","error_code":"","bytes_sent":0,"bytes_received":64413,"object_size":64413,"total_time":155,"user_agent":"aws-sdk-go/1.27.0 
(go1.16.15; linux; amd64) 
S3Manager","referrer":"","trans_id":"REDACTED","authentication_type":"Keystone","access_key_id":"REDACTED","temp_url":false}



So in my case it's not that RGW does not receive and logs this info, but 
more about it not applying this in a bucket policy (as far as my 
analysis of the issue goes).




Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bluestore compression - Which algo to choose? Zstd really still that bad?

2023-06-26 Thread Christian Rohmann

Hey ceph-users,

we've been using the default "snappy" to have Ceph compress data on 
certain pools - namely backups / copies of volumes of a VM environment.

So it's write once, and no random access.
I am now wondering if switching to another algo (there is snappy, zlib, 
lz4, or zstd) would improve the compression ratio (significantly)?


* Does anybody have any real world data on snappy vs. $anyother?

Using zstd is tempting as it's used in various other applications 
(btrfs, MongoDB, ...) for inline-compression with great success.
For Ceph though there is a warning ([1]), about it being not recommended 
in the docs still. But I am wondering if this still stands with e.g. [2] 
merged.
And there was [3] trying to improve the performance, this this reads as 
it only lead to a dead-end and no code changes?



In any case does anybody have any numbers to help with the decision on 
the compression algo?




Regards


Christian


[1] 
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#confval-bluestore_compression_algorithm

[2] https://github.com/ceph/ceph/pull/33790
[3] https://github.com/facebook/zstd/issues/910
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW accessing real source IP address of a client (e.g. in S3 bucket policies)

2023-06-15 Thread Christian Rohmann

On 15/06/2023 15:46, Casey Bodley wrote:

   * In case of HTTP via headers like "X-Forwarded-For". This is
apparently supported only for logging the source in the "rgw ops log" ([1])?
Or is this info used also when evaluating the source IP condition within
a bucket policy?

yes, the aws:SourceIp condition key does use the value from
X-Forwarded-For when present


I have an HAProxy in front of the RGWs which has

"option forwardfor" set  to add the "X-Forwarded-For" header.

Then the RGWs have  "rgw remote addr param = http_x_forwarded_for" set,
according to 
https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr_param


and I also see remote_addr properly logged within the rgw ops log.



But when applying a bucket policy with aws:SourceIp it seems to only 
work if I set the internal IP of the HAProxy instance, not the public IP 
of the client.

So the actual remote address is NOT used in my case.


Did I miss any config setting anywhere?




Regards and thanks for your help


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW accessing real source IP address of a client (e.g. in S3 bucket policies)

2023-06-15 Thread Christian Rohmann

Hello Ceph-Users,

context or motivation of my question is S3 bucket policies and other 
cases using the source IP address as condition.


I was wondering if and how RadosGW is able to access the source IP 
address of clients if receiving their connections via a loadbalancer / 
reverse proxy like HAProxy.
So naturally that is where the connection originates from in that case, 
rendering a policy based on IP addresses useless.


Depending on whether the connection balanced as HTTP or TCP there are 
two ways to carry information about the actual source:


 * In case of HTTP via headers like "X-Forwarded-For". This is 
apparently supported only for logging the source in the "rgw ops log" ([1])?
Or is this info used also when evaluating the source IP condition within 
a bucket policy?


 * In case of TCP loadbalancing, there is the proxy protocol v2. This 
unfortunately seems not even supposed by the BEAST library which RGW uses.

    I opened feature requests ...

     ** https://tracker.ceph.com/issues/59422
     ** https://github.com/chriskohlhoff/asio/issues/1091
     ** https://github.com/boostorg/beast/issues/2484

   but there is no outcome yet.


Regards


Christian


[1] 
https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr_param

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg_autoscaler using uncompressed bytes as pool current total_bytes triggering false POOL_TARGET_SIZE_BYTES_OVERCOMMITTED warnings?

2023-04-21 Thread Christian Rohmann

Hey ceph-users,

may I ask (nag) again about this issue?  I am wondering if anybody can 
confirm my observations?
I raised a bug https://tracker.ceph.com/issues/54136, but apart from the 
assignment to a

dev a while ago here was not response yet.

Maybe I am just holding it wrong, please someone enlighten me.


Thank you and with kind regards

Christian




On 02/02/2022 20:10, Christian Rohmann wrote:


Hey ceph-users,


I am debugging a mgr pg_autoscaler WARN which states a 
target_size_bytes on a pool would overcommit the available storage.
There is only one pool with value for  target_size_bytes (=5T) defined 
and that apparently would consume more than the available storage:


--- cut ---
# ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes
[WRN] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED: 1 subtrees have 
overcommitted pool target_size_bytes
    Pools ['backups', 'images', 'device_health_metrics', '.rgw.root', 
'redacted.rgw.control', 'redacted.rgw.meta', 'redacted.rgw.log', 
'redacted.rgw.otp', 'redacted.rgw.buckets.index', 
'redacted.rgw.buckets.data', 'redacted.rgw.buckets.non-ec'] overcommit 
available storage by 1.011x due to target_size_bytes 15.0T on pools 
['redacted.rgw.buckets.data'].

--- cut ---


But then looking at the actual usage it seems strange that 15T (5T * 3 
replicas) should not fit onto the remaining 122 TiB AVAIL:



--- cut ---
# ceph df detail
--- RAW STORAGE ---
CLASS  SIZE AVAIL    USED RAW USED  %RAW USED
hdd    293 TiB  122 TiB  171 TiB   171 TiB  58.44
TOTAL  293 TiB  122 TiB  171 TiB   171 TiB  58.44

--- POOLS ---
POOL ID  PGS   STORED   (DATA) (OMAP)   
OBJECTS  USED (DATA)   (OMAP)   %USED  MAX AVAIL QUOTA OBJECTS  
QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
backups   1  1024   92 TiB   92 TiB  3.8 MiB   
28.11M  156 TiB  156 TiB   11 MiB  64.77 28 TiB N/A    
N/A    N/A  39 TiB  123 TiB
images    2    64  1.7 TiB  1.7 TiB  249 KiB  
471.72k  5.2 TiB  5.2 TiB  748 KiB   5.81 28 TiB N/A    
N/A    N/A 0 B  0 B
device_health_metrics    19 1   82 MiB  0 B   82 
MiB   43  245 MiB  0 B  245 MiB  0 28 TiB 
N/A    N/A    N/A 0 B  0 B
.rgw.root    21    32   23 KiB   23 KiB 0 B   
25  4.1 MiB  4.1 MiB  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.control 22    32  0 B  0 B 0 B    
8  0 B  0 B  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.meta    23    32  1.7 MiB  394 KiB  1.3 
MiB    1.38k  237 MiB  233 MiB  3.9 MiB  0 28 TiB 
N/A    N/A    N/A 0 B  0 B
redacted.rgw.log 24    32   53 MiB  500 KiB   53 
MiB    7.60k  204 MiB   47 MiB  158 MiB  0 28 TiB 
N/A    N/A    N/A 0 B  0 B
redacted.rgw.otp 25    32  5.2 KiB  0 B  5.2 
KiB    0   16 KiB  0 B   16 KiB  0 28 TiB 
N/A    N/A    N/A 0 B  0 B
redacted.rgw.buckets.index   26    32  1.2 GiB  0 B  1.2 
GiB    7.46k  3.5 GiB  0 B  3.5 GiB  0 28 TiB 
N/A    N/A    N/A 0 B  0 B
redacted.rgw.buckets.data    27   128  3.1 TiB  3.1 TiB 0 B    
3.53M  9.5 TiB  9.5 TiB  0 B  10.11 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.buckets.non-ec  28    32  0 B  0 B 0 B    
0  0 B  0 B  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B

--- cut ---


I then looked at how those values are determined at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L509.
Apparently "total_bytes" are compared with the capacity of the 
root_map. I added a debug line and found that the total in my cluster 
was already at:


  total=325511007759696

so in excess of 300 TiB - Looking at "ceph df" again this usage seems 
strange.




Looking at how this total is calculated at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L441,
you see that the larger value (max) of "actual_raw_used" vs. 
"target_bytes*raw_used_rate" is determined and then summed up.



I dumped the values for all pools my cluster with yet another line of 
debug code:


---cut ---
pool_id 1 - actual_raw_used=303160109187420.0, target_bytes=0 
raw_used_rate=3.0
pool_id 2 - actual_raw_used=5714098884702.0, target_bytes=0 
raw_used_rate=3.0

pool_id 19 - actua

[ceph-users] External Auth (AssumeRoleWithWebIdentity) , STS by default, generic policies and isolation by ownership

2023-03-15 Thread Christian Rohmann

Hello ceph-users,

unhappy with the capabilities in regards to bucket access policies when 
using the Keystone authentication module
I posted to this ML a while back - 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/S2TV7GVFJTWPYA6NVRXDL2JXYUIQGMIN/


In general I'd still like to hear how others are making use of external 
authentication and STS and what your

experiences are in replacing e.g. Keystone authentication



In the meantime we looked into OIDC authentication (via Keycloak) and 
the potentials there.
While this works in general, AssumeRoleWithWebIdentity comes back with 
an STS token and that can be used to access S3 buckets,

I am wondering about a few things:


1) How to enable STS for everyone (without user-individual policy to 
AssumeRole)


In the documentation on STS 
(https://docs.ceph.com/en/quincy/radosgw/STS/#sts-in-ceph) and also 
STS-Lite (https://docs.ceph.com/en/quincy/radosgw/STSLite/#sts-lite)
it's implied at one has to attach an dedicated policy to allow for STS 
to each user individually. This does not scale well with thousands of 
users. Also when using a federated / external authentication, there is no
explicit user creation "A shadow user is created corresponding to every 
federated user. The user id is derived from the ‘sub’ field of the 
incoming web token."


Is there a way to automatically have a role corresponding to each user 
that can be assumed via a OIDC token?
So an implicit role that would allow for an externally authenticated 
user to have full access to S3 and all buckets owned?
Looking at STS Lite documentation, it seems all the more natural to be 
able to allow keystone users to make use of STS.


Is there any way to apply such an AssumeRole policy "globally" or for a 
whole set of users at the same time?
I just found PR https://github.com/ceph/ceph/pull/44434 aiming to add 
policy variables such as ${aws:username}  to allow for generic policies.
But this is more about restricting bucket names or granting access to 
certain pattern of names.




2) Isolation in S3 Multi-Tenancy with external IdP 
(AssumeRoleWithWebIdentity), how does bucket ownership come into play?


Following the question about generic policies for STS I am wondering 
about the role (no pun intended) that the bucket ownership or tenant 
play here?

If one creates a role policy of e.g.

{"Version":"2012-10-17","Statement":{"Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::*"}}

Would this allow someone assuming this role access to all, "*", buckets, 
or just those owned by the user that created this role policy?



In case of Keystone auth the owner of a bucket is the project, not the 
individual (human) user. So this creates somewhat of a tenant which I'd 
want to isolate.




3) Allowing users to create their own roles and policies by default

Is there a way to allow users to create their own roles and policies to 
use them by default?
All the examples talk about the requirement for admin caps and 
individual setting of '--caps="user-policy=*'.


If there was a default role + policy (question #1) that could be applied 
to externally authenticated users, I'd like for them to be able to
create new roles and policies to grant access to their buckets to other 
users.






Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Status of Quincy 17.2.5 ?

2023-01-25 Thread Christian Rohmann

Hey everyone,


On 20/10/2022 10:12, Christian Rohmann wrote:

1) May I bring up again my remarks about the timing:

On 19/10/2022 11:46, Christian Rohmann wrote:

I believe the upload of a new release to the repo prior to the 
announcement happens quite regularly - it might just be due to the 
technical process of releasing.
But I agree it would be nice to have a more "bit flip" approach to 
new releases in the repo and not have the packages appear as updates 
prior to the announcement and final release and update notes.
By my observations sometimes there are packages available on the 
download servers via the "last stable" folders such as 
https://download.ceph.com/debian-quincy/ quite some time before the 
announcement of a release is out.
I know it's hard to time this right with mirrors requiring some time 
to sync files, but would be nice to not see the packages or have 
people install them before there are the release notes and potential 
pointers to changes out. 


Todays 16.2.11 release shows the exact issue I described above 

1) 16.2.11 packages are already available via e.g. 
https://download.ceph.com/debian-pacific
2) release notes not yet merged: 
(https://github.com/ceph/ceph/pull/49839), thus 
https://ceph.io/en/news/blog/2022/v16-2-11-pacific-released/ show a 404 :-)
3) No announcement like 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/QOCU563UD3D3ZTB5C5BJT5WRSJL5CVSD/ 
to the ML yet.



Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD slow ops warning not clearing after OSD down

2023-01-16 Thread Christian Rohmann

Hello,

On 04/05/2021 09:49, Frank Schilder wrote:

I created a ticket: https://tracker.ceph.com/issues/50637


We just observed this very issue on Pacific (16.2.10) , which I also 
commented on the ticket.
I wonder if this case is so seldom, first having some issues causing 
slow ops and then a total failure of an OSD ?



Would be nice to fix this though to not "block" the warning status with 
something that's not actually a warning.




Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 branch

2022-12-15 Thread Christian Rohmann

On 15/12/2022 10:31, Christian Rohmann wrote:


May I kindly ask for an update on how things are progressing? Mostly I 
am interested on the (persisting) implications for testing new point 
releases (e.g. 16.2.11) with more and more bugfixes in them.


I guess I just have not looked on the right ML, it's being worke on 
already ... 
https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/CQPQJXD6OVTZUH43I4U3GGOP2PKYOREJ/




Sorry for the nagging,


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 branch

2022-12-15 Thread Christian Rohmann

Hey Laura, Greg, all,

On 31/10/2022 17:15, Gregory Farnum wrote:

If you don't mind me asking Laura, have those issues regarding the

testing lab been resolved yet?


There are currently a lot of folks working to fix the testing lab issues.
Essentially, disk corruption affected our ability to reach quay.ceph.io.
We've made progress this morning, but we are still working to understand
the root cause of the corruption. We expect to re-deploy affected services
soon so we can resume testing for v16.2.11.

We got a note about this today, so I wanted to clarify:

For Reasons, the sepia lab we run teuthology in currently uses a Red
Hat Enterprise Virtualization stack — meaning, mostly KVM with a lot
of fancy orchestration all packaged up, backed by Gluster. (Yes,
really — a full Ceph integration was never built and at one point this
was deemed the most straightforward solution compared to running
all-up OpenStack backed by Ceph, which would have been the available
alternative.) The disk images stored in Gluster started reporting
corruption last week (though Gluster was claiming to be healthy), and
with David's departure and his backup on vacation it took a while for
the remaining team members to figure out what was going on and
identify strategies to resolve or work around it.

The relevant people have figured out a lot more of what was going on,
and Adam (David's backup) is back now so we're expecting things to
resolve more quickly at this point. And indeed the team's looking at
other options for providing this infrastructure going forward. 😄
-Greg



May I kindly ask for an update on how things are progressing? Mostly I 
am interested on the (persisting) implications for testing new point 
releases (e.g. 16.2.11) with more and more bugfixes in them.



Thanks a bunch!


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Forcing buckets to be encrypted (SSE-S3) by default (via a global bucket encryption policy)?

2022-11-23 Thread Christian Rohmann

On 23/11/2022 13:36, Christian Rohmann wrote:


I am wondering if there are other options to ensure data is encrypted 
at rest and also only replicated as encrypted data ...


I should have referenced thread 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TNA3MK2C744BN5OJQ4FMLWDK7WJBFH77/#J2VYTUSWZQBQMLN2GQ7L7ZLDYNHVEZZQ 
which muses about enforcing encryption as REST as well.


But as discussed there, using the "automatic encryption" 
(https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only) 
using a static key stored in the config is likely not a good base for 
this endeavor.



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW Forcing buckets to be encrypted (SSE-S3) by default (via a global bucket encryption policy)?

2022-11-23 Thread Christian Rohmann

Hey ceph-users,

loosely related to my question about client-side encryption in the Cloud 
Sync module 
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/I366AIAGWGXG3YQZXP6GDQT4ZX2Y6BXM/)


I am wondering if there are other options to ensure data is encrypted at 
rest and also only replicated as encrypted data ...



My thoughts / findings so far:

AWS S3 supports setting a bucket encryption policy 
(https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-bucket-encryption.html) 
to "ApplyServerSideEncryptionByDefault" - so automatically apply SSE to 
all objects without the clients to explicitly request this per object.


Ceph RGW has received support for such policy via the bucket encryption 
API with 
https://github.com/ceph/ceph/commit/95acefb2f5e5b1a930b263bbc7d18857d476653c.


I am now just wondering if there is any way to not only allow bucket 
creators to apply such a policy themselves, but to apply this as a 
global default in RGW, forcing all buckets to have SSE enabled - 
transparently.


If there is no way to achieve this just yet, what are your thoughts 
about adding such an option to RGW?



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cloud sync to minio fails after creating the bucket

2022-11-21 Thread Christian Rohmann

On 21/11/2022 12:50, ma...@roterruler.de wrote:

Could this "just" be the bug https://tracker.ceph.com/issues/55310 (duplicate
https://tracker.ceph.com/issues/57807) about Cloud Sync being broken since 
Pacific?

Wow - yes, the issue seems to be exactly the same that I'm facing -.-


But there is a fix commited, pending backports to Quincy / Pacific: 
https://tracker.ceph.com/issues/57306




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cloud sync to minio fails after creating the bucket

2022-11-21 Thread Christian Rohmann

On 21/11/2022 11:04, ma...@roterruler.de wrote:

Hi list,

I'm currently implementing a sync between ceph and a minio cluster to 
continously sync the buckets and objects to an offsite location. I followed the 
guide on https://croit.io/blog/setting-up-ceph-cloud-sync-module

After the sync starts it successfully creates the first bucket, but somehow 
tries over and over again to create the bucket instead of adding the objects 
itself. This is from the minio logs:



Could this "just" be the bug https://tracker.ceph.com/issues/55310 
(duplicate https://tracker.ceph.com/issues/57807) about Cloud Sync being 
broken since Pacific?




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW replication and multiple endpoints

2022-11-14 Thread Christian Rohmann

Hey Kamil

On 14/11/2022 13:54, Kamil Madac wrote:

Hello,

I'm trying to create a RGW Zonegroup with two zones, and to have data
replicated between the zones. Each zone is separate Ceph cluster. There is
a possibility to use list of endpoints in zone definitions (not just single
endpoint) which will be then used for the replication between zones. so I
tried to use it instead of using LB in front of clusters for the
replication .

[...]

When node is back again, replication continue to work.

What is the reason to have possibility to have multiple endpoints in the
zone configuration when outage of one of them makes replication not
working?


We are running a similar setup and ran into similar issues before when 
doing rolling restarts of the RGWs.


1) Mostly it's a single metadata shard never syncing up and requireing a 
complete "metadata init". But this issue will likely be address via 
https://tracker.ceph.com/issues/39657


2) But we also observed issues with one RGW being unavailable or just 
slow and as a result influencing the whole sync process. I suppose the 
HTTP client used within rgw syncer does not do a good job of tracking 
which remote RGW is healthy or a slow reading RGW could just be locking 
all the shards ...


3) But as far as "cooperating" goes there are improvements being worked 
on, see https://tracker.ceph.com/issues/41230 or 
https://github.com/ceph/ceph/pull/45958 which then makes better use of 
having multiple distinct RGW in both zones.



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 branch

2022-10-28 Thread Christian Rohmann

On 28/10/2022 00:25, Laura Flores wrote:

Hi Oleksiy,

The Pacific RC has not been declared yet since there have been problems in
our upstream testing lab. There is no ETA yet for v16.2.11 for that reason,
but the full diff of all the patches that were included will be published
to ceph.io when v16.2.11 is released. There will also be a diff published
in the documentation on this page:
https://docs.ceph.com/en/latest/releases/pacific/

In the meantime, here is a link to the diff in commits between v16.2.10 and
the Pacific branch: https://github.com/ceph/ceph/compare/v16.2.10...pacific


There also is https://tracker.ceph.com/versions/656 which seems to be 
tracking

the open issues tagged for this particular point release.


If you don't mind me asking Laura, have those issues regarding the 
testing lab been resolved yet?


There are quite a few bugfixes in the pending release 16.2.11 which we 
are waiting for. TBH I was about
to ask if it would not be sensible to do an intermediate release and not 
let it grow bigger and

bigger (with even more changes / fixes)  going out at once.



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Status of Quincy 17.2.5 ?

2022-10-20 Thread Christian Rohmann



On 19/10/2022 16:30, Laura Flores wrote:
Dan is correct that 17.2.5 is a hotfix release. There was a flaw in 
the release process for 17.2.4 in which five commits were not included 
in the release. The users mailing list will hear an official 
announcement about this hotfix release later this week.


Thanks for the info.


1) May I bring up again my remarks about the timing:

On 19/10/2022 11:46, Christian Rohmann wrote:

I believe the upload of a new release to the repo prior to the 
announcement happens quite regularly - it might just be due to the 
technical process of releasing.
But I agree it would be nice to have a more "bit flip" approach to new 
releases in the repo and not have the packages appear as updates prior 
to the announcement and final release and update notes.
By my observations sometimes there are packages available on the 
download servers via the "last stable" folders such as 
https://download.ceph.com/debian-quincy/ quite some time before the 
announcement of a release is out.
I know it's hard to time this right with mirrors requiring some time to 
sync files, but would be nice to not see the packages or have people 
install them before there are the release notes and potential pointers 
to changes out.



2) Also in cases as with the 17.2.4 release containing a regression it 
would be great to have the N release and N-1 there to allow users to 
downgrade to a previous point-release quickly in case they run into issues.
Otherwise one needs to configure the N-1 repo manually to still have 
access to the N-1 release.


And with this just being links in the filesystem this should not even 
take make space on the download servers or their mirrors.




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mirror de.ceph.com broken?

2022-10-20 Thread Christian Rohmann

Hey ceph-users,

it seems that the German ceph mirror http://de.ceph.com/ 
 listed

at https://docs.ceph.com/en/latest/install/mirrors/#locations

does not hold any data.

The index page shows some plesk default page and also deeper links like 
http://de.ceph.com/debian-17.2.4/ return 404.



Regards

Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Status of Quincy 17.2.5 ?

2022-10-19 Thread Christian Rohmann

On 19/10/2022 11:26, Chris Palmer wrote:
I've noticed that packages for Quincy 17.2.5 appeared in the debian 11 
repo a few days ago. However I haven't seen any mention of it 
anywhere, can't find any release notes, and the documentation still 
shows 17.2.4 as the latest version.


Is 17.2.5 documented and ready for use yet? It's a bit risky having it 
sitting undocumented in the repo for any length of time when it might 
inadvertently be applied when doing routine patching... (I spotted it, 
but one day someone might not).


I believe the upload of a new release to the repo prior to the 
announcement happens quite regularly - it might just be due to the 
technical process of releasing.
But I agree it would be nice to have a more "bit flip" approach to new 
releases in the repo and not have the packages appear as updates prior 
to the announcement and final release and update notes.



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-13 Thread Christian Rohmann

Hey Boris,

On 07/10/2022 11:30, Boris Behrens wrote:

I just wanted to reshard a bucket but mistyped the amount of shards. In a
reflex I hit ctrl-c and waited. It looked like the resharding did not
finish so I canceled it, and now the bucket is in this state.
How can I fix it. It does not show up in the stale-instace list. It's also
a multisite environment (we only sync metadata).
I believe resharding is not supported with rgw multisite 
(https://docs.ceph.com/en/latest/radosgw/dynamicresharding/#multisite)
but is being worked on / implemented fpr the Quincy release, see 
https://tracker.ceph.com/projects/rgw/issues?query_id=247


But you are not syncing the data in your deployment? Maybe that's a 
different case then?




Regards

Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW multisite Cloud Sync module with support for client side encryption?

2022-09-12 Thread Christian Rohmann

Hello Ceph-Users,

I have a question regarding support for any client side encryption in 
the Cloud Sync Module for RGW 
(https://docs.ceph.com/en/latest/radosgw/cloud-sync-module/).


While a "regular" multi-site setup 
(https://docs.ceph.com/en/latest/radosgw/multisite/) is usually syncing 
data between Ceph clusters, RGWs and other supporting
infrastructure in the same administrative domain this might be different 
when looking at cloud sync.
One could setup a sync to e.g. AWS S3 or any other compatible S3 
implementation that is provided as a service and by another provider.


1) I was wondering if there is any transparent way to apply client side 
encryption to those objects that are sent to the remote service?
Even something the likes of a single static key (see 
https://github.com/ceph/ceph/blob/1c9e84a447bb628f2235134f8d54928f7d6b7796/doc/radosgw/encryption.rst#automatic-encryption-for-testing-only) 
would protect against the remote provider being able to look at the data.



2) What happens to objects that are encrypted on the source RGW and via 
SSE-S3? (https://docs.ceph.com/en/quincy/radosgw/encryption/#sse-s3)
I suppose those remain encrypted? But this does require users to 
actively make use of SSE-S3, right?




Thanks again with kind regards,


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to clear "Too many repaired reads on 1 OSDs" on pacific

2022-03-01 Thread Christian Rohmann

On 28/02/2022 20:54, Sascha Vogt wrote:
Is there a way to clear the error counter on pacific? If so, how? 


No, no anymore. See https://tracker.ceph.com/issues/54182


Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2022-02-10 Thread Christian Rohmann

Hey Stefan,

thanks for getting back to me!


On 10/02/2022 10:05, Stefan Schueffler wrote:

since my last mail in Dezember, we changed our ceph-setuo like this:

we added one SSD osd on each ceph host (which were pure HDD before). Then, we moved 
the problematic pool "de-dus5.rgw.buckets.index“ to those dedicated SSDs (by 
adding a corresponding crush map).

Since then, no further PG corruptions occurred.

This now has a two sided result:

on the one side, we now do not observe the problematic behavior anymore,

on the other side, this means, by using just spinning HDDs something is buggy 
with ceph. If the HDD can not fulfill the data IO requirements, then it 
probably should not lead to data/PG corruption…
And, just a blind guess, we only have a few IO requests in our RGW gateway per 
second - even with spinning HDDs there should not be a problem to store / 
update the index pool.

I would guess that it correlates with our setup having 7001 shards in the 
problematic bucket, and the implementation of „multisite“ feature, which will 
do 7001 „status“ requests per second to check and synchronize between the 
different rgw sites. And _this_ amount of random IO can not be satisfied by 
utilizing HDDs…
Anyway it should not lead to corrupted PGs.



We also have a multi-site setup and and and have one HDD-only and one 
cluster (primary) with NVME SSD for the OSD journaling.
There are more inconsistencies on the HDD-only cluster, but we do 
observe those on the other cluster as well.


If you follow the issue at https://tracker.ceph.com/issues/53663 there 
is even another user (Dieter Roels) observing this issue now.
He is talking about RADOSGW crashes potentially causing the 
inconsistencies. We already guessed it could be rolling restarts. But we 
cannot put our finger on it yet.


And yes, no amount of IO contention should ever cause data corruption.
In this case I believe there might be a correlation to the multisite 
feature hitting OMAP and stored metadata much harder than with regular 
RADOSGW usage.
And if there is a race condition or missing lock /semaphore or something 
along this line, this certainly is affected by the latency on the 
underlying storage.




Could you maybe trigger manual a deep-scrub on all your OSDs, just to 
see if that does anything?





Thanks again for keeping in touch!
Regards


Christian






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2022-02-08 Thread Christian Rohmann

Hey there again,

there now was a question from Neha Ojha in 
https://tracker.ceph.com/issues/53663
about providing OSD debug logs for a manual deep-scrub on (inconsistent) 
PGs.


I did provide the logs of two of those deep-scrubs via ceph-post-file 
already.


But since data inconsistencies are the worse of bugs and adding some 
unpredictability to their occurrence we likely need
more evidence to have a chance to narrow this down. And since you seem 
to observe something similar,  could you gather

and post debug info about them to the ticket as well maybe?


Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2022-02-07 Thread Christian Rohmann

Hello Ceph-Users!

On 22/12/2021 00:38, Stefan Schueffler wrote:

The other Problem, regarding the OSD scrub errors, we have this:

ceph health detail shows „PG_DAMAGED: Possible data damage: x pgs 
inconsistent.“
Every now and then new pgs get inconsistent. All inconsistent pgs 
belong to the buckets-index-pool de-dus5.rgw.buckets.index


ceph health detail
pg 136.1 is active+clean+inconsistent, acting [8,3,0]

rados -p de-dus5.rgw.buckets.index list-inconsistent-obj 136.1
No scrub information available for pg 136.1
error 2: (2) No such file or directory

rados list-inconsistent-obj 136.1
No scrub information available for pg 136.1
error 2: (2) No such file or directory

ceph pg deep-scrub 136.1
instructing pg 136.1 on osd.8 to deep-scrub

… until now nothing changed, the list-inconsistent-obj does not show 
any information (did i miss some cli arguments?)


Ususally, we simply do a
ceph pg repair 136.1
which most of the time silently does whatever it is supposed to do, 
and the error disappears. Shortly after, it reappears at random, with 
some other (or the same) pg out of the rgw.buckets.index - pool…


Strange you don't see any actual inconsistent objects ...



1)  For me it's usually looking at which pool actually has 
inconsistencies via e.g. :


$  for pool in $(rados lspools); do echo "${pool} $(rados 
list-inconsistent-pg ${pool})"; done


 device_health_metrics []
 .rgw.root []
 zone.rgw.control []
 zone.rgw.meta []
 zone.rgw.log 
["5.3","5.5","5.a","5.b","5.10","5.11","5.19","5.1a","5.1d","5.1e"]

 zone.rgw.otp []
 zone.rgw.buckets.index 
["7.4","7.5","7.6","7.9","7.b","7.11","7.13","7.14","7.18","7.1e"]

 zone.rgw.buckets.data []
 zone.rgw.buckets.non-ec []

(This is from now) and you can see how only metadata pools are actually 
affected.



2)  I then simply looped over the pgs with "rados list-inconsistent-obj 
$pg" and this is the object.name, errors and last_reqid:



 "data_log.14","omap_digest_mismatch","client.4349063.0:12045734"
 "data_log.59","omap_digest_mismatch","client.4364800.0:11773451"
 "data_log.30","omap_digest_mismatch","client.4349063.0:10935030"
 "data_log.42","omap_digest_mismatch","client.4348139.0:112695680"
 "data_log.63","omap_digest_mismatch","client.4348139.0:116876563"
 "data_log.44","omap_digest_mismatch","client.4349063.0:11358410"
 "data_log.11","omap_digest_mismatch","client.4349063.0:10259566"
 "data_log.61","omap_digest_mismatch","client.4349063.0:10259594"
 "data_log.28","omap_digest_mismatch","client.4349063.0:11358396"
 "data_log.39","omap_digest_mismatch","client.4349063.0:11364174"
 "data_log.55","omap_digest_mismatch","client.4349063.0:11358415"
 "data_log.15","omap_digest_mismatch","client.4364800.0:9518143"
 "data_log.27","omap_digest_mismatch","client.4349063.0:11473205"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.6","omap_digest_mismatch","client.4349063.0:11274164"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.1","omap_digest_mismatch","client.4349063.0:12168097"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.10","omap_digest_mismatch","client.4348139.0:112993744"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.0","omap_digest_mismatch","client.4349063.0:10289913"
 
".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.56337947.1562","omap_digest_mismatch","client.4364800.0:10934595"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.9","omap_digest_mismatch","client.4349063.0:10431941"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.0","omap_digest_mismatch","client.4349063.0:10431932"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.10","omap_digest_mismatch","client.4349063.0:10460106"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.8","omap_digest_mismatch","client.4349063.0:11696943"
 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.0","omap_digest_mismatch","client.4349063.0:9845513"
 
".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.61963196.333.1","omap_digest_mismatch","client.4364800.0:9593089"


As you can see, it's always some omap data that suffers from 
inconsistencies.





Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] pg_autoscaler using uncompressed bytes as pool current total_bytes triggering false POOL_TARGET_SIZE_BYTES_OVERCOMMITTED warnings?

2022-02-02 Thread Christian Rohmann

Hey ceph-users,


I am debugging a mgr pg_autoscaler WARN which states a target_size_bytes 
on a pool would overcommit the available storage.
There is only one pool with value for  target_size_bytes (=5T) defined 
and that apparently would consume more than the available storage:


--- cut ---
# ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes
[WRN] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED: 1 subtrees have 
overcommitted pool target_size_bytes
    Pools ['backups', 'images', 'device_health_metrics', '.rgw.root', 
'redacted.rgw.control', 'redacted.rgw.meta', 'redacted.rgw.log', 
'redacted.rgw.otp', 'redacted.rgw.buckets.index', 
'redacted.rgw.buckets.data', 'redacted.rgw.buckets.non-ec'] overcommit 
available storage by 1.011x due to target_size_bytes 15.0T on pools 
['redacted.rgw.buckets.data'].

--- cut ---


But then looking at the actual usage it seems strange that 15T (5T * 3 
replicas) should not fit onto the remaining 122 TiB AVAIL:



--- cut ---
# ceph df detail
--- RAW STORAGE ---
CLASS  SIZE AVAIL    USED RAW USED  %RAW USED
hdd    293 TiB  122 TiB  171 TiB   171 TiB  58.44
TOTAL  293 TiB  122 TiB  171 TiB   171 TiB  58.44

--- POOLS ---
POOL ID  PGS   STORED   (DATA) (OMAP)   
OBJECTS  USED (DATA)   (OMAP)   %USED  MAX AVAIL QUOTA OBJECTS  
QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
backups   1  1024   92 TiB   92 TiB  3.8 MiB   
28.11M  156 TiB  156 TiB   11 MiB  64.77 28 TiB N/A    
N/A    N/A  39 TiB  123 TiB
images    2    64  1.7 TiB  1.7 TiB  249 KiB  
471.72k  5.2 TiB  5.2 TiB  748 KiB   5.81 28 TiB N/A    
N/A    N/A 0 B  0 B
device_health_metrics    19 1   82 MiB  0 B   82 
MiB   43  245 MiB  0 B  245 MiB  0 28 TiB N/A    
N/A    N/A 0 B  0 B
.rgw.root    21    32   23 KiB   23 KiB  0 
B   25  4.1 MiB  4.1 MiB  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.control 22    32  0 B  0 B  0 
B    8  0 B  0 B  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.meta    23    32  1.7 MiB  394 KiB  1.3 MiB    
1.38k  237 MiB  233 MiB  3.9 MiB  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.log 24    32   53 MiB  500 KiB   53 MiB    
7.60k  204 MiB   47 MiB  158 MiB  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.otp 25    32  5.2 KiB  0 B  5.2 
KiB    0   16 KiB  0 B   16 KiB  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.buckets.index   26    32  1.2 GiB  0 B  1.2 GiB    
7.46k  3.5 GiB  0 B  3.5 GiB  0 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.buckets.data    27   128  3.1 TiB  3.1 TiB  0 B    
3.53M  9.5 TiB  9.5 TiB  0 B  10.11 28 TiB N/A    
N/A    N/A 0 B  0 B
redacted.rgw.buckets.non-ec  28    32  0 B  0 B  0 
B    0  0 B  0 B  0 B  0 28 TiB N/A    
N/A    N/A 0 B  0 B

--- cut ---


I then looked at how those values are determined at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L509.
Apparently "total_bytes" are compared with the capacity of the root_map. 
I added a debug line and found that the total in my cluster was already at:


  total=325511007759696

so in excess of 300 TiB - Looking at "ceph df" again this usage seems 
strange.




Looking at how this total is calculated at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L441,
you see that the larger value (max) of "actual_raw_used" vs. 
"target_bytes*raw_used_rate" is determined and then summed up.



I dumped the values for all pools my cluster with yet another line of 
debug code:


---cut ---
pool_id 1 - actual_raw_used=303160109187420.0, target_bytes=0 
raw_used_rate=3.0
pool_id 2 - actual_raw_used=5714098884702.0, target_bytes=0 
raw_used_rate=3.0

pool_id 19 - actual_raw_used=256550760.0, target_bytes=0 raw_used_rate=3.0
pool_id 21 - actual_raw_used=71433.0, target_bytes=0 raw_used_rate=3.0
pool_id 22 - actual_raw_used=0.0, target_bytes=0 raw_used_rate=3.0
pool_id 23 - actual_raw_used=5262798.0, target_bytes=0 raw_used_rate=3.0
pool_id 24 - actual_raw_used=162299940.0, target_bytes=0 raw_used_rate=3.0
pool_id 25 - actual_raw_used=16083.0, target_bytes=0 raw_used_rate=3.0
pool_id 26 - actual_raw_used=3728679936.0, target_bytes=0 raw_used_rate=3.0
pool_id 27 - actual_raw_used=10035209699328.0, 
target_bytes=54975581388

[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-21 Thread Christian Rohmann

Thanks for your response Stefan,

On 21/12/2021 10:07, Stefan Schueffler wrote:

Even without adding a lot of rgw objects (only a few PUTs per minute), we have 
thousands and thousands of rgw bucket.sync log entries in the rgw log pool 
(this seems to be a separate problem), and as such we accumulate „large omap 
objects“ over time.


Since you are doing RADOSGW as well, those OMAP objects are usually 
bucket index files 
(https://docs.ceph.com/en/latest/rados/operations/health-checks/#large-omap-objects 
). 
Since there is no dynamic resharing 
(https://docs.ceph.com/en/latest/radosgw/dynamicresharding/#rgw-dynamic-bucket-index-resharding) 
until Quincy 
(https://tracker.ceph.com/projects/rgw/issues?utf8=%E2%9C%93&set_filter=1&f%5B%5D=cf_3&op%5Bcf_3%5D=%3D&v%5Bcf_3%5D%5B%5D=multisite-reshard&f%5B%5D=&c%5B%5D=project&c%5B%5D=tracker&c%5B%5D=status&c%5B%5D=priority&c%5B%5D=subject&c%5B%5D=assigned_to&c%5B%5D=updated_on&c%5B%5D=category&c%5B%5D=fixed_version&c%5B%5D=cf_3&group_by=&t%5B%5D=) 
you need to have enough shards created for each bucket by default.


At about 200k objects (~ keys) per shards you should reveive this 
warning otherwise (used to be 2mio, see 
https://github.com/ceph/ceph/pull/29175/files).




we also face the same or at least a very similar  problem. We are running 
pacific (16.2.6 and 16.2.7, upgraded from 16.2.x to y to z) on both sides of 
the rgw multisite. In our case, the scrub errors occur on the secondary side 
only

Regarding your scrub errors. You do have those still coming up at random?
Could you check with "list-inconsistent-obj" if yours are within the 
OMAP data and in the metadata pools only?





Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-21 Thread Christian Rohmann

Hello Eugen,

On 20/12/2021 22:02, Eugen Block wrote:
you wrote that this cluster was initially installed with Octopus, so 
no upgrade ceph wise? Are all RGW daemons on the exact same ceph 
(minor) versions?
I remember one of our customers reporting inconsistent objects on a 
regular basis although no hardware issues were detectable. They 
replicate between two sites, too. A couple of months ago both sites 
were updated to the same exact ceph minor version (also Octopus), they 
haven't faced inconsistencies since then. I don't have details about 
the ceph version(s) though, only that both sites were initially 
installed with Octopus. Maybe it's worth checking your versions? 



Yes, everything has the same version:


{
[...]
   "overall": {
   "ceph version 15.2.15 
(2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 34

   }
}

I just observed another 3 scrub errors. Strangely they never see to have 
occurred on the same pgs again.
I shall be running another deep scrub on those OSD again to narrow this 
down.




But I am somewhat suspecting this to be a potential issue with the OMAP 
validation part of the scrubbing.
For RADOSGW there are large OMAP structures with lots of movement. And 
the issues only are with the metadata pools.





Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-20 Thread Christian Rohmann

Hello Ceph-Users,

for about 3 weeks now I see batches of scrub errors on a 4 node Octopus 
cluster:


# ceph health detail HEALTH_ERR 7 scrub errors; Possible data damage: 
6 pgs inconsistent [ERR] OSD_SCRUB_ERRORS: 7 scrub errors [ERR] 
PG_DAMAGED: Possible data damage: 6 pgs inconsistent     pg 5.3 is 
active+clean+inconsistent, acting [9,12,6]     pg 5.4 is 
active+clean+inconsistent, acting [15,17,18]     pg 7.2 is 
active+clean+inconsistent, acting [13,15,10]     pg 7.9 is 
active+clean+inconsistent, acting [5,19,4]     pg 7.e is 
active+clean+inconsistent, acting [1,15,20]     pg 7.18 is 
active+clean+inconsistent, acting [5,10,0] 


this cluster only serves RADOSGW and it's a multisite master.

I already found another thread 
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LXMQSRNSCPS5YJMFXIS3K5NMROHZKDJU/), 
but with no recent comments about such an issue.


In my case I am still seeing more scrub errors every few days. All those 
inconsistencies are "omap_digest_mismatch" in the "zone.rgw.log" or 
"zone.rgw.buckets.index" pool and are spread all across nodes and OSDs.


I already raised I bug ticket (https://tracker.ceph.com/issues/53663), 
but am wondering if anybody of you ever observed something similar?
Traffic to and from the object storage seems totally fine and I can even 
run a manual deep-scrub with no errors and then receive 3-4 errors the 
next day.



Is there anything I could look into / collect when the next 
inconsistency occurs?

Could there be any misconfiguration causing this?


Thanks and with kind regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: inconsistent pg after upgrade nautilus to octopus

2021-12-19 Thread Christian Rohmann

Hello Tomasz,


I observe a strange accumulation of inconsistencies for an RGW-only 
(+multisite) setup, with errors just like those you reported.
I collected some info and raised a bug ticket:  
https://tracker.ceph.com/issues/53663
Two more inconsistencies have just shown up hours after repairing the 
other, adding to the theory of something really odd going on.




Did you upgrade to Octopus in the end then? Any more issues with such 
inconsistencies on your side Tomasz?




Regards

Christian



On 20/10/2021 10:33, Tomasz Płaza wrote:
As the upgrade process states, rgw are the last one to be upgraded, so 
they are still on nautilus (centos7). Those logs showed up after 
upgrade of the first osd host. It is a multisite setup so I am a 
little afraid of upgrading rgw now.


Etienne:

Sorry for answering in this thread, but somehow I do not get messages 
directed only to ceph-users list. I did "rados list-inconsistent-pg" 
and got many entries like:


{
  "object": {
    "name": ".dir.99a07ed8-2112-429b-9f94-81383220a95b.7104621.23.7",
    "nspace": "",
    "locator": "",
    "snap": "head",
    "version": 82561410
  },
  "errors": [
    "omap_digest_mismatch"
  ],
  "union_shard_errors": [],
  "selected_object_info": {
    "oid": {
  "oid": ".dir.99a07ed8-2112-429b-9f94-81383220a95b.7104621.23.7",
  "key": "",
  "snapid": -2,
  "hash": 3316145293,
  "max": 0,
  "pool": 230,
  "namespace": ""
    },
    "version": "107760'82561410",
    "prior_version": "106468'82554595",
    "last_reqid": "client.392341383.0:2027385771",
    "user_version": 82561410,
    "size": 0,
    "mtime": "2021-10-19T16:32:25.699134+0200",
    "local_mtime": "2021-10-19T16:32:25.699073+0200",
    "lost": 0,
    "flags": [
  "dirty",
  "omap",
  "data_digest"
    ],
    "truncate_seq": 0,
    "truncate_size": 0,
    "data_digest": "0x",
    "omap_digest": "0x",
    "expected_object_size": 0,
    "expected_write_size": 0,
    "alloc_hint_flags": 0,
    "manifest": {
  "type": 0
    },
    "watchers": {}
  },
  "shards": [
    {
  "osd": 56,
  "primary": true,
  "errors": [],
  "size": 0,
  "omap_digest": "0xf4cf0e1c",
  "data_digest": "0x"
    },
    {
  "osd": 58,
  "primary": false,
  "errors": [],
  "size": 0,
  "omap_digest": "0xf4cf0e1c",
  "data_digest": "0x"
    },
    {
  "osd": 62,
  "primary": false,
  "errors": [],
  "size": 0,
  "omap_digest": "0x4bd5703a",
  "data_digest": "0x"
    }
  ]
}


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Metrics for object sizes

2021-10-14 Thread Christian Rohmann

On 23/04/2021 03:53, Szabo, Istvan (Agoda) wrote:

Objects inside RGW buckets like in couch base software they have their own 
metrics and has this information.


Not as detailed as you would like, but how about using the bucket stats 
on bucket size and number of objects?

 $ radosgw-admin bucket stats --bucket mybucket


Doing a bucket_size / number_of_objects gives you an average object size 
per bucket and that certainly is an indication on

buckets with rather small objects.



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite reshard stale instances

2021-10-04 Thread Christian Rohmann

On 04/10/2021 12:22, Christian Rohmann wrote:
So there is no reason those instances are still kept? How and when are 
those instances cleared up?
Also just like for the other reporters of this issue, in my case most 
buckets are deleted buckets, but not all of them.



I just hope somebody with a little more insight on the mechanisms at 
play here
joins this conversation. 


apparently this is known issue https://tracker.ceph.com/issues/20802, 
but does not cause any problems.

If I may bluntly quote Casey Bodley from our conversation on IRC:

17:20 < cbodley>  no crohmann, nothing cleans them up. 
https://tracker.ceph.com/issues/20802
17:27 < crohmann> A thanks cbodley for the pointer. Are there any 
side-effects of this to expect? Storage wise it's only a few kB for 
the bucket.instance I
  suppose. But what happens if a user creates another 
bucket with the same name at some point in the future?
17:28 < cbodley> new buckets with the same name don't conflict, they 
generate a different bucket instance





Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite reshard stale instances

2021-10-04 Thread Christian Rohmann

Hey there again,

On 01/10/2021 17:35, Szabo, Istvan (Agoda) wrote:

In my setup I've disabled the sharding and preshard each bucket which needs 
more then 1.1 millions of objects.


I also use 11 shards as default, see my ML post 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/UFXPAINBV3DQXABSPY5XLMYFA3UGF5LF/#OK7XMNRFHTF3EQU6SAWPLKEVGVNV4XET




I don't think it's possible to cleanup, even if you run the command with the 
really-really mean it, it will not do anything, I've tried already.


Searching a little more through older ML posts it appears we are not he 
only ones and also that those "stale" instances are to be expected when 
deleting buckets in a multisite setup:


 * 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033575.html
 * 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7CQZY6D2HLPLZAWKQPT4D74WLQ6GE3U5/#ZLAFDLS4MKOUPAIWRY73IBYJCVFYMECB


but even after running "data sync --init" again I still see stale 
instances, but both metadata and data are "caught up" on both sites.


So there is no reason those instances are still kept? How and when are 
those instances cleared up?
Also just like for the other reporters of this issue, in my case most 
buckets are deleted buckets, but not all of them.



I just hope somebody with a little more insight on the mechanisms at 
play here

joins this conversation.


Regards


Christian





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Christian Rohmann

On 01/10/2021 17:00, Szabo, Istvan (Agoda) wrote:

I just left it and I stopped using synchronous multisite replication. I'm only 
using directional for a while which is working properly.


So you did setup a sync policy to only sync in one direction?

In my setup the secondary site does not receive any writes anyways, so 
it's likely not about changes that happend to the same bucket in both sites.
Somehow I ended up with a few stale instances in both sites though - but 
some even for buckets which don't exist anymore and the lists
are not the same. The lists don't appear to be growing, but still, I'd 
like to clean those up.


I did not explicitly disable dynamic sharding in ceph.conf until 
recently - but question is, if this was even necessary since RGW does 
recognize when it's running in multisite sync.



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Christian Rohmann

Hey Istvan,

On 05/02/2021 03:00, Szabo, Istvan (Agoda) wrote:

I found 6-700 stale instances with the reshard stale instances list command.
Is there a way to clean it up (or actually should I clean it up)?
The stale instance rm doesn't work in multisite.


I observe a similar issue with some stale instances on master as 
secondary site after migrating from a single site to multisite.



Did you ever find out what to do about those stale instances then?


Regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bucket_index_max_shards vs. no resharding in multisite? How to brace RADOS for huge buckets

2021-09-30 Thread Christian Rohmann

On 30/09/2021 17:02, Christian Rohmann wrote:


Looking at my zones I can see that the master zone (converted from 
previously single-site setup) has



 bucket_index_max_shards=0


while the other, secondary zone has

 bucket_index_max_shards=11


Should I align this and use "11" as the default static number of 
shards for all new buckets then?
Maybe an even higher (prime) number just to be save? 



Reading 
https://docs.ceph.com/en/octopus/install/ceph-deploy/install-ceph-gateway/#configure-bucket-sharding 
again,
it seems there are some instructions on editing the zonegroup JSON to 
set bucket_index_max_shards to something sensible.


Unfortunately there is no word about this in the mutisite conversion 
section 
(https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site-system-to-multi-site) 
- maybe this would be sensible to ensure folks converting to a multisite 
setup don't end up with huge unsharded bucket indices which also cannot 
be resharded or even are resharded automatically.



Regards

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bucket_index_max_shards vs. no resharding in multisite? How to brace RADOS for huge buckets

2021-09-30 Thread Christian Rohmann

Hello Ceph-Users,

I just switched from a single to a multi-site setup with all sorts of 
bucket sizes and large differences in the number of stored objects.


Usually resharding is handled by RADOSGW automagically whenever a 
certain object count per shard is reached, 100k per default.

The functionality is nicely documented at:

  https://docs.ceph.com/en/octopus/radosgw/dynamicresharding/

Also mentioned there is that dynamic resharding is NOT possible in 
multisite environments.
Apparently there are efforts to implement (?) multisite-resharding with 
Ceph 17 (Quincy) ...


https://tracker.ceph.com/projects/rgw/issues?utf8=%E2%9C%93&set_filter=1&f%5B%5D=cf_3&op%5Bcf_3%5D=%3D&v%5Bcf_3%5D%5B%5D=multisite-reshard&f%5B%5D=&c%5B%5D=project&c%5B%5D=tracker&c%5B%5D=status&c%5B%5D=priority&c%5B%5D=subject&c%5B%5D=assigned_to&c%5B%5D=updated_on&c%5B%5D=category&c%5B%5D=fixed_version&c%5B%5D=cf_3&group_by=&t%5B%5D=


But how should I or do you handle ever growing buckets in the meantime?

While a larger number of index shards for all (new) buckets might come 
to mind and would avoid the described issues with too many objects  for 
a while, this also has long known issues and downsides:


 * http://cephnotes.ksperis.com/blog/2015/05/12/radosgw-big-index

Looking at my zones I can see that the master zone (converted from 
previously single-site setup) has



 bucket_index_max_shards=0


while the other, secondary zone has

 bucket_index_max_shards=11


Should I align this and use "11" as the default static number of shards 
for all new buckets then?

Maybe an even higher (prime) number just to be save?



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] Re: create a Multi-zone-group sync setup

2021-08-18 Thread Christian Rohmann

Hey Boris,

On 18/08/2021 08:49, Boris Behrens wrote:

I've set up realm,first zonegroup with the zone and a sync user in the
master setup, and commited.
Then I've pulled the periode on the 2nd setup and added a 2nd zonegroup
with a zone and commited.

Now I can create users in the master setup, but not in the 2nd (as it
doesn't sync back). But I am not able to create a bucket or so with the
credentials of the users I created.


Did you define the one zonegroup as the master then? Not just the zone 
in there, but the actual zonegroup?
As the meta-data is always written to the master zone in the master 
zonegroup (as far as I understood the concept) and since you are

running two zonegroups, you need to set one as master.

I am looking into a multi-site setup myself - but with synced data (two 
zones in one zonegroup) and am wondering about using a
common hostname for users to access the s3 API which I can then move to 
point to the master
(the thread on the ML is at 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/UQSASRZAA4NONIZWFRFVFQFYSJBCOCNK/).


I suppose in your case you don't mind having dedicated hostnames for 
each site and for users to select one of them?



Regards


Christian





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multiple DNS names for RGW?

2021-08-17 Thread Christian Rohmann

On 17/08/2021 13:37, Janne Johansson wrote:

Don't forget that v4 auth bakes in the clients idea of what the
hostname of the endpoint was, so its not only about changing headers.
If you are not using v2 auth, you will not be able to rewrite the
hostname on the fly.


Thanks for the heads up in this regard.


How would one achieve the idea of having two distinct sites, i.e.

* s3-az1.example.com
* s3-az2.example.com

each having their own rgw_dns_name set and doing mult-site sync, but 
also having a generic hostname, s3.example.com,

that I can simply reconfigure to point to the master?

From what you said I read that I cannot:

a) use an additonal rgw_dns_name, as only one can be configured (right?)
b) simply rewrite the hostname from the frontend-proxy / lb to the 
backends as this will invalidate the sigv4 the clients do?





Regards


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multiple DNS names for RGW?

2021-08-17 Thread Christian Rohmann

Hey Burkhard, Chris, all,

On 16/08/2021 10:48, Chris Palmer wrote:
It's straightforward to add multiple DNS names to an endpoint. We do 
this for the sort of reasons you suggest. You then don't need separate 
rgw instances (not for this reason anyway).


Assuming default:

 * radosgw-admin zonegroup get > zg-default
 * Edit zg-default, changing "hostnames" to e.g.  ["host1",
   "host1.domain", "host2", "host2.domain"]
 * radosgw-admin zonegroup set --infile zg-default
 * Restart all rgw instances

Please excuse my confusion, but how does this relate to the endpoints of 
zonegroup and zones then.

What does setting endpoints (or hostnames even) on those actually do?


If I may split my confusion up into some questions 


1) From what I understand is that a zone has endpoints to identify how 
it can be reached by other RGWs to enable communication for multisite sync.

So having

  * s3-az1.example.com (zone "az1")
  * s3-az2.example.com (zone "az2")

as endpoints in each zone of my two zones allows the two zones to talk 
to each other.


But does this have to match the "rgw dns name" setting on the RGWs in 
each zone then?
Or could I potentially just add all the individual hosts (if they were 
reachable) of my RGW farm to avoid hitting the loadbalancer in front?



2) How do the endpoints of the whole zone-group relate then? Do I simply 
add all endpoints of all zones?

What are those used for then?


3) How would one go about having a global DNS name used to always point 
to the master zone
Would I just add another "global" or "generic" hostname, let's say 
s3.example.com to the zonegroup as an endpoint and have the DNS point to 
the LB of the current master zone? The intention would be to avoid 
involving the clients having to update their endpoint in case of a failover.




Thanks and with kind regards


Christian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Christian Rohmann

On 08/07/2021 09:39, Dominik Csapak wrote:
It's available at https://ceph.com/pgcalc/ just now (with cert not 
matching), but there apparently are people working on migrating the 
whole website 


* ceph.com redirects to https://old.ceph.com/ with matching Let's 
Encrypt certificate
* but https://ceph.com/pgcalc/ is not rewritten to the old.ceph.com 
domain and thus the certificate error because the cert is only valid 
vor old.ceph.com



Regards

Christian





thanks for the answer :)

i still get a 404 on ceph.com/pgcalc
(and no redirect to old.ceph.com
also no cert mismatch or anything)

but i can see it on https://old.ceph.com/pgcalc
(also no cert error?)

thanks



It's even more complicated as it's different for IPv6 and IPv4 ...


IPv6:


curl -k -v https://ceph.com
*   Trying 2607:5300:201:2000::3:5897:443...
* Connected to ceph.com (2607:5300:201:2000::3:5897) port 443 (#0)

 -> HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Thu, 08 Jul 2021 08:29:16 GMT
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Last-Modified: Tue, 22 Jun 2021 20:30:33 GMT
< Expires: Thu, 08 Jul 2021 09:29:16 GMT
< Cache-Control: max-age=3600
< X-Redirect-By: WordPress
< Location: https://old.ceph.com/




IPv4:


curl -4 -k -v https://ceph.com
*   Trying 8.43.84.140:443...
* Connected to ceph.com (8.43.84.140) port 443 (#0)
< HTTP/1.1 500 Internal Server Error
< Server: nginx
< Date: Thu, 08 Jul 2021 08:30:25 GMT
< Content-Type: text/html
< Content-Length: 170
< Connection: close
<

500 Internal Server Error





Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-07-08 Thread Christian Rohmann

Hey Dominik,

On 05/07/2021 09:55, Dominik Csapak wrote:

Hi,

just wanted to ask if it is intentional that

http://ceph.com/pgcalc/

results in a 404 error?

is there any alternative url?
it is still linked from the offical docs.


It's available at https://ceph.com/pgcalc/ just now (with cert not 
matching), but there apparently are people working on migrating the 
whole website 


* ceph.com redirects to https://old.ceph.com/ with matching Let's 
Encrypt certificate
* but https://ceph.com/pgcalc/ is not rewritten to the old.ceph.com 
domain and thus the certificate error because the cert is only valid vor 
old.ceph.com



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

2021-07-08 Thread Christian Rohmann

Hey Igor,

On 07/07/2021 14:59, Igor Fedotov wrote:
after an upgrade from Ceph Nautilus to Octopus we ran into extreme 
performance issues leading to an unusable cluster
when doing a larger snapshot delete and the cluster doing snaptrims, 
see i.e. https://tracker.ceph.com/issues/50511#note-13.
Since this was not an issue prior to the upgrade, maybe the 
conversion of the OSD to OMAP caused this degradation of the RocksDB 
data structures, maybe not. (We were running bluefs_buffered_io=true, 
so that was NOT the issue here).


It's hard to say what exactly caused the issue this time. Indeed OMAP 
conversion could have some impact since it had performed bulk removal 
along the upgrade process - so DB could gain critical mass to start 
lagging.


But I presume this is a one-time effect - it should vaporize after DB 
compaction. Which doesn't mean that snaptrims or any other bulk 
removals are absolutely safe since then though. 


Thank you very much for your quick and extensive reply!

If OMAP conversion could have this effect, maybe it's sensible to 
trigger either an an immediate online compaction to the end of the 
conversion or at least add this to the upgrade notes. I suppose with the 
EoL of Nautilus more and more clusters will now make the jump to the 
Octopus release and convert their OSDs to OMAP in the process. Even if 
not all clusters RocksDBs would go over the edge, in any case running a 
compaction should not hurt right?




Thanks again,


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

2021-07-07 Thread Christian Rohmann

We found the issue causing data not being synced 

On 25/06/2021 18:24, Christian Rohmann wrote:

What is apparently not working in the sync of actual data.

Upon startup the radosgw on the second site shows:


2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: start
2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: realm 
epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736
2021-06-25T16:15:11.525+ 7fe71dff3700  0 
RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read 
remote data log shards




also when issuing

# radosgw-admin data sync init --source-zone obst-rgn

it throws

2021-06-25T16:20:29.167+ 7f87c2aec080 0 
RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote 
data log shards



Apparently using HTTPS endpoints does not work for data sync - just 
changing this to plain HTTP resulted in things to work.

See my bug report at

  https://tracker.ceph.com/issues/51538



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

2021-07-07 Thread Christian Rohmann

Hello ceph-users,

after an upgrade from Ceph Nautilus to Octopus we ran into extreme 
performance issues leading to an unusable cluster
when doing a larger snapshot delete and the cluster doing snaptrims, see 
i.e. https://tracker.ceph.com/issues/50511#note-13.
Since this was not an issue prior to the upgrade, maybe the conversion 
of the OSD to OMAP caused this degradation of the RocksDB data 
structures, maybe not. (We were running bluefs_buffered_io=true, so that 
was NOT the issue here).


But I've noticed there are a few reports of such issues which boil down 
to RocksDB being in a somewhat degraded state and running a simple 
compact fixed those issues, see:


 * https://tracker.ceph.com/issues/50511
 * 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XSEBOIT43TGIBVIGKC5WAHMB7NSD7D2B/
 * 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/BTWAQIEXBBEGTSTSJ4SK25PEWDEHIAUR/
 * 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z4ADQFTGC5HMMTCJZW3WHOTNLMU5Q4JR/
 * Maybe also: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/N74C4U2POOSHZGE6ZVKFLVKN3LZ2XEEC/



I know improvements in this regard are actively worked on for pg 
removal, i.e.


 * https://tracker.ceph.com/issues/47174
 ** https://github.com/ceph/ceph/pull/37314
 ** https://github.com/ceph/ceph/pull/37496

but am wondering if this will help with snaptrims as well?



In any case I was just wondering if any of you also experienced this 
condition with RocksDB and am wondering what you do to monitor or to 
actively mitigate this prior to having flapping OSDs and queuing up 
(snaptrim) operations?
With Ceph Pacific it's possible to enable offline compaction on every 
start of an OSD (osd_compact_on_start), but is this really sufficient then?




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

2021-06-27 Thread Christian Rohmann

Hey Dominic,

thanks for your quick response!

On 25/06/2021 19:45, dhils...@performair.com wrote:

Christian;

Do the second site's RGW instance(s) have access to the first site's OSDs?  Is 
the reverse true?

It's been a while since I set up the multi-site sync between our clusters, but I seem to 
remember that, while metadata is exchanged RGW1<-->RGW2, data is exchanged 
OSD1<-->RGW2.

Anyone else on the list, PLEASE correct me if I'm wrong.


I am certain that they cannot communicate - but I am also quite certain 
communication only happens between the RGW.

Requiring communication between two sites' OSDs is usually not something

I also see the admin API is queried about the logs of the various shards 
which the API responds with HTTP 200s.
But with no indication on why data is not starting to be replicated and 
also there is the said error.


Maybe I do lack some rados pool or did miss on preparing a data 
structure? But there is just no clear error about what RADOSGW is 
unhappy about.



Regards

Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

2021-06-25 Thread Christian Rohmann

Hey ceph-users,


I setup a multisite sync between two freshly setup Octopus clusters.
In the first cluster I created a bucket with some data just to test the 
replication of actual data later.


I then followed the instructions on 
https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site-system-to-multi-site 
to add a second zone.


Things went well and both zones are now happily reaching each other and 
the API endpoints are talking.
Also the metadata is in sync already - both sides are happy and I can 
see bucket listings and users are "in sync":




# radosgw-admin sync status
  realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
  zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
   zone 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
  metadata sync no sync (zone is master)
  data sync source: c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
    init
    full sync: 128/128 shards
    full sync: 0 buckets to sync
    incremental sync: 0/128 shards
    data is behind on 128 shards
    behind shards: [0...127]



and on the other side ...


# radosgw-admin sync status
  realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
  zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
   zone c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
  metadata sync syncing
    full sync: 0/64 shards
    incremental sync: 64/64 shards
    metadata is caught up with master
  data sync source: 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
    init
    full sync: 128/128 shards
    full sync: 0 buckets to sync
    incremental sync: 0/128 shards
    data is behind on 128 shards
    behind shards: [0...127]




also the newly created buckets (read: their metadata) is synced.



What is apparently not working in the sync of actual data.

Upon startup the radosgw on the second site shows:


2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: start
2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: realm 
epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736
2021-06-25T16:15:11.525+ 7fe71dff3700  0 
RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read remote 
data log shards




also when issuing

# radosgw-admin data sync init --source-zone obst-rgn

it throws

2021-06-25T16:20:29.167+ 7f87c2aec080 0 
RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data 
log shards






Does anybody have any hints on where to look for what could be broken here?

Thanks a bunch,
Regards


Christian





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RADOSGW Keystone integration - S3 bucket policies targeting not just other tenants / projects ?

2021-06-16 Thread Christian Rohmann

Hallo Ceph-Users,

I've been wondering about the state of OpenStack Keystone Auth in RADOSGW.


1) Even though the general documentation on RADOSGW S3 bucket policies 
is a little "misleading" 
https://docs.ceph.com/en/latest/radosgw/bucketpolicy/#creation-and-removal 
in showing users being referred as Principal,
the documentation about Keystone integration at 
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstack-keystone 
clearly states, that "A Ceph Object Gateway user is mapped into a 
Keystone "||.


In the keystone authentication code it strictly only takes the project 
from the authenticating user:


 * 
https://github.com/ceph/ceph/blob/6ce6874bae8fbac8921f0bdfc3931371fc61d4ff/src/rgw/rgw_auth_keystone.cc#L127
 * 
https://github.com/ceph/ceph/blob/6ce6874bae8fbac8921f0bdfc3931371fc61d4ff/src/rgw/rgw_auth_keystone.cc#L515



This is rather unfortunate as this renders the usually powerful S3 
bucket policies to be rather basic with granting access to all users 
(with a certain role) of a project or more importantly all users of 
another project / tenant, as in using


  arn:aws:iam::$OS_REMOTE_PROJECT_ID:root

as principal.


Or am I just misreading anything here or is this really all that can be 
done if using native keystone auth?




2) There is a PR open implementing generic external authentication 
https://github.com/ceph/ceph/pull/34093


Apparently this seems to also address the lack of support for subusers 
for Keystone - if I understand this correctly I could then grant access 
to users


  arn:aws:iam::$OS_REMOTE_PROJECT_ID:$user


Are there any plans on the roadmap to extend the functionality in 
regards to keystone as authentication backend?





I know a similar question as been asked before 
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/GY7VUKCQ5QUMDYSFUJE233FKBRADXRZK/#GY7VUKCQ5QUMDYSFUJE233FKBRADXRZK)

but unfortunately with no discussion / responses then.



Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io