[ceph-users] How to replace a disk with minimal impact on performance

2023-12-07 Thread Michal Strnad

Hi guys!

Based on our observation of the impact of the balancer on the 
performance of the entire cluster, we have drawn conclusions that we 
would like to discuss with you.


- A newly created pool should be balanced before being handed over 
to the user. This, I believe, is quite evident.


- When replacing a disk, it is advisable to exchange it directly 
for a new one. As soon as the OSD replacement occurs, the balancer 
should be invoked to realign any improperly placed PGs during the disk 
outage and disk recovery.
Perhaps an even better method is to pause recovery and backfilling 
before removing the disk, remove the disk itself, promptly add a new 
one, and then resume recovery and backfilling. It's essential to per

form all of this as quickly as possible (using a script).

Ad. We are using a community balancer developed by Jonas Jelton because 
the built-in one does not meet our requirements.


What are your thoughts on this?

Michal


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs export over RGW issue in Pacific

2023-12-07 Thread Adiga, Anantha
Thank you Adam!!

Anantha

From: Adam King 
Sent: Thursday, December 7, 2023 10:46 AM
To: Adiga, Anantha 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] nfs export over RGW issue in Pacific

The first handling of nfs exports over rgw in the nfs module, including the 
`ceph nfs export create rgw` command, wasn't added to the nfs module in pacific 
until 16.2.7.

On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha 
mailto:anantha.ad...@intel.com>> wrote:
Hi,


oot@a001s016:~# cephadm version

Using recent ceph image 
ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586

ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)

root@a001s016:~#



root@a001s016:~# cephadm shell

Inferring fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8

Inferring config 
/var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/config

Using recent ceph image 
ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586



root@a001s016:~# ceph version

ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)

-
But, Cephadm does not show "nfs export create rgw"


nfs export create cephfs[--readonly] []

nfs export rm  

nfs export delete  

nfs export ls  [--detailed]

nfs export get  

nfs export update

-

However, Ceph Dashboard allows to create the export see below:

Access Type RW
Cluster nfs-1
Daemons nfs-1.0.0.zp3110b001a0101.uckows, nfs-1.1.0.zp3110b001a0102.hhpebb, 
nfs-1.2.0.zp3110b001a0103.bbkpcb, nfs-1.3.0.zp3110b001a0104.zujkso
NFS Protocol NFSv4
Object Gateway User admin
Path buc-cluster-inventory
Pseudo /rgwnfs_cluster_inventory
Squash no_root_squash
Storage Backend Object Gateway
Transport TCP
-

While nfs export is created, pseudo "/rgwnfs_cluster_inventory",   cephadm is 
not listing it

# ceph nfs export ls nfs-1
[
  "/cluster_inventory",
  "/oob_crashdump"
]
#

Anantha
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs export over RGW issue in Pacific

2023-12-07 Thread Adam King
The first handling of nfs exports over rgw in the nfs module, including the
`ceph nfs export create rgw` command, wasn't added to the nfs module in
pacific until 16.2.7.

On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha 
wrote:

> Hi,
>
>
> oot@a001s016:~# cephadm version
>
> Using recent ceph image ceph/daemon@sha256
> :261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
>
> ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific
> (stable)
>
> root@a001s016:~#
>
>
>
> root@a001s016:~# cephadm shell
>
> Inferring fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8
>
> Inferring config
> /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/config
>
> Using recent ceph image ceph/daemon@sha256
> :261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
>
>
>
> root@a001s016:~# ceph version
>
> ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific
> (stable)
>
> -
> But, Cephadm does not show "nfs export create rgw"
>
>
> nfs export create cephfs[--readonly]
> []
>
> nfs export rm  
>
> nfs export delete  
>
> nfs export ls  [--detailed]
>
> nfs export get  
>
> nfs export update
>
> -
>
> However, Ceph Dashboard allows to create the export see below:
>
> Access Type RW
> Cluster nfs-1
> Daemons nfs-1.0.0.zp3110b001a0101.uckows,
> nfs-1.1.0.zp3110b001a0102.hhpebb, nfs-1.2.0.zp3110b001a0103.bbkpcb,
> nfs-1.3.0.zp3110b001a0104.zujkso
> NFS Protocol NFSv4
> Object Gateway User admin
> Path buc-cluster-inventory
> Pseudo /rgwnfs_cluster_inventory
> Squash no_root_squash
> Storage Backend Object Gateway
> Transport TCP
> -
>
> While nfs export is created, pseudo "/rgwnfs_cluster_inventory",   cephadm
> is not listing it
>
> # ceph nfs export ls nfs-1
> [
>   "/cluster_inventory",
>   "/oob_crashdump"
> ]
> #
>
> Anantha
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] nfs export over RGW issue in Pacific

2023-12-07 Thread Adiga, Anantha
Hi,


oot@a001s016:~# cephadm version

Using recent ceph image 
ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586

ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)

root@a001s016:~#



root@a001s016:~# cephadm shell

Inferring fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8

Inferring config 
/var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/config

Using recent ceph image 
ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586



root@a001s016:~# ceph version

ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)

-
But, Cephadm does not show "nfs export create rgw"


nfs export create cephfs[--readonly] []

nfs export rm  

nfs export delete  

nfs export ls  [--detailed]

nfs export get  

nfs export update

-

However, Ceph Dashboard allows to create the export see below:

Access Type RW
Cluster nfs-1
Daemons nfs-1.0.0.zp3110b001a0101.uckows, nfs-1.1.0.zp3110b001a0102.hhpebb, 
nfs-1.2.0.zp3110b001a0103.bbkpcb, nfs-1.3.0.zp3110b001a0104.zujkso
NFS Protocol NFSv4
Object Gateway User admin
Path buc-cluster-inventory
Pseudo /rgwnfs_cluster_inventory
Squash no_root_squash
Storage Backend Object Gateway
Transport TCP
-

While nfs export is created, pseudo "/rgwnfs_cluster_inventory",   cephadm is 
not listing it

# ceph nfs export ls nfs-1
[
  "/cluster_inventory",
  "/oob_crashdump"
]
#

Anantha
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-12-07 Thread Yuri Weinstein
The issue https://github.com/ceph/ceph/pull/54772 was resolved and we
continue with the 18.2.1 release

On Fri, Dec 1, 2023 at 11:12 AM Igor Fedotov  wrote:
>
> Hi Yuri,
>
> Looks like that's not THAT critical and complicated as it's been thought
> originally. User has to change bluefs_shared_alloc_size to be exposed to
> the issue. So hopefully I'll submit a patch on Monday to close this gap
> and we'll be able to proceed.
>
>
> Thanks,
>
> Igor
>
> On 01/12/2023 18:16, Yuri Weinstein wrote:
> > Venky, pls review the test results for smoke and fs after the PRs were 
> > merged.
> >
> > Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618?
> >
> > Thx
> >
> > On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein  wrote:
> >> The fs PRs:
> >> https://github.com/ceph/ceph/pull/54407
> >> https://github.com/ceph/ceph/pull/54677
> >> were approved/tested and ready for merge.
> >>
> >> What is the status/plan for https://tracker.ceph.com/issues/63618?
> >>
> >> On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov  
> >> wrote:
> >>> https://tracker.ceph.com/issues/63618 to be considered as a blocker for
> >>> the next Reef release.
> >>>
> >>> On 07/11/2023 00:30, Yuri Weinstein wrote:
>  Details of this release are summarized here:
> 
>  https://tracker.ceph.com/issues/63443#note-1
> 
>  Seeking approvals/reviews for:
> 
>  smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
>  rados - Neha, Radek, Travis, Ernesto, Adam King
>  rgw - Casey
>  fs - Venky
>  orch - Adam King
>  rbd - Ilya
>  krbd - Ilya
>  upgrade/quincy-x (reef) - Laura PTL
>  powercycle - Brad
>  perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> 
>  Please reply to this email with approval and/or trackers of known
>  issues/PRs to address them.
> 
>  TIA
>  YuriW
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Difficulty adding / using a non-default RGW placement target & storage class

2023-12-07 Thread Anthony D'Atri
Following up on my own post from last month, for posterity.

The trick was updating the period.  I'm not using multisite, but Rook seems to 
deploy so that one can.

-- aad

> On Nov 6, 2023, at 16:52, Anthony D'Atri  wrote:
> 
> I'm having difficulty adding and using a non-default placement target & 
> storage class and would appreciate insights.  Am I going about this 
> incorrectly?  Rook does not yet have the ability to do this, so I'm adding it 
> by hand.
> 
> Following instructions on the net I added a second bucket pool, placement 
> target, and storageclass, and created a user defaulting to the new pg/sc, but 
> I get an error when trying to create a bucket:
> 
> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ s5cmd --endpoint-url 
> http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc mb s3://foofoobars
> ERROR "mb s3://foofoobars": InvalidLocationConstraint:  status code: 400, 
> request id: tx057b71002881d48ca-0065495d54-1abe555-ceph-objectstore, host 
> id:
> 
> 
> I found an article suggesting that the placement target and/or storage class 
> should have the api_name prepended, so I tried setting either or both to 
> "ceph-objectstore:HDD-EC" / "ceph-objectstore:GLACIER" with no success.  I 
> suspect that I'm missing something subtle -- or that Rook has provisioned 
> these bits in an atypical fashion.
> 
> Log entry:
> 
> /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.543+
>  7f6573a9f700  1 == starting new request req=0x7f64818ba730 =
> /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.546+
>  7f6570a99700  0 req 6320538205097380042 0.00309s s3:create_bucket could 
> not find user default placement id HDD-EC/GLACIER within zonegroup
> /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.546+
>  7f6570a99700  1 == req done req=0x7f64818ba730 op status=-2208 
> http_status=400 latency=0.00309s ==
> /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log:2023-11-06T21:40:36.546+
>  7f6570a99700  1 beast: 0x7f64818ba730: 10.233.90.156 - aad 
> [06/Nov/2023:21:40:36.543 +] "PUT /foofoobars HTTP/1.1" 400 266 - 
> "aws-sdk-go/1.40.25 (go1.18.3; linux; amd64)" - latency=0.00309s
> 
> 
> 
> [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ ceph -v
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> 
> Here's the second buckets pool, constrained to HDDs.  AFAICT it can share the 
> index and data_extra_pool created for the default / STANDARD pt/sc by Rook.  
> I initially omitted ec_overwrites but enabled it after creation.
> 
> pool 19 'ceph-objectstore.rgw.buckets.data' erasure profile 
> ceph-objectstore.rgw.buckets.data_ecprofile size 6 min_size 5 crush_rule 10 
> object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode off last_change 
> 165350 lfor 0/156300/165341 flags hashpspool,ec_overwrites stripe_width 16384 
> application rook-ceph-rgw
> pool 21 'ceph-objectstore.rgw.buckets.data.hdd' erasure profile 
> ceph-objectstore.rgw.buckets.data_ecprofile_hdd size 6 min_size 5 crush_rule 
> 11 object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode off 
> last_change 167193 lfor 0/0/164453 flags hashpspool,ec_overwrites 
> stripe_width 16384 application rook-ceph-rgw
> [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$
> 
> 
> [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ radosgw-admin zonegroup get
> {
>"id": "d994155c-2a9c-4e37-ae30-64fd2934ff99",
>"name": "ceph-objectstore",
>"api_name": "ceph-objectstore",
>"is_master": "true",
>"endpoints": [
>"http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80;
>],
>"hostnames": [],
>"hostnames_s3website": [],
>"master_zone": "72035401-a6d9-426b-8c89-9a17e268825f",
>"zones": [
>{
>"id": "72035401-a6d9-426b-8c89-9a17e268825f",
>"name": "ceph-objectstore",
>"endpoints": [
>"http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80;
>],
>"log_meta": "false",
>"log_data": "false",
>"bucket_index_max_shards": 11,
>"read_only": "false",
>"tier_type": "",
>"sync_from_all": "true",
>"sync_from": [],
>"redirect_zone": ""
>}
>],
>"placement_targets": [
>{
>"name": "HDD-EC",
>"tags": [],
>"storage_classes": [
>"GLACIER"
>]
>},
>{
>"name": "default-placement",
>"tags": [],
>"storage_classes": [
>"STANDARD"
>]
>}
>],
>"default_placement": "default-placement",
>"realm_id": "51fb8875-31ac-40ef-ab21-0ffd4e229f15",
>"sync_policy": {
>"groups": []
>}
> }
> 
> 
> 
> [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ radosgw-admin zone get
> {
>"id": "72035401-a6d9-426b-8c89-9a17e268825f",
>"name": "ceph-objectstore",
>

[ceph-users] MDS recovery with existing pools

2023-12-07 Thread Eugen Block

Hi,

following up on the previous thread (After hardware failure tried to  
recover ceph and followed instructions for recovery using OSDS), we  
were able to get ceph back into a healthy state (including the unfound  
object). Now the CephFS needs to be recovered and I'm having trouble  
to fully understand the docs [1] which the next steps would be. We ran  
the following which according to [1] sets the state to existing but  
failed:


ceph fs new--force --recover

But how to continue from here? Should we expect an active MDS at this  
point or not? Because the "ceph fs status" output still shows rank 0  
as failed. We then tried:


ceph fs set  joinable true

But apparently it was already joinable, nothing changed. Before doing  
anything (destructive) from the advanced options [2] I wanted to ask  
the community, how to get on from here. I pasted the mds logs at the  
bottom, I'm not really sure if the current state is expected or not.  
Apparently, the journal recovers but the purge_queue does not:


mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.

mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal IO  
failed: No such file or directory

mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory, force  
readonly...

mds.0.cache force file system read-only
force file system read-only

Is this expected because the "--recover" flag prevents an active MDS  
or not? Before running "ceph mds rmfailed ..." and/or "ceph fs reset  
" with the --yes-i-really-mean-it flag I'd like to  
ask for your input. In which case should we run those commands? The  
docs are not really clear to me. Any input is highly appreciated!


Thanks!
Eugen

[1] https://docs.ceph.com/en/latest/cephfs/recover-fs-after-mon-store-loss/
[2]  
https://docs.ceph.com/en/latest/cephfs/administration/#advanced-cephfs-admin-settings


---snip---
Dec 07 15:35:48 node02 bash[692598]: debug-90>  
2023-12-07T13:35:47.730+ 7f4cd855f700  1 mds.storage.node02.hemalk  
Updating MDS map to version 41 from mon.0
Dec 07 15:35:48 node02 bash[692598]: debug-89>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug-88>  
2023-12-07T13:35:47.730+ 7f4cd855f700  5 asok(0x55c27fe86000)  
register_command objecter_requests hook 0x55c27fe16310
Dec 07 15:35:48 node02 bash[692598]: debug-87>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient: _renew_subs
Dec 07 15:35:48 node02 bash[692598]: debug-86>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug-85>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 log_channel(cluster)  
update_config to_monitors: true to_syslog: false syslog_facility:   
prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port:  
12201)
Dec 07 15:35:48 node02 bash[692598]: debug-84>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug-83>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.0 apply_blocklist:  
killed 0, blocklisted sessions (0 blocklist entries, 0)
Dec 07 15:35:48 node02 bash[692598]: debug-82>  
2023-12-07T13:35:47.730+ 7f4cd855f700  1 mds.0.41 handle_mds_map i  
am now mds.0.41
Dec 07 15:35:48 node02 bash[692598]: debug-81>  
2023-12-07T13:35:47.734+ 7f4cd855f700  1 mds.0.41 handle_mds_map  
state change up:standby --> up:replay
Dec 07 15:35:48 node02 bash[692598]: debug-80>  
2023-12-07T13:35:47.734+ 7f4cd855f700  5  
mds.beacon.storage.node02.hemalk set_want_state: up:standby -> up:replay
Dec 07 15:35:48 node02 bash[692598]: debug-79>  
2023-12-07T13:35:47.734+ 7f4cd855f700  1 mds.0.41 replay_start
Dec 07 15:35:48 node02 bash[692598]: debug-78>  
2023-12-07T13:35:47.734+ 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening inotable
Dec 07 15:35:48 node02 bash[692598]: debug-77>  
2023-12-07T13:35:47.734+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug-76>  
2023-12-07T13:35:47.734+ 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening sessionmap
Dec 07 15:35:48 node02 bash[692598]: debug-75>  
2023-12-07T13:35:47.734+ 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug

[ceph-users] Moving from ceph-ansible to cephadm and upgrading from pacific to octopus

2023-12-07 Thread wodel youchi
Hi,

I have an Openstack platform deployed with Yoga and ceph-ansible pacific on
Rocky 8.

Now I need to do an upgrade to Openstack zed with octopus on Rocky 9.

This is the path of the upgrade I have traced
- upgrade my nodes to Rocky 9 keeping Openstack yoga with ceph-ansible
pacific.
- convert ceph pacific from ceph-ansible to cephadm.
- stop Openstack platform yoga
- upgrade ceph pacific to octopus
- upgrade Openstack yoga to zed.

Any thoughts or guide lines to keep in mind and follow regarding ceph
convertion and upgrade.

Ps : on my ceph I have rbd, rgw and cephfs pools.

Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io