[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-26 Thread duluxoz

Hi All,

OK, an update for everyone, a note about some (what I believe to be) 
missing information in the Ceph Doco, a success story, and an admission 
on my part that I may have left out some important information.


So to start with, I finally got everything working - I now have my 4T 
RBD Image mapped, mounted, and tested on my host.  YA!


The missing Ceph Doco Info:

What I found in the latested Redhat documentation 
(https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html/block_device_guide/the-rbd-kernel-module) 
that is not in the Ceph documentation (perhaps because it is 
EL-specific? - but a note should be placed anyway, even if it is 
EL-specific) is that the RBD Image needs to have a partition entry 
created for it - that might be "obvious" to some, but my ongoing belief 
is that most "obvious" things aren't, so its better to be explicit about 
such things. Just my $0.02 worth.  :-)


The relevant commands, which are performed after a `rbd map 
my_pool.meta/my_image --id my_image_user` are:


[codeblock]

parted /dev/rbd0 mklabel gpt

parted /dev/rbd0 mkpart primary xfs 0% 100%

[/codebock]

From there the RBD Image needs a file system: `mkfs.xfs /dev/rbd0p1`

And a mount: `mount /dev/rbd0p1 /mnt/my_image`

Now, the omission on my part:

The host I was attempting all this on was an oVirt-managed VM. 
Apparently, an oVirt-Managed VM doesn't like/allow (speculation on my 
part) running the `parted` or `mkfs.xfs` commands on an RBD Image. What 
I had to do to test this and get it working was to run the `rbd map`, 
`parted`, and `mkfs.xfs` commands on a physical host (which I did), THEN 
unmount/unmap the image from the physical host and map / mount it on the VM.


So my apologises for not providing all the info - I didn't consider it 
to be relevant - my bad!


So all good in the end. I hope the above helps others if they have 
similar issues.


Thank you all who helped / pitched in with ideas - I really, *really* 
appreciate it.


Thanks too to Wesley Dillingham - although the suggestion wasn't 
relevant to this issue, it did cause me to look at the firewall settings 
on the Ceph Cluster where I found (and corrected) an unrelated issue 
that hadn't reared its ugly head yet. Thanks Wes.


Cheers (until next time)  :-P

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Best practice in 2024 for simple RGW failover

2024-03-26 Thread E Taka
Hi,

The requirements are actually not high: 1. there should be a generally
known address for access. 2. it should be possible to reboot or shut down a
server without the RGW connections being down the entire time. A downtime
of a few seconds is OK.

Constant load balancing would be nice, but is not necessary. I have found
various approaches on the Internet - what is currently recommended for a
current Ceph installation?


Thanks,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice in 2024 for simple RGW failover

2024-03-26 Thread Marc
> 
> The requirements are actually not high: 1. there should be a generally
> known address for access. 2. it should be possible to reboot or shut down a
> server without the RGW connections being down the entire time. A downtime
> of a few seconds is OK.
> 
> Constant load balancing would be nice, but is not necessary. I have found
> various approaches on the Internet - what is currently recommended for a
> current Ceph installation?
> 

I am having haproxy, I think I saw this most here. When I would scale rgw in 
the orchestrator it would automatically load balance. Advantage is also that 
you can do a little more with security, url filtering and url rewriting.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-26 Thread Marc
is that the RBD Image needs to have a partition entry
> created for it - that might be "obvious" to some, but my ongoing belief
> is that most "obvious" things aren't, so its better to be explicit about
> such things. 
> >

Are you absolutely sure about this? I think you are missing something 
somewhere. I have been using for years the method of adding physical disks and 
now rbd devices to linux without partitioning them. I mostly do this with disks 
that contain data that is expected to grow, this way it is easier to resize 
them while vm's stay active/up. (Coming from the days that the new partition 
table was not updated and you had to reboot).




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-26 Thread duluxoz
I don't know Marc, i only know what I had to do to get the thing 
working  :-)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Lot log message from one server

2024-03-26 Thread Albert Shih
Hi, 

On my active mgr (only one) on my cluster I got repeatedly something like 

Mar 26 08:50:04 cthulhu2 conmon[2737]: 2024-03-26T07:50:04.778+ 
7f704bce8700 -1 client.0 error registering admin socket command: (17) File 
exists
Mar 26 08:50:04 cthulhu2 conmon[2737]: 2024-03-26T07:50:04.778+ 
7f704bce8700 -1 client.0 error registering admin socket command: (17) File 
exists
Mar 26 08:50:04 cthulhu2 conmon[2737]: 2024-03-26T07:50:04.778+ 
7f704bce8700 -1 client.0 error registering admin socket command: (17) File 
exists
Mar 26 08:50:04 cthulhu2 conmon[2737]: 2024-03-26T07:50:04.778+ 
7f704bce8700 -1 client.0 error registering admin socket command: (17) File 
exists
Mar 26 08:50:04 cthulhu2 conmon[2737]: 2024-03-26T07:50:04.778+ 
7f704bce8700 -1 client.0 error registering admin socket command: (17) File 
exists
Mar 26 08:55:04 cthulhu2 ceph-mgr[2843]: client.0 error registering admin 
socket command: (17) File exists
Mar 26 08:55:04 cthulhu2 ceph-mgr[2843]: client.0 error registering admin 
socket command: (17) File exists
Mar 26 08:55:04 cthulhu2 ceph-mgr[2843]: client.0 error registering admin 
socket command: (17) File exists
Mar 26 08:55:04 cthulhu2 ceph-mgr[2843]: client.0 error registering admin 
socket command: (17) File exists
Mar 26 08:55:04 cthulhu2 ceph-mgr[2843]: client.0 error registering admin 
socket command: (17) File exists

I check the dashboard (first hit with google on that message) and that seem OK, 
no config about it listen address.

Those messages come each ~ 1 hour. 

I notice those message appear each time just after 

Mar 26 08:50:04 cthulhu2 ceph-mgr[2843]: [volumes INFO volumes.module] Starting 
_cmd_fs_subvolume_ls(prefix:fs subvolume ls, target:['mon-mgr', ''], 
vol_name:cephfs) < ""

but I don't known if that's related.

Any clue ? 

Regards






-- 
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
mar. 26 mars 2024 10:52:53 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can setting mds_session_blocklist_on_timeout to false minize the session eviction?

2024-03-26 Thread Yongseok Oh
Hi,

CephFS is provided as a shared file system service in a private cloud 
environment of our company, LINE. The number of sessions is approximately more 
than 5,000, and session evictions occur several times a day. When session 
eviction occurs, the message 'Cannot send after transport endpoint shutdown' or 
'Permission denied' is displayed and file system access is not possible. Our 
users are very uncomfortable with this issue. In particular, there are no 
special problems such as network connection or CPU usage. When I access the 
machine and take a close look, there are no special problems. After this, users 
feel the inconvenience of having to perform umount/mount tasks and run the 
application again. In a Kubernetes environment, recovery is a bit more 
complicated, which causes a lot of frustration.

I tested that by setting the mds_session_blocklist_on_timeout and 
mds_session_blocklist_on_evict options to false and setting 
client_reconnect_stale to true on the client side, the file system can be 
accessed even if eviction occurs. It seemed like there was no major problem 
accessing the file system as the session was still attached.

What I'm curious about is if I turn on the above option, will there be any 
other side effects? For example, should I take some action if the integrity of 
the file is broken or if there is an issue on the mds side? I am asking a 
question because there are no details regarding this in the official CephFS 
documentation.

Thank you

Yongseok
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy/Dashboard: Object Gateway not accessible after applying self-signed cert to rgw service

2024-03-26 Thread stephan . budach
Although I did search the list for prior posting regarding the issue I had, I 
finally came across a thread which addressed this issue. In short, updating 
Ceph to 18.2.2 resolved this issue.

The first issue was - as I initially suspected, caused by mgr to perform a 
verification on the SSL certs presented to it. Once I disabled that using

ceph dashboard set-rgw-api-ssl-verify false

I was presented with another error, which was then resolved by the Ceph upgrade:

Mär 25 10:47:25 bceph01 conmon[260644]: ValueError: invalid literal for int() 
with base 10: '9443 ssl_certificate=config://rgw/cert/rgw.rgw.JVM.GH79'
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm on mixed architecture hosts

2024-03-26 Thread Iain Stott
Hi,

We are trying to deploy Ceph Reef 18.2.1 using cephadm on mixed architecture 
hosts using x86_64 for the mons and aarch64 for the OSDs.

During deployment we use the following config for the bootstrap process, where 
$REPOSITORY is our docker repo.

[global]
container_image = $REPOSITORY/ceph/ceph:v18.2.1
[mgr]
mgr/cephadm/container_image_base = $REPOSITORY/ceph/ceph:v18.2.1
mgr/cephadm/container_image_prometheus = $REPOSITORY/ceph/prometheus:v2.33.4
mgr/cephadm/container_image_node_exporter = 
$REPOSITORY/ceph/node-exporter:v1.3.1
mgr/cephadm/container_image_grafana = $REPOSITORY/ceph/ceph-grafana:8.3.5
mgr/cephadm/container_image_alertmanager = $REPOSITORY/ceph/alertmanager:v0.23.0
[osd]
container_image = $REPOSITORY/ceph/ceph:v18.2.1
Once  the bootstrap process is complete, if we do a ceph config dump, the 
container image for global and osd changes from version tag to sha reference, 
meaning that when deploying the containers on the OSDs they try using the amd64 
container image and not the aarch64 image and fail deployment.

Is there a config setting we are missing or a workaround for this?

Thanks
Iain

Iain Stott
OpenStack Engineer
iain.st...@thg.com
[THG Ingenuity Logo]
www.thg.com
[LinkedIn] 
[Instagram]   [X] 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] stretch mode item not defined

2024-03-26 Thread ronny.lippold

hi there, need some help please.

we are planning to replace our rbd-mirror setup and go to stretch mode.
the goal is, to have the cluster in 2 fire compartment server rooms.

start was a default proxmox/ceph setup.
now, i followed the howto from: 
https://docs.ceph.com/en/latest/rados/operations/stretch-mode/


with the crushmap i ended in an error:
in rule 'stretch_rule' item 'dc1' not defined (dc1,2,3 is like site1,2,3 
in the docu).


my osd tree looks like
pve-test02-01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME   STATUS  REWEIGHT  PRI-AFF
 -1 3.80951  root default
-15 0.06349  host pve-test01-01
  0ssd  0.06349  osd.0   up   1.0  1.0
-19 0.06349  host pve-test01-02
  1ssd  0.06349  osd.1   up   1.0  1.0
-17 0.06349  host pve-test01-03
  2ssd  0.06349  osd.2   up   1.0  1.0
...

i think there is something missing.
should i need adding the buckets manually?
...

for disaster failover, any ideas are welcome.


man thanks for help and kind regards,
ronny
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I set osd fast shutdown = true

2024-03-26 Thread Manuel Lausch
I would suggest this way

ceph config set global osd_fast_shutdown true

Regards
Manuel

On Tue, 26 Mar 2024 12:12:22 +0530
Suyash Dongre  wrote:

> Hello,
> 
> I want to set osd fast shutdown = true, how should I achieve this?
> 
> Regards,
> Suyash
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs client not released caps when running rsync

2024-03-26 Thread Nikita Borisenkov
We transfer data (300 million small files) using rsync between cephfs 
from version 12.2.13 to 18.2.1. After about the same time (about 7 hours 
in this case), copying stops for a minute


```
health:HEALTH_WARN
 1 clients failing to advance oldest client/flush tid
 1 MDSs report slow metadata IOs
 1 MDSs behind on trimming
```

The only interesting messages in the logs are:
```
ceph-mds[640738]: mds.beacon.cephfs.X missed beacon ack from the monitors
```

I watched debugging on the client (destination) via watch -n1
/sys/kernel/debug/ceph/451eea44-d7a0-11ee-9117-b496914b4c02.client32497
```
item total
--
opened files / total inodes 1 / 866757
pinned i_caps / total inodes 866757 / 866757
opened inodes / total inodes 1 / 866757

item total avg_lat(us) min_lat(us) max_lat(us) stdev(us)
-- 
-

read 0 0 0 0 0
write 5129252 194689 9080 54693415 1023
metadata 29045143 670 161 87794124 369

item total miss hit
-
d_lease 5361 3026 315488442
caps 866757 11 483177478
```
During the copy, "pinned i_caps / total inodes" are gradually increased 
until it reaches the value "mds_max_caps_per_client" (default: 1Mi). 
Then "pinned i_caps / total inodes" begins to decrease to almost 0, at 
which time HEALTH_WARN appears and transfer stops. "op/s wr" increases 
from 200 to 1.5k. Then total inodes begin to increase again along with 
the resumption of copying and the cluster goes into the HEALTHY state.


Mount options:
/mnt/cephfs-old 10.77.12.90:6789,10.77.12.91:6789,10.77.12.92:6789:/ 
ceph rw,noatime,nodiratime,name=admin,secret=,acl
/mnt/cephfs-new 10.77.12.139:6789,10.77.12.140:6789,10.77.12.141:6789:/ 
ceph rw,noatime,nodiratime,name=admin,secret=,acl,caps_max=1


Client properties on the MDS server (removed unnecessary):
ceph daemon mds.cephfs.X client ls 32497
[
 {
 ...
 "id": 32497,
 "state": "open",
 "num_leases": 0,
 "num_caps": 980679,
 "request_load_avg": 7913,
 "requests_in_flight": 466,
 "num_completed_flushes": 464,
 "recall_caps": {
 "value": 0,
 "halflife": 60
 },
 "release_caps": {
 "value": 1732.2552002208533,
 "halflife": 60
 },
 "recall_caps_throttle": {
 "value": 0,
 "halflife": 1.35001
 },
 "recall_caps_throttle2o": {
 "value": 0,
 "halflife": 0.5
 },
 "session_cache_liveness": {
 "value": 42186.620275326415,
 "halflife": 300
 },
 "cap_acquisition": {
 "value": 0,
 "halflife": 30
 },
 ...
 }
]

ceph daemonperf mds.cephfs.X
```
---mds-- 
- --mds_cache--- --mds_log 
 -mds_mem- ---mds_server--- mds_ -objecter-- 
purg
req rlat slr fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd 
hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat 
|sess|actv rd wr rdwr|purg|
114 0 0 0 1.9M 438k 0 0 0 0 253 0 0 0 0 0 0 59 | 0 0 0 |128 123k 129 0 
|1.3M 1.9M|114 0 0 0 0 | 3 | 0 0 440 0 | 0
101 0 0 0 1.9M 438k 0 0 0 0 0 0 0 0 0 0 0 53 | 0 0 0 |106 123k 129 0 
|1.3M 1.9M|101 0 0 0 0 | 3 | 0 0 0 0 | 0

...
``` - from this output it is clear that the client does not send cap 
release at all (column hccr, decoding of the columns "ceph daemon mds.X 
perf schema")


Then I was able to find the right query to google similar problems:
https://www.spinics.net/lists/ceph-users/msg50573.html
https://ceph-users.ceph.narkive.com/mcyPtEyz/rsync-kernel-client-cepfs-mkstemp-no-space-left-on-device
https://www.spinics.net/lists/ceph-users/msg50158.html
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/

It turns out that the problem is in rsync, in the way it works?

The only "solution" is to do it on the client according to a schedule 
(or upon reaching a certain number of open caps) “echo 2 > 
/proc/sys/vm/drop_caches”. After this command, the cephfs client 
releases the cached caps. And if there were a lot of them, then MDS 
becomes slow again.


We also tried to mount cephfs with the option "caps_max=1" so that 
the client would do a forced release when the specified value is 
reached, but this did not help.


We can limit mds_max_caps_per_client (not tested), but this also affects 
all clients at once.


The command "ceph daemon mds.cephfs.X cache drop" (with or without an 
additional parameter) does not help


Tested on Linux kernels (client side): 5.10 and 6.1

Did I understand everything correctly? is this the expected behavior 
when running rsync?



And one more problem (I don’t know if it’s related or not), when rsync 
finish

[ceph-users] Re: stretch mode item not defined

2024-03-26 Thread Anthony D'Atri
Yes, you will need to create datacenter buckets and move your host buckets 
under them.


> On Mar 26, 2024, at 09:18, ronny.lippold  wrote:
> 
> hi there, need some help please.
> 
> we are planning to replace our rbd-mirror setup and go to stretch mode.
> the goal is, to have the cluster in 2 fire compartment server rooms.
> 
> start was a default proxmox/ceph setup.
> now, i followed the howto from: 
> https://docs.ceph.com/en/latest/rados/operations/stretch-mode/
> 
> with the crushmap i ended in an error:
> in rule 'stretch_rule' item 'dc1' not defined (dc1,2,3 is like site1,2,3 in 
> the docu).
> 
> my osd tree looks like
> pve-test02-01:~# ceph osd tree
> ID   CLASS  WEIGHT   TYPE NAME   STATUS  REWEIGHT  PRI-AFF
> -1 3.80951  root default
> -15 0.06349  host pve-test01-01
>  0ssd  0.06349  osd.0   up   1.0  1.0
> -19 0.06349  host pve-test01-02
>  1ssd  0.06349  osd.1   up   1.0  1.0
> -17 0.06349  host pve-test01-03
>  2ssd  0.06349  osd.2   up   1.0  1.0
> ...
> 
> i think there is something missing.
> should i need adding the buckets manually?
> ...
> 
> for disaster failover, any ideas are welcome.
> 
> 
> man thanks for help and kind regards,
> ronny
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm on mixed architecture hosts

2024-03-26 Thread Daniel Brown

Iain - 


I’ve seen this same behavior. I’ve not found a work-around, though would agree 
that it would be a “nice to have” feature.






> On Mar 26, 2024, at 7:22 AM, Iain Stott  wrote:
> 
> Hi,
> 
> We are trying to deploy Ceph Reef 18.2.1 using cephadm on mixed architecture 
> hosts using x86_64 for the mons and aarch64 for the OSDs.
> 
> During deployment we use the following config for the bootstrap process, 
> where $REPOSITORY is our docker repo.
> 
> [global]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> [mgr]
> mgr/cephadm/container_image_base = $REPOSITORY/ceph/ceph:v18.2.1
> mgr/cephadm/container_image_prometheus = $REPOSITORY/ceph/prometheus:v2.33.4
> mgr/cephadm/container_image_node_exporter = 
> $REPOSITORY/ceph/node-exporter:v1.3.1
> mgr/cephadm/container_image_grafana = $REPOSITORY/ceph/ceph-grafana:8.3.5
> mgr/cephadm/container_image_alertmanager = 
> $REPOSITORY/ceph/alertmanager:v0.23.0
> [osd]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> Once  the bootstrap process is complete, if we do a ceph config dump, the 
> container image for global and osd changes from version tag to sha reference, 
> meaning that when deploying the containers on the OSDs they try using the 
> amd64 container image and not the aarch64 image and fail deployment.
> 
> Is there a config setting we are missing or a workaround for this?
> 
> Thanks
> Iain
> 
> Iain Stott
> OpenStack Engineer
> iain.st...@thg.com
> [THG Ingenuity Logo]
> www.thg.com
> [LinkedIn] 
> [Instagram]   [X] 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Clients failing to advance oldest client?

2024-03-26 Thread Erich Weiler
Thank you!  The OSD/mon/mgr/MDS servers are on 18.2.1, and the clients 
are mostly 17.2.6.


-erich

On 3/25/24 11:57 PM, Dhairya Parmar wrote:
I think this bug has already been worked on in 
https://tracker.ceph.com/issues/63364 
, can you tell which version 
you're on?


--
*Dhairya Parmar*

Associate Software Engineer, CephFS

IBM, Inc.



On Tue, Mar 26, 2024 at 2:32 AM Erich Weiler > wrote:


Hi Y'all,

I'm seeing this warning via 'ceph -s' (this is on Reef):

# ceph -s
    cluster:
      id:     58bde08a-d7ed-11ee-9098-506b4b4da440
      health: HEALTH_WARN
              3 clients failing to advance oldest client/flush tid
              1 MDSs report slow requests
              1 MDSs behind on trimming

    services:
      mon: 5 daemons, quorum
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 3d)
      mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
      mds: 1/1 daemons up, 1 standby
      osd: 46 osds: 46 up (since 3d), 46 in (since 2w)

    data:
      volumes: 1/1 healthy
      pools:   4 pools, 1313 pgs
      objects: 258.13M objects, 454 TiB
      usage:   688 TiB used, 441 TiB / 1.1 PiB avail
      pgs:     1303 active+clean
               8    active+clean+scrubbing
               2    active+clean+scrubbing+deep

    io:
      client:   131 MiB/s rd, 111 MiB/s wr, 41 op/s rd, 613 op/s wr

I googled around and looked at the docs and it seems like this isn't a
critical problem, but I couldn't find a clear path to resolution.  Does
anyone have any advice on what I can do to resolve the health issues
up top?

My CephFS filesystem is incredibly busy so I have a feeling that has
some impact here, but not 100% sure...

Thanks as always for the help!

cheers,
erich
___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm on mixed architecture hosts

2024-03-26 Thread John Mulligan
On Tuesday, March 26, 2024 7:22:18 AM EDT Iain Stott wrote:
> Hi,
> 
> We are trying to deploy Ceph Reef 18.2.1 using cephadm on mixed architecture
> hosts using x86_64 for the mons and aarch64 for the OSDs.
> 
> During deployment we use the following config for the bootstrap process,
> where $REPOSITORY is our docker repo.
> 
> [global]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> [mgr]
> mgr/cephadm/container_image_base = $REPOSITORY/ceph/ceph:v18.2.1
> mgr/cephadm/container_image_prometheus = $REPOSITORY/ceph/prometheus:v2.33.4
> mgr/cephadm/container_image_node_exporter =
> $REPOSITORY/ceph/node-exporter:v1.3.1 mgr/cephadm/container_image_grafana =
> $REPOSITORY/ceph/ceph-grafana:8.3.5
> mgr/cephadm/container_image_alertmanager =
> $REPOSITORY/ceph/alertmanager:v0.23.0 [osd]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> Once  the bootstrap process is complete, if we do a ceph config dump, the
> container image for global and osd changes from version tag to sha
> reference, meaning that when deploying the containers on the OSDs they try
> using the amd64 container image and not the aarch64 image and fail
> deployment.
> 
> Is there a config setting we are missing or a workaround for this?
> 

Try:
`ceph config set mgr mgr/cephadm/use_repo_digest false`

This comes up often enough that we should document it. I don't think that 
option is documented  right now because I searched for the option with google 
and all that came up were other tracker issues and older mailing-list posts. 
In the longer term we may want to make cephadm arch-aware (volunteers welcome 
:-) ).




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mclock and massive reads

2024-03-26 Thread Luis Domingues
Hello,

We have a question about mClock scheduling reads on pacific (16.2.14 currently).

When we do massive reads, from let's say machines we want to drain containing a 
lot of data on EC pools, we observe quite frequently slow ops on the source 
OSDs. Those slow ops affect the client services, talking directly rados. If we 
kill the OSD that causes slow ops, the recovery stays more or less at the same 
speed, but no more slow ops.

And when we tweak mClock, if we limit on the OSDs that are the source, nothing 
that we can observe happens. However, if we limit on the target OSDs, the 
global speed slows down, and the slow ops disappear.

So our question, is mClock taking into account the reads as well as the writes? 
Or are the reads calculate to be less expensive than the writes?

Thanks,

Luis Domingues
Proton AG
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs client not released caps when running rsync

2024-03-26 Thread Alexander E. Patrakov
Hello Nikita,

A valid workaround is to export both instances of CephFS via
NFS-Ganesha and run rsync on NFS, not on CephFS directly.

On Tue, Mar 26, 2024 at 10:15 PM Nikita Borisenkov
 wrote:
>
> We transfer data (300 million small files) using rsync between cephfs
> from version 12.2.13 to 18.2.1. After about the same time (about 7 hours
> in this case), copying stops for a minute
>
> ```
> health:HEALTH_WARN
>   1 clients failing to advance oldest client/flush tid
>   1 MDSs report slow metadata IOs
>   1 MDSs behind on trimming
> ```
>
> The only interesting messages in the logs are:
> ```
> ceph-mds[640738]: mds.beacon.cephfs.X missed beacon ack from the monitors
> ```
>
> I watched debugging on the client (destination) via watch -n1
> /sys/kernel/debug/ceph/451eea44-d7a0-11ee-9117-b496914b4c02.client32497
> ```
> item total
> --
> opened files / total inodes 1 / 866757
> pinned i_caps / total inodes 866757 / 866757
> opened inodes / total inodes 1 / 866757
>
> item total avg_lat(us) min_lat(us) max_lat(us) stdev(us)
> --
> -
> read 0 0 0 0 0
> write 5129252 194689 9080 54693415 1023
> metadata 29045143 670 161 87794124 369
>
> item total miss hit
> -
> d_lease 5361 3026 315488442
> caps 866757 11 483177478
> ```
> During the copy, "pinned i_caps / total inodes" are gradually increased
> until it reaches the value "mds_max_caps_per_client" (default: 1Mi).
> Then "pinned i_caps / total inodes" begins to decrease to almost 0, at
> which time HEALTH_WARN appears and transfer stops. "op/s wr" increases
> from 200 to 1.5k. Then total inodes begin to increase again along with
> the resumption of copying and the cluster goes into the HEALTHY state.
>
> Mount options:
> /mnt/cephfs-old 10.77.12.90:6789,10.77.12.91:6789,10.77.12.92:6789:/
> ceph rw,noatime,nodiratime,name=admin,secret=,acl
> /mnt/cephfs-new 10.77.12.139:6789,10.77.12.140:6789,10.77.12.141:6789:/
> ceph rw,noatime,nodiratime,name=admin,secret=,acl,caps_max=1
>
> Client properties on the MDS server (removed unnecessary):
> ceph daemon mds.cephfs.X client ls 32497
> [
>   {
>   ...
>   "id": 32497,
>   "state": "open",
>   "num_leases": 0,
>   "num_caps": 980679,
>   "request_load_avg": 7913,
>   "requests_in_flight": 466,
>   "num_completed_flushes": 464,
>   "recall_caps": {
>   "value": 0,
>   "halflife": 60
>   },
>   "release_caps": {
>   "value": 1732.2552002208533,
>   "halflife": 60
>   },
>   "recall_caps_throttle": {
>   "value": 0,
>   "halflife": 1.35001
>   },
>   "recall_caps_throttle2o": {
>   "value": 0,
>   "halflife": 0.5
>   },
>   "session_cache_liveness": {
>   "value": 42186.620275326415,
>   "halflife": 300
>   },
>   "cap_acquisition": {
>   "value": 0,
>   "halflife": 30
>   },
>   ...
>   }
> ]
>
> ceph daemonperf mds.cephfs.X
> ```
> ---mds--
> - --mds_cache--- --mds_log
>  -mds_mem- ---mds_server--- mds_ -objecter--
> purg
> req rlat slr fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd
> hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat
> |sess|actv rd wr rdwr|purg|
> 114 0 0 0 1.9M 438k 0 0 0 0 253 0 0 0 0 0 0 59 | 0 0 0 |128 123k 129 0
> |1.3M 1.9M|114 0 0 0 0 | 3 | 0 0 440 0 | 0
> 101 0 0 0 1.9M 438k 0 0 0 0 0 0 0 0 0 0 0 53 | 0 0 0 |106 123k 129 0
> |1.3M 1.9M|101 0 0 0 0 | 3 | 0 0 0 0 | 0
> ...
> ``` - from this output it is clear that the client does not send cap
> release at all (column hccr, decoding of the columns "ceph daemon mds.X
> perf schema")
>
> Then I was able to find the right query to google similar problems:
> https://www.spinics.net/lists/ceph-users/msg50573.html
> https://ceph-users.ceph.narkive.com/mcyPtEyz/rsync-kernel-client-cepfs-mkstemp-no-space-left-on-device
> https://www.spinics.net/lists/ceph-users/msg50158.html
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/
>
> It turns out that the problem is in rsync, in the way it works?
>
> The only "solution" is to do it on the client according to a schedule
> (or upon reaching a certain number of open caps) “echo 2 >
> /proc/sys/vm/drop_caches”. After this command, the cephfs client
> releases the cached caps. And if there were a lot of them, then MDS
> becomes slow again.
>
> We also tried to mount cephfs with the option "caps_max=1" so that
> the client would do a forced release when the specified value is
> reached, but this did no

[ceph-users] 1x port from bond down causes all osd down in a single machine

2024-03-26 Thread Szabo, Istvan (Agoda)
Hi,

Wonder what we are missing from the netplan configuration on ubuntu which ceph 
needs to tolerate properly.
We are using this bond configuration on ubuntu 20.04 with octopus ceph:


bond1:
  macaddress: x.x.x.x.x.50
  dhcp4: no
  dhcp6: no
  addresses:
- 192.168.199.7/24
  interfaces:
- ens2f0np0
- ens2f1np1
  mtu: 9000
  parameters:
mii-monitor-interval: 100
mode: 802.3ad
lacp-rate: fast
transmit-hash-policy: layer3+4



ens2f1np1 failed and caused slow ops, all osd down ... = disaster

Any idea what is wrong with this bond config?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS filesystem mount tanks on some nodes?

2024-03-26 Thread Erich Weiler

Hi All,

We have a CephFS filesystem where we are running Reef on the servers 
(OSD/MDS/MGR/MON) and Quincy on the clients.


Every once in a while, one of the clients will stop allowing access to 
my CephFS filesystem, the error being "permission denied" while try to 
access the filesystem on that node.  The fix is to force unmount the 
filesystem and remount it, then it's fine again.  Any idea how I can 
prevent this?


I see this in the client node logs:

Mar 25 11:34:46 phoenix-07 kernel: [50508.354036]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:34:46 phoenix-07 kernel: [50508.359650]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:34:46 phoenix-07 kernel: [50508.367657]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.189000]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.192579]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.196103]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.024268]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.031520]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.038594]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:40:48 phoenix-07 kernel: [50870.853281]  ? 
__touch_cap+0x24/0xd0 [ceph]
Mar 25 22:55:38 phoenix-07 kernel: [91360.583032] libceph: mds0 
(1)10.50.1.75:6801 socket closed (con state OPEN)
Mar 25 22:55:38 phoenix-07 kernel: [91360.667914] libceph: mds0 
(1)10.50.1.75:6801 session reset
Mar 25 22:55:38 phoenix-07 kernel: [91360.667923] ceph: mds0 closed our 
session

Mar 25 22:55:38 phoenix-07 kernel: [91360.667925] ceph: mds0 reconnect start
Mar 25 22:55:52 phoenix-07 kernel: [91374.541614] ceph: mds0 reconnect 
denied
Mar 25 22:55:52 phoenix-07 kernel: [91374.541726] ceph:  dropping 
dirty+flushing Fw state for ea96c18f 1099683115069
Mar 25 22:55:52 phoenix-07 kernel: [91374.541732] ceph:  dropping 
dirty+flushing Fw state for ce495f00 1099687100635
Mar 25 22:55:52 phoenix-07 kernel: [91374.541737] ceph:  dropping 
dirty+flushing Fw state for 73ebb190 1099687100636
Mar 25 22:55:52 phoenix-07 kernel: [91374.541744] ceph:  dropping 
dirty+flushing Fw state for 91337e6a 1099687100637
Mar 25 22:55:52 phoenix-07 kernel: [91374.541746] ceph:  dropping 
dirty+flushing Fw state for 9075ecd8 1099687100634
Mar 25 22:55:52 phoenix-07 kernel: [91374.541751] ceph:  dropping 
dirty+flushing Fw state for d1d4c51f 1099687100633
Mar 25 22:55:52 phoenix-07 kernel: [91374.541781] ceph:  dropping 
dirty+flushing Fw state for 63dec1e4 1099687100632
Mar 25 22:55:52 phoenix-07 kernel: [91374.541793] ceph:  dropping 
dirty+flushing Fw state for 8b3124db 1099687100638
Mar 25 22:55:52 phoenix-07 kernel: [91374.541796] ceph:  dropping 
dirty+flushing Fw state for d9e76d8b 1099687100471
Mar 25 22:55:52 phoenix-07 kernel: [91374.541798] ceph:  dropping 
dirty+flushing Fw state for b57da610 1099685041085
Mar 25 22:55:52 phoenix-07 kernel: [91374.542235] libceph: mds0 
(1)10.50.1.75:6801 socket closed (con state V1_CONNECT_MSG)
Mar 25 22:55:52 phoenix-07 kernel: [91374.791652] ceph: mds0 rejected 
session
Mar 25 23:01:51 phoenix-07 kernel: [91733.308806] ceph: get_quota_realm: 
ino (1.fffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.182127] ceph: 
check_quota_exceeded: ino (1000a1cb4a8.fffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.188225] ceph: 
check_quota_exceeded: ino (1000a1cb4a8.fffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.233658] ceph: 
check_quota_exceeded: ino (1000a1cb4aa.fffe) null i_snap_realm
Mar 25 23:25:52 phoenix-07 kernel: [93174.787630] libceph: mds0 
(1)10.50.1.75:6801 socket closed (con state OPEN)
Mar 25 23:39:45 phoenix-07 kernel: [94007.751879] ceph: get_quota_realm: 
ino (1.fffe) null i_snap_realm
Mar 26 00:03:28 phoenix-07 kernel: [95430.158646] ceph: get_quota_realm: 
ino (1.fffe) null i_snap_realm
Mar 26 00:39:45 phoenix-07 kernel: [97607.685421] ceph: get_quota_realm: 
ino (1.fffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.681145] ceph: 
check_quota_exceeded: ino (1000a306503.fffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.686797] ceph: 
check_quota_exceeded: ino (1000a306503.fffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.729046] ceph: 
check_quota_exceeded: ino (1000a306505.fffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.302564] ceph: 
check_quota_exceeded: ino (1000a75677d.fffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.305676] ceph: 
check_quota_exceeded: ino (1000a75677d.fffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.347267] ceph: 
check_quota_exceeded: ino (1000a755fe3.fffe) null i_snap_realm
Mar 26 01:04:49 pho

[ceph-users] Re: mark direct Zabbix support deprecated? Re: Ceph versus Zabbix: failure: no data sent

2024-03-26 Thread Zac Dover
I have created https://tracker.ceph.com/issues/65161 in order to track the 
process of updating the Zabbix documentation.

Zac Dover
Upstream Docs
Ceph Foundation




On Tuesday, March 26th, 2024 at 5:49 AM, John Jasen  wrote:

> 
> 
> Well, at least on my RHEL Ceph cluster, turns out zabbix-sender,
> zabbix-agent, etc aren't in the container image. Doesn't explain why it
> didn't work with the Debian/proxmox version, but shrug.
> 
> It appears there is no interest in adding them back in, per:
> https://github.com/ceph/ceph-container/issues/1651
> 
> As such, may I recommend marking the Ceph documentation to this effect?
> Possibly referring to Zabbix instructions with Agent 2?
> 
> 
> 
> 
> On Fri, Mar 22, 2024 at 7:04 PM John Jasen jja...@gmail.com wrote:
> 
> > If the documentation is to be believed, it's just install the zabbix
> > sender, then;
> > 
> > ceph mgr module enable zabbix
> > 
> > ceph zabbix config-set zabbix_host my-zabbix-server
> > 
> > (Optional) Set the identifier to the fsid.
> > 
> > And poof. I should now have a discovered entity on my zabbix server to add
> > templates to.
> > 
> > However, this has not worked yet on either of my ceph clusters (one RHEL,
> > one proxmox).
> > 
> > Reference: https://docs.ceph.com/en/latest/mgr/zabbix/
> > 
> > On Reddit advice, I installed the Ceph templates for Zabbix.
> > https://raw.githubusercontent.com/ceph/ceph/master/src/pybind/mgr/zabbix/zabbix_template.xml
> > 
> > Still no dice. No traffic at all seems to be generated, that I've seen
> > from packet traces,
> > 
> > ... OK.
> > 
> > I su'ed to the ceph user on both clusters, and ran zabbix_send:
> > 
> > zabbix_sender -v -z 10.0.0.1 -s "$my_fsid" -k ceph.osd_avg_pgs -o 1
> > 
> > Response from "10.0.0.1:10051": "processed: 1; failed: 0; total: 1;
> > seconds spent: 0.42"
> > 
> > sent: 1; skipped: 0; total: 1
> > 
> > As the ceph user, ceph zabbix send/discovery still fail.
> > 
> > I am officially stumped.
> > 
> > Any ideas as to which tree I should be barking up?
> > 
> > Thanks in advance!
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm stacktrace on copying ceph.conf

2024-03-26 Thread Jesper Agerbo Krogh [JSKR]
Hi. 

We're currently getting these errors - and I seem to be missing a clear 
overview over the cause and how to debug. 

3/26/24 9:38:09 PM[ERR]executing _write_files((['dkcphhpcadmin01', 
'dkcphhpcmgt028', 'dkcphhpcmgt029', 'dkcphhpcmgt031', 'dkcphhpcosd033', 
'dkcphhpcosd034', 'dkcphhpcosd035', 'dkcphhpcosd036', 'dkcphhpcosd037', 
'dkcphhpcosd038', 'dkcphhpcosd039', 'dkcphhpcosd040', 'dkcphhpcosd041', 
'dkcphhpcosd042', 'dkcphhpcosd043', 'dkcphhpcosd044'],)) failed. Traceback 
(most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 240, 
in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await 
source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 
458, in run self.handle_error(exc) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise 
exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in 
run await self._send_files(path, b'') File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files 
self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", 
line 307, in handle_error raise exc from None File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await 
self._send_file(srcpath, dstpath, attrs) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await 
self._make_cd_request(b'C', attrs, size, srcpath) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request 
self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", 
line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: 
/tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: 
Permission denied During handling of the above exception, another exception 
occurred: Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/utils.py", line 79, in do_work return f(*arg) File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1088, in _write_files 
self._write_client_files(client_files, host) File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1107, in _write_client_files 
self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) File 
"/usr/share/ceph/mgr/cephadm/ssh.py", line 261, in write_remote_file host, 
path, content, mode, uid, gid, addr)) File 
"/usr/share/ceph/mgr/cephadm/module.py", line 615, in wait_async return 
self.event_loop.get_result(coro) File "/usr/share/ceph/mgr/cephadm/ssh.py", 
line 56, in get_result return asyncio.run_coroutine_threadsafe(coro, 
self._loop).result() File "/lib64/python3.6/concurrent/futures/_base.py", line 
432, in result return self.__get_result() File 
"/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise 
self._exception File "/usr/share/ceph/mgr/cephadm/ssh.py", line 249, in 
_write_remote_file raise OrchestratorError(msg) 
orchestrator._interface.OrchestratorError: Unable to write 
dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf:
 scp: 
/tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: 
Permission denied
3/26/24 9:38:09 PM[ERR]Unable to write 
dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf:
 scp: 
/tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: 
Permission denied Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/ssh.py", line 240, in _write_remote_file await 
asyncssh.scp(f.name, (conn, tmp_path)) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await 
source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 
458, in run self.handle_error(exc) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise 
exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in 
run await self._send_files(path, b'') File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files 
self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", 
line 307, in handle_error raise exc from None File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await 
self._send_file(srcpath, dstpath, attrs) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await 
self._make_cd_request(b'C', attrs, size, srcpath) File 
"/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request 
self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", 
line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: 
/tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: 
Permission denied
3/26/24 9:38:09 PM[INF]Updating 
dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf

It seem to be related to the permissions that the manager writes the files with 
and the process copying 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-26 Thread Adam King
For context, the value the autotune goes with takes the value from `cephadm
gather-facts` on the host (the "memory_total_kb" field) and then subtracts
from that per daemon on the host according to

min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
'mon': 1024 * 1048576,
'crash': 128 * 1048576,
'keepalived': 128 * 1048576,
'haproxy': 128 * 1048576,
'nvmeof': 4096 * 1048576,
}
default_size = 1024 * 1048576

what's left is then divided by the number of OSDs on the host to arrive at
the value. I'll also add, since it seems to be an issue on this particular
host,  if you add the "_no_autotune_memory" label to the host, it will stop
trying to do this on that host.

On Mon, Mar 25, 2024 at 6:32 PM  wrote:

> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in
> it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB
> of RAM each, and the remaining have 24gb
> For some reason i am unable to identify, the first host in the cluster
> appears to constantly be trying to set the osd_memory_target variable to
> roughly half of what the calculated minimum is for the cluster, i see the
> following spamming the logs constantly
> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
> value: Value '480485376' is below minimum 939524096
> Default is set to 4294967296.
> I did double check and osd_memory_base (805306368) + osd_memory_cache_min
> (134217728) adds up to minimum exactly
> osd_memory_target_autotune is currently enabled. But i cannot for the life
> of me figure out how it is arriving at 480485376 as a value for that
> particular host that even has the most RAM. Neither the cluster or the host
> is even approaching max utilization on memory, so it's not like there are
> processes competing for resources.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm on mixed architecture hosts

2024-03-26 Thread Iain Stott
O thanks John, will give it a try and report back

From: John Mulligan 
Sent: 26 March 2024 14:24
To: ceph-users@ceph.io 
Cc: Iain Stott 
Subject: Re: [ceph-users] Cephadm on mixed architecture hosts

CAUTION: This email originates from outside THG

On Tuesday, March 26, 2024 7:22:18 AM EDT Iain Stott wrote:
> Hi,
>
> We are trying to deploy Ceph Reef 18.2.1 using cephadm on mixed architecture
> hosts using x86_64 for the mons and aarch64 for the OSDs.
>
> During deployment we use the following config for the bootstrap process,
> where $REPOSITORY is our docker repo.
>
> [global]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> [mgr]
> mgr/cephadm/container_image_base = $REPOSITORY/ceph/ceph:v18.2.1
> mgr/cephadm/container_image_prometheus = $REPOSITORY/ceph/prometheus:v2.33.4
> mgr/cephadm/container_image_node_exporter =
> $REPOSITORY/ceph/node-exporter:v1.3.1 mgr/cephadm/container_image_grafana =
> $REPOSITORY/ceph/ceph-grafana:8.3.5
> mgr/cephadm/container_image_alertmanager =
> $REPOSITORY/ceph/alertmanager:v0.23.0 [osd]
> container_image = $REPOSITORY/ceph/ceph:v18.2.1
> Once  the bootstrap process is complete, if we do a ceph config dump, the
> container image for global and osd changes from version tag to sha
> reference, meaning that when deploying the containers on the OSDs they try
> using the amd64 container image and not the aarch64 image and fail
> deployment.
>
> Is there a config setting we are missing or a workaround for this?
>

Try:
`ceph config set mgr mgr/cephadm/use_repo_digest false`

This comes up often enough that we should document it. I don't think that
option is documented  right now because I searched for the option with google
and all that came up were other tracker issues and older mailing-list posts.
In the longer term we may want to make cephadm arch-aware (volunteers welcome
:-) ).





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stretch mode item not defined

2024-03-26 Thread ronny.lippold

hi anthony ... many thanks for that.

i did not understood, why the docu missed that part ... anyway

i checked the login and the mail adress c...@spark5.de should be right 
:/



one last question ... we have to server rooms. do you think, strecht 
mode is the right way?

do you have any other ideas about disaster recovery?

many thanks ... ronny

Am 2024-03-26 15:14, schrieb Anthony D'Atri:

Yes, you will need to create datacenter buckets and move your host
buckets under them.



On Mar 26, 2024, at 09:18, ronny.lippold  wrote:

hi there, need some help please.

we are planning to replace our rbd-mirror setup and go to stretch 
mode.

the goal is, to have the cluster in 2 fire compartment server rooms.

start was a default proxmox/ceph setup.
now, i followed the howto from: 
https://docs.ceph.com/en/latest/rados/operations/stretch-mode/


with the crushmap i ended in an error:
in rule 'stretch_rule' item 'dc1' not defined (dc1,2,3 is like 
site1,2,3 in the docu).


my osd tree looks like
pve-test02-01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME   STATUS  REWEIGHT  PRI-AFF
-1 3.80951  root default
-15 0.06349  host pve-test01-01
 0ssd  0.06349  osd.0   up   1.0  1.0
-19 0.06349  host pve-test01-02
 1ssd  0.06349  osd.1   up   1.0  1.0
-17 0.06349  host pve-test01-03
 2ssd  0.06349  osd.2   up   1.0  1.0
...

i think there is something missing.
should i need adding the buckets manually?
...

for disaster failover, any ideas are welcome.


man thanks for help and kind regards,
ronny
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Call for Interest: Managed SMB Protocol Support

2024-03-26 Thread David Yang
This is great, we are currently using the smb protocol heavily to
export kernel-mounted cephfs.
But I encountered a problem. When there are many smb clients
enumerating or listing the same directory, the smb server will
experience high load, and the smb process will become D state.
This problem has been going on for some time and no suitable solution
has been found yet.

John Mulligan  于2024年3月26日周二 03:43写道:
>
> On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote:
> > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan
> >
> >  wrote:
> > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote:
> > > > Hi John,
> > > >
> > > > > A few major features we have planned include:
> > > > > * Standalone servers (internally defined users/groups)
> > > >
> > > > No concerns here
> > > >
> > > > > * Active Directory Domain Member Servers
> > > >
> > > > In the second case, what is the plan regarding UID mapping? Is NFS
> > > > coexistence planned, or a concurrent mount of the same directory using
> > > > CephFS directly?
> > >
> > > In the immediate future the plan is to have a very simple, fairly
> > > "opinionated" idmapping scheme based on the autorid backend.
> >
> > OK, the docs for clustered SAMBA do mention the autorid backend in
> > examples. It's a shame that the manual page does not explicitly list
> > it as compatible with clustered setups.
> >
> > However, please consider that the majority of Linux distributions
> > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to
> > join AD domains by default (where "default" means a pointy-clicky way
> > in a workstation setup), which uses SSSD, and therefore, by this
> > opinionated choice of the autorid backend, you create mappings that
> > disagree with the supposed majority and the default. This will create
> > problems in the future when you do consider NFS coexistence.
> >
>
> Thanks, I'll keep that in mind.
>
> > Well, it's a different topic that most organizations that I have seen
> > seem to ignore this default. Maybe those that don't have any problems
> > don't have any reason to talk to me? I think that more research is
> > needed here on whether RedHat's and GNOME's push of SSSD is something
> > not-ready or indeed the de-facto standard setup.
> >
>
> I think it's a bit of a mix, but am not sure either.
>
>
> > Even if you don't want to use SSSD, providing an option to provision a
> > few domains with idmap rid backend with statically configured ranges
> > (as an override to autorid) would be a good step forward, as this can
> > be made compatible with the default RedHat setup.
>
> That's reasonable. Thanks for the suggestion.
>
>
> >
> > > Sharing the same directories over both NFS and SMB at the same time, also
> > > known as "multi-protocol", is not planned for now, however we're all aware
> > > that there's often a demand for this feature and we're aware of the
> > > complexity it brings. I expect we'll work on that at some point but not
> > > initially. Similarly, sharing the same directories over a SMB share and
> > > directly on a cephfs mount won't be blocked but we won't recommend it.
> >
> > OK. Feature request: in the case if there are several CephFS
> > filesystems, support configuration of which one to serve.
> >
>
> Putting it on the list.
>
> > > > In fact, I am quite skeptical, because, at least in my experience,
> > > > every customer's SAMBA configuration as a domain member is a unique
> > > > snowflake, and cephadm would need an ability to specify arbitrary UID
> > > > mapping configuration to match what the customer uses elsewhere - and
> > > > the match must be precise.
> > >
> > > I agree - our initial use case is something along the lines:
> > > Users of a Ceph Cluster that have Windows systems, Mac systems, or
> > > appliances that are joined to an existing AD
> > > but are not currently interoperating with the Ceph cluster.
> > >
> > > I expect to add some idpapping configuration and agility down the line,
> > > especially supporting some form of rfc2307 idmapping (where unix IDs are
> > > stored in AD).
> >
> > Yes, for whatever reason, people do this, even though it is cumbersome
> > to manage.
> >
> > > But those who already have idmapping schemes and samba accessing ceph will
> > > probably need to just continue using the existing setups as we don't have
> > > an immediate plan for migrating those users.
> > >
> > > > Here is what I have seen or was told about:
> > > >
> > > > 1. We don't care about interoperability with NFS or CephFS, so we just
> > > > let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2"
> > > > idmap backend. It's completely OK that workstations get different UIDs
> > > > and GIDs, as only SIDs traverse the wire.
> > >
> > > This is pretty close to our initial plan but I'm not clear why you'd think
> > > that "workstations get different UIDs and GIDs". For all systems acessing
> > > the (same) ceph cluster the id mapping should be consistent.
> > >

[ceph-users] Re: CephFS space usage

2024-03-26 Thread Thorne Lawler

Hi everyone!

Just thought I would let everyone know: The issue appears to have been 
the Ceph NFS service associated with the filesystem.


I removed all the files, waited a while, disconnected all the clients, 
waited a while, then deleted the NFS shares - the disk space and objects 
abruptly began freeing up.


I'm sorry that I can't contribute any more useful diagnostic 
information, but maybe this is the extra bit of data that crystallizes 
someone's theory about the issue.


On 21/03/2024 10:33 am, Anthony D'Atri wrote:

Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them 
behind.


On Mar 20, 2024, at 5:28 PM, Igor Fedotov wrote:

Hi Thorne,

unfortunately I'm unaware of any tools high level enough to easily map files to rados 
objects without deep undestanding how this works. You might want to try "rados 
ls" command to get the list of all the objects in the cephfs data pool. And then  
learn how that mapping is performed and parse your listing.


Thanks,

Igor


On 3/20/2024 1:30 AM, Thorne Lawler wrote:

Igor,

Those files are VM disk images, and they're under constant heavy use, so yes- 
there/is/ constant severe write load against this disk.

Apart from writing more test files into the filesystems, there must be Ceph 
diagnostic tools to describe what those objects are being used for, surely?

We're talking about an extra 10TB of space. How hard can it be to determine 
which file those objects are associated with?


On 19/03/2024 8:39 pm, Igor Fedotov wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have severe 
write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to your 
experiment. At peak I can see bytes_used increase = 629,461,893,120 bytes 
(45978612027392  - 45349150134272). With replica factor = 3 this roughly 
matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880 bytes 
delta (=45349569585152 - 45349150134272). I could see two options (apart that 
someone else wrote additional stuff to CephFS during the experiment) to explain 
this:

1. File removal wasn't completed at the last probe half an hour after file's 
removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for your 
issue if huge(?) files at CephFS are created/removed periodically. So if we're 
certain that the leak really occurred (and option 1. above isn't the case) it 
makes sense to run more experiments with writing/removing a bunch of huge files 
to the volume to confirm space leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:

Thanks Igor,

I have tried that, and the number of objects and bytes_used took a long time to 
drop, but they seem to have dropped back to almost the original level:

  * Before creating the file:
  o 3885835 objects
  o 45349150134272 bytes_used
  * After creating the file:
  o 3931663 objects
  o 45924147249152 bytes_used
  * Immediately after deleting the file:
  o 3935995 objects
  o 45978612027392 bytes_used
  * Half an hour after deleting the file:
  o 3886013 objects
  o 45349569585152 bytes_used

Unfortunately, this is all production infrastructure, so there is always other 
activity taking place.

What tools are there to visually inspect the object map and see how it relates 
to the filesystem?


Not sure if there is anything like that at CephFS level but you can use rados 
tool to view objects in cephfs data pool and try to build some mapping between 
them and CephFS file list. Could be a bit tricky though.

On 15/03/2024 7:18 pm, Igor Fedotov wrote:

ceph df detail --format json-pretty

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any 
attached files may be confidential information, and may also be the subject of 
legal professional privilege. _If you are not the intended recipient any use, 
disclosure or copying of this email is unauthorised. _If you received this 
email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 
6868 to report this matter and delete all copies of this transmission together 
with any attachments. /


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and