[ceph-users] CephFS mount error after successful attach of volume

Martin Reid Fri, 17 Oct 2025 16:42:34 -0700

Hey everyone,

I've got a difficult problem with my CephFS that I haven't been able tomake any headway with. Maybe you guys can help?



 The problem

I’m trying to provision a volume on a CephFS, using a Ceph clusterinstalled on Kubernetes (K3s) using Rook, but I’m running into thefollowing error (from the Events in |kubectl describe|:

|Events: Type Reason Age From Message ---- ------ ---- ---- -------Normal Scheduled 4m24s default-scheduler Successfully assignedarchie/ceph-loader-7989b64fb5-m8ph6 to archie NormalSuccessfulAttachVolume 4m24s attachdetach-controller AttachVolume.Attachsucceeded for volume "pvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267" WarningFailedMount 3m18s kubelet MountVolume.MountDevice failed for volume"pvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267" : rpc error: code = Internaldesc = an error (exit status 32) occurred while running mount args: [-tcephcsi-cephfs-nod...@039a3dba-d55c-476f-90f0-8783a18338aa.main-ceph-fs=/volumes/csi/csi-vol-25d616f5-918f-4e15-bfd6-55b866f9aa9f/4bda56a4-5088-451c-90c8-baa83317d5a5/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/3e10b46e93bcc2c4d3d1b343af01ee628c736ffee7e562e99d478bc397dab10d/globalmount-omon_addr=10.43.233.111:3300/10.43.237.205:3300/10.43.39.81:3300,secretfile=/tmp/csi/keys/keyfile-2996214224,_netdev]stderr: mount error: no mds (Metadata Server) is up. The cluster mightbe laggy, or you may not be authorized|

I’m kind of new to K8s, and /very/ new to Ceph, so I would love someadvice on how to go about debugging this mess.



 General context

*Kubernetes distribution*: K3s

*Kubernetes version(s)*: v1.33.4+k3s1 (master), v1.32.7+k3s1 (workers)

*Ceph*: installed via Rook

*Nodes*: 3

*OS*: Linux (Arch on master, NixOS on workers)


 What I’ve checked/tried

*Note*: Since this is a Rook deployment of Ceph (on Kubernetes), allthese checks are performed in the Rook Toolbox<https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/>container.



   MDS status / Ceph cluster health

Even I know this is the first go-to when your Ceph cluster is giving youissues. I have the Rook toolbox<https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/>running on my K8s cluster, so I went into the toolbox pod and ran:

|$ ceph status cluster: id: 039a3dba-d55c-476f-90f0-8783a18338aa health:HEALTH_OK|

services: mon: 3 daemons, quorum a,c,b (age 7d) mgr: b(active, since7d), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up(since 7d), 3 in (since 2w)

data: volumes: 1/1 healthy pools: 4 pools, 81 pgs objects: 47 objects,3.2 MiB usage: 139 MiB used, 502 GiB / 502 GiB avail pgs: 81 active+clean


io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

Since the error we started out with |mount error: no mds (MetadataServer) is up|, I checked the |ceph status| output above for the statusof the metadata server. As you can see, all the MDS instances are running.



   CephFS Status

|$ ceph fs status main-ceph-fs - 0 clients ============ RANK STATE MDSACTIVITY DNS INOS DIRS CAPS 0 active main-ceph-fs-a Reqs: 0 /s 143 38 370 0-s standby-replay main-ceph-fs-b Evts: 0 /s 159 30 29 0 POOL TYPEUSED AVAIL main-ceph-fs-metadata metadata 4176k 158Gmain-ceph-fs-replicated data 0 158G main-ceph-fs-main-ceph-fs-replicateddata 0 158G STANDBY MDS main-ceph-fs-d main-ceph-fs-c MDS version: cephversion 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable)|



   Ceph authorizations for MDS

Since the other part of the error indicated that I might not beauthorized, I wanted to check what the authorizations were:

|$ ceph auth ls mds.main-ceph-fs-a # main MDS for my CephFS instancekey: <base64 key> caps: [mds] allow caps: [mon] allow profile mds caps:[osd] allow * mds.main-ceph-fs-b # standby MDS for my CephFS instancekey: <different base64 key> caps: [mds] allow caps: [mon] allow profilemds caps: [osd] allow * ... client.csi-cephfs-node.1 # the clientmentioned in the error message key: <another base64 key> caps: [mds]allow rw caps: [mgr] allow rw caps: [mon] allow r caps: [osd] allow rwxtag cephfs metadata=*, allow rw tag cephfs data=* ... # more after this|


Note: |main-ceph-fs| is the name I gave my CephFS file system.

It looks like this should be okay, but I’m not sure. Definitely open tosome more insight here.



   PersistentVolumeClaim binding

I checked to make sure the PersistentVolume was provisioned successfullyfrom the PersistentVolumeClaim, and that it bound appropriately:

|$ kubectl get pvc -n archie jellyfin-ceph-pvc NAME STATUS VOLUMECAPACITY jellyfin-ceph-pvc Boundpvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267 180Gi |



   Changing the PVC size to something smaller

I tried changing the PVC’s size from 180GB to 1GB, since I thought itmight be a free space issue,but the error persisted.



   Turning off firewalls

Turned off all firewall to see if it was that, and still no luck.


 I’m not quite sure where to go from here.

What am I missing? What context should I add? What should I try? Whatshould I check?



Thank you so much in advance,

Martin
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] CephFS mount error after successful attach of volume

Reply via email to