[ceph-users] Clients failing to respond to capability release

2023-09-19 Thread Tim Bishop
Hi,

I've seen this issue mentioned in the past, but with older releases. So
I'm wondering if anybody has any pointers.

The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
clients are working fine, with the exception of our backup server. This
is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
(so I suspect a newer Ceph version?).

The backup server has multiple (12) CephFS mount points. One of them,
the busiest, regularly causes this error on the cluster:

HEALTH_WARN 1 clients failing to respond to capability release
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability 
release
mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing to 
respond to capability release client_id: 521306112

And occasionally, which may be unrelated, but occurs at the same time:

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs

The second one clears itself, but the first sticks until I can unmount
the filesystem on the client after the backup completes.

It appears that whilst it's in this stuck state there may be one or more
directory trees that are inaccessible to all clients. The backup server
is walking the whole tree but never gets stuck itself, so either the
inaccessible directory entry is caused after it has gone past, or it's
not affected. Maybe the backup server is holding a directory when it
shouldn't?

It may be that an upgrade to Quincy resolves this, since it's more
likely to be inline with the kernel client version wise, but I don't
want to knee-jerk upgrade just to try and fix this problem.

Thanks for any advice.

Tim.

[1] The reason for the newer kernel is that the backup performance from
CephFS was terrible with older kernels. This newer kernel does at least
resolve that issue.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Clients failing to respond to capability release

2023-09-20 Thread Tim Bishop
Hi Stefan,

On Wed, Sep 20, 2023 at 11:00:12AM +0200, Stefan Kooman wrote:
> On 19-09-2023 13:35, Tim Bishop wrote:
> > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
> > clients are working fine, with the exception of our backup server. This
> > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
> > (so I suspect a newer Ceph version?).
> > 
> > The backup server has multiple (12) CephFS mount points. One of them,
> > the busiest, regularly causes this error on the cluster:
> > 
> > HEALTH_WARN 1 clients failing to respond to capability release
> > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability 
> > release
> >  mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing 
> > to respond to capability release client_id: 521306112
> > 
> > And occasionally, which may be unrelated, but occurs at the same time:
> > 
> > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >  mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs
> > 
> > The second one clears itself, but the first sticks until I can unmount
> > the filesystem on the client after the backup completes.
> 
> You are not alone. We also have a backup server running 22.04 and 6.2 and
> occasionally hit this issue. We hit this with mainly 5.12.19 clients and a
> 6.2 backup server. We're on 16.2.11.
> 
> 
> Sidenote:
> 
> For those of you who are wondering: why would you want to use latest
> (greatest?) linux kernel for CephFS ... this is why. To try to get rid of 1)
> slow requests because of some deadlock / locking issue, clients failing to
> capability release, and 3) bug fixes / improvements (thx devs!).
> 
> 
> Questions:
> 
> Do you have the filesystem read only mounted and given the backup server
> CephFS client read only caps on the MDS?

Yes, mounted read-only and the caps for the client are read-only for the
MDS.

I do have multiple mounts from the same CephFS filesystem though, and
I've been wondering if that could be causing more parallel requests from
the backup server. I'd been thinking about doing it through a single
mount, but then all the paths change which doesn't make the backups
overly happy.

> Are you running a multiple active MDS setup?

No. We tried it for a while but after seeing some issues like this we
backtracked to a single active MDS to rule out multiple active being the
issue.

> > It appears that whilst it's in this stuck state there may be one or more
> > directory trees that are inaccessible to all clients. The backup server
> > is walking the whole tree but never gets stuck itself, so either the
> > inaccessible directory entry is caused after it has gone past, or it's
> > not affected. Maybe the backup server is holding a directory when it
> > shouldn't?
> 
> We have seen both cases, yet most of the time the backup server would not be
> able to make progress and be stuck on a file.

Interesting. Backups have never got stuck for us. Whilst we regularly,
pretty much daily, see the above mentioned error.

But because nothing we're directly running gets stuck I only find out if
a directory somewhere is inaccessible if a user reports it to us from
one of our other client machines, usually a HPC node.

> > It may be that an upgrade to Quincy resolves this, since it's more
> > likely to be inline with the kernel client version wise, but I don't
> > want to knee-jerk upgrade just to try and fix this problem.
> 
> We are testing with 6.5 kernel clients (see other recent threads about
> this). We have not seen this issue there (but time will tell, it does not
> happen *that* often, but hit other issues).
> 
> The MDS server itself is indeed older than the newer kernel clients. It
> might certainly be a factor. And that raises the question what kind of
> interoperability / compatibility tests (if any) are done between CephFS
> (kernel) clients and MDS server versions. This might be a good "focus topic"
> for a ceph User + Dev meeting ...
> 
> > Thanks for any advice.
> 
> You might want to try 6.5.x kernel on the clients. But might run into other
> issues. Not sure about that, these might be only relevant for one of our
> workloads, only one way to find out ...

I've been sticking with what's available in Ubuntu - the 6.2 kernel is
part of their HWE enablement stack, which is handy. It won't be long
until 23.10 is out with the 6.5 kernel though. I'll definitely give it a
try then.

> Enable debug logging on the MDS to gather logs that might shine some light
> on what is happening wit

[ceph-users] Re: Clients failing to respond to capability release

2023-10-12 Thread Tim Bishop
his automatically and
>now we have no incidents of this.
> 
> Yesterday for example, we had an incident of multiple MDS warning messages
> of: MDS_SLOW_REQUEST,  MDS_TRIM, MDS_CLIENT_RECALL, and
> MDS_CLIENT_LATE_RELEASE. This was caused by a non-responsive hard-drive
> leading to a build up of the MDS cache and trims being unable to be
> completed where we managed to narrow down the hard-drive for the inode which
> the blocked client was waiting for a rdlock on and restarted the OSD for
> that the drive. Notably this hard-drive had no errors with smartctl or
> elsewhere and only had the following slow ops message on OSD systemctl
> status/ log:
> 
>     osd.281 123892 get_health_metrics reporting 4 slow ops, oldest is
> osd_op(client.49576654.0:9101 3.d9bs0 3.ee6f2d9b (undecoded)
> ondisk+read+known_if_redirected e123889)
> 
> Whilst restarting the hard-drive that the client with the oldest blocked op
> was waiting for did "clear" this /sys/kernel/debug/ceph/*/osdc queue, the
> oldest blocked MDS op then became an "internal op fragmentdir:mds.0:1" one
> where restarting the active MDS cleared this. Alas, this resulted in another
> blocked getattr op at the flag point "failed to authpin, dir is being
> fragmented" which was similarly tackled by restarting the MDS that just took
> over. This finally resulted in only two clients failing to respond to caps
> releases on inodes they were holding (despite rebooting at the time) where
> performing a "ceph tell mds.N session kill CLIENT_ID" removed them from the
> session map and allow the MDS' cache to become manageable again, thereby
> clearing all of these warning messages.
> 
> We've had this problem since the beginning of this year and upgrading from
> octopus to quincy has unfortunately not solved our problem. We've only
> really been able to solve this problem by undergoing an aggressive campaign
> of replacing hard-drives which were reaching the end of their lives. This
> has substantially reduced the amount of problems we've had in relation to
> this.
> 
> We would be very interested to hear about the rest of the community's
> experience in relation to this and I would recommend looking at your
> underlying OSDs Tim to see whether there are any timeout or uncorrectable
> errors. We would also be very eager to hear if these approaches are
> sub-optimal and whether anyone else has any insight into our problems. Sorry
> as well for resurrecting an old thread but we thought our experiences may be
> helpfully for others!
> 
> Kindest regards,
> 
> Ivan Clayson
> 
> On 19/09/2023 12:35, Tim Bishop wrote:
> > Hi,
> > 
> > I've seen this issue mentioned in the past, but with older releases. So
> > I'm wondering if anybody has any pointers.
> > 
> > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
> > clients are working fine, with the exception of our backup server. This
> > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
> > (so I suspect a newer Ceph version?).
> > 
> > The backup server has multiple (12) CephFS mount points. One of them,
> > the busiest, regularly causes this error on the cluster:
> > 
> > HEALTH_WARN 1 clients failing to respond to capability release
> > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability 
> > release
> >  mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing 
> > to respond to capability release client_id: 521306112
> > 
> > And occasionally, which may be unrelated, but occurs at the same time:
> > 
> > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >  mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs
> > 
> > The second one clears itself, but the first sticks until I can unmount
> > the filesystem on the client after the backup completes.
> > 
> > It appears that whilst it's in this stuck state there may be one or more
> > directory trees that are inaccessible to all clients. The backup server
> > is walking the whole tree but never gets stuck itself, so either the
> > inaccessible directory entry is caused after it has gone past, or it's
> > not affected. Maybe the backup server is holding a directory when it
> > shouldn't?
> > 
> > It may be that an upgrade to Quincy resolves this, since it's more
> > likely to be inline with the kernel client version wise, but I don't
> > want to knee-jerk upgrade just to try and fix this problem.
> > 
> > Thanks for any advice.
> > 
> > Tim.
> > 
> &g

[ceph-users] Advice on balancing data across OSDs

2022-10-24 Thread Tim Bishop
Hi all,

ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)

We're having an issue with the spread of data across our OSDs. We have
108 OSDs in our cluster, all identical disk size, same number in each
server, and the same number of servers in each rack. So I'd hoped we'd
end up with a pretty balanced distribution of data across the disks.
However, the fullest is at 85% full and the most empty is at 40% full.

I've included the osd df output below, along with pool and crush rules.

I've also looked at the reweight-by-utilization command which would
apparently help:

# ceph osd test-reweight-by-utilization
moved 16 / 5715 (0.279965%)
avg 52.9167
stddev 7.20998 -> 7.15325 (expected baseline 7.24063)
min osd.45 with 31 -> 31 pgs (0.585827 -> 0.585827 * mean)
max osd.22 with 70 -> 68 pgs (1.32283 -> 1.28504 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.6229
overload_utilization 0.7474
osd.22 weight 1. -> 0.9500
osd.23 weight 1. -> 0.9500
osd.53 weight 1. -> 0.9500
osd.78 weight 1. -> 0.9500
no change

But I'd like to make sure I understand why the problem is occuring
first so I can rule out a configuration issue, since it feels like the
cluster shouldn't be getting in to this state in the first place.

I have some suspicions that the number of PGs may be a bit low on some
pools, but autoscale-status is set to "on" or "warn" for every pool, so
it's happy with the current numbers. Does it play nice with CephFS?

Thanks for any advice.
Tim.

ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META 
AVAIL %USE   VAR   PGS  STATUS
 22hdd  3.63199   1.0  3.6 TiB  3.1 TiB  3.1 TiB  450 MiB  7.6 GiB   
557 GiB  85.04  1.37   70  up
 23hdd  3.63199   1.0  3.6 TiB  2.9 TiB  2.9 TiB  459 MiB  7.5 GiB   
759 GiB  79.64  1.28   64  up
 53hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  703 MiB  8.0 GiB   
823 GiB  77.91  1.25   66  up
 78hdd  3.63799   1.0  3.6 TiB  2.8 TiB  2.8 TiB  187 MiB  5.9 GiB   
851 GiB  77.15  1.24   61  up
 26hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  432 MiB  7.7 GiB   
854 GiB  77.07  1.24   61  up
 39hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  503 MiB  7.2 GiB   
874 GiB  76.55  1.23   65  up
 42hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.7 TiB  439 MiB  6.6 GiB   
909 GiB  75.59  1.21   60  up
101hdd  3.63820   1.0  3.6 TiB  2.7 TiB  2.7 TiB  306 MiB  7.1 GiB   
913 GiB  75.50  1.21   61  up
 87hdd  3.63820   1.0  3.6 TiB  2.7 TiB  2.7 TiB  539 MiB  7.5 GiB   
921 GiB  75.27  1.21   61  up
 59hdd  3.63799   1.0  3.6 TiB  2.7 TiB  2.7 TiB  721 MiB  7.9 GiB   
957 GiB  74.30  1.19   64  up
 79hdd  3.63799   1.0  3.6 TiB  2.7 TiB  2.7 TiB  950 MiB  9.0 GiB   
970 GiB  73.95  1.19   58  up
 34hdd  3.63199   1.0  3.6 TiB  2.7 TiB  2.7 TiB  202 MiB  6.0 GiB   
974 GiB  73.85  1.19   57  up
 60hdd  3.63799   1.0  3.6 TiB  2.7 TiB  2.6 TiB  668 MiB  7.2 GiB  
1009 GiB  72.91  1.17   59  up
 18hdd  3.63199   1.0  3.6 TiB  2.6 TiB  2.6 TiB  453 MiB  6.5 GiB  
1021 GiB  72.59  1.17   60  up
 74hdd  3.63799   1.0  3.6 TiB  2.6 TiB  2.6 TiB  693 MiB  7.5 GiB   
1.0 TiB  72.12  1.16   62  up
 19hdd  3.63199   1.0  3.6 TiB  2.6 TiB  2.6 TiB  655 MiB  7.9 GiB   
1.0 TiB  71.71  1.15   63  up
 69hdd  3.63799   1.0  3.6 TiB  2.6 TiB  2.6 TiB  445 MiB  6.2 GiB   
1.0 TiB  71.70  1.15   65  up
 43hdd  3.63199   1.0  3.6 TiB  2.6 TiB  2.6 TiB  170 MiB  4.7 GiB   
1.0 TiB  71.62  1.15   63  up
 97hdd  3.63820   1.0  3.6 TiB  2.6 TiB  2.6 TiB  276 MiB  5.7 GiB   
1.0 TiB  71.33  1.15   66  up
 67hdd  3.63799   1.0  3.6 TiB  2.6 TiB  2.6 TiB  430 MiB  6.3 GiB   
1.0 TiB  71.22  1.14   54  up
 68hdd  3.63799   1.0  3.6 TiB  2.6 TiB  2.6 TiB  419 MiB  6.6 GiB   
1.1 TiB  70.68  1.13   58  up
 31hdd  3.63199   1.0  3.6 TiB  2.6 TiB  2.5 TiB  419 MiB  5.2 GiB   
1.1 TiB  70.16  1.13   63  up
 48hdd  3.63199   1.0  3.6 TiB  2.6 TiB  2.5 TiB  211 MiB  5.0 GiB   
1.1 TiB  70.13  1.13   56  up
 73hdd  3.63799   1.0  3.6 TiB  2.5 TiB  2.5 TiB  765 MiB  7.1 GiB   
1.1 TiB  69.52  1.12   57  up
 98hdd  3.63820   1.0  3.6 TiB  2.5 TiB  2.5 TiB  552 MiB  7.1 GiB   
1.1 TiB  68.72  1.10   60  up
 58hdd  3.63799   1.0  3.6 TiB  2.5 TiB  2.5 TiB  427 MiB  6.3 GiB   
1.2 TiB  68.39  1.10   53  up
 14hdd  3.63199   1.0  3.6 TiB  2.5 TiB  2.5 TiB  409 MiB  6.0 GiB   
1.2 TiB  68.06  1.09   65  up
 47hdd  3.63199   1.0  3.6 TiB  2.5 TiB  2.5 TiB  166 MiB  5.5 GiB   
1.2 TiB  67.84  1.09   55  up
  9hdd  3.63199   1.0  3.6 TiB  2.5 TiB  2.5 TiB  419 MiB  5.9 GiB   
1.2 TiB  67.78  1.09   58  up
 90hdd  3.63820   1.0  3.6 TiB  2.5 TiB  2.5 TiB  277 MiB  6.3 GiB   
1.2 TiB  67.56  1.08   57   

[ceph-users] Re: Advice on balancing data across OSDs

2022-10-24 Thread Tim Bishop
Hi Josh,

On Mon, Oct 24, 2022 at 07:20:46AM -0600, Josh Baergen wrote:
> > I've included the osd df output below, along with pool and crush rules.
> 
> Looking at these, the balancer module should be taking care of this
> imbalance automatically. What does "ceph balancer status" say?

# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.038795",
"last_optimize_started": "Mon Oct 24 15:35:43 2022",
"mode": "upmap",
"optimize_result": "Optimization plan created successfully",
"plans": []
}

Looks healthy?

This cluster is on pacific but has been upgraded through numerous
previous releases, so it is possible some settings have been inherited
and are not the same defaults as a new cluster.

Tim.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Advice on balancing data across OSDs

2022-10-24 Thread Tim Bishop
Hi Joseph,

Here's some of the larger pools. Notable the largest (pool 51, 32 TiB
CephFS data) doesn't have the highest number of PGs.

POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
pool28 28  256  9.9 TiB2.61M   30 TiB  43.28 13 TiB
pool29 29  256  9.5 TiB2.48M   28 TiB  42.13 13 TiB
pool36 36  128  6.0 TiB1.58M   18 TiB  31.67 13 TiB
pool39 39  256   20 TiB5.20M   60 TiB  60.37 13 TiB
pool43 43   32  1.9 TiB  503.92k  5.7 TiB  12.79 13 TiB
pool46 46   32  1.3 TiB  236.34k  3.9 TiB   9.14 13 TiB
pool47 47  128  4.0 TiB1.04M   12 TiB  23.35 13 TiB
pool51 51  128   32 TiB   32.47M   55 TiB  58.30 26 TiB
pool53 53  128  3.3 TiB  864.88k  9.9 TiB  20.21 13 TiB
pool57 57  128   14 TiB3.55M   21 TiB  34.80 26 TiB

It does sounds like I need to increase that, but I had assume the
autoscaler would have produced a warning if that was the case... it
certainly has for some pools in the past, and I've adjusted as per its
recommendation.

Tim.

On Mon, Oct 24, 2022 at 09:24:58AM -0400, Joseph Mundackal wrote:
> Hi Tim,
> You might want to check you pool utilization and see if there are
> enough pg's in that pool. Higher GB per pg can result in this scenario.
> 
> I am also assuming that you have the balancer module turn on (ceph balancer
> status) should tell you that as well.
> 
> If you have enough pgs in the bigger pools and the balancer module is on,
> you shouldht have to manually reweight osd's.
> 
> -Joseph
> 
> On Mon, Oct 24, 2022 at 9:13 AM Tim Bishop  wrote:
> 
> > Hi all,
> >
> > ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
> > (stable)
> >
> > We're having an issue with the spread of data across our OSDs. We have
> > 108 OSDs in our cluster, all identical disk size, same number in each
> > server, and the same number of servers in each rack. So I'd hoped we'd
> > end up with a pretty balanced distribution of data across the disks.
> > However, the fullest is at 85% full and the most empty is at 40% full.
> >
> > I've included the osd df output below, along with pool and crush rules.
> >
> > I've also looked at the reweight-by-utilization command which would
> > apparently help:
> >
> > # ceph osd test-reweight-by-utilization
> > moved 16 / 5715 (0.279965%)
> > avg 52.9167
> > stddev 7.20998 -> 7.15325 (expected baseline 7.24063)
> > min osd.45 with 31 -> 31 pgs (0.585827 -> 0.585827 * mean)
> > max osd.22 with 70 -> 68 pgs (1.32283 -> 1.28504 * mean)
> >
> > oload 120
> > max_change 0.05
> > max_change_osds 4
> > average_utilization 0.6229
> > overload_utilization 0.7474
> > osd.22 weight 1. -> 0.9500
> > osd.23 weight 1. -> 0.9500
> > osd.53 weight 1. -> 0.9500
> > osd.78 weight 1. -> 0.9500
> > no change
> >
> > But I'd like to make sure I understand why the problem is occuring
> > first so I can rule out a configuration issue, since it feels like the
> > cluster shouldn't be getting in to this state in the first place.
> >
> > I have some suspicions that the number of PGs may be a bit low on some
> > pools, but autoscale-status is set to "on" or "warn" for every pool, so
> > it's happy with the current numbers. Does it play nice with CephFS?
> >
> > Thanks for any advice.
> > Tim.
> >
> > ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
> >  AVAIL %USE   VAR   PGS  STATUS
> >  22hdd  3.63199   1.0  3.6 TiB  3.1 TiB  3.1 TiB  450 MiB  7.6
> > GiB   557 GiB  85.04  1.37   70  up
> >  23hdd  3.63199   1.0  3.6 TiB  2.9 TiB  2.9 TiB  459 MiB  7.5
> > GiB   759 GiB  79.64  1.28   64  up
> >  53hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  703 MiB  8.0
> > GiB   823 GiB  77.91  1.25   66  up
> >  78hdd  3.63799   1.0  3.6 TiB  2.8 TiB  2.8 TiB  187 MiB  5.9
> > GiB   851 GiB  77.15  1.24   61  up
> >  26hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  432 MiB  7.7
> > GiB   854 GiB  77.07  1.24   61  up
> >  39hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.8 TiB  503 MiB  7.2
> > GiB   874 GiB  76.55  1.23   65  up
> >  42hdd  3.63199   1.0  3.6 TiB  2.8 TiB  2.7 TiB  439 MiB  6.6
> > GiB   909 GiB  75.59  1.21   60  up
> > 101hdd  3.63820   1.0  3.6 TiB  2.7 TiB  2.7 TiB  306 MiB  7.1
&

[ceph-users] Re: telemetry.ceph.com certificate expired

2020-04-15 Thread Tim Bishop
Yup, us too. I filed a bug report earlier:

https://tracker.ceph.com/issues/45096

Hopefully it can be fixed soon!

Tim.

On Wed, Apr 15, 2020 at 02:27:51PM +0200, Eneko Lacunza wrote:
> Hi all,
> 
> We're receiving a certificate error for telemetry module:
> Module 'telemetry' has failed: 
> HTTPSConnectionPool(host='telemetry.ceph.com', port=443): Max retries 
> exceeded with url: /report (Caused by SSLError(SSLError("bad handshake: 
> Error([('SSL routines', 'tls_process_server_certificate', 'certificate 
> verify failed')],​)",​),​));
> 
> Seems certificate expired yesterday (14th april).
> 
> Cheers
> Eneko

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io