[ceph-users] Re: cephadm and remoto package

2023-06-26 Thread Florian Haas

Hi Shashi,

I just ran into this myself, and I thought I'd share the 
solution/workaround that I applied.


On 15/05/2023 22:08, Shashi Dahal wrote:

Hi,
I followed this documentation:

https://docs.ceph.com/en/pacific/cephadm/adoption/

This is the error I get when trying to enable cephadm.

ceph mgr module enable cephadm

Error ENOENT: module 'cephadm' reports that it cannot run on the active
manager daemon: loading remoto library:No module named 'remoto' (pass
--force to force enablement)

When I import remoto, it imports just fine.


OS is ubuntu 20.04 focal



As far as I can see, this issue applies to non-containerized Ceph 
Pacific deployments — such as ones orchestrated with ceph-ansible — 
running on Debian or Ubuntu. There is no python3-remoto package on those 
platforms, so you can't install remoto by "regular" installation means 
(that is, apt/apt-get).


It looks to me like this issue was introduced in Pacific, and then went 
away in Quincy because that release dropped remoto and replaced it with 
asyncssh (for which a Debian/Ubuntu package does exist). If you start 
out on Octopus with ceph-ansible and do the Cephadm migration *then*, 
you're apparently fine too, and you can subsequently use Cephadm to 
upgrade to Pacific and Quincy. I think it's just this particular 
combination — (a) run on Debian/Ubuntu, (b) deploy non-containerized, 
*and* (c) start your deployment on Pacific, where Cephadm adoption breaks.


The problem has apparently been known for a while (see 
https://tracker.ceph.com/issues/43415), but the recommendation appears 
to have been "just run mgr on a different OS then", which is frequently 
not a viable option.


I tried (like you did, I assume) to just pip-install remoto, and if I 
opened a Python console and typed "import remoto" it imported just fine, 
but apparently the cephadm mgr module didn't like that.


I've now traced this down to the following line that shows up in the 
ceph-mgr log if you bump "debug mgr" to 10/10:


2023-06-26T10:01:34.799+ 7fb0979ba500 10 mgr[py] Computed sys.path 
'/usr/share/ceph/mgr:/local/lib/python3.8/dist-packages:/lib/python3/dist-packages:/lib/python3.8/dist-packages:lib/python38.zip:/lib/python3.8:/lib/python3.8/lib-dynload'


Note the /local/lib/python3.8/dist-packages path, which does not exist 
on Ubuntu Focal. It's properly /usr/local/lib/python3.8/dist-packages, 
and this is where "pip install", when run as root outside a virtualenv, 
installs packages to.


I think the incorrect sys.path may actually be a build or packaging bug 
in the community packages built for Debian/Ubuntu, but I'm not 100% certain.


At any rate, the combined workaround for this issue, for me, is:

(1) pip install remoto (this installs remoto into 
/usr/local/lib/python3.8/dist-packages)
(2) ln -s /usr/local/lib/python3.8/dist-packages 
/local/lib/python3.8/dist-packages (this makes pip-installed packages 
available to ceph-mgr)

(3) restart all ceph-mgr instances
(4) ceph mgr module enable cephadm

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: A change in Ceph leadership...

2021-10-18 Thread Florian Haas

On 15/10/2021 17:13, Josh Durgin wrote:

Thanks so much Sage, it's difficult to put into words how much you've
done over the years. You're always a beacon of the best aspects of open
source - kindness, wisdom, transparency, and authenticity. So many folks
have learned so much from you, and that's reflected in the vibrant Ceph
community around the world.

All the best in whatever you do in the future!
Josh


I wanted to write something very similar but Josh put it perfectly. 
Seconded. Thank you Sage!


Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bogus Entries in RGW Usage Log / Large omap object in rgw.log pool

2019-10-29 Thread Florian Haas
Hi David,

On 28/10/2019 20:44, David Monschein wrote:
> Hi All,
> 
> Running an object storage cluster, originally deployed with Nautilus
> 14.2.1 and now running 14.2.4.
> 
> Last week I was alerted to a new warning from my object storage cluster:
> 
> [root@ceph1 ~]# ceph health detail
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
>     1 large objects found in pool 'default.rgw.log'
>     Search the cluster log for 'Large omap object found' for more details.
> 
> I looked into this and found the object and pool in question
> (default.rgw.log):
> 
> [root@ceph1 /var/log/ceph]# grep -R -i 'Large omap object found' .
> ./ceph.log:2019-10-24 12:21:26.984802 osd.194 (osd.194) 715 : cluster
> [WRN] Large omap object found. Object: 5:0fbdcb32:usage::usage.17:head
> Key count: 702330 Size (bytes): 92881228
> 
> [root@ceph1 ~]# ceph --format=json pg ls-by-pool default.rgw.log | jq '.[]' | 
> egrep '(pgid|num_large_omap_objects)' | grep -v '"num_large_omap_objects": 
> 0,' | grep -B1 num_large_omap_objects
> "pgid": "5.70",
>   "num_large_omap_objects": 1,
> While I was investigating, I noticed an enormous amount of entries in
> the RGW usage log:
> 
> [root@ceph ~]# radosgw-admin usage show | grep -c bucket
> 223326
> [...]

I recently ran into a similar issue:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AQNGVY7VJ3K6ZGRSTX3E5XIY7DBNPDHW/

You have 702,330 keys on that omap object, so you would have been bitten
by the default for osd_deep_scrub_large_omap_object_key_threshold having
been revised down from 2,000,000 to 200,000 in 14.2.3:

https://github.com/ceph/ceph/commit/d8180c57ac9083f414a23fd393497b2784377735
https://tracker.ceph.com/issues/40583

That's why you didn't see this warning before your recent upgrade.

> There are entries for over 223k buckets! This was pretty scary to see,
> considering we only have maybe 500 legitimate buckets in this fairly new
> cluster. Almost all of the entries in the usage log are bogus entries
> from anonymous users. It looks like someone/something was scanning,
> looking for vulnerabilities, etc. Here are a few example entries, notice
> none of the operations were successful:

Caveat: whether or not you really *want* to trim the usage log is up to
you to decide. If you are suspecting you are dealing with a security
breach, you should definitely export and preserve the usage log before
you trim it, or else delay trimming until you have properly investigated
your problem.

*If* you decide you no longer need those usage log entries, you can use
"radosgw-admin usage trim" with appropriate --start-date, --end-date,
and/or --uid options, to clean them up:

https://docs.ceph.com/docs/nautilus/radosgw/admin/#trim-usage

Please let me know if that information is helpful. Thank you!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Static website hosting with RGW

2019-10-25 Thread Florian Haas
On 25/10/2019 02:38, Oliver Freyermuth wrote:
> Also, if there's an expert on this: Exposing a bucket under a tenant as 
> static website is not possible since the colon (:) can't be encoded in DNS, 
> right?

There are certainly much better-qualified radosgw experts than I am, but
as I understand it multi-tenanted radosgw is incompatible with bucket
hostnames in general (whether static websites are involved or not), for
the very reason you mention. It's documented here:

https://docs.ceph.com/docs/nautilus/radosgw/multitenancy/#accessing-buckets-with-explicit-tenants
(look for "Note that it’s not possible to supply an explicit tenant
using a hostname").

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-15 Thread Florian Haas
On 14/10/2019 22:57, Reed Dier wrote:
> I had something slightly similar to you.
> 
> However, my issue was specific/limited to the device_health_metrics pool
> that is auto-created with 1 PG when you turn that mgr feature on.
> 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56315.html

Thank you — yes that does look superficially similar, though in my case
it's an RGW pool. (Also, my sympathy on the OSD crashes; that must have
been quite the jolt.)

However, the similarities unfortunately end where the pg repair fixes
things for you. For me, the scrub error keeps coming back. It's quite odd.

Cheers,
Florian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 17:21, Dan van der Ster wrote:
>> I'd appreciate a link to more information if you have one, but a PG
>> autoscaling problem wouldn't really match with the issue already
>> appearing in pre-Nautilus releases. :)
> 
> https://github.com/ceph/ceph/pull/30479

Thanks! But no, this doesn't look like a likely culprit, for the reason
that we also saw this in Luminous and hence, *definitely* without splits
or merges in play.

Has anyone else seen these scrub false positives — if that's what they are?

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:29, Dan van der Ster wrote:
>> Hi Dan,
>>
>> what's in the log is (as far as I can see) consistent with the pg query
>> output:
>>
>> 2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
>> 10.10d scrub starts
>> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
>> 10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
>> 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
>> 0/0 manifest objects, 0/0 hit_set_archive bytes.
>> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
>> 10.10d scrub 1 errors
>>
>> Have you seen this before?
> 
> Yes occasionally we see stat mismatches -- repair always fixes
> definitively though.

Not here, sadly. That error keeps coming back, always in the same PG,
and only in that PG.

> Are you using PG autoscaling? There's a known issue there which
> generates stat mismatches.

I'd appreciate a link to more information if you have one, but a PG
autoscaling problem wouldn't really match with the issue already
appearing in pre-Nautilus releases. :)

Cheers,
Florian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:20, Dan van der Ster wrote:
> Hey Florian,
> 
> What does the ceph.log ERR or ceph-osd log show for this inconsistency?
> 
> -- Dan

Hi Dan,

what's in the log is (as far as I can see) consistent with the pg query
output:

2019-10-14 08:33:57.345 7f1808fb3700  0 log_channel(cluster) log [DBG] :
10.10d scrub starts
2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
10.10d scrub : stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty,
0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/11 bytes,
0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-10-14 08:33:57.345 7f1808fb3700 -1 log_channel(cluster) log [ERR] :
10.10d scrub 1 errors

Have you seen this before?

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
Hello,

I am running into an "interesting" issue with a PG that is being flagged
as inconsistent during scrub (causing the cluster to go to HEALTH_ERR),
but doesn't actually appear to contain any inconsistent objects.

$ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 10.10d is active+clean+inconsistent, acting [15,13]

$ rados list-inconsistent-obj 10.10d
{"epoch":12138,"inconsistents":[]}

"ceph pg query" (see below) on that PG does report num_scrub_errors=1,
num_shallow_scrub_errors=1, and num_objects_dirty=1. "osd scrub auto
repair = true" is set on all OSDs, but the PG never auto-repairs. (This
is a test cluster, the pool size is 2 — this may preclude auto repair
from ever kicking in; I'm not sure on that one.)

"ceph pg repair" does repair, but the issue reappears on the next
scheduled scrub.

This issue was first discovered while the cluster was on
Jewel/Filestore. In an event like this I would normally suspect either a
problem with an individual OSD, or a bug in the FileStore code. But the
cluster has had *all* of it's OSDs replaced since, as part of a full
Jewel→Luminous→Nautilus upgrade and a FileStore→BlueStore conversion.
The issue still persists.

A full "ceph pg 10.10d query" result is below. If anyone has ideas on
how to permanently fix this issue, I'd be most grateful.

Thanks!

Cheers,
Florian




{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 12143,
"up": [
15,
13
],
"acting": [
15,
13
],
"acting_recovery_backfill": [
"13",
"15"
],
"info": {
"pgid": "10.10d",
"last_update": "100'11",
"last_complete": "100'11",
"log_tail": "0'0",
"last_user_version": 11,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 45,
"epoch_pool_created": 45,
"last_epoch_started": 12139,
"last_interval_started": 12138,
"last_epoch_clean": 12139,
"last_interval_clean": 12138,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 12138,
"same_interval_since": 12138,
"same_primary_since": 12114,
"last_scrub": "100'11",
"last_scrub_stamp": "2019-10-14 08:33:57.347097",
"last_deep_scrub": "100'11",
"last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
"last_clean_scrub_stamp": "2019-10-11 14:09:29.016946"
},
"stats": {
"version": "100'11",
"reported_seq": "4927",
"reported_epoch": "12143",
"state": "active+clean+inconsistent",
"last_fresh": "2019-10-14 08:33:57.347147",
"last_change": "2019-10-14 08:33:57.347147",
"last_active": "2019-10-14 08:33:57.347147",
"last_peered": "2019-10-14 08:33:57.347147",
"last_clean": "2019-10-14 08:33:57.347147",
"last_became_active": "2019-10-11 14:44:09.312226",
"last_became_peered": "2019-10-11 14:44:09.312226",
"last_unstale": "2019-10-14 08:33:57.347147",
"last_undegraded": "2019-10-14 08:33:57.347147",
"last_fullsized": "2019-10-14 08:33:57.347147",
"mapping_epoch": 12138,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 45,
"last_epoch_clean": 12139,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "100'11",
"last_scrub_stamp": "2019-10-14 08:33:57.347097",
"last_deep_scrub": "100'11",
"last_deep_scrub_stamp": "2019-10-11 14:09:29.016946",
"last_clean_scrub_stamp": "2019-10-11 14:09:29.016946",
"log_size": 11,
"ondisk_log_size": 11,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": true,
"pin_stats_invalid": true,
"manifest_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 11,
"num_objects": 1,
"num_object_clones": 0,
"num_object_copies": 2,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 1,
"num_whiteouts": 0,
"num_read": 33,
"num_read_kb": 22,
"num_write": 11,

[ceph-users] Re: Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
On 09/10/2019 09:07, Florian Haas wrote:
> Also, is anyone aware of any adverse side effects of increasing these
> thresholds, and/or changing the usage log sharding settings, that I
> should keep in mind here?

Sorry, I should have checked the latest in the list archives; Paul
Emmerich has just recently commented here on the threshold setting:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037087.html

So that one looks OK to bump, but the question with about resharding the
usage log still stands. (The untrimmed usage log, in my case, would have
blasted the old 2M keys threshold, too.)

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
Hi,

I am currently dealing with a cluster that's been in use for 5 years and
during that time, has never had its radosgw usage log trimmed. Now that
the cluster has been upgraded to Nautilus (and has completed a full
deep-scrub), it is in a permanent state of HEALTH_WARN because of one
large omap object:

$ ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.usage'


As far as I can tell, there are two thresholds that can trigger that
warning:

* The default omap object size warning threshold,
osd_deep_scrub_large_omap_object_value_sum_threshold, is 1G.

* The default omap object key count warning threshold,
osd_deep_scrub_large_omap_object_key_threshold, is 20.


In this case, this was the original situation:

osd.6 [WRN] : Large omap object found. Object:
15:169282cd:::usage.20:head Key count: 5834118 Size (bytes): 917351868

So that's 5.8M keys (way above threshold) and 875 MiB total object size
(below threshold, but not by much).


The usage log in this case was no longer needed that far back, so I
trimmed it to keep only the entries from this year (radosgw-admin usage
trim --end-date 2018-12-31), a process that took upward of an hour.

After the trim (and a deep-scrub of the PG in question¹), my situation
looks like this:

osd.6 [WRN] Large omap object found. Object: 15:169282cd:::usage.20:head
Key count: 1185694 Size (bytes): 187061564

So both the key count and the total object size have diminished by about
80%, which is about what you expect when you trim 5 years of usage log
down to 1 year of usage log. However, my key count is still almost 6
times the threshold.


I am aware that I can silence the warning by increasing
osd_deep_scrub_large_omap_object_key_threshold by a factor of 10, but
that's not my question. My question is what I can do to prevent the
usage log from creating such large omap objects in the first place.

Now, there's something else that you should know about this radosgw,
which is that it is configured with the defaults for usage log sharding:

rgw_usage_max_shards = 32
rgw_usage_max_user_shards = 1

... and this cluster's radosgw is pretty much being used by a single
application user. So the fact that it's happy to shard the usage log 32
ways is irrelevant as long as it puts the usage log for one user all
into one shard.


So, I am assuming that if I bump rgw_usage_max_user_shards up to, say,
16 or 32, all *new* usage log entries will be sharded. But I am not
aware of any way to reshard the *existing* usage log. Is there such a
thing?

Otherwise, it seems like the only option in this situation would be to
clear the usage log altogether, and tweak the sharding knobs, which
should at least make the problem not reappear. Or, else, bump
osd_deep_scrub_large_omap_object_key_threshold and just live with the
large object.


Also, is anyone aware of any adverse side effects of increasing these
thresholds, and/or changing the usage log sharding settings, that I
should keep in mind here?

Thanks in advance for your thoughts.

Cheers,
Florian


¹For anyone reading this in the archives because they've run into the
same problem, and wondering how you find out which PGs in a pool have
too-large objects, here's a jq one-liner:

ceph --format=json pg ls-by-pool  \
  | jq '.pg_stats[]|select(.stat_sum.num_large_omap_objects>0)'
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-09-05 Thread Florian Haas
On 03/09/2019 18:42, Ilya Dryomov wrote:
> On Tue, Sep 3, 2019 at 6:29 PM Florian Haas  wrote:
>>
>> Hi,
>>
>> replying to my own message here in a shameless attempt to re-up this. I
>> really hope that the list archive can be resurrected in one way or
>> another...
> 
> Adding David, who managed the transition.
> 
> Thanks,
> 
> Ilya

It looks like the archives are available again at the original location.
Thank you, this will help a lot of people!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-09-03 Thread Florian Haas
Hi,

replying to my own message here in a shameless attempt to re-up this. I
really hope that the list archive can be resurrected in one way or
another...

Cheers,
Florian


On 29/08/2019 15:00, Florian Haas wrote:
> Hi,
> 
> is there any chance the list admins could copy the pipermail archive
> from lists.ceph.com over to lists.ceph.io? It seems to contain an awful
> lot of messages referred elsewhere by their archive URL, many (all?) of
> which appear to now lead to 404s.
> 
> Example: google "Set existing pools to use hdd device class only". The
> top hit is a link to
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029078.html:
> 
> $ curl -IL
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029078.html
> HTTP/1.1 301 Moved Permanently
> Server: nginx/1.10.3 (Ubuntu)
> Date: Thu, 29 Aug 2019 12:48:13 GMT
> Content-Type: text/html
> Content-Length: 194
> Connection: keep-alive
> Location:
> https://lists.ceph.io/pipermail/ceph-users-ceph.com/2018-August/029078.html
> Strict-Transport-Security: max-age=31536000
> 
> HTTP/1.1 404 Not Found
> Server: nginx
> Date: Thu, 29 Aug 2019 12:48:14 GMT
> Content-Type: text/html; charset=utf-8
> Content-Length: 3774
> Connection: keep-alive
> X-Frame-Options: SAMEORIGIN
> Vary: Accept-Language, Cookie
> Content-Language: en
> 
> Or maybe this is just a redirect rule that needs to be cleverer or more
> specific, rather than the apparent catch-all .com/.io redirect?
> 
> Cheers,
> Florian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-08-29 Thread Florian Haas
Hi,

is there any chance the list admins could copy the pipermail archive
from lists.ceph.com over to lists.ceph.io? It seems to contain an awful
lot of messages referred elsewhere by their archive URL, many (all?) of
which appear to now lead to 404s.

Example: google "Set existing pools to use hdd device class only". The
top hit is a link to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029078.html:

$ curl -IL
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029078.html
HTTP/1.1 301 Moved Permanently
Server: nginx/1.10.3 (Ubuntu)
Date: Thu, 29 Aug 2019 12:48:13 GMT
Content-Type: text/html
Content-Length: 194
Connection: keep-alive
Location:
https://lists.ceph.io/pipermail/ceph-users-ceph.com/2018-August/029078.html
Strict-Transport-Security: max-age=31536000

HTTP/1.1 404 Not Found
Server: nginx
Date: Thu, 29 Aug 2019 12:48:14 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 3774
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
Vary: Accept-Language, Cookie
Content-Language: en

Or maybe this is just a redirect rule that needs to be cleverer or more
specific, rather than the apparent catch-all .com/.io redirect?

Cheers,
Florian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-26 Thread Florian Haas
On 23/08/2019 22:14, Paul Emmerich wrote:
> On Fri, Aug 23, 2019 at 3:54 PM Florian Haas  wrote:
>>
>> On 23/08/2019 13:34, Paul Emmerich wrote:
>>> Is this reproducible with crushtool?
>>
>> Not for me.
>>
>>> ceph osd getcrushmap -o crushmap
>>> crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
>>> hostname-that-doesnt-exist-yet -o crushmap.modified
>>> Replacing XX with the osd ID you tried to add.
>>
>> Just checking whether this was intentional. As the issue pops up when
>> adding an new OSD *on* a new host, not moving an existing OSD *to* a new
>> host, I would have used --add-item here. Is there a specific reason why
>> you're suggesting to test with --update-item?
> 
> yes, update should map to create or move which it should use internally
> 
>>
>> At any rate, I tried with multiple different combinations (this is on a
>> 12.2.12 test cluster; I can't test this in production):
> 
> which also ran into this bug? The idea of using crushtool is to not
> crash your production cluster but just the local tool.

Ah, gotcha. I thought you wanted me to be able to at least do "ceph osd
setcrushmap" with the resulting crushmap, which would require a running
cluster.

So yes, doing this completely offline shows that you're definitely on to
something. I am able to crash crushtool with the original crushmap, and
what it appears to be falling over on is a choose_args map in there.

I've updated the bug report with this comment:
https://tracker.ceph.com/issues/40029#note-11

It would seem that there are two workarounds at this stage for
pre-Nautilus users with a choose_args map in their crushmap, and who for
some reason are unable to upgrade to Nautilus yet:

1. Add host buckets manually before adding new OSDs.
2. Drop any choose_args map from their crushmap.

As it happens I am not aware of any way to do #2 other than

- using getcrushmap,
- decompiling the crushmap,
- dropping the choose_args map from the textual representation of the
crushmap,
- recompiling, and then
- using setcrushmap.

Are you, by any chance?

Thanks again for your help!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-23 Thread Florian Haas
On 23/08/2019 13:34, Paul Emmerich wrote:
> Is this reproducible with crushtool?

Not for me.

> ceph osd getcrushmap -o crushmap
> crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
> hostname-that-doesnt-exist-yet -o crushmap.modified
> Replacing XX with the osd ID you tried to add.

Just checking whether this was intentional. As the issue pops up when
adding an new OSD *on* a new host, not moving an existing OSD *to* a new
host, I would have used --add-item here. Is there a specific reason why
you're suggesting to test with --update-item?

At any rate, I tried with multiple different combinations (this is on a
12.2.12 test cluster; I can't test this in production):


0. Get the current reference crushmap:

# ceph osd tree
ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
-1   0.05846 root default
-5   0.01949 host daisy
 0   hdd 0.01949 osd.0  up  1.0 1.0
-7   0.01949 host eric
 1   hdd 0.01949 osd.1  up  1.0 1.0
-3   0.01949 host frank
 2   hdd 0.01949 osd.2  up  1.0 1.0
# ceph osd getcrushmap -o crushmap
11


1. "Update" a nonexistent OSD belonging to a nonexistent host (your
suggestion):

# crushtool -i crushmap --update-item 59 0.01949 osd.59 --loc host
nonexistent -o crushmap-update-nonexistent-to-nonexistent
# ceph osd setcrushmap -i crushmap-update-nonexistent-to-nonexistent
12
# ceph osd tree
ID CLASS WEIGHT  TYPE NAMESTATUS REWEIGHT PRI-AFF
-9   0.01949 host nonexistent
59   0.01949 osd.59  DNE0
-1   0.05846 root default
-5   0.01949 host daisy
 0   hdd 0.01949 osd.0up  1.0 1.0
-7   0.01949 host eric
 1   hdd 0.01949 osd.1up  1.0 1.0
-3   0.01949 host frank
 2   hdd 0.01949 osd.2up  1.0 1.0
# ceph osd setcrushmap -i crushmap
13


2. Add a nonexistent OSD belonging to a nonexistent host (I think this
is functionally identical):

# crushtool -i crushmap --add-item 59 0.01949 osd.59 --loc host
nonexistent -o crushmap-add-nonexistent-to-nonexistent
# ceph osd setcrushmap -i crushmap-add-nonexistent-to-nonexistent
14
# ceph osd tree
ID CLASS WEIGHT  TYPE NAMESTATUS REWEIGHT PRI-AFF -9
0.01949 host nonexistent
59   0.01949 osd.59  DNE0
-1   0.05846 root default
-5   0.01949 host daisy
 0   hdd 0.01949 osd.0up  1.0 1.0
-7   0.01949 host eric
 1   hdd 0.01949 osd.1up  1.0 1.0
-3   0.01949 host frank
 2   hdd 0.01949 osd.2up  1.0 1.0
# ceph osd setcrushmap -i crushmap
15


3. Move an existing OSD to a nonexistent host:

# crushtool -i crushmap --update-item 0 0.01949 osd.0 --loc host
nonexistent -o crushmap-update-existing-to-nonexistent
# ceph osd setcrushmap -i crushmap-update-existing-to-nonexistent
16
# ceph osd tree
ID CLASS WEIGHT  TYPE NAMESTATUS REWEIGHT PRI-AFF
-9   0.01949 host nonexistent
 0   hdd 0.01949 osd.0up  1.0 1.0
-1   0.03897 root default
-5 0 host daisy
-7   0.01949 host eric
 1   hdd 0.01949 osd.1up  1.0 1.0
-3   0.01949 host frank
 2   hdd 0.01949 osd.2up  1.0 1.0
# ceph osd setcrushmap -i crushmap
17


None of these crashed any mon.

However, there's this line in the bug report:

   -19> 2019-08-22 10:08:11.897364 7f93797ab700  0
mon.cc-ceph-osd11-fra1@0(leader).osd e302401 create-or-move crush item
name 'osd.59' initial_weight 1.6374 at location
{host=cc-ceph-osd26-fra1,root=default}

So it's not trying to move the item to just a nonexistent host, but to a
nonexistent host *in the default root*.

So I retried the above commands with "--loc host nonexistent --loc root
default".  No change other than everything showing up under default; no
mon crash.

And then I tried one more which was to *first* add just a new OSD under
the default root, and *then* moving that OSD to a new, nonexistent host,
also under the default root. Again, no mon crash.

So I'm afraid I am unable to reproduce this with crushtool and setcrushmap.


And I can't get my mons to crash with "ceph osd crush move", either:

ceph osd crush move osd.59 host=nonexistent root=default
moved item id 59 name 'osd.59' to location
{host=nonexistent,root=default} in crush map


Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-23 Thread Florian Haas
Hi everyone,

there are a couple of bug reports about this in Redmine but only one
(unanswered) mailing list message[1] that I could find. So I figured I'd
raise the issue here again and copy the original reporters of the bugs
(they are BCC'd, because in case they are no longer subscribed it
wouldn't be appropriate to share their email addresses with the list).

This is about https://tracker.ceph.com/issues/40029, and
https://tracker.ceph.com/issues/39978 (the latter of which was recently
closed as a duplicate of the former).

In short, it appears that at least in luminous and mimic (I haven't
tried nautilus yet), it's possible to crash a mon when attempting to add
a new OSD as it's trying to inject itself into the crush map under its
host bucket, when that host bucket does not exist yet.

What's worse is that when the OSD's "ceph osd new" process has thus
crashed the leader mon, a new leader is elected and in case the "ceph
osd new" process is still running on the OSD node, it will promptly
connect to that mon, and kill it too. This then continues until
sufficiently many mons have died for quorum to be lost.

The recovery steps appear to involve

- killing the "ceph osd new" process,
- restarting mons until you regain quorum,
- and then running "ceph osd purge" to drop the problematic OSD entry
from the crushmap and osdmap.

The issue can apparently be worked around by adding the host buckets to
the crushmap manually before adding the new OSDs, but surely this isn't
intended to be a prerequisite, at least not to the point of mons
crashing otherwise?

Also I am guessing that this is some weird corner case rooted in an
unusual combination of contributing factors, because otherwise I am
guessing more people would be bitten by this problem.

Anyone able to share their thoughts on this one? Have more people run
into this?

Cheers,
Florian



[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034880.html
— interestingly I could find this message in the pipermail archive but
none in the one that my MUA keeps for me. So perhaps that message wasn't
delivered to all subscribers, which might be why it has gone unanswered.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-23 Thread Florian Haas
Just following up here to report back and close the loop:

On 21/08/2019 16:51, Jason Dillaman wrote:
> It just looks like this was an oversight from the OpenStack developers
> when Nova RBD "direct" ephemeral image snapshot support was added [1].
> I would open a bug ticket against Nova for the issue.

Done: https://bugs.launchpad.net/nova/+bug/1841160

Thanks again for your help!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-21 Thread Florian Haas
On 21/08/2019 18:05, dhils...@performair.com wrote:
> Florian;
> 
> Forgive my lack of knowledge of OpenStack, and your environment / use case.
> 
> Why would you need / want to snapshot an ephemeral disk?  Isn't the point of 
> ephemeral storage to not be persistent?

Fair point, but please consider that if you use an ephemeral VM as a
template for other VMs (a common motivation for snapshotting), you might
not care about the consistency of the VMs themselves, but you probably
do care about the consistency of the template. But, for that use-case
you could argue that you should just shut down the VM and take a clean
snapshot then.

However, in OpenStack Nova you may also use boot-from-volume, meaning
you're running a VM that is expected to be *wholly* persistent, rather
than ephemeral, and in that case the consistency of a snapshot taken
while the instance is running is rather important.

So just to be sure I took your cue and retested to see whether the same
issue also applied to an instance using boot-from-volume. And lo and
behold, the problem does not apply — if I configure an instance to boot
from a volume, I get fsfreeze just as intended. (I have yet to dig up
the code path for this.)

So, evidently the situation can be summarized as:

- Ephemeral boot, *without* RBD, with or without attached volumes:
freeze/thaw if hw_qemu_guest_agent=yes, resulting in consistent snapshots.

- Ephemeral boot *from* RBD, also with or without attached volumes: no
freeze/thaw, resulting in potentially inconsistent snapshots even with
hw_qemu_guest_agent=yes.

- Boot-from-volume from RBD: freeze/thaw if hw_qemu_guest_agent=yes,
resulting in consistent snapshots.

Bit odd, that. :) But at least there's another available workaround: if
you need to ensure snapshot consistency, use boot-from-volume.

Thanks for the nudge in that direction, Dominic!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-21 Thread Florian Haas
Hi everyone,

apologies in advance; this will be long. It's also been through a bunch
of edits and rewrites, so I don't know how well I'm expressing myself at
this stage — please holler if anything is unclear and I'll be happy to
try to clarify.

I am currently in the process of investigating the behavior of OpenStack
Nova instances when being snapshotted and suspended, in conjunction with
qemu-guest-agent (qemu-ga). I realize that RBD-backed Nova/libvirt
instances are expected to behave differently from file-backed ones, but
I think I might have reason to believe that the RBD-backed ones are
indeed behaving incorrectly, and I'd like to verify that.

So first up, for comparison, let's recap how a Nova/libvirt/KVM instance
behaves when it is *not* backed by RBD (such as, it's using a qcow2 file
that is on a Nova compute node in /var/lib/nova/instances), is booted
from an image with the hw_qemu_guest_agent=yes meta property set, and
runs qemu-guest-agent within the guest:

- User issues "nova suspend" or "openstack server suspend".

- If nova-compute on the compute node decides that the instance has
qemu-guest-agent running (which is the case if it's qemu or kvm, and its
image has hw_qemu_guest_agent=yes), it sends a guest-sync command over
the guest agent VirtIO serial port. This command registers in the
qemu-ga log file in the guest.

- nova-compute on the compute node sends a libvirt managed-save command.

- Nova reports the instance as suspended.

- User issues "nova resume" or "openstack server resume".

- nova-compute on the compute node sends a libvirt start command.

- Again, if nova-compute on the compute node knows that the instance has
qemu-guest-agent running, it sends another command over the serial port,
namely guest-set-time. This, too, registers in the guest's qemu-ga log.

- Nova reports the instance as active (running normally) again.


Now, when I instead use a Nova environment that is fully RBD-backed, I
see exactly the same behavior as described above. So I know that in
principle, nova-compute/qemu-ga communication works in both an
RBD-backed and a non-RBD-backed environment.


However, things appear to get very different when it comes to snapshots.


Again, starting with a file-backed environment:

- User issues "nova image-create" or "openstack server image create".

- If nova-compute on the compute node decides that the instance can be
quiesced (which is the case if it's qemu or kvm, and its image has
hw_qemu_guest_agent=yes), then it sends a "guest-fsfreeze-freeze"
command over the guest agent VirtIO serial port.

- The guest agent inside the guest loops over all mounted filesystems,
and issues the FIFREEZE ioctl (which maps to the kernel freeze_super()
function). This can be seen in the qemu-ga log file in the guest, and it
is also verifiable by using ftrace on the qemu-ga PID and checking for
the freeze_super() function call.

- nova-compute then takes a live snapshot of the instance.

- Once complete, the guest gets a "guest-fsfreeze-thaw" command, and
again I can see this in the qemu-ga log, and with ftrace.


And now with RBD:

- User issues "nova image-create" or "openstack server image create".

- The guest-fsfreeze-freeze agent command never happens.

Now I can see the info message from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2048
in my nova-compute log, which confirms that we're attempting a live
snapshot.

I also do *not* see the warning from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2068,
so it looks like the direct_snapshot() call from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2058
succeeds. This is defined in
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/imagebackend.py#L1055
and it uses RBD functionality only. Importantly, it never interacts with
qemu-ga, so it appears to not worry at all about freezing the filesystem.

(Which does seem to contradict
https://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#image-properties,
by the way, so that may be a documentation bug.)

Now here's another interesting part. Were the direct snapshot to fail,
if I read
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2081
and
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2144
correctly, the fallback behavior would be as follows: The domain would
next be "suspended" (note, again this is Nova suspend, which maps to
libvirt managed-save per
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/guest.py#L504),
then snapshotted using a libvirt call and resumed again post-snapshot.
In which case there would be a 

[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-14 Thread Florian Haas
On 12/08/2019 21:07, Alexandre Marangone wrote:
>> rados -p volumes stat 'obj-vS6RN9\uQwvXU9DP'
>>  error stat-ing volumes/obj-vS6RN9\uQwvXU9DP: (2) No such file or directory
> I believe you need to substitute \u with _

Yes indeed, thank you!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-14 Thread Florian Haas
Hi Tom,

responding back on this briefly so that people are in the loop; I'll
have more details in a blog post that I hope to get around to writing.

On 12/08/2019 11:34, Thomas Byrne - UKRI STFC wrote:
>> And bluestore should refuse to start if the configured limit is > 4GB.  Or 
>> something along those lines...
> 
> Just on this point - Bluestore OSDs will fail to start with an 
> osd_max_object_size >=4GB with a helpful error message about the Bluestore 
> hard limit. I was mildly amused when I discovered that luminous OSDs can 
> start with osd_max_object_size = 4GB - 1 byte, but mimic OSDs require it to 
> be <= 4GB - 2 bytes to start without an error. I haven't checked to see if 
> nautilus OSDs require <= 4GB - 3 bytes yet.

Yes but that doesn't help users much for clusters where very large
objects already exist. Even in Luminous, osd_max_object_size defaults to
128M, but if an OSD already has objects larger than that, it will still
happily start up and serve data with FileStore — and crash any newly
added BlueStore OSDs unfortunate enough to be mapped to a PG with one or
more objects that are 4GiB or larger.

The pending PR to make this a scrub error even on FileStore OSDs
mitigates this issue (https://github.com/ceph/ceph/pull/29579), but
it'll still cause a somewhat unexpected surprise for people who have
just updated to a version including that fix and suddenly see tons of
scrub errors — they would be easily forgiven for assuming they've run
into a regression that involves false positives on scrub. "Hey, none of
these errors were here before the upgrade, surely there's a problem with
the software rather than my data!"

We've progressed further in the interim and it appears like I can give
all-clears on a couple of concerns that we had:

1. It looks like these objects were not created by an RBD going haywire,
but by something actually using librados to create them, presumably long
before the cluster ever went into production.

2. I am not changing the subject line so I don't mess up people's list
archives if their MUA doesn't correctly thread based on In-Reply-To or
References, but it's now evident that this is *not* related to bug
#38724 but instead really just due to objects being too large for
BlueStore, like Sage said in his first reply.

Thanks for the answer — by the way I have been imploring all my
colleagues to watch your Cephalocon talk,[1] which was excellent.

Cheers,
Florian

[1] https://youtu.be/niFNZN5EKvE
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-09 Thread Florian Haas
Hi Sage!

Whoa that was quick. :)

On 09/08/2019 16:27, Sage Weil wrote:
>> https://tracker.ceph.com/issues/38724#note-26
> 
> {
> "op_num": 2,
> "op_name": "truncate",
> "collection": "2.293_head",
> "oid": 
> "#-4:c96337db:::temp_recovering_2.293_11123'6472830_288833_head:head#",
> "offset": 4457615932
> },
> 
> That offsize (size) is > 4 GB.  BlueStore has a hard limit of 2^32-1 for 
> object sizes (because it uses a uint32_t).  This cluster appears to have 
> some ginormous rados objects.  Until those are removed, you 
> can't/shouldn't use bluestore.

OK, this is interesting.

This is an OpenStack Cinder volumes pool, so all the objects in there
belong to RBDs. I couldn't think of any situation in which RBD would
create a huge object like that.

But, as it happens that PG is currently mapped to a primary OSD that is
still on FileStore, so I can do a "find -size +1G" on that mount point,
and here's what I get:

-rw-r--r-- 1 ceph ceph 4457615932 Mar 29  2018
DIR_3/DIR_9/DIR_6/DIR_C/obj-vS6RN9\uQwvXU9DP__head_DBECC693__2

So, bingo. That's a 4.2GB size file whose size matches that offset exactly.

But I'm not familiar with that object name format. How did that object
get here? And how do I remove it, considering I seem to be unable to
access it?

rados -p volumes stat 'obj-vS6RN9\uQwvXU9DP'
 error stat-ing volumes/obj-vS6RN9\uQwvXU9DP: (2) No such file or directory

Or is that file just an artifact that doesn't even map to an object?

This is turning out to be a learning experience. :)

Thanks again for your help!

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-09 Thread Florian Haas
Hi everyone,

it seems there have been several reports in the past related to
BlueStore OSDs crashing from unhandled errors in _txc_add_transaction:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/03.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032172.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/031960.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/031964.html

Bug #38724 tracks this, has been fixed in master with
https://github.com/ceph/ceph/pull/27929, and is pending backports (and,
I dare say, is *probably* misclassified as being only minor, as this
does cause potential data loss as soon as it affects enough OSDs
simultaneously):

https://tracker.ceph.com/issues/38724

We just ran into a similar issue with a couple of BlueStore OSDs that we
recently added to a Luminous (12.2.12) cluster that was upgraded from
Jewel, and hence, still largely runs on FileStore. I say similar because
evidently other people reporting this problem have been running into
ENOENT (No such file or directory) or ENOTEMPTY (Directory not empty);
for us it's interestingly E2BIG (Argument list too long):

https://tracker.ceph.com/issues/38724#note-26

So I'm wondering if someone could shed light on these questions:

* Is this the same issue as that which
https://github.com/ceph/ceph/pull/27929 fixes?

* Thus, since https://github.com/ceph/ceph/pull/29115 (the Nautilus
backport for that fix) has been merged, but is not yet included in a
release, do *Nautilus* users get a fix in the upcoming 14.2.3 release,
and once they update, would this bug go away with no further
intervention required?

* For users on *Luminous*, since https://tracker.ceph.com/issues/39694
(the Luminous version of 38724) says "non-trivial backport", is it fair
to say that a fix might still take a while for that release?

* Finally, are Luminous users safe from this bug if they keep using, or
revert to, FileStore?

Thanks in advance for your thoughts! Please keep Erik CC'd on your reply.

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io