[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-25 Thread Matthew Vernon

On 24/06/2024 21:18, Matthew Vernon wrote:

2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw 
ERROR root] Non-zero return from ['radosgw-admin', '-k', 
'/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 
'mgr.moss-be2001.qvwcaq', 'realm', 'pull', '--url', 
'https://apus.svc.eqiad.wmnet:443', '--access-key', 'REDACTED', 
'--secret', 'REDACTED', '--rgw-realm', 'apus']: request failed: (5) 
Input/output error


EIO is an odd sort of error [doesn't sound very network-y], and I don't 
think I see any corresponding request in the radosgw logs in the primary 
zone. From the CLI outside the container I can do e.g. curl 
https://apus.svc.eqiad.wmnet/ just fine, are there other things worth 
checking here? Could it matter that the mgr node isn't an rgw?


...the answer turned out to be "container image lacked the relevant CA 
details to validate the TLS of the other end".


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon

On 24/06/2024 20:49, Matthew Vernon wrote:

On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where 


...the code path seems to have a bunch of places it might raise an 
exception; are those likely to result in some entry in a log-file? I've 


Ah, I do now find:

2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw 
ERROR root] Non-zero return from ['radosgw-admin', '-k', 
'/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 
'mgr.moss-be2001.qvwcaq', 'realm', 'pull', '--url', 
'https://apus.svc.eqiad.wmnet:443', '--access-key', 'REDACTED', 
'--secret', 'REDACTED', '--rgw-realm', 'apus']: request failed: (5) 
Input/output error


EIO is an odd sort of error [doesn't sound very network-y], and I don't 
think I see any corresponding request in the radosgw logs in the primary 
zone. From the CLI outside the container I can do e.g. curl 
https://apus.svc.eqiad.wmnet/ just fine, are there other things worth 
checking here? Could it matter that the mgr node isn't an rgw?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon

On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where 


...the code path seems to have a bunch of places it might raise an 
exception; are those likely to result in some entry in a log-file? I've 
not found anything, which is making working out what the problem is 
quite challenging...


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph rgw zone create fails EINVAL

2024-06-19 Thread Matthew Vernon

Hi,

I'm running cephadm/reef 18.2.2. I'm trying to set up multisite.

I created realm/zonegroup/master zone OK (I think!), edited the 
zonegroup json to include hostnames. I have this spec file for the 
secondary zone:


rgw_zone: codfw
rgw_realm_token: "SECRET"
placement:
  label: "rgw"

[I get "SECRET" by doing ceph rgw realm tokens on the master, and C 
the field labelled "token"]


If I then try and apply this with:
ceph rgw zone create -i /root/rgw_secondary.yaml

It doesn't work, and I get an unhelpful backtrace:
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1811, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
   ^
  File "/usr/share/ceph/mgr/mgr_module.py", line 474, in call
return self.func(mgr, **kwargs)
   
  File "/usr/share/ceph/mgr/rgw/module.py", line 96, in wrapper
return func(self, *args, **kwargs)
   ^^^
  File "/usr/share/ceph/mgr/rgw/module.py", line 304, in 
_cmd_rgw_zone_create
return HandleCommandResult(retval=0, stdout=f"Zones {', 
'.join(created_zones)} created successfully")



TypeError: sequence item 0: expected str instance, int found

I assume I've messed up the spec file, but it looks like the one in the 
docs[0]. Can anyone point me in the right direction, please?


[if the underlying command emits anything useful, I can't find it in the 
logs]


Thanks,

Matthew

[0] https://docs.ceph.com/en/reef/mgr/rgw/#realm-credentials-token
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon

Hi,

I'm using reef (18.2.2); the docs talk about setting up a multi-site 
setup with a spec file e.g.


rgw_realm: apus
rgw_zonegroup: apus_zg
rgw_zone: eqiad
placement:
  label: "rgw"

but I don't think it's possible to configure the "hostnames" parameter 
of the zonegroup (and thus control what hostname(s) the rgws are 
expecting to serve)? Have I missed something, or do I need to set up the 
realm/zonegroup/zone, extract the zonegroup json and edit hostnames by hand?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rgw mgr module not shipped? (in reef at least)

2024-05-31 Thread Matthew Vernon

Hi,

As far as I can tell, the rgw mgr module is not shipped in the published 
reef Debian packages (nor, I suspect, the ubuntu ones, but I've not 
actually checked).


Is there a reason why it couldn't just be added to ceph-mgr-modules-core 
? That contains quite a large number of modules already, and the rgw one 
is effectively one small python file, I think...


I'm using 18.2.2.

Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon

On 28/05/2024 17:07, Wesley Dillingham wrote:

What is the state of your PGs? could you post "ceph -s"


PGs all good:

root@moss-be1001:/# ceph -s
  cluster:
id: d7849d66-183c-11ef-b973-bc97e1bb7c18
health: HEALTH_WARN
1 stray daemon(s) not managed by cephadm

  services:
mon: 3 daemons, quorum moss-be1001,moss-be1003,moss-be1002 (age 6d)
mgr: moss-be1001.yibskr(active, since 6d), standbys: moss-be1003.rwdjgw
osd: 48 osds: 47 up (since 2d), 47 in (since 2d)

  data:
pools:   1 pools, 1 pgs
objects: 6 objects, 19 MiB
usage:   4.2 TiB used, 258 TiB / 263 TiB avail
pgs: 1 active+clean

The OSD is marked as "destroyed" in the osd tree:

root@moss-be1001:/# ceph osd tree | grep -E '^35'
35hdd3.75999  osd.35destroyed 0  1.0

root@moss-be1001:/# ceph osd safe-to-destroy osd.35 ; echo $?
OSD(s) 35 are safe to destroy without reducing data durability.
0

I should have said - this is a reef 18.2.2 cluster, cephadm deployed.

Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon

Hi,

I want to prepare a failed disk for replacement. I did:
ceph orch osd rm 35 --zap --replace

and it's now in the state "Done, waiting for purge", with 0 pgs, and 
REPLACE and ZAP set to true. It's been like this for some hours, and now 
my cluster is unhappy:


[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.35 on host moss-be1002 not managed by cephadm

(the OSD is down & out)

...and also neither the disk nor the relevant NVME LV has been zapped.

I have my OSDs deployed via a spec:
service_type: osd
service_id: rrd_single_NVMe
placement:
  label: "NVMe"
spec:
  data_devices:
rotational: 1
  db_devices:
model: "NVMe"

And before issuing the ceph orch osd rm I set that to be unmanaged (ceph 
orch set-unmanaged osd.rrd_single_NVMe), as obviously I don't want ceph 
to just try and re-make a new OSD on the sad disk.


I'd expected from the docs[0] that what I did would leave me with a 
system ready for the failed disk to be swapped (and that I could then 
mark osd.rrd_single_NVMe as managed again, and a new OSD built), 
including removing/wiping the NVME lv so it can be removed.


What did I do wrong? I don't much care about the OSD id (but obviously 
it's neater to not just incrementally increase OSD numbers every time a 
disk died), but I thought that telling ceph orch not to make new OSDs 
then using ceph orch osd rm to zap the disk and NVME lv would have been 
the way to go...


Thanks,

Matthew

[0] https://docs.ceph.com/en/reef/cephadm/services/osd/#replacing-an-osd
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-22 Thread Matthew Vernon

Hi,

On 22/05/2024 12:44, Eugen Block wrote:


you can specify the entire tree in the location statement, if you need to:


[snip]

Brilliant, that's just the ticket, thank you :)


This should be made a bit clearer in the docs [0], I added Zac.


I've opened a MR to update the docs, I hope it's at least useful as a 
starter-for-ten:

https://github.com/ceph/ceph/pull/57633

Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-21 Thread Matthew Vernon

Hi,

Returning to this, it looks like the issue wasn't to do with how 
osd_crush_chooseleaf_type ; I destroyed and re-created my cluster as 
before, and I have the same problem again:


pg 1.0 is stuck inactive for 10m, current state unknown, last acting []

as before, ceph osd tree:

root@moss-be1001:/# ceph osd tree
ID  CLASS  WEIGHT TYPE NAME STATUS  REWEIGHT  PRI-AFF
-7 176.11194  rack F3
-6 176.11194  host moss-be1003
13hdd7.33800  osd.13up   1.0  1.0
15hdd7.33800  osd.15up   1.0  1.0

And checking the crushmap, the default bucket is again empty:

root default {
id -1   # do not change unnecessarily
id -14 class hdd# do not change unnecessarily
# weight 0.0
alg straw2
hash 0  # rjenkins1
}

[by way of confirming that I didn't accidentally leave the old config 
fragment lying around, the replication rule has:

step chooseleaf firstn 0 type host
]

So it looks like setting location: in my spec is breaking the cluster 
bootstrap - the hosts aren't put into default, but neither are the 
declared racks. As a reminder, that spec has host entries like:


service_type: host
hostname: moss-be1003
addr: 10.64.136.22
location:
  rack: F3
labels:
  - _admin
  - NVMe

Is this expected behaviour? Presumably I can fix the cluster by using 
"ceph osd crush move F3 root=default" and similar for the others, but is 
there a way to have what I want done by cephadm bootstrap?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon

Hi,

Thanks for your help!

On 20/05/2024 18:13, Anthony D'Atri wrote:


You do that with the CRUSH rule, not with osd_crush_chooseleaf_type.  Set that 
back to the default value of `1`.  This option is marked `dev` for a reason ;)


OK [though not obviously at 
https://docs.ceph.com/en/reef/rados/configuration/pool-pg-config-ref/#confval-osd_crush_chooseleaf_type 
]



but I think you’d also need to revert `osd_crush_chooseleaf_type` too.  Might 
be better to wipe and redeploy so you know that down the road when you add / 
replace hardware this behavior doesn’t resurface.


Yep, I'm still at the destroy-and-recreate point here, trying to make 
sure I can do this repeatably.



Once the cluster was up I used an osd spec file that looked like:
service_type: osd
service_id: rrd_single_NVMe
placement:
  label: "NVMe"
spec:
  data_devices:
rotational: 1
  db_devices:
model: "NVMe"

Is it your intent to use spinners for payload data and SSD for metadata?


Yes.


You might want to set `db_slots` accordingly, by default I think it’ll be 1:1 
which probably isn’t what you intend.


Is there an easy way to check this? The docs suggested it would work, 
and vgdisplay on the vg that pvs tells me the nvme device is in shows 24 
LVs...


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon

Hi,

On 20/05/2024 17:29, Anthony D'Atri wrote:


On May 20, 2024, at 12:21 PM, Matthew Vernon  wrote:



This has left me with a single sad pg:
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
pg 1.0 is stuck inactive for 33m, current state unknown, last acting []



.mgr pool perhaps.


I think so


ceph osd tree shows that CRUSH picked up my racks OK, eg.
-3  45.11993  rack B4
-2  45.11993  host moss-be1001
1hdd3.75999  osd.1 up   1.0  1.0



Please send the entire first 10 lines or so of `ceph osd tree`


root@moss-be1001:/# ceph osd tree
ID  CLASS  WEIGHT TYPE NAME STATUS  REWEIGHT  PRI-AFF
-7 176.11194  rack F3
-6 176.11194  host moss-be1003
 2hdd7.33800  osd.2 up   1.0  1.0
 3hdd7.33800  osd.3 up   1.0  1.0
 6hdd7.33800  osd.6 up   1.0  1.0
 9hdd7.33800  osd.9 up   1.0  1.0
12hdd7.33800  osd.12up   1.0  1.0
13hdd7.33800  osd.13up   1.0  1.0
16hdd7.33800  osd.16up   1.0  1.0
19hdd7.33800  osd.19up   1.0  1.0



I passed this config to bootstrap with --config:

[global]
  osd_crush_chooseleaf_type = 3


Why did you set that?  3 is an unusual value.  AIUI most of the time the only 
reason to change this option is if one is setting up a single-node sandbox - 
and perhaps localpools create a rule using it.  I suspect this is at least part 
of your problem.


I wanted to have rack as failure domain rather than host i.e. to ensure 
that each replica goes in a different rack (academic at the moment as I 
have 3 hosts, one in each rack, but for future expansion important).



Once the cluster was up I used an osd spec file that looked like:
service_type: osd
service_id: rrd_single_NVMe
placement:
  label: "NVMe"
spec:
  data_devices:
rotational: 1
  db_devices:
model: "NVMe"


Is it your intent to use spinners for payload data and SSD for metadata?


Yes.

Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon

Hi,

I'm probably Doing It Wrong here, but. My hosts are in racks, and I 
wanted ceph to use that information from the get-go, so I tried to 
achieve this during bootstrap.


This has left me with a single sad pg:
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
pg 1.0 is stuck inactive for 33m, current state unknown, last acting []

ceph osd tree shows that CRUSH picked up my racks OK, eg.
-3  45.11993  rack B4
-2  45.11993  host moss-be1001
 1hdd3.75999  osd.1 up   1.0  1.0

But root seems empty:
-1 0  root default

and if I decompile the crush map, indeed:
# buckets
root default {
id -1   # do not change unnecessarily
id -14 class hdd# do not change unnecessarily
# weight 0.0
alg straw2
hash 0  # rjenkins1
}

which does indeed look empty, whereas I have rack entries that contain 
the relevant hosts.


And the replication rule:
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type rack
step emit
}

I passed this config to bootstrap with --config:

[global]
  osd_crush_chooseleaf_type = 3

and an initial spec file with host entries like this:

service_type: host
hostname: moss-be1001
addr: 10.64.16.40
location:
  rack: B4
labels:
  - _admin
  - NVMe

Once the cluster was up I used an osd spec file that looked like:
service_type: osd
service_id: rrd_single_NVMe
placement:
  label: "NVMe"
spec:
  data_devices:
rotational: 1
  db_devices:
model: "NVMe"

I could presumably fix this up by editing the crushmap (to put the racks 
into the default bucket), but what did I do wrong? Was this not a 
reasonable thing to want to do with cephadm?


I'm running
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)

Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm basic questions: image config, OS reimages

2024-05-16 Thread Matthew Vernon

Hi,

I've some experience with Ceph, but haven't used cephadm much before, 
and am trying to configure a pair of reef clusters with cephadm. A 
couple of newbie questions, if I may:


* cephadm shell image

I'm in an isolated environment, so pulling from a local repository. I 
bootstrapped OK with

cephadm --image docker-registry.wikimedia.org/ceph bootstrap ...

And that worked nicely, but if I want to run cephadm shell (to do any 
sort of admin), then I have to specify

cephadm --image docker-registry.wikimedia.org/ceph shell

(otherwise it just hangs failing to talk to quay.io).

I found the docs, which refer to setting lots of other images, but not 
the one that cephadm uses:

https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment

I found an old tracker in this area: https://tracker.ceph.com/issues/47274

...but is there a good way to arrange for cephadm to use the 
already-downloaded image without having to remember to specify --image 
each time?


* OS reimages

We do OS upgrades by reimaging the server (which doesn't touch the 
storage disks); on an old-style deployment you could then use 
ceph-volume to re-start the OSDs and away you went; how does one do this 
in a cephadm cluster?
[I presume involves telling cephadm to download a new image for podman 
to use and suchlike]


Would the process be smoother if we arranged to leave /var/lib/ceph 
intact between reimages?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Matthew Vernon

On 24/04/2024 13:43, Bailey Allison wrote:


A simple ceph-volume lvm activate should get all of the OSDs back up and
running once you install the proper packages/restore the ceph config
file/etc.,


What's the equivalent procedure in a cephadm-managed cluster?

Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Matthew Vernon

Hi,

On 06/03/2024 16:49, Gregory Farnum wrote:

Has the link on the website broken? https://ceph.com/en/community/connect/
We've had trouble keeping it alive in the past (getting a non-expiring
invite), but I thought that was finally sorted out.


Ah, yes, that works. Sorry, I'd gone to
https://docs.ceph.com/en/latest/start/get-involved/

which lacks the registration link.

Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph-storage slack access

2024-03-06 Thread Matthew Vernon

Hi,

How does one get an invite to the ceph-storage slack, please?

Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-21 Thread Matthew Vernon

[mgr modules failing because pyO3 can't be imported more than once]

On 29/01/2024 12:27, Chris Palmer wrote:

I have logged this as https://tracker.ceph.com/issues/64213


I've noted there that it's related to 
https://tracker.ceph.com/issues/63529 (an earlier report relating to the 
dashboard); there is a MR to fix just the dashboard issue which got 
merged into main. I've opened a MR to backport that change to Reef:

https://github.com/ceph/ceph/pull/55689

I don't know what the devs' plans are for dealing with the broader pyO3 
issue, but I'll ask on the dev list...


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v18.2.1 Reef released

2023-12-19 Thread Matthew Vernon

On 19/12/2023 06:37, Eugen Block wrote:

Hi,

I thought the fix for that would have made it into 18.2.1. It was marked 
as resolved two months ago (https://tracker.ceph.com/issues/63150, 
https://github.com/ceph/ceph/pull/53922).


Presumably that will only take effect once ceph orch is version 18.2.1 
(whereas the reporter is still on 18.2.0)? i.e. one has to upgrade to 
18.2.1 before this bug will be fixed and so the upgrade _to_ 18.2.1 is 
still affected.


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 support

2023-11-13 Thread Matthew Vernon

Hi,

On 13/11/2023 10:42, Chris Palmer wrote:
And another big +1 for debian12 reef from us. We're unable to upgrade to 
either debian12 or reef.
I've been keeping an eye on the debian12 bug, and it looks as though it 
might be fixed if you start from the latest repo release.


My expectation is that the next point release of Reef (due soon!) will 
have Debian packages built as part of it.


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian/bullseye build for reef

2023-09-07 Thread Matthew Vernon

Hi,

On 21/08/2023 17:16, Josh Durgin wrote:
We weren't targeting bullseye once we discovered the compiler version 
problem, the focus shifted to bookworm. If anyone would like to help 
maintaining debian builds, or looking into these issues, it would be 
welcome:


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129 
https://tracker.ceph.com/issues/61845 


I've made some progress on building on bookworm now, and have updated 
the ticket; the failure now seems to be the tree missing 
src/pybind/mgr/dashboard/frontend/dist rather than anything relating to 
C++ issues...


HTH,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian/bullseye build for reef

2023-09-04 Thread Matthew Vernon

Hi,

On 21/08/2023 17:16, Josh Durgin wrote:
We weren't targeting bullseye once we discovered the compiler version 
problem, the focus shifted to bookworm. If anyone would like to help 
maintaining debian builds, or looking into these issues, it would be 
welcome:


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129 


I think this bug is similar (identical?) to Debian 1039472, which is now 
fixed in bookworm by a backport; so it might be worth trying again with 
a fully-updated bookworm system?


[this is going to be relevant to my interests at some point, but I can't 
yet offer much time]


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Building Ceph containers

2023-01-16 Thread Matthew Vernon

Hi,

Is it possible/supported to build Ceph containers on Debian? The build 
instructions[0] talk about building packages (incl. .debs), but now 
building containers.


Cephadm only supports containerised deployments, but our local policy is 
that we should only deploy containers we've built ourselves. Is it still 
the case that only building centos-based images is supported?


Perhaps with a follow-up question of why container building isn't 
supported on the range of platforms that package building is supported 
on if containerised deployments are where people are meant to be going...


Thanks,

Matthew

[0] https://docs.ceph.com/en/quincy/install/build-ceph/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OS suggestion for further ceph installations (centos stream, rocky, ubuntu)?

2022-02-04 Thread Matthew Vernon

On 01/02/2022 12:40, Boris Behrens wrote:


Personally I like ubuntu a lot, but most of the ceph developers seem to
come from redhat (or at least a RH flavored background) to I could imagine
that this might be a slightly more optimal way.


If you want to run with Ubuntu, you might find the Ubuntu Cloud Archive 
helpful if you want a more recent Ceph than the version your release 
shipped with; it can also help you decouple Ceph upgrades from OS upgrades.


HTH,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW] bi_list(): (5) Input/output error blocking resharding

2022-01-10 Thread Matthew Vernon

Hi,

On 07/01/2022 18:39, Gilles Mocellin wrote:


Anyone who had that problem find a workaround ?


Are you trying to reshard a bucket in a multisite setup? That isn't 
expected to work (and, IIRC, the changes to support doing so aren't 
going to make it into quincy).


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: switching ceph-ansible from /dev/sd to /dev/disk/by-path

2022-01-07 Thread Matthew Vernon

Hi,

On 06/01/2022 17:42, Dave Holland wrote:


The right solution appears to be to configure ceph-ansible to use
/dev/disk/by-path device names, allowing for the expander IDs being
embedded in the device name -- so those would have to be set per-host
with host vars. Has anyone done that change from /dev/sd and


I think I considered this, and concluded it was only a partial fix - as 
you note, the expander ID changes between hosts (and, I think, after 
some sorts of hardware repair/replacement), and I think when drives are 
hot-swapped they didn't necessarily come back in the same path, because 
the replacement drive gets a different LUN sometimes.



/dev/disk/by-path and have any advice, please? Is it a safe change, or
do I have to stick with /dev/sd names and modify the device list as a
host var, if/when the naming changes after a reboot? (Which would be
grotty!)


IIRC, ceph-ansible looks at what ceph-volume lvm list says when working 
out whether it needs to build new OSDs; I would hope it would correctly 
follow symlinks back to the correct point when working this out. I'd try 
it on a handy test cluster and see :)


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-17 Thread Matthew Vernon

On 17/11/2021 15:19, Marc wrote:

The CLT is discussing a more feasible alternative to LTS, namely to
publish an RC for each point release and involve the user community to
help test it.


How many users even have the availability of a 'test cluster'?


The Sanger has one (3 hosts), which was a real boon.

Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Matthew Vernon

Hi,

On 18/10/2021 23:34, Gregory Farnum wrote:

On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon  wrote:



Also, if I'm using RGWs, will they do the right thing location-wise?
i.e. DC A RGWs will talk to DC A OSDs wherever possible?


Stretch clusters are entirely a feature of the RADOS layer at this
point; setting up RGW/RBD/CephFS to use them efficiently is left as an
exercise to the user. Sorry. :/

That said, I don't think it's too complicated — you want your CRUSH
rule to specify a single site as the primary and to run your active
RGWs on that side, or else to configure read-from-replica and local
reads if your workloads support them. But so far the expectation is
definitely that anybody deploying this will have their own
orchestration systems around it (you can't really do HA from just the
storage layer), whether it's home-brewed or Rook in Kubernetes, so we
haven't discussed pushing it out more within Ceph itself.


We do have existing HA infrastructure which can e.g. make sure our S3 
clients in DC A talk to our RGWs in DC A.


But I think I understand you to be saying that in a stretch cluster 
(other than in stretch degraded mode) each pg will still have 1 primary 
which will serve all reads - so ~50% of our RGWs in DC B will end up 
reading from DC A (and vice versa). And that there's no way round this. 
Is that correct?


Relatedly, I infer this means that the inter-DC link will continue to be 
a bottleneck for write latency as if I were just running a "normal" 
cluster that happens to be in two DCs? [because the primary OSD will 
only ACK the write once all four replicas are complete]


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stretch cluster experiences in production?

2021-10-15 Thread Matthew Vernon

Hi,

Stretch clusters[0] are new in Pacific; does anyone have experience of 
using one in production?


I ask because I'm thinking about new RGW cluster (split across two main 
DCs), which I would naturally be doing using RGW multi-site between two 
clusters.


But it strikes me that a stretch cluster might be simpler (multi-site 
RGW isn't entirely straightforward e.g. round resharding), and 2 copies 
per site is quite a bit less storage than 3 per site. But I'm not sure 
if this new feature is considered production-deployment-ready


Also, if I'm using RGWs, will they do the right thing location-wise? 
i.e. DC A RGWs will talk to DC A OSDs wherever possible?


Thanks,

Matthew

[0] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-10 Thread Matthew Vernon

On 10/09/2021 15:20, Edward R Huyer wrote:


Question 2:  If db_slots still *doesn't* work, is there a coherent
way to divide up a solid state DB drive for use by a bunch of OSDs
when the OSDs may not all be created in one go?  At first I thought
it was related to limit, but re-reading the advanced specification
for a 4th time, I don't think that's the case.  Of course this
question is moot if db_slots actually works.


I've previously done this outside Ceph - i.e. have our existing
automation chop the NVMEs up into partitions, and then just tell Ceph to
use an NVME partition per OSD.

[not attempted this with cephadm, this was ceph-ansible]

Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-08 Thread Matthew Vernon

Hi,

On 06/09/2021 08:37, Lokendra Rathour wrote:

Thanks, Mathew for the Update.
The upgrade got failed for some random wired reasons, Checking further 
Ceph's status shows that "Ceph health is OK" and times it gives certain 
warnings but I think that is ok.


OK...

but what if we see the Version mismatch between the daemons, i.e few 
services have upgraded and the remaining could not be upgraded. So in 
this state, we do two things:


  * Retrying the upgrade activity (to Pacific) - it might work this time.
  * Going back to the older Version (Octopus) - is this possible and if
yes then how?


In general downgrades are not supported, so I think continuing with the 
upgrade is the best answer.



*Other Query:*
What if the complete cluster goes down, i.e mon crashes other daemon 
crashes, can we try to restore the data in OSDs, maybe by reusing the 
OSD's in another or new Ceph Cluster or something to save the data.


You will generally have more than 1 mon (typically 3, some people have 
5), and as long as a quorum remains, you will still have a working 
cluster. If you somehow manage to break all your mons, there is an 
emergency procedure for recreating the mon map from an OSD -


https://docs.ceph.com/en/pacific/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

...but you don't want to end up in that situation!

RADOS typically splits objects across multiple placement groups (and 
thus across multiple OSDs); while there are tools to extract data from 
OSDs (e.g. https://docs.ceph.com/en/latest/man/8/ceph-objectstore-tool/ 
), you won't get complete objects this way. Instead, the advice would be 
to try and get enough mons back up to get your cluster at least to a 
read-only state and then attempt recovery that way.


HTH,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-03 Thread Matthew Vernon

On 02/09/2021 09:34, Lokendra Rathour wrote:


We have deployed the Ceph Octopus release using Ceph-Ansible.
During the upgrade from Octopus to Pacific release we saw the upgrade got
failed.


I'm afraid you'll need to provide some more details (e.g. ceph -s 
output) on the state of your cluster; I'd expect a cluster mid-upgrade 
to still be operational, so you should still be able to access your OSDs.


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Howto upgrade AND change distro

2021-08-27 Thread Matthew Vernon

Hi,

On 27/08/2021 16:16, Francois Legrand wrote:

We are running a ceph nautilus cluster under centos 7. To upgrade to 
pacific we need to change to a more recent distro (probably debian or 
ubuntu because of the recent announcement about centos 8, but the distro 
doesn't matter very much).


However, I could'nt find a clear procedure to upgrade ceph AND the 
distro !  As we have more than 100 osds and ~600TB of data, we would 
like to avoid as far as possible to wipe the disks and 
rebuild/rebalance. It seems to be possible to reinstall a server and 
reuse the osds, but the exact procedure remains quite unclear to me.


It's going to be least pain to do the operations separately, which means 
you may need to build a set of packages for one or other "end" of the 
operation, if you see what I mean?


The Debian and Ubuntu installers both have an "expert mode" which gives 
you quite a lot of control which should enable you to upgrade the OS 
without touching the OSD disks - but make sure you have backups of all 
your Ceph config!


If you're confident (and have enough redundancy), you can set noout 
while you upgrade a machine, which will reduce the amount of rebalancing 
you have to do when it rejoins the cluster post upgrade.


Regards,

Matthew

[one good thing about Ubuntu's cloud archive is that e.g. you can get 
the same version that's default in 20.04 available as packages for 18.04 
via UCA meaning you can upgrade Ceph first, and then do the distro 
upgrade, and it's pretty painless]


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW Swift & multi-site

2021-08-16 Thread Matthew Vernon

Hi,

Are there any issues to be aware of when using RGW's newer multi-site 
features with the Swift front-end? I've, perhaps unfairly, gathered the 
impression that the Swift support in RGW gets less love than S3...


Thanks,

Matthew

ps: new email address, as I've moved employer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I check my rgw quota ? [EXT]

2021-06-23 Thread Matthew Vernon

On 22/06/2021 12:58, Massimo Sgaravatto wrote:

Sorry for the very naive question:

I know how to set/check the rgw quota for a user (using  radosgw-admin)

But how can a  radosgw user check what is the quota assigned to his/her
account , using the S3 and/or the swift interface  ?


I think you can't via S3; we collect these data and publish them 
out-of-band (via a CSV file and some trend graphs). The Ceph dashboard 
can also show you this, I think, if you don't mind all your users being 
able to see each others' quotas.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Matthew Vernon

Hi,

In the discussion after the Ceph Month talks yesterday, there was a bit 
of chat about cephadm / containers / packages. IIRC, Sage observed that 
a common reason in the recent user survey for not using cephadm was that 
it only worked on containerised deployments. I think he then went on to 
say that he hadn't heard any compelling reasons why not to use 
containers, and suggested that resistance was essentially a user 
education question[0].


I'd like to suggest, briefly, that:

* containerised deployments are more complex to manage, and this is not 
simply a matter of familiarity

* reducing the complexity of systems makes admins' lives easier
* the trade-off of the pros and cons of containers vs packages is not 
obvious, and will depend on deployment needs
* Ceph users will benefit from both approaches being supported into the 
future


We make extensive use of containers at Sanger, particularly for 
scientific workflows, and also for bundling some web apps (e.g. 
Grafana). We've also looked at a number of container runtimes (Docker, 
singularity, charliecloud). They do have advantages - it's easy to 
distribute a complex userland in a way that will run on (almost) any 
target distribution; rapid "cloud" deployment; some separation (via 
namespaces) of network/users/processes.


For what I think of as a 'boring' Ceph deploy (i.e. install on a set of 
dedicated hardware and then run for a long time), I'm not sure any of 
these benefits are particularly relevant and/or compelling - Ceph 
upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud 
Archive) provide .debs of a couple of different Ceph releases per Ubuntu 
LTS - meaning we can easily separate out OS upgrade from Ceph upgrade. 
And upgrading the Ceph packages _doesn't_ restart the daemons[1], 
meaning that we maintain control over restart order during an upgrade. 
And while we might briefly install packages from a PPA or similar to 
test a bugfix, we roll those (test-)cluster-wide, rather than trying to 
run a mixed set of versions on a single cluster - and I understand this 
single-version approach is best practice.


Deployment via containers does bring complexity; some examples we've 
found at Sanger (not all Ceph-related, which we run from packages):


* you now have 2 process supervision points - dockerd and systemd
* docker updates (via distribution unattended-upgrades) have an 
unfortunate habit of rudely restarting everything
* docker squats on a chunk of RFC 1918 space (and telling it not to can 
be a bore), which coincides with our internal network...
* there is more friction if you need to look inside containers 
(particularly if you have a lot running on a host and are trying to find 
out what's going on)

* you typically need to be root to build docker containers (unlike packages)
* we already have package deployment infrastructure (which we'll need 
regardless of deployment choice)


We also currently use systemd overrides to tweak some of the Ceph units 
(e.g. to do some network sanity checks before bringing up an OSD), and 
have some tools to pair OSD / journal / LVM / disk device up; I think 
these would be more fiddly in a containerised deployment. I'd accept 
that fixing these might just be a SMOP[2] on our part.


Now none of this is show-stopping, and I am most definitely not saying 
"don't ship containers". But I think there is added complexity to your 
deployment from going the containers route, and that is not simply a 
"learn how to use containers" learning curve. I do think it is 
reasonable for an admin to want to reduce the complexity of what they're 
dealing with - after all, much of my job is trying to automate or 
simplify the management of complex systems!


I can see from a software maintainer's point of view that just building 
one container and shipping it everywhere is easier than building 
packages for a number of different distributions (one of my other hats 
is a Debian developer, and I have a bunch of machinery for doing this 
sort of thing). But it would be a bit unfortunate if the general thrust 
of "let's make Ceph easier to set up and manage" was somewhat derailed 
with "you must use containers, even if they make your life harder".


I'm not going to criticise anyone who decides to use a container-based 
deployment (and I'm sure there are plenty of setups where it's an 
obvious win), but if I were advising someone who wanted to set up and 
use a 'boring' Ceph cluster for the medium term, I'd still advise on 
using packages. I don't think this makes me a luddite :)


Regards, and apologies for the wall of text,

Matthew

[0] I think that's a fair summary!
[1] This hasn't always been true...
[2] Simple (sic.) Matter of Programming


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 

[ceph-users] Re: time duration of radosgw-admin [EXT]

2021-06-02 Thread Matthew Vernon

Hi,

On 01/06/2021 21:29, Rok Jaklič wrote:


is it normal that radosgw-admin user info --uid=user ... takes around 3s or
more?


Seems to take about 1s on our production cluster (Octopus), which isn't 
exactly speedy, but good enough...


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus, Ceph-Ansible, existing OSDs, and ceph.conf updates [EXT]

2021-04-12 Thread Matthew Vernon

On 10/04/2021 13:03, Dave Hall wrote:

Hello,

A while back I asked about the troubles I was having with Ceph-Ansible when
I kept existing OSDs in my inventory file when managing my Nautilus cluster.

At the time it was suggested that once the OSDs have been configured they
should be excluded from the inventory file.

However, when processing certain configuration changes Ceph-Ansible updates
ceph.conf on all cluster nodes and clients in the inventory file.

Is there an alternative way to keep OSD nodes in the inventory file without
listing them as OSD nodes, so they get other updates, but also so
Ceph-Ansible doesn't try to do any of the ceph-volume stuff that seems to
be failing after the OSDs are configured?


Are you using LVM or LVM-batch? If the former, you might find
--skip-tags prepare_osd
Does what you want. I use that because otherwise ceph-ansible gets sad 
if your device names aren't exactly what it's expecting.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-ansible in Pacific and beyond? [EXT]

2021-03-18 Thread Matthew Vernon

Hi,

On 18/03/2021 15:03, Guillaume Abrioux wrote:


ceph-ansible@stable-6.0 supports pacific and the current content in the
branch 'master' (future stable-7.0) is intended to support Ceph Quincy.

I can't speak on behalf of Dimitri but I'm personally willing to keep
maintaining ceph-ansible if there are interests, but people must be aware
that:


This is good to know, thank you :)

I hadn't realised my question would spawn such a monster thread!

Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Email alerts from Ceph [EXT]

2021-03-18 Thread Matthew Vernon

Hi,

On 17/03/2021 22:26, Andrew Walker-Brown wrote:


How have folks implemented getting email or snmp alerts out of Ceph?
Getting things like osd/pool nearly full or osd/daemon failures etc.
I'm afraid we used our existing Nagios infrastructure for checking 
HEALTH status, and have a script that runs daily to report on failed OSDs.


Our existing metrics infrastructure is collectd/graphite/grafana so we 
have dashboards and so on, but as far as I'm aware the Octopus dashboard 
only supports prometheus, so we're a bit stuck there :-(


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Telemetry ident use?

2021-03-17 Thread Matthew Vernon

Hi,

What use is made of the ident data in the telemetry module? It's 
disabled by default, and the docs don't seem to say what it's used for...


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-ansible in Pacific and beyond?

2021-03-17 Thread Matthew Vernon

Hi,

I caught up with Sage's talk on what to expect in Pacific ( 
https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention 
of ceph-ansible at all.


Is it going to continue to be supported? We use it (and uncontainerised 
packages) for all our clusters, so I'd be a bit alarmed if it was going 
to go away...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon

On 15/03/2021 11:29, Matthew Vernon wrote:

On 15/03/2021 11:09, Dan van der Ster wrote:


Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.

We're looking for a way to reeactivate osd like this without rebooting.


Systemd's udev plumbing is _meant_ to cope with this OK (infuriatingly 
the only place it seems to do so reliably is our test cluster!), but it 
doesn't seem very good at it.


Sorry, I realise showing what that looks like when it works might be 
helpful.


Pulling a drive (/dev/sdan):

Oct  1 15:55:49 sto-t1-3 systemd[1]: Stopping LVM2 PV scan on device 
66:112...
Oct  1 15:55:49 sto-t1-3 lvm[932541]:   Device 66:112 not found. Cleared 
from lv

metad cache.
Oct  1 15:55:49 sto-t1-3 systemd[1]: Stopped LVM2 PV scan on device 66:112.

then after the drive comes back (as /dev/sdbk):

Oct  1 15:57:04 sto-t1-3 systemd[1]: Starting LVM2 PV scan on device 
67:224...
Oct  1 15:57:04 sto-t1-3 lvm[932557]:   1 logical volume(s) in volume 
group "ceph-5077d6e1-460b-43ca-8845-5cbae468c1a8" now active

Oct  1 15:57:04 sto-t1-3 systemd[1]: Started LVM2 PV scan on device 67:224.

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon

On 15/03/2021 11:09, Dan van der Ster wrote:


Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.

We're looking for a way to reeactivate osd like this without rebooting.


Systemd's udev plumbing is _meant_ to cope with this OK (infuriatingly 
the only place it seems to do so reliably is our test cluster!), but it 
doesn't seem very good at it.


You might be able to reshuffle the device back to its original location 
thus:

echo 1 > /sys/block/sdNEW/device/delete
rescan-scsi-bus.sh -a -r
?

I've been trying this when replacing drives (ceph-ansible gets confused 
if the drives on a host change too much), so I don't know if udev will DTRT.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions RE: Ceph/CentOS/IBM [EXT]

2021-03-03 Thread Matthew Vernon

Hi,

You can get support for running Ceph on a number of distributions - RH 
support both RHEL and Ubuntu, Canonical support Ubuntu, the smaller 
consultancies seem happy to support anything plausible (e.g. Debian), 
this mailing list will opine regardless of what distro you're running ;-)


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Octopus auto-scale causing HEALTH_WARN re object numbers [EXT]

2021-03-03 Thread Matthew Vernon

On 02/03/2021 16:38, Matthew Vernon wrote:


root@sto-t1-1:~# ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs 
not deep-scrubbed in time
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than 
average
     pool default.rgw.buckets.data objects per pg (313153) is more than 
23.4063 times cluster average (13379)


...which seems like the wrong thing for the auto-scaler to be doing. Is 
this a known problem?


The autoscaler has finished, and I still have the health warning:

root@sto-t1-1:~# ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than 
average
pool default.rgw.buckets.data objects per pg (313153) is more than 
23.0871 times cluster average (13564)


Am I right that the auto-scaler only considers size and never object count.

If so, am I right that this is a bug?

I mean, I think I can bodge around it with pg_num_min, but I thought one 
of the merits of Octopus was that the admin had to spend less time 
worrying about pool sizes...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Octopus auto-scale causing HEALTH_WARN re object numbers

2021-03-02 Thread Matthew Vernon

Hi,

I've upgraded our test cluster to Octopus, and enabled the auto-scaler. 
It's nearly finished:


PG autoscaler decreasing pool 11 PGs from 1024 to 32 (4d)
  [==..] (remaining: 3h)

But I notice it looks to be making pool 11 smaller when HEALTH_WARN 
thinks it should be larger:


root@sto-t1-1:~# ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs 
not deep-scrubbed in time
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than 
average
pool default.rgw.buckets.data objects per pg (313153) is more than 
23.4063 times cluster average (13379)


...which seems like the wrong thing for the auto-scaler to be doing. Is 
this a known problem?


Regards,

Matthew

More details:

ceph df:
root@sto-t1-1:~# ceph df
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
hdd993 TiB  782 TiB  210 TiB   211 TiB  21.22
TOTAL  993 TiB  782 TiB  210 TiB   211 TiB  21.22

--- POOLS ---
POOLID  STORED   OBJECTS  USED %USED  MAX AVAIL
.rgw.root2   69 KiB4  1.4 MiB  0220 TiB
default.rgw.control  3  1.1 MiB8  3.3 MiB  0220 TiB
default.rgw.data.root4  115 KiB   14  3.6 MiB  0220 TiB
default.rgw.gc   5  5.3 MiB   32   23 MiB  0220 TiB
default.rgw.log  6   31 MiB  184   96 MiB  0220 TiB
default.rgw.users.uid7  249 KiB8  1.8 MiB  0220 TiB
default.rgw.buckets.data11   23 GiB   10.02M  2.0 TiB   0.30220 TiB
rgwtls  13   54 KiB3  843 KiB  0220 TiB
pilot-metrics   14  285 MiB2.60M  476 GiB   0.07220 TiB
pilot-images15   40 GiB4.97k  122 GiB   0.02220 TiB
pilot-volumes   16  192 GiB   48.90k  577 GiB   0.09220 TiB
pilot-vms   17  125 GiB   33.79k  376 GiB   0.06220 TiB
default.rgw.users.keys  18  111 KiB5  1.5 MiB  0220 TiB
default.rgw.buckets.index   19  4.0 GiB  246   12 GiB  0220 TiB
rbd 20   39 TiB   10.09M  116 TiB  14.88220 TiB
default.rgw.buckets.non-ec  21  344 KiB1  1.0 MiB  0220 TiB
rgw-ec  22  7.0 TiB1.93M   11 TiB   1.57441 TiB
rbd-ec  23   45 TiB   11.73M   67 TiB   9.22441 TiB
default.rgw.users.email 24   23 MiB1   69 MiB  0220 TiB
pilot-backups   25   73 MiB3  219 MiB  0220 TiB
device_health_metrics   26   51 MiB  186  153 MiB  0220 TiB

root@sto-t1-1:~# ceph osd pool autoscale-status
POOL  SIZE  TARGET SIZE  RATE  RAW CAPACITY 
RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
.rgw.root   70843 3.0992.7T 
0.  1.0  32  on
default.rgw.control  1116k3.0992.7T 
0.  1.0  32  on
default.rgw.data.root   115.1k3.0992.7T 
0.  1.0  32  on
default.rgw.gc   5379k3.0992.7T 
0.  1.0  32  on
default.rgw.log 32036k3.0992.7T 
0.  1.0  32  on
default.rgw.users.uid   248.7k3.0992.7T 
0.  1.0  32  on
default.rgw.buckets.data23894M3.0992.7T 
0.0001  1.0  32  on
rgwtls  55760 3.0992.7T 
0.  1.0  32  on
pilot-metrics   285.3M3.0992.7T 
0.  1.0  32  on
pilot-images41471M3.0992.7T 
0.0001  1.0  32  on
pilot-volumes   192.3G3.0992.7T 
0.0006  1.0  32  on
pilot-vms   124.6G3.0992.7T 
0.0004  1.0  32  on
default.rgw.users.keys  111.1k3.0992.7T 
0.  1.0  32  on
default.rgw.buckets.index4090M3.0992.7T 
0.  1.0  32  on
rbd 39430G3.0992.7T 
0.1164  1.01024  on
default.rgw.buckets.non-ec  344.3k3.0992.7T 
0. 

[ceph-users] "optimal" tunables on release upgrade

2021-02-26 Thread Matthew Vernon

Hi,

Having been slightly caught out by tunables on my Octopus upgrade[0], 
can I just check that if I do

ceph osd crush tunables optimal

That will update the tunables on the cluster to the current "optimal" 
values (and move a lot of data around), but that this doesn't mean 
they'll change next time I upgrade the cluster or anything like that?


It's not quite clear from the documentation whether the next time 
"optimal" tunables change that'll be applied to a cluster where I've set 
tunables thus, or if tunables are only ever changed by a fresh 
invocation of ceph osd crush tunables...


[I assume the same answer applies to "default"?]

Regards,

Matthew

[0] I foolishly thought a cluster initially installed as Jewel would 
have jewel tunables



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-16 Thread Matthew Vernon

Hi,

On 16/02/2021 08:06, Dan van der Ster wrote:


Which version are you upgrading from? If recent nautilus, you may have
already completed this conversion.


Mimic (well, really Luminous with a pit-stop at Mimic).


When we did this fsck (not with octopus, but to a nautilus point
release that had this conversion backported), we first upgraded one
single osd just to see the typical downtime for our data.
On our S3 cluster, the conversion completed within just a couple of
minutes per OSD, so we decided to leave
bluestore_fsck_quick_fix_on_mount at its default true, and did all the
fsck's as we updated the osds.


Thanks; I'll see what it looks like on the test cluster.

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-15 Thread Matthew Vernon

Hi,

Looking at the Octopus upgrade instructions, I see "the first time each 
OSD starts, it will do a format conversion to improve the accounting for 
“omap” data. This may take a few minutes to as much as a few hours (for 
an HDD with lots of omap data)." and that I can disable this by setting 
bluestore_fsck_quick_fix_on_mount to false.


A couple of questions about this:

i) what are the consequences of turning off this "quick fix"? Is it 
possible to have it run in the background or similar?


ii) is there any way to narrow down the time estimate? Our production 
cluster has 3060 OSDs on hdd (with block.db on NVME), and obviously 3000 
lots of "a few hours" is an awful lot of time...


I'll be doing some testing on our test cluster (by putting 10M objects 
into an S3 bucket before trying the upgrade), but it'd be useful to have 
some idea of how this is likely to work at scale...


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-15 Thread Matthew Vernon

On 14/02/2021 21:31, Graham Allan wrote:

On Tue, Feb 9, 2021 at 11:00 AM Matthew Vernon <mailto:m...@sanger.ac.uk>> wrote:


On 07/02/2021 22:19, Marc wrote:
 >
 > I was wondering if someone could post a config for haproxy. Is
there something specific to configure? Like binding clients to a
specific backend server, client timeouts, security specific to rgw etc.

Ours is templated out by ceph-ansible; to try and condense out just the
interesting bits:

(snipped the config...)

The aim is to use all available CPU on the RGWs at peak load, but to
also try and prevent one user overwhelming the service for everyone
else
- hence the dropping of idle connections and soft (and then hard)
limits
on per-IP connections.


Can I ask a followup question to this: how many haproxy instances do you 
then run - one on each of your gateways, with keepalived to manage which 
is active?


One on each gateway, yes. We use RIP - each RGW listens on each of the 6 
service ips (and knows about all 6 RGWs so haproxy can hand off traffic 
if over-loaded). The switches do some work to make sure traffic from our 
OpenStack goes to its "nearest" RGW where possible.


Like the setup you describe, RIP has no way of knowing if the radosgw 
has gone down but the host is otherwise up; but haproxy can tell that, 
which I think is an advantage.


We needed to tune the haproxy and radosgw setup to get as much out of 
the gateway hardware as possible (we used cosbench); redoing the 
benchmarking bypassing haproxy showed that haproxy had very little 
impact on performance.


Regards,

Matthew




--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Backups of monitor [EXT]

2021-02-15 Thread Matthew Vernon

On 12/02/2021 15:47, Freddy Andersen wrote:


I would say everyone recommends at least 3 monitors and since they
need to be 1,3,5 or 7 I always read that as 5 is the best number (if
you have 5 servers in your cluster).
We have 3 on all our clusters, and at the risk of tempting fate, haven't 
had any issues as a result...


[it's slightly fiddly to add more, since we give them a bunch of extra 
storage than our other nodes since the Mon store can get pretty big in a 
large cluster if you have to do a big rebalance]


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-09 Thread Matthew Vernon

On 07/02/2021 22:19, Marc wrote:


I was wondering if someone could post a config for haproxy. Is there something 
specific to configure? Like binding clients to a specific backend server, 
client timeouts, security specific to rgw etc.


Ours is templated out by ceph-ansible; to try and condense out just the 
interesting bits:


global
nbthread 24
#this plus 8 rados handles and 600 civetweb threads lets us use all the 
#CPU on our RGW systems


defaults
timeout connect 60s
timeout client  2m
timeout server  2m
# give clients chance to benefit from keepalive; but don't
# let idle connections linger
timeout http-keep-alive 1s

frontend listen_https
mode http
option forwardfor
bind :443 ssl crt /etc/ceph/rgwtls.pem
stick-table type ip size 1m expire 1h store conn_cur
tcp-request content track-sc0 src

# tcp-request is processed before http-request
# these soft and hard limits templated
tcp-request content reject if { sc_conn_cur(0) gt 170 }
http-request set-nice 1000 if { sc_conn_cur(0) gt 113 }

default_backend rgw_servers

backend rgw_servers
balance roundrobin
#Use our server if it's got connections spare
use-server sto-rgw-1 if { srv_conn(sto-rgw-1) le 341 }
server sto-rgw-1 172.27.50.8:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 0

#Otherwise, prefer the two network-local servers
server sto-rgw-2 172.27.50.9:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 100
server sto-rgw-3 172.27.50.10:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 100

#Finally, the more remote options
server sto-rgw-4 172.27.50.136:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 5
server sto-rgw-5 172.27.50.137:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 5
server sto-rgw-6 172.27.50.138:8443 check ssl verifyhost 
cog.sanger.ac.uk ca-file ca-certificates.crt fall 5 inter 2000 rise 2 
maxconn 341 weight 5


The aim is to use all available CPU on the RGWs at peak load, but to 
also try and prevent one user overwhelming the service for everyone else 
- hence the dropping of idle connections and soft (and then hard) limits 
on per-IP connections.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Matthew Vernon

Hi,

On 04/02/2021 07:41, Loïc Dachary wrote:


On 04/02/2021 05:51, Federico Lucifredi wrote:

Hi Loïc,
    I am intrigued, but am missing something: why not using RGW, and store the 
source code files as objects? RGW has native compression and can take care of 
that behind the scenes.

Excellent question!


    Is the desire to use RBD only due to minimum allocation sizes?

I *assume* that since RGW does have specific strategies to take advantage of 
the fact that objects are immutable and will never be removed:

* It will be slower to add artifacts in RGW than in an RBD image + index
* The metadata in RGW will be larger than an RBD image + index


RGW addition is pretty quick up to fairly large buckets; and if you're 
not expecting to want to list the bucket contents often, then RGW might 
well be a good option for your object store with small files.


Or at least, using some of the RGW code (I think there's a librgw) to 
re-use a bunch of its code for your use case; this feels more natural to 
me than using RBD for this.


Regards,

Matthew
[pleased software heritage are still looking at Ceph :) ]


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Sequence replacing a failed OSD disk? [EXT]

2021-01-04 Thread Matthew Vernon

On 31/12/2020 09:10, Rainer Krienke wrote:


Yesterday my ceph nautilus 14.2.15 cluster had a disk with unreadable
sectors, after several tries the OSD was marked down and rebalancing
started and has also finished successfully. ceph osd stat shows the osd
now as "autoout,exists".

Usually the steps to replace a failed disk are:
1. Destroy the failed OSD: ceph osd destroy {id}
2. run ceph-volume lvm create --bluestore --osd-id {id} --data /dev/sdX
... with a new disk in place to recreate a OSD with the same id without
the need to change the crushmap or auth info etc.

Now I still wait for a new disk and I am a unsure if I should run the
destroy-command already now to keep ceph from trying to reactivate the
broken osd?  Then I would wait until the disk has arrived in a day or so
and then use ceph volume to create a new osd?


If the rebalance is complete, then I would destroy the old OSD now - as 
you say, if the system reboots or somesuch you don't want the OSD to try 
and restart on a fail{ed,ing} disk.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Matthew Vernon

Hi,

On 15/12/2020 20:44, Suresh Rama wrote:

TL;DR: use a real NTP client, not systemd-timesyncd


1) We audited the network (inspecting TOR, iperf, MTR) and nothing was
indicating any issue but OSD logs were keep complaining about
BADAUTHORIZER


...this is quite possibly due to clock skew on your OSD nodes.


2) Made sure no clock skew and we use timesyncd.   After taking out a


systemd-timesyncd is fine for simple installations, but I think isn't 
really recommended for applications (like ceph) that really care about 
good time sync. I'd think seriously about replacing it with a real NTP 
client (the 'ntp' package in Ubuntu, for example).



3) When looking through ceph.log on the mon with tailf, I was getting a lot
of different time stamp reported in the ceph logs in MON1 which is master.
Confused on why the live log report various timestamps?


...this would continue to be consistent with time sync issues.

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Matthew Vernon

On 26/10/2020 14:13, Ing. Luis Felipe Domínguez Vega wrote:

How can i free the store of ceph monitor?:


root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1
542G    ./store.db
542G    .



Is your cluster not in HEALTH_OK, all OSDs in+up? The mons have to store 
all the osdmaps since the cluster was last happy, so it can grow pretty 
big if you've had a big rebalance and your cluster isn't yet back to 
normal. It sorts itself out thereafter.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-22 Thread Matthew Vernon

Hi,

We're considering the merits of enabling CephFS for our main Ceph 
cluster (which provides object storage for OpenStack), and one of the 
obvious questions is what sort of hardware we would need for the MDSs 
(and how many!).


These would be for our users scientific workloads, so they would need to 
provide reasonably high performance. For reference, we have 3060 6TB 
OSDs across 51 OSD hosts, and 6 dedicated RGW nodes.


The minimum specs are very modest (2-3GB RAM, a tiny amount of disk, 
similar networking to the OSD nodes), but I'm not sure how much going 
beyond that is likely to be useful in production.


I've also seen it suggested that an SSD-only pool is sensible for the 
CephFS metadata pool; how big is that likely to get?


I'd be grateful for any pointers :)

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RGW Performance [EXT]

2020-09-28 Thread Matthew Vernon

Hi,

On 25/09/2020 20:39, Dylan Griff wrote:


We have 10Gb network to our two RGW nodes behind a single ip on
haproxy, and some iperf testing shows I can push that much; latencies
look okay. However, when using a small cosbench cluster I am unable to
get more than ~250Mb of read speed total.


A few thoughts:

i) have you benchmarked your ceph itself (i.e. with rados bench - you'll 
want to parameter-sweep with how many clients you run (and/or 
threads/client))? That gives you a more useful baseline


ii) what does your RGW node load look like? On our RGW nodes we can 
eventually use up all the available CPU, but that required tuning both 
cosbench (which you look to have tried) and civetweb - rgw num rados 
handles and num_threads in rgw frontends


iii) is haproxy your rate-limiter? We had to significantly increase 
nbthread on our haproxy


HTH,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw beast access logs [EXT]

2020-08-19 Thread Matthew Vernon

On 19/08/2020 14:01, Casey Bodley wrote:


Yes, this was implemented by Mark Kogan in 
https://github.com/ceph/ceph/pull/33083 . It looks like it was
backported to Octopus for 15.2.5 in https://tracker.ceph.com/issues/45951. Is 
there interest in a nautilus
backport too?


I don't think we'd be able to use beast in Nautilus in production 
without it...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph not warning about clock skew on an OSD-only host?

2020-08-11 Thread Matthew Vernon

Hi,

Our production cluster runs Luminous.

Yesterday, one of our OSD-only hosts came up with its clock about 8 
hours wrong(!) having been out of the cluster for a week or so. 
Initially, ceph seemed entirely happy, and then after an hour or so it 
all went South (OSDs start logging about bad authenticators, I/O pauses, 
general sadness).


I know clock sync is important to Ceph, so "one system is 8 hours out, 
Ceph becomes sad" is not a surprise. It is perhaps a surprise that the 
OSDs were allowed in at all...


What _is_ a surprise, though, is that at no point in all this did Ceph 
raise a peep about clock skew. Normally it's pretty sensitive to this - 
our test cluster has had clock skew complaints when a mon is only 
slightly out, and here we had a node 8 hours wrong.


Is there some oddity like Ceph not warning on clock skew for OSD-only 
hosts? or an upper bound on how high a discrepency it will WARN about?


Regards,

Matthew

example output from mid-outage:

root@sto-3-1:~#  ceph -s
  cluster:
id: 049fc780-8998-45a8-be12-d3b8b6f30e69
health: HEALTH_ERR
40755436/2702185683 objects misplaced (1.508%)
Reduced data availability: 20 pgs inactive, 20 pgs peering
Degraded data redundancy: 367431/2702185683 objects 
degraded (0.014%), 4549 pgs degraded
481 slow requests are blocked > 32 sec. Implicated osds 
188,284,795,1278,1981,2061,2648,2697
644 stuck requests are blocked > 4096 sec. Implicated osds 
22,31,33,35,101,116,120,130,132,140,150,159,201,211,228,263,327,541,561,566,585,589,636,643,649,654,743,785,790,806,865,1037,1040,1090,1100,1104,1115,1134,1135,1166,1193,1275,1277,1292,1494,1523,1598,1638,1746,2055,2069,2191,2210,2358,2399,2486,2487,2562,2589,2613,2627,2656,2713,2720,2837,2839,2863,2888,2908,2920,2928,2929,2947,2948,2963,2969,2972


[...]


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph SSH orchestrator? [EXT]

2020-07-06 Thread Matthew Vernon

Hi,

On 03/07/2020 19:44, Oliver Freyermuth wrote:

Am 03.07.20 um 20:29 schrieb Dimitri Savineau:

You can try to use ceph-ansible which supports baremetal and
containerized deployment.

https://github.com/ceph/ceph-ansible


Thanks for the pointer! I know about ceph-ansible. The problem is
that our full infrastructure is Puppet-based, so mixing in a
different configuration management will increase complexity (while
ceph-deploy is really good at filling the gaps, ceph-ansible seems
overkill for us).

Additionally, all existing users of ceph-ansible I have talked to and
asked about their experiences have responded with a heavy sigh and
painful face, mentioning there were regular issues during usage, so I
am reluctant to try it out with zero ansible experience myself.


We're using ceph-ansible at the Sanger Institute, and are still pretty 
happy with it (by which I mean, we're not looking to change at least for 
our move to Nautilus); I gave a talk about our setup at Barcelona 
Cephalocon which is on YT somewhere...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Re-run ansible to add monitor and RGWs

2020-06-15 Thread Matthew Vernon

On 14/06/2020 17:07, Khodayar Doustar wrote:


Now I want to add the other two nodes as monitor and rgw.

Can I just modify the ansible host file and re-run the site.yml?


Yes.


I've done some modification in Storage classes, I've added some OSD and
uploaded a lot of data up to now. Is it safe to re-run ansible site.yml
playbook?


It's worth checking what OSDs you configured in inventory, but IME 
ceph-ansible won't remove existing OSDs.


HTH,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using Ceph-ansible for a luminous -> nautilus upgrade?

2020-06-01 Thread Matthew Vernon

Hi,

For previous Ceph version upgrades, we've used the rolling_upgrade 
playbook from Ceph-ansible - for example, the stable-3.0 branch supports 
both Jewel and Luminous, so we used it to migrate our clusters from 
Jewel to Luminous.


As I understand it, upgrading direct from Luminous to Nautilus is a 
supported operation. But there is no Ceph-ansible release that supports 
both versions. Indeed, stable-4.0 supports Nautilus but no other releases.


Is the expected process to use stable-4.0 for the upgrade, or do we have 
to do the upgrade by hand and only then update our version of ceph-ansible?


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Servicing multiple OpenStack clusters from the same Ceph cluster [EXT]

2020-01-29 Thread Matthew Vernon
Hi,

On 29/01/2020 16:40, Paul Browne wrote:

> Recently we deployed a brand new Stein cluster however, and I'm curious
> whether the idea of pointing the new OpenStack cluster at the same RBD
> pools for Cinder/Glance/Nova as the Luminous cluster would be considered
> bad practice, or even potentially dangerous.

I think that would be pretty risky - here we have a Ceph cluster that
provides backing for our OpenStacks, and each OpenStack has its own set
of pools -metrics,-images,-volumes,-vms (and its own credential).

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph User Survey 2019 [EXT]

2019-11-28 Thread Matthew Vernon

Hi,

On 27/11/2019 18:28, Mike Perez wrote:

To better understand how our current users utilize Ceph, we conducted a 
public community survey. This information is a guide to the community of 
how we spend our contribution efforts for future development. The survey 
results will remain anonymous and aggregated in future Ceph Foundation 
publications to the community.


I'm pleased to announce after much discussion on the Ceph dev mailing 
list [0] that the community has formed the Ceph Survey for 2019.


The RGW questions include:

"The largest object stored in gigabytes"

Is there a tool that would answer this question for me? I can tell you 
how many GB in total we have, but short of iterating through all the 
objects in our RGWs (which would take ages), I don't know how to answer 
this one...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io