[ceph-users] Re: Issue Replacing OSD with cephadm: Partition Path Not Accepted

2024-09-03 Thread Fox, Kevin M
I thought bluestore stored that stuff in non lvm mode?


From: Robert Sander 
Sent: Monday, September 2, 2024 11:35 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Issue Replacing OSD with cephadm: Partition Path Not 
Accepted

Check twice before you click! This email originated from outside PNNL.


Hi,

On 9/2/24 20:24, Herbert Faleiros wrote:

> /usr/bin/docker: stderr ceph-volume lvm batch: error: /dev/sdb1 is a
> partition, please pass LVs or raw block devices

A Ceph OSD nowadays needs a logical volume because it stores crucial
metadata in the LV tags. This helps to activate the OSD.
IMHO you will have to redeploy the OSD to use LVM on the disk. It does
not need to be the whole disk if there is other data on it. It should be
sufficient to make /dev/sdb1 a PV of a new VG for the LV of the OSD.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de/

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What's the best way to add numerous OSDs?

2024-08-06 Thread Fox, Kevin M
some kernels (el7?) lie about being jewel until after they are blocked from 
connecting at jewel. then they report newer. Just fyi.


From: Anthony D'Atri 
Sent: Tuesday, August 6, 2024 5:08 PM
To: Fabien Sirjean
Cc: ceph-users
Subject: [ceph-users] Re: What's the best way to add numerous OSDs?

Check twice before you click! This email originated from outside PNNL.


Since they’re 20TB, I’m going to assume that these are HDDs.

There are a number of approaches.  One common theme is to avoid rebalancing 
until after all have been added to the cluster and are up / in, otherwise you 
can end up with a storm of map updates and superfluous rebalancing.


One strategy is to set osd_crush_initial_weight = 0 temporarily, so that the 
OSDs when added won’t take any data yet.  Then when you’re ready you can set 
their CRUSH weights up to where they otherwise would be, and unset 
osd_crush_initial_weight so you don’t wonder what the heck is going on six 
months down the road.

Another is to add a staging CRUSH root.  If the new OSDs are all on new hosts, 
you can create CRUSH host buckets for them in advance so that when you create 
the OSDs they go there and again won’t immediately take data.  Then you can 
move the host buckets into the production root in quick succession.

Either way if you do want to add them to the cluster all at once, with HDDs 
you’ll want to limit the rate of backfill so you don’t DoS your clients.  One 
strategy is to leverage pg-upmap with a tool like 
https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py

Note that to use pg-upmap safely, you will need to ensure that your clients are 
all at Luminous or later, in the case of CephFS I *think* that means kernel 
4.13 or later.  `ceph features` will I think give you that information.

An older method of spreading out the backfill thundering herd was to use a for 
loop to weight up the OSDs in increments of, say, 0.1 at a time, let the 
cluster settle, then repeat.  This strategy results in at least some data 
moving twice, so it’s less efficient.  Similarly you might add, say, one OSD 
per host at a time and let the cluster settle between iterations, which would 
also be less than ideally efficient.

— aad

> On Aug 6, 2024, at 11:08 AM, Fabien Sirjean  wrote:
>
> Hello everyone,
>
> We need to add 180 20TB OSDs to our Ceph cluster, which currently consists of 
> 540 OSDs of identical size (replicated size 3).
>
> I'm not sure, though: is it a good idea to add all the OSDs at once? Or is it 
> better to add them gradually?
>
> The idea is to minimize the impact of rebalancing on the performance of 
> CephFS, which is used in production.
>
> Thanks in advance for your opinions and feedback 🙂
>
> Wishing you a great summer,
>
> Fabien
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bug Found in Reef Releases - Action Required for pg-upmap-primary Interface Users

2024-05-29 Thread Fox, Kevin M
How do you know if its safe to set `require-min-compat-client=reef` if you have 
kernel clients?

Thanks,
Kevin


From: Laura Flores 
Sent: Wednesday, May 29, 2024 8:12 AM
To: ceph-users; dev; clt
Cc: Radoslaw Zarzynski; Yuri Weinstein
Subject: [ceph-users] Bug Found in Reef Releases - Action Required for 
pg-upmap-primary Interface Users

Check twice before you click! This email originated from outside PNNL.


Dear Ceph Users,

We have discovered a bug with the pg-upmap-primary interface (related to
the offline read balancer [1]) that affects all Reef releases.

In all Reef versions, users are required to set
`require-min-compat-client=reef` in order to use the pg-upmap-primary
interface to prevent pre-reef clients from connecting and not understanding
the new interface. We found this setting is simply not enforced [2], which
leads to miscommunication between older and newer peers or, depending on
version, to an assert in the mons and/or osds [3]. However, the fundamental
precondition is making use of the new `pg-upmap-primary` feature.

If you have not yet upgraded to v18.2.2, we recommend that you refrain from
upgrading to v18.2.2 until a later version is out with a fix. We also
recommend removing any existing pg-upmap-primary mappings to prevent
hitting the assert [3], as well as to prevent any miscommunication between
older and newer peers about pg primaries [2].
Remove mappings by:
$ `ceph osd dump`
For each pg_upmap_primary entry in the above output:
$ `ceph osd rm-pg-upmap-primary `

If you have already upgraded to v18.2.2, your cluster is more likely to hit
the osd/mon assert [3] when you set a `pg-upmap-primary` mapping (this
would involve explicitly setting a mapping via the osdmaptool or the CLI
command). As long as you refrain from setting any pg-upmap-primary
mappings, your cluster will NOT be affected by [3].

Follow the trackers below for further updates.

1. pg-upmap-primary documentation:
https://docs.ceph.com/en/reef/rados/operations/read-balancer/
2. mon, osd, *: require-min-compat-client is not really honored -
https://tracker.ceph.com/issues/66260
3. Failed assert "pg_upmap_primaries.empty()" in the read balancer
 - https://tracker.ceph.com/issues/61948

Thanks,
Laura Flores

--

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

2023-06-21 Thread Fox, Kevin M
Does quincy automatically switch existing things to 4k or do you need to do a 
new ost to get the 4k size?

Thanks,
Kevin


From: Igor Fedotov 
Sent: Wednesday, June 21, 2023 5:56 AM
To: Carsten Grommel; ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

Check twice before you click! This email originated from outside PNNL.


Hi Carsten,

please also note a workaround to bring the osds back for e.g. data
recovery - set bluefs_shared_alloc_size to 32768.

This will hopefully allow OSD to startup and pull data out of it. But I
wouldn't discourage you from using such OSDs long term as fragmentation
might evolve and this workaround will become ineffective as well.

Please do not apply this change to healthy OSDs as it's irreversible.


BTW, having two namespace at NVMe drive is a good alternative to Logical
Volumes if for some reasons one needs two "physical" disks for OSD setup...

Thanks,

Igor

On 21/06/2023 11:41, Carsten Grommel wrote:
>
> Hi Igor,
>
> thank you for your ansere!
>
> >first of all Quincy does have a fix for the issue, see
> >https://tracker.ceph.com/issues/53466 (and its Quincy counterpart
> >https://tracker.ceph.com/issues/58588)
>
> Thank you I somehow missed that release, good to know!
>
> >SSD or HDD? Standalone or shared DB volume? I presume the latter... What
> >is disk size and current utilization?
> >
> >Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if
> >possible
>
> We use 4 TB NVMe SSDs, shared db yes and mainly Micron with some Dell
> and Samsung in this cluster:
>
> Micron_7400_MTFDKCB3T8TDZ_214733D291B1 cloud5-1561:nvme5n1  osd.5
>
> All Disks are at ~ 88% utilization. I noticed that around 92% our
> disks tend to run into this bug.
>
> Here are some bluefs-bdev-sizes from different OSDs on different hosts
> in this cluster:
>
> ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-36/
>
> inferring bluefs devices from bluestore path
>
> 1 : device size 0x37e3ec0 : using 0x2e1b390(2.9 TiB)
>
> ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-24/
>
> inferring bluefs devices from bluestore path
>
> 1 : device size 0x37e3ec0 : using 0x2d4e318d000(2.8 TiB)
>
> ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-5/
>
> inferring bluefs devices from bluestore path
>
> 1 : device size 0x37e3ec0 : using 0x2f2da93d000(2.9 TiB)
>
> >Generally, given my assumption that DB volume is currently collocated
> >and you still want to stay on Pacific, you might want to consider
> >redeploying OSDs with a standalone DB volume setup.
> >
> >Just create large enough (2x of the current DB size seems to be pretty
> >conservative estimation for that volume's size) additional LV on top of
> >the same physical disk. And put DB there...
> >
> >Separating DB from main disk would result in much less fragmentation at
> >DB volume and hence work around the problem. The cost would be having
> >some extra spare space at DB volume unavailable for user data .
>
> I guess that makes, so the suggestion would be to deploy the osd and
> db on the same NVMe
>
> but with different logical volumes or updating to quincy.
>
> Thank you!
>
> Carsten
>
> *Von: *Igor Fedotov 
> *Datum: *Dienstag, 20. Juni 2023 um 12:48
> *An: *Carsten Grommel , ceph-users@ceph.io
> 
> *Betreff: *Re: [ceph-users] Ceph Pacific bluefs enospc bug with newly
> created OSDs
>
> Hi Carsten,
>
> first of all Quincy does have a fix for the issue, see
> https://tracker.ceph.com/issues/53466 (and its Quincy counterpart
> https://tracker.ceph.com/issues/58588)
>
> Could you please share a bit more info on OSD disk layout?
>
> SSD or HDD? Standalone or shared DB volume? I presume the latter... What
> is disk size and current utilization?
>
> Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if
> possible
>
>
> Generally, given my assumption that DB volume is currently collocated
> and you still want to stay on Pacific, you might want to consider
> redeploying OSDs with a standalone DB volume setup.
>
> Just create large enough (2x of the current DB size seems to be pretty
> conservative estimation for that volume's size) additional LV on top of
> the same physical disk. And put DB there...
>
> Separating DB from main disk would result in much less fragmentation at
> DB volume and hence work around the problem. The cost would be having
> some extra spare space at DB volume unavailable for user data .
>
>
> Hope this helps,
>
> Igor
>
>
> On 20/06/2023 10:29, Carsten Grommel wrote:
> > Hi all,
> >
> > we are experiencing the “bluefs enospc bug” again after redeploying
> all OSDs of our Pacific Cluster.
> > I know that our cluster is a bit too utilized at the moment with
> 87.26 % raw usage but still this should not happen afaik.
> > We never hat this problem with previous ceph versions and right now
> I am kind of out of ideas at how to tackle these crashes.
> >

[ceph-users] Re: BlueStore fragmentation woes

2023-05-30 Thread Fox, Kevin M
)  probe -10: 0,  0, 0
May 28 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-28T18:35:22.790+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -18: 0,  0, 0
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  allocation stats probe 3: cnt: 17509 
frags: 17509 size: 31015436288
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -1: 17987,  17987, 32446676992
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -3: 21986,  21986, 47407562752
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -7: 0,  0, 0
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -11: 0,  0, 0
May 29 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-29T18:35:22.815+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -19: 0,  0, 0
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  allocation stats probe 4: cnt: 21016 
frags: 21016 size: 45432438784
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -1: 17509,  17509, 31015436288
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -2: 17987,  17987, 32446676992
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -4: 21986,  21986, 47407562752
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -12: 0,  0, 0
May 30 11:35:22 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2490690]: 
debug 2023-05-30T18:35:22.826+ 7fe190013700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -20: 0,  0, 0

Thanks,
Kevin


From: Fox, Kevin M 
Sent: Thursday, May 25, 2023 9:36 AM
To: Igor Fedotov; Hector Martin; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: BlueStore fragmentation woes

Ok, I'm gathering the "allocation stats probe" stuff. Not sure I follow what 
you mean by the historic probes. just:
| egrep "allocation stats probe|probe"   ?

That gets something like:
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  allocation stats probe 110: cnt: 27637 
frags: 27637 size: 63777406976
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -1: 24503,  24503, 58141900800
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -2: 24594,  24594, 56951898112
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -6: 19737,  19737, 37299027968
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -14: 20373,  20373, 35302801408
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -30: 19072,  19072, 33645854720

if that is the right query, then I'll gather the metrics, restart and gather 
some more after and let you know.

Thanks,
Kevin


From: Igor Fedotov 
Sent: Thursday, May 25, 2023 9:29 AM
To: Fox, Kevin M; Hector Martin; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: BlueStore fragmentation woes

Just run through available logs for a specific OSD (which you suspect
suffer from high fragmentation) and collect all allocation stats probes
you can find ("allocation stats probe" string is a perfect grep pattern,

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
Ok, I'm gathering the "allocation stats probe" stuff. Not sure I follow what 
you mean by the historic probes. just:
| egrep "allocation stats probe|probe"   ?

That gets something like:
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  allocation stats probe 110: cnt: 27637 
frags: 27637 size: 63777406976
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -1: 24503,  24503, 58141900800
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -2: 24594,  24594, 56951898112
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -6: 19737,  19737, 37299027968
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -14: 20373,  20373, 35302801408
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: 
debug 2023-05-24T18:24:34.105+ 7f53603fc700  0 
bluestore(/var/lib/ceph/osd/ceph-183)  probe -30: 19072,  19072, 33645854720

if that is the right query, then I'll gather the metrics, restart and gather 
some more after and let you know.

Thanks,
Kevin


From: Igor Fedotov 
Sent: Thursday, May 25, 2023 9:29 AM
To: Fox, Kevin M; Hector Martin; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: BlueStore fragmentation woes

Just run through available logs for a specific OSD (which you suspect
suffer from high fragmentation) and collect all allocation stats probes
you can find ("allocation stats probe" string is a perfect grep pattern,
please append lines with historic probes following day-0 line as well.
Given this is printed once per day there wouldn't be too many).

Then do OSD restart and wait a couple more days. Would allocation stats
show much better disparity between cnt and frags columns?

Is the similar pattern (eventual degradation in stats prior to restart
and severe improvement afterwards) be observed for other OSDs?


On 25/05/2023 19:20, Fox, Kevin M wrote:
> If you can give me instructions on what you want me to gather before the 
> restart and after restart I can do it. I have some running away right now.
>
> Thanks,
> Kevin
>
> 
> From: Igor Fedotov 
> Sent: Thursday, May 25, 2023 9:17 AM
> To: Fox, Kevin M; Hector Martin; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: BlueStore fragmentation woes
>
> Perhaps...
>
> I don't like the idea to use fragmentation score as a real index. IMO
> it's mostly like a very imprecise first turn marker to alert that
> something might be wrong. But not a real quantitative high-quality estimate.
>
> So in fact I'd like to see a series of allocation probes showing
> eventual degradation without OSD restart and immediate severe
> improvement after the restart.
>
> Can you try to collect something like that? Would the same behavior
> persist with an alternative allocator?
>
>
> Thanks,
>
> Igor
>
>
> On 25/05/2023 18:41, Fox, Kevin M wrote:
>> Is this related to https://tracker.ceph.com/issues/58022 ?
>>
>> We still see run away osds at times, somewhat randomly, that causes runaway 
>> fragmentation issues.
>>
>> Thanks,
>> Kevin
>>
>> 
>> From: Igor Fedotov 
>> Sent: Thursday, May 25, 2023 8:29 AM
>> To: Hector Martin; ceph-users@ceph.io
>> Subject: [ceph-users] Re: BlueStore fragmentation woes
>>
>> Check twice before you click! This email originated from outside PNNL.
>>
>>
>> Hi Hector,
>>
>> I can advise two tools for further fragmentation analysis:
>>
>> 1) One might want to use ceph-bluestore-tool's free-dump command to get
>> a list of free chunks for an OSD and try to analyze whether it's really
>> highly fragmented and lacks long enough extents. free-dump just returns
>> a list of extents in json format, I can take a look to the output if
>> shared...
>>
>> 2) You might want to look for allocation probs in OSD logs and see how
>> fragmentation in allocated chunks has evolved.
>>
>> E.g.
>>
>> allocation stats probe 33: cnt: 8148921 frags: 10958186 size: 1704348508>
>> probe -1: 35168547,  46401246, 1199516209152
>> probe -3: 27275

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
If you can give me instructions on what you want me to gather before the 
restart and after restart I can do it. I have some running away right now.

Thanks,
Kevin


From: Igor Fedotov 
Sent: Thursday, May 25, 2023 9:17 AM
To: Fox, Kevin M; Hector Martin; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: BlueStore fragmentation woes

Perhaps...

I don't like the idea to use fragmentation score as a real index. IMO
it's mostly like a very imprecise first turn marker to alert that
something might be wrong. But not a real quantitative high-quality estimate.

So in fact I'd like to see a series of allocation probes showing
eventual degradation without OSD restart and immediate severe
improvement after the restart.

Can you try to collect something like that? Would the same behavior
persist with an alternative allocator?


Thanks,

Igor


On 25/05/2023 18:41, Fox, Kevin M wrote:
> Is this related to https://tracker.ceph.com/issues/58022 ?
>
> We still see run away osds at times, somewhat randomly, that causes runaway 
> fragmentation issues.
>
> Thanks,
> Kevin
>
> 
> From: Igor Fedotov 
> Sent: Thursday, May 25, 2023 8:29 AM
> To: Hector Martin; ceph-users@ceph.io
> Subject: [ceph-users] Re: BlueStore fragmentation woes
>
> Check twice before you click! This email originated from outside PNNL.
>
>
> Hi Hector,
>
> I can advise two tools for further fragmentation analysis:
>
> 1) One might want to use ceph-bluestore-tool's free-dump command to get
> a list of free chunks for an OSD and try to analyze whether it's really
> highly fragmented and lacks long enough extents. free-dump just returns
> a list of extents in json format, I can take a look to the output if
> shared...
>
> 2) You might want to look for allocation probs in OSD logs and see how
> fragmentation in allocated chunks has evolved.
>
> E.g.
>
> allocation stats probe 33: cnt: 8148921 frags: 10958186 size: 1704348508>
> probe -1: 35168547,  46401246, 1199516209152
> probe -3: 27275094,  35681802, 200121712640
> probe -5: 34847167,  52539758, 271272230912
> probe -9: 44291522,  60025613, 523997483008
> probe -17: 10646313,  10646313, 155178434560
>
> The first probe refers to the last day while others match days (or
> rather probes) -1, -3, -5, -9, -17
>
> 'cnt' column represents the amount of allocations performed in the
> previous 24 hours and 'frags' one shows amount of fragments in the
> resulted allocations. So significant mismatch between frags and cnt
> might indicate some issues with high fragmentation indeed.
>
> Apart from retrospective analysis you might also want how OSD behavior
> changes after reboot - e.g. wouldn't rebooted OSD produce less
> fragmentation... Which in turn might indicate some issues with BlueStore
> allocator..
>
> Just FYI: allocation probe printing interval is controlled by
> bluestore_alloc_stats_dump_interval parameter.
>
>
> Thanks,
>
> Igor
>
>
>
> On 24/05/2023 17:18, Hector Martin wrote:
>> On 24/05/2023 22.07, Mark Nelson wrote:
>>> Yep, bluestore fragmentation is an issue.  It's sort of a natural result
>>> of using copy-on-write and never implementing any kind of
>>> defragmentation scheme.  Adam and I have been talking about doing it
>>> now, probably piggybacking on scrub or other operations that already
>>> area reading all of the extents for an object anyway.
>>>
>>>
>>> I wrote a very simply prototype for clone to speed up the rbd mirror use
>>> case here:
>>>
>>> https://github.com/markhpc/ceph/commit/29fc1bfd4c90dd618eb9e0d4ae6474d8cfa5dfdf
>>>
>>>
>>> Adam ended up going the extra mile and completely changed how shared
>>> blobs works which probably eliminates the need to do defrag on clone
>>> anymore from an rbd-mirror perspective, but I think we still need to
>>> identify any times we are doing full object reads of fragmented objects
>>> and consider defragmenting at that time.  It might be clone, or scrub,
>>> or other things, but the point is that if we are already doing most of
>>> the work (seeks on HDD especially!) the extra cost of a large write to
>>> clean it up isn't that bad, especially if we are doing it over the
>>> course of months or years and can help keep freespace less fragmented.
>> Note that my particular issue seemed to specifically be free space
>> fragmentation. I don't use RBD mirror and I would not *expect* most of
>> my cephfs use cases to lead to any weird cow/fragmentation issues with
>> objects other than t

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
Is this related to https://tracker.ceph.com/issues/58022 ?

We still see run away osds at times, somewhat randomly, that causes runaway 
fragmentation issues.

Thanks,
Kevin


From: Igor Fedotov 
Sent: Thursday, May 25, 2023 8:29 AM
To: Hector Martin; ceph-users@ceph.io
Subject: [ceph-users] Re: BlueStore fragmentation woes

Check twice before you click! This email originated from outside PNNL.


Hi Hector,

I can advise two tools for further fragmentation analysis:

1) One might want to use ceph-bluestore-tool's free-dump command to get
a list of free chunks for an OSD and try to analyze whether it's really
highly fragmented and lacks long enough extents. free-dump just returns
a list of extents in json format, I can take a look to the output if
shared...

2) You might want to look for allocation probs in OSD logs and see how
fragmentation in allocated chunks has evolved.

E.g.

allocation stats probe 33: cnt: 8148921 frags: 10958186 size: 1704348508>
probe -1: 35168547,  46401246, 1199516209152
probe -3: 27275094,  35681802, 200121712640
probe -5: 34847167,  52539758, 271272230912
probe -9: 44291522,  60025613, 523997483008
probe -17: 10646313,  10646313, 155178434560

The first probe refers to the last day while others match days (or
rather probes) -1, -3, -5, -9, -17

'cnt' column represents the amount of allocations performed in the
previous 24 hours and 'frags' one shows amount of fragments in the
resulted allocations. So significant mismatch between frags and cnt
might indicate some issues with high fragmentation indeed.

Apart from retrospective analysis you might also want how OSD behavior
changes after reboot - e.g. wouldn't rebooted OSD produce less
fragmentation... Which in turn might indicate some issues with BlueStore
allocator..

Just FYI: allocation probe printing interval is controlled by
bluestore_alloc_stats_dump_interval parameter.


Thanks,

Igor



On 24/05/2023 17:18, Hector Martin wrote:
> On 24/05/2023 22.07, Mark Nelson wrote:
>> Yep, bluestore fragmentation is an issue.  It's sort of a natural result
>> of using copy-on-write and never implementing any kind of
>> defragmentation scheme.  Adam and I have been talking about doing it
>> now, probably piggybacking on scrub or other operations that already
>> area reading all of the extents for an object anyway.
>>
>>
>> I wrote a very simply prototype for clone to speed up the rbd mirror use
>> case here:
>>
>> https://github.com/markhpc/ceph/commit/29fc1bfd4c90dd618eb9e0d4ae6474d8cfa5dfdf
>>
>>
>> Adam ended up going the extra mile and completely changed how shared
>> blobs works which probably eliminates the need to do defrag on clone
>> anymore from an rbd-mirror perspective, but I think we still need to
>> identify any times we are doing full object reads of fragmented objects
>> and consider defragmenting at that time.  It might be clone, or scrub,
>> or other things, but the point is that if we are already doing most of
>> the work (seeks on HDD especially!) the extra cost of a large write to
>> clean it up isn't that bad, especially if we are doing it over the
>> course of months or years and can help keep freespace less fragmented.
> Note that my particular issue seemed to specifically be free space
> fragmentation. I don't use RBD mirror and I would not *expect* most of
> my cephfs use cases to lead to any weird cow/fragmentation issues with
> objects other than those forced by the free space becoming fragmented
> (unless there is some weird pathological use case I'm hitting). Most of
> my write workloads are just copying files in bulk and incrementally
> writing out files.
>
> Would simply defragging objects during scrub/etc help with free space
> fragmentation itself? Those seem like two somewhat unrelated issues...
> note that if free space is already fragmented, you wouldn't even have a
> place to put down a defragmented object.
>
> Are there any stats I can look at to figure out how bad object and free
> space fragmentation is? It would be nice to have some clearer data
> beyond my hunch/deduction after seeing the I/O patterns and the sole
> fragmentation number :). Also would be interesting to get some kind of
> trace of the bluestore ops the OSD is doing, so I can find out whether
> it's doing something pathological that causes more fragmentation for
> some reason.
>
>> Mark
>>
>>
>> On 5/24/23 07:17, Hector Martin wrote:
>>> Hi,
>>>
>>> I've been seeing relatively large fragmentation numbers on all my OSDs:
>>>
>>> ceph daemon osd.13 bluestore allocator score block
>>> {
>>>   "fragmentation_rating": 0.77251526920454427
>>> }
>>>
>>> These aren't that old, as I recreated them all around July last year.
>>> They mostly hold CephFS data with erasure coding, with a mix of large
>>> and small files. The OSDs are at around 80%-85% utilization right now.
>>> Most of the data was written sequentially when the OSDs were created (I
>>> rsynced everything from a remote backup). Since

[ceph-users] Re: Ceph Failure and OSD Node Stuck Incident

2023-03-30 Thread Fox, Kevin M
I've seen this twice in production on two separate occasions as well. one osd 
gets stuck. a bunch of pg's go into laggy state.

ceph pg dump | grep laggy

shows all the laggy pg's share the same osd.

Restarting the affected osd restored full service.


From: Ramin Najjarbashi 
Sent: Thursday, March 30, 2023 7:47 AM
To: peter...@raksmart.com
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph Failure and OSD Node Stuck Incident

Check twice before you click! This email originated from outside PNNL.


On Thu, Mar 30, 2023 at 6:08 PM  wrote:

> We encountered a Ceph failure where the system became unresponsive with no
> IOPS or throughput after encountering a failed node. Upon investigation, it
> appears that the OSD process on one of the Ceph storage nodes is stuck, but
> ping is still responsive. However, during the failure, Ceph was unable to
> recognize the problematic node, which resulted in all other OSDs in the
> cluster experiencing slow operations and no IOPS in the cluster at all.
>
> Here's the timeline of the incident:
>
> - At 10:40, an alert is triggered, indicating a problem with the OSD.
> - After the alert, Ceph becomes unresponsive with no IOPS or throughput.
> - At 11:26, an engineer discovers that there is a gradual OSD failure,
> with 6 out of 12 OSDs on the node being down.
> - At 11:46, the Ceph engineer is unable to SSH into the faulty node and
> attempts a soft restart, but the "smartmontools" process is stuck while
> shutting down the server. Ping works during this time.
> - After waiting for about one or two minutes, a hard restart is attempted
> for the server.
> - At 11:57, after the Ceph node starts normally, service resumes as usual,
> indicating that the issue has been resolved.
>
> Here is some basic information about our services:
>
> - `Mon: 5 daemons, quorum host001, host002, host003, host004, host005 (age
> 4w)`
> - `Mgr: host005 (active, since 4w), standbys: host001, host002, host003,
> host004`
> - `Osd: 218 osds: 218 up (since 22h), 218 in (since 22h)`
>
> We have a cluster with 19 nodes, including 15 SSD nodes and 4 HDD nodes.
> In total, there are 218 OSDs. The SSD nodes have 11 OSDs with Samsung EVO
> 870 SSD and each drive DB/WAL by 1.6T NVME drive. We are using Ceph version
> 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable).
>
> Here is the health check detail:
> [root@node21 ~]#  ceph health detail
> HEALTH_WARN 1 osds down; Reduced data availability: 12 pgs inactive, 12
> pgs peering; Degraded data redundancy: 272273/43967625 objects degraded
> (0.619%), 88 pgs degraded, 5 pgs undersized; 18192 slow ops, oldest one
> blocked for 3730 sec, daemons
> [osd.0,osd.1,osd.101,osd.103,osd.107,osd.108,osd.109,osd.11,osd.111,osd.112]...
> have slow ops.
> [WRN] OSD_DOWN: 1 osds down
> osd.174 (root=default,host=hkhost031) is down
> [WRN] PG_AVAILABILITY: Reduced data availability: 12 pgs inactive, 12 pgs
> peering
> pg 2.dc is stuck peering for 49m, current state peering, last
> acting [87,95,172]
> pg 2.e2 is stuck peering for 15m, current state peering, last
> acting [51,177,97]
>
> ..
>   pg 2.f7e is active+undersized+degraded, acting [10,214]
> pg 2.f84 is active+undersized+degraded, acting [91,52]
> [WRN] SLOW_OPS: 18192 slow ops, oldest one blocked for 3730 sec, daemons
> [osd.0,osd.1,osd.101,osd.103,osd.107,osd.108,osd.109,osd.11,osd.111,osd.112]...
> have slow ops.
>
> I have the following questions:
>
> 1. Why couldn't Ceph detect the faulty node and automatically abandon its
> resources? Can anyone provide more troubleshooting guidance for this case?
>

Ceph is designed to detect and respond to node failures in the cluster. One
possible explanation is that the OSD process on the node was stuck and not
responding to the Ceph monitor, preventing the monitor from recognizing the
node as down. To troubleshoot this issue, you can start by checking the
Ceph logs on the failed node to see if there are any error messages related
to the OSD process or any other relevant issues.


> 2. What is Ceph's detection mechanism and where can I find related
> information? All of our production cloud machines were affected and
> suspended. If RBD is unstable, we cannot continue to use Ceph technology
> for our RBD source.
>

Ceph uses a monitoring system called Ceph Monitor to detect node failures
and ensure data consistency across the cluster. The Ceph Monitor
periodically sends health checks to the OSD processes and other Ceph
daemons in the cluster to ensure that they are running correctly. If a node
fails to respond to the health check, the Ceph Monitor marks the node as
down and redistributes its resources to other nodes in the cluster.



> 3. Did we miss any patches or bug fixes?
>


> 4. Is there anyone who can suggest improvements and how we can quickly
> detect and avoid similar issues in the future?



> ___
> ceph-users mailing li

[ceph-users] Re: s3 compatible interface

2023-03-21 Thread Fox, Kevin M
Will either the file store or the posix/gpfs filter support the underlying 
files changing underneath so you can access the files either through s3 or by 
other out of band means (smb, nfs, etc)?

Thanks,
Kevin


From: Matt Benjamin 
Sent: Monday, March 20, 2023 5:27 PM
To: Chris MacNaughton
Cc: ceph-users@ceph.io; Kyle Bader
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


Hi Chris,

This looks useful.  Note for this thread:  this *looks like* it's using the
zipper dbstore backend?  Yes, that's coming in Reef.  We think of dbstore
as mostly the zipper reference driver, but it can be useful as a standalone
setup, potentially.

But there's now a prototype of a posix file filter that can be stacked on
dbstore (or rados, I guess)--not yet merged, and iiuc post-Reef.  That's
the project Daniel was describing.  The posix/gpfs filter is aiming for
being thin and fast and horizontally scalable.

The s3gw project that Clyso and folks were writing about is distinct from
both of these.  I *think* it's truthful to say that s3gw is its own
thing--a hybrid backing store with objects in files, but also metadata
atomicity from an embedded db--plus interesting orchestration.

Matt

On Mon, Mar 20, 2023 at 3:45 PM Chris MacNaughton <
chris.macnaugh...@canonical.com> wrote:

> On 3/20/23 12:02, Frank Schilder wrote:
>
> Hi Marc,
>
> I'm also interested in an S3 service that uses a file system as a back-end. I 
> looked at the documentation of 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D&reserved=0
>  and have to say that it doesn't make much sense to me. I don't see this kind 
> of gateway anywhere there. What I see is a build of a rados gateway that can 
> be pointed at a ceph cluster. That's not a gateway to an FS.
>
> Did I misunderstand your actual request or can you point me to the part of 
> the documentation where it says how to spin up an S3 interface using a file 
> system for user data?
>
> The only thing I found is 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs3gw-docs.readthedocs.io%2Fen%2Flatest%2Fhelm-charts%2F%23local-storage&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1fr9aDJ3nqnB3RDDzsF6vpxzXN4961YRDQ%2BhHCdEC%2Bw%3D&reserved=0,
>  but it sounds to me that this is not where the user data will be going.
>
> Thanks for any hints and best regards,
>
>
> for testing you can try: 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D&reserved=0
>
> Yes indeed, that looks like it can be used with a simple fs backend.
>
> Hey,
>
> (Re-sending this email from a mailing-list subscribed email)
>
> I was playing around with RadosGW's file backend (coming in Reef, zipper)
> a few months back and ended up making this docker container that just works
> to setup things:
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FChrisMacNaughton%2Fceph-rgw-docker&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Lu%2F9P50FHeInNkTkYUKQGzwDePnvkvcRR%2FmTOPdzeRE%3D&reserved=0;
>  published (still,
> maybe for a while?) at 
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Ficeyec%2Fceph-rgw-zipper&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WQI5wYhaP6XDTiR%2FcKvkAe7i6o4iBgATWVdr4zSBDRI%3D&reserved=0
>
> Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.redhat.com%2Fen%2Fte

[ceph-users] Re: s3 compatible interface

2023-03-06 Thread Fox, Kevin M
+1. If I know radosgw on top of cephfs is a thing, I may change some plans. Is 
that the planned route?

Thanks,
Kevin


From: Daniel Gryniewicz 
Sent: Monday, March 6, 2023 6:21 AM
To: Kai Stian Olstad
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


On 3/3/23 13:53, Kai Stian Olstad wrote:
> On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote:
>> We're actually writing this for RGW right now.  It'll be a bit before
>> it's productized, but it's in the works.
>
> Just curious, what is the use cases for this feature?
> S3 against CephFS?
>

Local FS for development use, and distributed FS (initial target is
GPFS) for production.   There's no current plans to make it work against
CephFS, although I would imagine it will work fine.  But if you have a
Ceph cluster, you're much better off using standard RGW on RADOS.

Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-02-28 Thread Fox, Kevin M
Minio no longer lets you read / write from the posix side. Only through minio 
itself. :(

Haven't found a replacement yet. If you do, please let me know.

Thanks,
Kevin


From: Robert Sander 
Sent: Tuesday, February 28, 2023 9:37 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


On 28.02.23 16:31, Marc wrote:
>
> Anyone know of a s3 compatible interface that I can just run, and 
> reads/writes files from a local file system and not from object storage?

Have a look at Minio:

https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmin.io%2Fproduct%2Foverview%23architecture&data=05%7C01%7Ckevin.fox%40pnnl.gov%7Cfbffadde8e0a45e1d18308db19b2b714%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638132027594291339%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uPhkVghMl%2B%2BU75ddjwv9FMaLlAHO4GgkcreH5bZFIm0%3D&reserved=0

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.heinlein-support.de%2F&data=05%7C01%7Ckevin.fox%40pnnl.gov%7Cfbffadde8e0a45e1d18308db19b2b714%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638132027594291339%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ciJR1pAWHTbtBbpJJ6GDtcBl7pUJdnU8C5ZBLoWlcaM%3D&reserved=0

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Upgrade path

2023-02-01 Thread Fox, Kevin M
We successfully did ceph-deploy+octopus+centos7 -> (ceph-deploy 
unsupported)+octopus+centos8stream (using leap) -> (ceph-deploy 
unsupported)+pacific+centos8stream  -> cephadm+pacific+centos8stream

Everything in place. Leap was tested repeatedly till the procedure/sideeffects 
were very well known.

We also did s/centos8stream/rocky8/ successfully.

Thanks,
Kevin


From: Iztok Gregori 
Sent: Wednesday, February 1, 2023 3:51 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph Upgrade path

Check twice before you click! This email originated from outside PNNL.


Hi to all!

We are running a Ceph cluster (Octopus) on (99%) CentOS 7 (deployed at
the time with ceph-deploy) and we would like to upgrade it. As far as I
know for Pacific (and later releases) there aren't packages for CentOS 7
distribution (at least not on download.ceph.com), so we need to upgrade
(change) not only Ceph but also the distribution.

What is the raccomended path to do so?

We could upgrade (reinstall) all the nodes to Rocky 8 and then upgrade
Ceph to Quincy, but we will "stuck" with "not the latest" distribution
and probably we will have to upgrade (reinstall) again in the near future.

Our second idea is to leverage cephadm (which we would like to
implement) and switch from rpms to containers, but I don't have a clear
vision of how to do it. I was thinking to:

1. install a new monitor/manager with Rocky 9.
2. prepare the node for cephadm.
3. start the manager/monitor containers on that node.
4. repeat for the other monitors.
5. repeat for the OSD servers.

I'm not sure how to execute the point 2 and 3. The documentation says
how to bootstrap a NEW cluster and how to ADOPT an existing one, but our
situation is a hybrid (or in my mind it is...).

I cannot also adopt my current cluster to cephadm because we have 30% of
our OSD still on filestore. My intention was to drain them, reinstall
them and then adopt them. But I would like to avoid (if not necessary)
multiple reinstallations. In my mind all the OSD servers will be drained
before been reinstalled, just to be sure to have a "fresh" start).

Have you any ideas and/or advice to give us?


Thanks a lot!
Iztok

P.S. I saw that the script cephadm doesn't support Rocky, I can modify
it to do so and it should work, but is there a plan to officially
support it?



--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.elettra.eu%2F&data=05%7C01%7Ckevin.fox%40pnnl.gov%7Cd68621f5c9db4ff0375808db044b7927%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638108494454332936%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GF36e%2FEPgroA%2FE9x1LYCjk42%2BLOH15yAAxc%2BRoqf%2B7g%3D&reserved=0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueFS spillover warning gone after upgrade to Quincy

2023-01-12 Thread Fox, Kevin M
If you have prometheus enabled, the metrics should be in there I think?

Thanks,
Kevin


From: Peter van Heusden 
Sent: Thursday, January 12, 2023 6:12 AM
To: ceph-users@ceph.io
Subject: [ceph-users] BlueFS spillover warning gone after upgrade to Quincy

Check twice before you click! This email originated from outside PNNL.


Hello everyone

I have a Ceph installation where some of the OSDs were misconfigured to use
1GB SSD partitions for rocksdb. This caused a spillover ("BlueFS *spillover*
detected"). I recently upgraded to quincy using cephadm (17.2.5) the
spillover warning vanished. This is
despite bluestore_warn_on_bluefs_spillover still being set to true.

Is there a way to investigate the current state of the DB to see if
spillover is, indeed, still happening?

Thank you,
Peter
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 2 pgs backfill_toofull but plenty of space

2023-01-10 Thread Fox, Kevin M
What else is going on? (ceph -s). If there is a lot of data being shuffled 
around, it may just be because its waiting for some other actions to complete 
first.

Thanks,
Kevin


From: Torkil Svensgaard 
Sent: Tuesday, January 10, 2023 2:36 AM
To: ceph-users@ceph.io
Cc: Ruben Vestergaard
Subject: [ceph-users] 2 pgs backfill_toofull but plenty of space

Check twice before you click! This email originated from outside PNNL.


Hi

Ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
(stable)

Looking at this:

"
Low space hindering backfill (add storage if this doesn't resolve
itself): 2 pgs backfill_toofull
"

"
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if
this doesn't resolve itself): 2 pgs backfill_toofull
 pg 3.11f is active+remapped+backfill_wait+backfill_toofull, acting
[98,51,39,100]
 pg 3.74c is active+remapped+backfill_wait+backfill_toofull, acting
[96,120,58,48]
"

But the disks are noway near being full as far as I can determine, so
why backfill_toofull? The PGs in question are in the rbd_data pool.

"
# ceph df
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
hdd1.4 PiB  730 TiB  686 TiB   686 TiB  48.46
ssd1.3 TiB  1.2 TiB  162 GiB   162 GiB  12.11
TOTAL  1.4 PiB  731 TiB  686 TiB   686 TiB  48.42

--- POOLS ---
POOL ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr  1 1  1.1 GiB  273  545 MiB   0.05549 GiB
rbd_data  3  4096  294 TiB   78.56M  450 TiB  45.72267 TiB
rbd   432  4.1 MiB   26  3.5 MiB  0549 GiB
rbd_internal  532   54 KiB   16  172 KiB  0549 GiB
cephfs_data   6  2048  127 TiB  148.64M  229 TiB  29.99267 TiB
cephfs_metadata   7   128   71 GiB2.84M  142 GiB  11.46549 GiB
libvirt   832   37 MiB  221   74 MiB  0549 GiB
nfs-ganesha   932  2.7 KiB7   52 KiB  0366 GiB
.nfs 1032   53 KiB   47  306 KiB  0366 GiB
"

The top utilized disk is at 57% and the PGs in that pool are ~50GB.

"
TOP BOTTOM
USE WEIGHT  PGS ID  |USEWEIGHT  PGS ID
+
57.71%  1.0 54  osd.68  |46.60% 1.0 286 osd.17
57.08%  1.0 53  osd.80  |46.55% 1.0 286 osd.99
54.95%  1.0 70  osd.86  |46.48% 1.0 284 osd.106
54.86%  1.0 52  osd.63  |45.88% 1.0 187 osd.27
54.06%  1.0 68  osd.88  |45.81% 1.0 279 osd.5
53.89%  1.0 51  osd.79  |44.95% 1.0 272 osd.13
53.65%  1.0 51  osd.67  |43.63% 1.0 269 osd.16
53.59%  1.0 52  osd.65  |43.30% 1.0 261 osd.12
53.58%  1.0 51  osd.82  |32.17% 1.0 172 osd.4
53.52%  1.0 50  osd.72  |0% 0   0   osd.49
+
"

Mvh.

Torkil

--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove radosgw entirely

2022-12-13 Thread Fox, Kevin M
Is there any problem removing the radosgw and all backing pools from a cephadm 
managed cluster? Ceph won't become unhappy about it? We have one cluster with a 
really old, historical radosgw we think would be better to remove and someday 
later, recreate fresh.

Thanks,
Kevin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph upgrade advice - Luminous to Pacific with OS upgrade

2022-12-06 Thread Fox, Kevin M
We went on a couple clusters from ceph-deploy+centos7+nautilus to 
cephadm+rocky8+pacific using ELevate as one of the steps. Went through octopus 
as well. ELevate wasn't perfect for us either, but was able to get the job 
done. Had to test it carefully on the test clusters multiple times to get the 
procedure just right. Had some bumps even then, but was able to get things 
finished up.

Thanks,
Kevin


From: Wolfpaw - Dale Corse 
Sent: Tuesday, December 6, 2022 8:18 AM
To: 'David C'
Cc: 'ceph-users'
Subject: [ceph-users] Re: Ceph upgrade advice - Luminous to Pacific with OS 
upgrade

Check twice before you click! This email originated from outside PNNL.


Hi David,

  > Good to hear you had success with the ELevate tool, I'd looked at that but 
seemed a bit risky. The tool supports Rocky so I may give it a look.

Elevate wasn't perfect - we had to manually upgrade some packages from outside 
repos (ceph, opennebula and salt if memory serves). That said, it was certainly 
manageable.

> This one is surprising since in theory Pacific still supports Filestore, 
> there is at least one thread on the list where someone upgraded to Pacific 
> and is still running some Filestore OSDs -
> on the other hand, there's also a recent thread where someone ran into 
> problems and was  forced to upgrade to Bluestore - did you experience issues 
> yourself or was this advice you
> picked up? I do ultimately want to get all my OSDs on Bluestore but was 
> hoping to do that after the Ceph version upgrade.

Sorry - I am mistaken about Rocks/LevelDB and Filestore upgrades being required 
for Pacific. Apologies!
I do remember doing all of ours when we upgraded from Luminous -> Nautilus, but 
I can't remember why to be honest. Might have been advice at the time, or 
something I read when looking into the upgrade :)

Cheers,
D.

-Original Message-
From: David C [mailto:dcsysengin...@gmail.com]
Sent: Tuesday, December 6, 2022 8:56 AM
To: Wolfpaw - Dale Corse 
Cc: ceph-users 
Subject: [SPAM] [ceph-users] Re: [SPAM] Ceph upgrade advice - Luminous to 
Pacific with OS upgrade

Hi Wolfpaw, thanks for the response

- Id upgrade to Nautilus on Centos 7 before moving to EL8. We then used
> AlmaLinux Elevate to love from 7 to 8 without a reinstall. Rocky has a
> similar path I think.
>

Good to hear you had success with the ELevate tool, I'd looked at that but 
seemed a bit risky. The tool supports Rocky so I may give it a look.

>
> - you will need to love those filestore OSD’s to Bluestore before
> hitting Pacific, might even be part of the Nautilus upgrade. This
> takes some time if I remember correctly.
>

This one is surprising since in theory Pacific still supports Filestore, there 
is at least one thread on the list where someone upgraded to Pacific and is 
still running some Filestore OSDs - on the other hand, there's also a recent 
thread where someone ran into problems and was forced to upgrade to Bluestore - 
did you experience issues yourself or was this advice you picked up? I do 
ultimately want to get all my OSDs on Bluestore but was hoping to do that after 
the Ceph version upgrade.


> - You may need to upgrade monitors to RocksDB too.


Thanks, I wasn't aware of this  - I suppose I'll do that when I'm on Nautilus


On Tue, Dec 6, 2022 at 3:22 PM Wolfpaw - Dale Corse 
wrote:

> We did this (over a longer timespan).. it worked ok.
>
> A couple things I’d add:
>
> - Id upgrade to Nautilus on Centos 7 before moving to EL8. We then
> used AlmaLinux Elevate to love from 7 to 8 without a reinstall. Rocky
> has a similar path I think.
>
> - you will need to love those filestore OSD’s to Bluestore before
> hitting Pacific, might even be part of the Nautilus upgrade. This
> takes some time if I remember correctly.
>
> - You may need to upgrade monitors to RocksDB too.
>
> Sent from my iPhone
>
> > On Dec 6, 2022, at 7:59 AM, David C  wrote:
> >
> > Hi All
> >
> > I'm planning to upgrade a Luminous 12.2.10 cluster to Pacific
> > 16.2.10, cluster is primarily used for CephFS, mix of Filestore and
> > Bluestore OSDs, mons/osds collocated, running on CentOS 7 nodes
> >
> > My proposed upgrade path is: Upgrade to Nautilus 14.2.22 -> Upgrade
> > to
> > EL8 on the nodes (probably Rocky) -> Upgrade to Pacific
> >
> > I assume the cleanest way to update the node OS would be to drain
> > the node and remove from the cluster, install Rocky 8, add back to
> > cluster as effectively a new node
> >
> > I have a relatively short maintenance window and was hoping to speed
> > up OS upgrade with the following approach on each node:
> >
> > - back up ceph config/systemd files etc.
> > - set noout etc.
> > - deploy Rocky 8, being careful not to touch OSD block devices
> > - install Nautilus binaries (ensuring I use same version as pre OS
> upgrade)
> > - copy ceph config back over
> >
> > In theory I could then start up the daemons and they wouldn't care
> > that we're now running on a different OS

[ceph-users] Re: Tuning CephFS on NVME for HPC / IO500

2022-12-01 Thread Fox, Kevin M
if its this:
http://www.acmemicro.com/Product/17848/Kioxia-KCD6XLUL15T3---15-36TB-SSD-NVMe-2-5-inch-15mm-CD6-R-Series-SIE-PCIe-4-0-5500-MB-sec-Read-BiCS-FLASH-TLC-1-DWPD

its listed as 1 DWPD with a 5 year warranty. So should be ok.

Thanks,
Kevin


From: Robert Sander 
Sent: Wednesday, November 30, 2022 11:58 PM
To: ceph-users
Subject: [ceph-users] Re: Tuning CephFS on NVME for HPC / IO500

Check twice before you click! This email originated from outside PNNL.


Hi,

On 2022-12-01 8:26, Manuel Holtgrewe wrote:

> The Ceph cluster nodes have 10x enterprise NVMEs each (all branded as
> "Dell
> enterprise disks"), 8 older nodes (last year) have "Dell Ent NVMe v2
> AGN RI
> U.2 15.36TB" which are Samsung disks, 2 newer nodes (just delivered)
> have
> "Dell Ent NVMe CM6 RI 15.36TB" which are Kioxia disks.

Does the "RI" stand for read-intensive?

I think you need mixed-use flash storage for a Ceph cluster as it has
many random write accesses.

Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.heinlein-support.de%2F&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C5240a7d0843a49017a8c08dad37204ba%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638054783937073502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oCkCgU2SgVMQME30%2FABSLAUPtr2x6XSsHo9nL71UZIc%3D&reserved=0

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd set-require-min-compat-client

2022-11-30 Thread Fox, Kevin M
When we switched  (Was using the compat balancer previously), I:
1. turned off the balancer
2. forced the client minimum (new centos7 clients are ok being forced to 
luminious even though they report as jewel. There's an email thread elsewhere 
describing it)
3. slowly reweighted the crush compat weight set back to completely matching. I 
manually ran the upmap balancer in between reweights to try and keep it not too 
unbalanced.
4. removed the crush compat weight set. didn't have any noticeable affect when 
all the weighs were back to matching.
5. ran some more manual balances to get things really well balanced
6. turned on the balancer

We run the cluster near full and its quite busy so we did things much more 
slowly that way. The new balance using upmap is a significant improvement over 
the compat balancer.

Kevin


From: Stolte, Felix 
Sent: Wednesday, November 30, 2022 4:20 AM
To: Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: osd set-require-min-compat-client

Check twice before you click! This email originated from outside PNNL.


Hi Dan,

thanks for your reply. I wasn’t worried about the setting itself, but about the 
balancer starting to use the pg-upmap feature (which currently fails, because 
of the jewel setting). I would assume though, that the balancer is using 
pg-upmap in a throttled way to avoid performance issues.

I will execute the command on the weekend, just to be safe.

Best regards
Felix


-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
-
-

Am 30.11.2022 um 12:48 schrieb Dan van der Ster :

Hi Felix,

This change won't trigger any rebalancing. It will prevent older clients from 
connecting, but since this isn't a crush tunable it won't directly affect data 
placement.

Best,

Dan


On Wed, Nov 30, 2022, 12:33 Stolte, Felix 
mailto:f.sto...@fz-juelich.de>> wrote:
Hey guys,

our ceph cluster is on pacific, but started on jewel years ago. While i was 
going through the logs of the mrg daemon i stumbled about the following entry:

=
[balancer ERROR root] execute error: r = -1, detail = min_compat_client jewel < 
luminous, which is required for pg-upmap. Try 'ceph osd 
set-require-min-compat-client luminous' before using the new interface
=

I could confirm that with `ceph osd get-require-min-compat-client` my value is 
still value. Reading the docs it looks to me, we really want to set this to 
luminous to benefit from a better pg distribution. My question for you is the 
following:

Do I have to expect a major rebalancing after applying the 'ceph osd 
set-require-min-compat-client luminous‘  command, affecting my cluster IO?

All my daemons are on pacific and all clients at least on nautilus.

Thanks in advance and best regards
Felix

-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
-
-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Configuring rgw connection timeouts

2022-11-17 Thread Fox, Kevin M
I think you can do it like:
```
service_type: rgw
service_id: main
service_name: rgw.main
placement:
  label: rgwmain
spec:
  config:
rgw_keystone_admin_user: swift
```

?


From: Thilo-Alexander Ginkel 
Sent: Thursday, November 17, 2022 10:21 AM
To: Casey Bodley
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Configuring rgw connection timeouts

Check twice before you click! This email originated from outside PNNL.


Hello Casey,

On Thu, Nov 17, 2022 at 6:52 PM Casey Bodley  wrote:

> it doesn't look like cephadm supports extra frontend options during
> deployment. but these are stored as part of the `rgw_frontends` config
> option, so you can use a command like 'ceph config set' after
> deployment to add request_timeout_ms


unfortunately that doesn't really seem to work as cephadm is setting the
config on a service instance level (e.g., client.rgw.rgw.ceph-5.yjgdea), so
we can't simply override this on a higher hierarchical level. In addition,
we deploy multiple rgw instances per node (to better utilize available
resources) which get assigned different HTTP(S) ports by cephadm so they
can coexist on the same host.

Regards,
Thilo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to monitor growing of db/wal partitions ?

2022-11-14 Thread Fox, Kevin M
There should be prom metrics for each.

Thanks,
Kevin


From: Christophe BAILLON 
Sent: Monday, November 14, 2022 10:08 AM
To: ceph-users
Subject: [ceph-users] How to monitor growing of db/wal partitions ?

Check twice before you click! This email originated from outside PNNL.


Hello,

How to simply monitor the growing of db/wal partitions ?
We have 2 nmve shared for 12 osd by host (1 nvme for 6 osd), and we want to 
monitor the growing.
We use cephadm to manage ours clusters

Thanks for advance

--
Christophe BAILLON
Mobile :: +336 16 400 522
Work :: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Feyona.com%2F&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C0d42e6189190400539cd08dac66b618f%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638040461767100682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Fff7jnfiySJlCl1gnVn11tbSs6n9SOqhdKjzZImJMpE%3D&reserved=0
Twitter :: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fctof&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C0d42e6189190400539cd08dac66b618f%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638040461767100682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IuC8ZD3Hu52AZVezElL5%2B%2B%2Fto31FO%2Ffi2m3JPA5MEHA%3D&reserved=0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it a bug that OSD crashed when it's full?

2022-11-01 Thread Fox, Kevin M
If its the same issue, I'd check the fragmentation score on the entire cluster 
asap. You may have other osds close to the limit and its harder to fix when all 
your osds cross the line at once. If you drain this one, it may push the other 
ones into the red zone if your too close, making the problem much worse.

Our cluster has been stable after splitting all the db's to their own volumes.

Really looking forward to the 4k fix.  :) But the workaround seems solid.

Thanks,
Kevin



From: Igor Fedotov 
Sent: Tuesday, November 1, 2022 4:34 PM
To: Tony Liu; ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] Re: Is it a bug that OSD crashed when it's full?

Check twice before you click! This email originated from outside PNNL.


Hi Tony,

first of all let me share my understanding of the issue you're facing.
This recalls me an upstream ticket and I presume my root cause analysis
from there 
(https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F57672%23note-9&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C2e3ff73019a7475ade5e08dabc627f6d%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638029428500885214%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2F8iqR9bo0Yg4WZDA8TqI4d8HywpEoChEUCoiXLEE9TM%3D&reserved=0)
 is applicable
in your case as well.

So generally speaking your OSD isn't 100% full - from the log output one
can see that 0x57acbc000 of 0x6fc840 bytes are free. But there are
not enough contiguous 64K chunks for BlueFS to proceed operating..

As a result OSD managed to escape any *full* sentries and reached the
state when it's crashed - these safety means just weren't designed to
take that additional free space fragmentation factor into account...

Similarly the lack of available 64K chunks prevents OSD from starting up
- it needs to write out some more data to BlueFS during startup recovery.

I'm currently working on enabling BlueFS functioning with default main
device allocation unit (=4K) which will hopefully fix the above issue.


Meanwhile you might want to workaround the current  OSD's state by
setting bluefs_shared_allocat_size to 32K - this might have some
operational and performance effects but highly likely OSD should be able
to startup afterwards. Please do not use 4K for now - it's known for
causing more problems in some circumstances. And I'd highly recommend to
redeploy the OSD ASAP as you drained all the data off it - I presume
that's the reason why you want to bring it up instead of letting the
cluster to recover using regular means applied on OSD loss.

Alternative approach would be to add standalone DB volume and migrate
BlueFS there - ceph-volume should be able to do that even in the current
OSD state. Expanding main volume (if backed by LVM and extra spare space
is available) is apparently a valid option too


Thanks,

Igor


On 11/1/2022 8:09 PM, Tony Liu wrote:
> The actual question is that, is crash expected when OSD is full?
> My focus is more on how to prevent this from happening.
> My expectation is that OSD rejects write request when it's full, but not 
> crash.
> Otherwise, no point to have ratio threshold.
> Please let me know if this is the design or a bug.
>
> Thanks!
> Tony
> 
> From: Tony Liu 
> Sent: October 31, 2022 05:46 PM
> To: ceph-users@ceph.io; d...@ceph.io
> Subject: [ceph-users] Is it a bug that OSD crashed when it's full?
>
> Hi,
>
> Based on doc, Ceph prevents you from writing to a full OSD so that you don’t 
> lose data.
> In my case, with v16.2.10, OSD crashed when it's full. Is this expected or 
> some bug?
> I'd expect write failure instead of OSD crash. It keeps crashing when tried 
> to bring it up.
> Is there any way to bring it back?
>
>  -7> 2022-10-31T22:52:57.426+ 7fe37fd94200  4 rocksdb: EVENT_LOG_v1 
> {"time_micros": 1667256777427646, "job": 1, "event": "recovery_started", 
> "log_files": [23300]}
>  -6> 2022-10-31T22:52:57.426+ 7fe37fd94200  4 rocksdb: 
> [db_impl/db_impl_open.cc:760] Recovering log #23300 mode 2
>  -5> 2022-10-31T22:52:57.529+ 7fe37fd94200  3 rocksdb: 
> [le/block_based/filter_policy.cc:584] Using legacy Bloom filter with high 
> (20) bits/key. Dramatic filter space and/or accuracy improvement is available 
> with format_version>=5.
>  -4> 2022-10-31T22:52:57.592+ 7fe37fd94200  1 bluefs _allocate unable 
> to allocate 0x9 on bdev 1, allocator name block, allocator type hybrid, 
> capacity, block size 0x1000, free 0x57acbc000, fragmentation 0.359784, 
> allocated 0x0
>  -3> 2022-10-31T22:52:57.592+ 7fe37fd94200 -1 bluefs _allocate 
> allocation failed, needed 0x8064a
>  -2> 2022-10-31T22:52:57.592+ 7fe37fd94200 -1 bluefs _flush_range 
> allocated: 0x0 offset: 0x0 length: 0x8064a
>  -1> 2022-10-31T22:52:57.604+ 7fe37fd94200 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_6

[ceph-users] Re: cephadm trouble with OSD db- and wal-device placement (quincy)

2022-11-01 Thread Fox, Kevin M
I haven't done it, but had to read through the documentation a couple months 
ago and what I gathered was:
1. if you have a db device specified but no wal device, it will put the wal on 
the same volume as the db.
2. the recommendation seems to be to not have a separate volume for db and wal 
if on the same physical device?

So, that should allow you to have the failure mode you want I think?

Can anyone else confirm this or knows that it is incorrect?

Thanks,
Kevin


From: Ulrich Pralle 
Sent: Tuesday, November 1, 2022 7:25 AM
To: ceph-users@ceph.io
Subject: [ceph-users] cephadm trouble with OSD db- and wal-device placement 
(quincy)

Check twice before you click! This email originated from outside PNNL.


Hej,

we are using ceph version 17.2.0 on Ubuntu 22.04.1 LTS.

We've got several servers with the same setup and are facing a problem
with OSD deployment and db-/wal-device placement.

Each server consists of ten rotational disks (10TB each) and two NVME
devices (3TB each).

We would like to deploy each rotational disk with a db- and wal-device.

We want to place the db and wal devices of an osd together on the same
NVME, to cut the failure of the OSDs in half if one NVME fails.

We tried several osd service type specifications to achieve our
deployment goal.

Our best approach is:

service_type: osd
service_id: osd_spec_10x10tb-dsk_db_and_wal_on_2x3tb-nvme
service_name: osd.osd_spec_10x10tb-dsk_db_and_wal_on_2x3tb-nvme
placement:
   host_pattern: '*'
unmanaged: true
spec:
   data_devices:
 model: MG[redacted]
 rotational: 1
   db_devices:
 limit: 1
 model: MZ[redacted]
 rotational: 0
   filter_logic: OR
   objectstore: bluestore
   wal_devices:
 limit: 1
 model: MZ[redacted]
 rotational: 0

This service spec deploys ten OSDs with all db-devices on one NVME and
all wal-devices on the second NVME.

If we omit "limit: 1", cephadm deploys ten OSDs with db-devices equally
distributed on both NVMEs and no wal-devices at all --- although half of
the NVMEs capacity remains unused.

What's the best way to do it.

Does that even make sense?

Thank you very much and with kind regards
Uli
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring drives

2022-10-14 Thread Fox, Kevin M
Would it cause problems to mix the smartctl exporter along with ceph's built in 
monitoring stuff?

Thanks,
Kevin


From: Wyll Ingersoll 
Sent: Friday, October 14, 2022 10:48 AM
To: Konstantin Shalygin; John Petrini
Cc: Marc; Paul Mezzanini; ceph-users
Subject: [ceph-users] Re: monitoring drives

Check twice before you click! This email originated from outside PNNL.


This looks very useful.  Has anyone created a grafana dashboard that will 
display the collected data ?



From: Konstantin Shalygin 
Sent: Friday, October 14, 2022 12:12 PM
To: John Petrini 
Cc: Marc ; Paul Mezzanini ; ceph-users 

Subject: [ceph-users] Re: monitoring drives

Hi,

You can get this metrics, even wear level, from official smartctl_exporter [1]

[1] 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fprometheus-community%2Fsmartctl_exporter&data=05%7C01%7Ckevin.fox%40pnnl.gov%7C427caf0d5bb141698e2c08daae0c89bc%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638013666131743069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qo1e3pnVlv7ILn6%2FN7Ojh7j8dB9pThI0g%2F56%2F66wdbM%3D&reserved=0

k
Sent from my iPhone

> On 14 Oct 2022, at 17:12, John Petrini  wrote:
>
> We run a mix of Samsung and Intel SSD's, our solution was to write a
> script that parses the output of the Samsung SSD Toolkit and Intel
> ISDCT CLI tools respectively. In our case, we expose those metrics
> using node_exporter's textfile collector for ingestion by prometheus.
> It's mostly the same smart data but it helps identify some vendor
> specific smart metrics, namely SSD wear level, that we were unable to
> decipher from the raw smart data.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v15-2-14-octopus no docker images on docker hub ceph/ceph ?

2021-08-20 Thread Fox, Kevin M
We launch a local registry for cases like these and mirror the relevant 
containers there. This keeps copies of the images closer to the target cluster 
and reduces load on the public registries. Its not that much different from 
mirroring a yum/apt repo locally to speed up access. For large clusters, its 
well worth the effort.


From: Nico Schottelius 
Sent: Friday, August 20, 2021 2:15 AM
To: Erik Lindahl
Cc: Nico Schottelius; Stefan Fleischmann; ceph-users@ceph.io
Subject: [ceph-users] Re: v15-2-14-octopus no docker images on docker hub 
ceph/ceph ?

Check twice before you click! This email originated from outside PNNL.


Erik Lindahl  writes:

>> On 20 Aug 2021, at 10:39, Nico Schottelius  
>> wrote:
>>
>> I believe mid term everyone will need to provide their own image
>> registries, as the approach of "everything is at dockerhub|quay"
>> does not scale well.
>
> Yeah, this particular issue is not hard to fix technically (and I just
> checked and realized there are also _client_ pull limits that apply
> even to OSS repo).

Yes, that is the problem we are running into from time to time.

> However, I also think it's wise to take a step back and consider
> whether it's just a matter of a technical mishap (docker suddenly
> introducing limits) or two, or if these are signs the orchestration
> isn't as simple as we had hoped.

My personal opinion is still that focussing on something like rook for
containers and leaving the base native would make most sense. If you are
going down containers *anyway*, k8s is helpful for large scale
deployments. If you are not, then completely containerless might be
easier to document, teach and maintain.

> If we need to set up our own container registry, our Ceph
> orchestration is gradually getting more complicated than our entire
> salt setup for ~200 nodes, which to me is an indication of something
> not working as intended :-)

Sorry, that's *not* what I meant: I think that each Open Source project
(like Ceph) might need to setup their own registry like it happened with
package repositories many years ago.

Now, I am aware that the ceph team is working at redhat, redhat is
driving quay.io, so the logically choice would be quay.io.

But the problem with that is rate limiting on quay.io + the lack of IPv6
in our case, which makes all IPv6 hosts look like coming from one IPv4
address, which gives horrible rate limits.

Even in the private IPv4 case that will be the same problem - dozens or
hundreds of nodes are pulling from quay.io and your cluster gets rate
limited.

In practice that means that downloading a single ceph image might take
hours, we have experienced this quite some times already. Thus an
upgrade with cephadm or rook will potentially be delayed by hours, just
for pulling in the image.

Now you can argue that the users/consumers should carry some of the
weight of providing the images and we from ungleich would be happy to
sponsor a public image cache, if necessary.

In short: the container registry move is not the only problem, the
registry limits are a big problem for ceph clusters and require
additional local caching at the moment. I would certainly prefer this
being solved somewhere more nearby upstream instead of everyone running
their on nexus/harbor/docker registry.

Greetings from containerland,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph with BGP?

2021-07-06 Thread Fox, Kevin M
I'm not aware of any directly, but I know rook-ceph is used on Kubernetes, and 
Kubernetes is sometimes deployed with BGP based SDN layers. So there may be a 
few deployments that do it that way.


From: Martin Verges 
Sent: Monday, July 5, 2021 11:23 PM
To: ceph-users
Subject: [ceph-users] Re: Ceph with BGP?

Check twice before you click! This email originated from outside PNNL.


Hello,

> This is not easy to answer without all the details. But for sure there
are cluster running with BGP in the field just fine.

Out of curiosity, is there someone here that has his Ceph cluster running
with BGP in production?
As far as I remember, here at croit with multiple hundred supported
clusters, we never encountered a BGP deployment in the field. It's always
just the theoretical or testing where we hear from BGP.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMartinVerges&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C3360df7c2cb54516c0d608d94046d910%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637611495305315852%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gNkBW79kNkvHZLi9V0JlWYdMMyPwoFeVgS5s2srk9Ws%3D&reserved=0

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcroit.io%2F&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C3360df7c2cb54516c0d608d94046d910%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637611495305325807%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xLOo3N%2BQmX3Xrlo0zv08juJ2ctcRnrRH2RiVnRH2W%2FU%3D&reserved=0
YouTube: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2FPGE1Bx&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C3360df7c2cb54516c0d608d94046d910%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637611495305325807%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HsGCn5BHUd74uVk73jdmAd6YR7agTL6%2BxaWk%2B7LRUrg%3D&reserved=0


On Tue, 6 Jul 2021 at 07:11, Stefan Kooman  wrote:

> On 7/5/21 6:26 PM, German Anders wrote:
> > Hi All,
> >
> > I have an already created and functional ceph cluster (latest
> luminous
> > release) with two networks one for the public (layer 2+3) and the other
> for
> > the cluster, the public one uses VLAN and its 10GbE and the other one
> uses
> > Infiniband with 56Gb/s, the cluster works ok. The public network uses
> > Juniper QFX5100 switches with VLAN in layer2+3 configuration but the
> > network team needs to move to a full layer3 and they want to use BGP, so
> > the question is, how can we move to that schema? What are the
> > considerations? Is it possible? Is there any step-by-step way to move to
> > that schema? Also is anything better than BGP or other alternatives?
>
> Ceph doesn't care at all. Just as long as the nodes can communicate to
> each other, it's fine. It depends on your failure domains how easy you
> can move to this L3 model. Do you have separate datacenters that you can
> do one by one, or separate racks?
>
> And you can do BGP on different levels: router, top of rack switches, or
> even on the Ceph host itselfs (FRR).
>
> We use BGP / VXLAN / EVPN for our Ceph cluster. But it all depends on
> why your networking teams wants to change to L3, and why.
>
> There are no step by step guides, as most deployments are unique.
>
> This might be a good time to reconsider a separate cluster network.
> Normally there is no need for that, and might make things simpler.
>
> Do you have separate storage switches? Whre are your clients connected
> to (separate switches or connected to storage switches as well).
>
> This is not easy to answer without all the details. But for sure there
> are cluster running with BGP in the field just fine.
>
> Gr. Stefan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph connect to openstack

2021-06-30 Thread Fox, Kevin M
https://docs.ceph.com/en/latest/rbd/rbd-openstack/


From: Szabo, Istvan (Agoda) 
Sent: Wednesday, June 30, 2021 9:50 AM
To: Ceph Users
Subject: [ceph-users] Ceph connect to openstack

Check twice before you click! This email originated from outside PNNL.


Hi,

Is there any proper documentation how to connect ceph with openstack?




This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-25 Thread Fox, Kevin M
Orchestration is hard, especially with every permutation. The devs have 
implemented what they feel is the right solution for their own needs from the 
sound of it. The orchestration was made modular to support non containerized 
deployment. It just takes someone to step up and implement the permutations 
desired. And ultimately that's what opensource is geared towards. With 
opensource and some desired feature, you can:
1. Implement it
2. Pay someone else to implement it
3. Convince someone else to implement it in their spare time.

The thread seems to be currently focused around #3 but no developer seems to be 
interested in implementing it. So that leaves options 1 and 2?

To move this forward, is anyone interested in developing package support in the 
orchestration system or paying to have it implemented?


From: Oliver Freyermuth 
Sent: Wednesday, June 2, 2021 2:26 PM
To: Matthew Vernon; ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


Hi,

that's also a +1 from me — we also use containers heavily for scientific 
workflows, and know their benefits well.
But they are not the "best", or rather, the most fitting tool in every 
situation.
You have provided a great summary and I agree with all points, and thank you a 
lot for this very competent and concise write-up.


Since in this lengthy thread, static linking and solving the issue of many 
inter-dependencies for production services with containers have been mentioned 
as solutions,
I'd like to add another point to your list of complexities:
* Keeping production systems secure may be a lot more of a hassle.

Even though the following article is long and many may regard it as 
controversial, I'd like to link to a concise write-up from a packager 
discussing this topic in a quite generic way:
  
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblogs.gentoo.org%2Fmgorny%2F2021%2F02%2F19%2Fthe-modern-packagers-security-nightmare%2F&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C7e520344a4cb466b0fc908d9260d5851%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637582661036645267%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qHV9gj8s0oEmHpHp5ZZdzsf%2Fs5Z6RhUZS8PaHwzeNRs%3D&reserved=0
While the article discusses the issues of static linking and package management 
performed in language-specific domains, it applies all the same to containers.

If I operate services in containers built by developers, of course this ensures 
the setup works, and dependencies are well tested, and even upgrades work well 
— but it also means that,
at the end of the day, if I run 50 services in 50 different containers from 50 
different upstreams, I'll have up to 50 different versions of OpenSSL floating 
around my production servers.
If a security issue is found in any of the packages used in all the container 
images, I now need to trust the security teams of all the 50 developer groups 
building these containers
(and most FOSS projects won't have the ressources, understandably...),
instead of the one security team of the disto I use. And then, I also have to 
re-pull all these containers, after finding out that a security fix has become 
available.
Or I need to build all these containers myself, and effectively take over the 
complete job, and have my own security team.

This may scale somewhat well, if you have a team of 50 people, and every person 
takes care of one service. Containers are often your friend in this case[1],
since it allows to isolate the different responsibilities along with the 
service.

But this is rarely the case outside of industry, and especially not in 
academics.
So the approach we chose for us is to have one common OS everywhere, and 
automate all of our deployment and configuration management with Puppet.
Of course, that puts is in one of the many corners out there, but it scales 
extremely well to all services we operate,
and I can still trust the distro maintainers to keep the base OS safe on all 
our servers, automate reboots etc.

For Ceph, we've actually seen questions about security issues already on the 
list[0] (never answered AFAICT).


To conclude, I strongly believe there's no one size fits all here.

That was why I was hopeful when I first heard about the Ceph orchestrator idea, 
when it looked to be planned out to be modular,
with the different tasks being implementable in several backends, so one could 
imagine them being implemented with containers, with classic SSH on bare-metal 
(i.e. ceph-deploy-like), ansible, rook or maybe others.
Sadly, it seems it ended up being "container-only".
Containers certainly have many uses, and we run thousands of them daily, but 
neither do they fit each and every existing requirement,
nor are they a magic bullet to solve all issues.

Cheers,
Oliver


[0] 
https:/

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Fox, Kevin M
I bumped into this recently:
https://samuel.karp.dev/blog/2021/05/running-freebsd-jails-with-containerd-1-5/

:)

Kevin


From: Sage Weil 
Sent: Thursday, June 24, 2021 2:06 PM
To: Stefan Kooman
Cc: Nico Schottelius; Kai Börnert; Marc; ceph-users
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman  wrote:
> On 6/21/21 6:19 PM, Nico Schottelius wrote:
> > And while we are at claiming "on a lot more platforms", you are at the
> > same time EXCLUDING a lot of platforms by saying "Linux based
> > container" (remember Ceph on FreeBSD? [0]).
>
> Indeed, and that is a more fundamental question: how easy it is to make
> Ceph a first-class citizen on non linux platforms. Was that ever a
> (design) goal? But then again, if you would be able to port docker
> natively to say OpenBSD, you should be able to run Ceph on it as well.

Thank you for bringing this up.  This is in fact a key reason why the
orchestration abstraction works the way it does--to allow other
runtime environments to be supported (FreeBSD!
sysvinit/Devuan/whatever for systemd haters!) while ALSO allowing an
integrated, user-friendly experience in which users workflow for
adding/removing hosts, replacing failed OSDs, managing services (MDSs,
RGWs, load balancers, etc) can be consistent across all platforms.
For 10+ years we basically said "out of scope" to these pesky
deployment details and left this job to Puppet, Chef, Ansible,
ceph-deploy, rook, etc., but the result of that strategy was pretty
clear: ceph was hard to use and the user experience dismal when
compared to an integrated product from any half-decent enterprise
storage company, or products like Martin's that capitalize on core
ceph's bad UX.

The question isn't whether we support other environments, but how.  As
I mentioned in one of my first messages, we can either (1) generalize
cephadm to work in other environments (break the current
systemd+container requirement), or (2) add another orchestrator
backend that supports a new environment.  I don't have any well-formed
opinion here.  There is a lot of pretty generic "orchestration" logic
in cephadm right now that isn't related to systemd or containers that
could either be pulled out of cephadm into the mgr/ochestrator layer
or a library.  Or an independent, fresh orch backend implementation
could opt for a very different approach or set of opinions.

Either way, my assumption has been that these other environments would
probably not be docker|podman-based.  In the case of FreeBSD we'd
probably want to use jails or whatever.  But anything is possible.

s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Fox, Kevin M
I've actually had rook-ceph not proceed with something that I would have 
continued on with. Turns out I was wrong and it was right. Its checking was 
more through then mine. Thought that was pretty cool. It eventually cleared 
itself and finished up.

For a large ceph cluster, the orchestration is very nice.

Thanks,
Kevin


From: Sage Weil 
Sent: Thursday, June 24, 2021 1:46 PM
To: Marc
Cc: Anthony D'Atri; Nico Schottelius; Matthew Vernon; ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


On Sun, Jun 20, 2021 at 9:51 AM Marc  wrote:
> Remarks about your cephadm approach/design:
>
> 1. I am not interested in learning podman, rook or kubernetes. I am using 
> mesos which is also on my osd nodes to use the extra available memory and 
> cores. Furthermore your cephadm OC is limited to only ceph nodes. While my 
> mesos OC is spread across a larger cluster and has rules when, and when not 
> to run tasks on the osd nodes. You incorrectly assume that rgw, grafana, 
> prometheus, haproxy are going to be ran on your ceph OC.

rgw, grafana, prom, haproxy, etc are all optional components.  The
monitoring stack is deployed by default but is trivially disabled via
a flag to the bootstrap command.  We are well aware that not everyone
wants these, but we cannot ignore the vast majority of users that
wants things to Just Work without figuring out how to properly deploy
and manage all of these extraneous integrated components.

> 2. Nico pointed out that you do not have alpine linux container images. I did 
> not even know you were using container images. So how big are these? Where 
> are these stored. And why are these not as small as they can be? Such an osd 
> container image should be 20MB or so at most. I would even expect statically 
> build binary container image, why even a tiny os?
> 4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ 
> to do with container thinking. I expected to find container images for osd, 
> msd, rgw separately and smaller. This looks more like an OS deployment.
Early on the team building the container images opted for a single
image that includes all of the daemons for simplicity.  We could build
stripped down images for each daemon type, but that's an investment in
developer time and complexity and we haven't heard any complaints
about the container size.  (Usually a few hundred MB on a large scale
storage server isn't a problem.)

> 3. Why is in this cephadm still being talked about systemd? Your orchestrator 
> should handle restarts,namespaces and failed tasks not? There should be no 
> need to have a systemd dependency, at least I have not seen any container 
> images relying on this.

Something needs to start the ceph daemon containers when the system
reboots.  We integrated with systemd since all major distros adopted
it.  Cephadm could be extended to support other init systems with
pretty minimal effort... we aren't doing anything fancy with systemd.

> 5. I have been writing this previously on the mailing list here. Is each rgw 
> still requiring its own dedicated client id? Is it still true, that if you 
> want to spawn 3 rgw instances, they need to authorize like client.rgw1, 
> client.rgw2 and client.rgw3?
> This does not allow for auto scaling. The idea of using an OC is that you 
> launch a task, and that you can scale this task automatically when necessary. 
> So you would get multiple instances of rgw1. If this is still and issue with 
> rgw, mds and mgr etc. Why even bother doing something with an OC and 
> containers?

The orchestrator automates the creation and cleanup of credentials for
each rgw instance.  (It also trivially scales them up/down, ala k8s.)
If you have an autoscaler, you just need to tell cephadm how many you
want and it will add/remove daemons.  If you are using cephadm's
ingress (haproxy) capability, the LB configuration will be adjusted
for you.  If you are using an external LB, you can query cephadm for a
description of the current daemons and their endpoints and feed that
info into your own ingress solution.

> 6. As I wrote before I do not want my rgw or haproxy running in a OC that has 
> the ability to give tasks capability SYSADMIN. So that would mean I have to 
> run my osd daemons/containers separately.

Only the OSD containers get extra caps to deal with the storage hardware.

> 7. If you are not setting cpu and memory limits on your cephadm containers, 
> then again there is an argument why even use containers.

Memory limits are partially implemented; we haven't gotten to CPU
limits yet.  It's on the list!

> 8. I still see lots of comments on the mailing list about accessing logs. I 
> have all my containers log to a remote syslog server, if you still have your 
> ceph daemons that can not do this (correctly). What point is it even going to 

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Fox, Kevin M
Ultimately, that's what a container image is. From the outside, its a 
statically linked binary. From the inside, it can be assembled using modular 
techniques. The best thing about it, is you can use container scanners and 
other techniques to gain a lot of the benefits of that modularity still. Plus, 
you don't need special compilers, whole dependency chains compiled with static 
options, etc. So an interesting data point in the statically linked binary vs 
dynamic debate. Its kind of both.


From: Rok Jaklič 
Sent: Wednesday, June 2, 2021 1:18 PM
To: Harry G. Coin
Cc: ceph-users
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


In this giga, tera byte times all this dependency hell can now be avoided
with some static linking. For example, we do use statically linked mysql
binaries and it saved us numerous times. 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F5PmHRSeA2c8%3Ft%3D490&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C9b5e6f33be734b5c0f5008d92603cfc9%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637582621412826966%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=XLMvuVzoCaBgq2swTc8x5ehpdCVP5xTRwuwU6XnL%2Fgo%3D&reserved=0

Rok

On Wed, Jun 2, 2021 at 9:57 PM Harry G. Coin  wrote:

>
> On 6/2/21 2:28 PM, Phil Regnauld wrote:
> > Dave Hall (kdhall) writes:
> >> But the developers aren't out in the field with their deployments
> >> when something weird impacts a cluster and the standard approaches don't
> >> resolve it.  And let's face it:  Ceph is a marvelously robust solution
> for
> >> large scale storage, but it is also an amazingly intricate matrix of
> >> layered interdependent processes, and you haven't got all of the bugs
> >> worked out yet.
> >   I think you hit a very important point here: the concern with
> >   containerized deployments is that they may be a barrier to
> >   efficient troubleshooting and bug reporting by traditional methods
> >   (strace et al) -- unless a well documented debugging and analysis
> >   toolset/methodolgy is provided.
> >
> >   Paradoxically, containerized deployments certainly sound like
> they'd
> >   free up lots of cycles from the developer side of things (no more
> >   building packages for N distributions as was pointed out, easier
> >   upgrade and regression testing), but it might make it more
> difficult
> >   initially for the community to contribute (well, at least for us
> >   dinosaurs that aren't born with docker brains).
> >
> >   Cheers,
> >   Phil
>
>
> I think there's great value in ceph devs doing QA and testing docker
> images, releasing them as a 'known good thing'.  Why? Doing that avoids
> dependency hell inducing fragility-- fragility which I've experienced in
> other multi-host / multi-master packages.  Wherein one distro's
> maintainer decides some new rev ought be pushed out as 'security update'
> while another distro's maintainer decides it's a feature change, another
> calls it a backport, etc.  There's no way to QA 'upgrades' across so
> many grains of shifting sand.
>
> While the devs and the rest of the bleeding-edge folks should enjoy the
> benefits that come with tolerating and managing dependency hell, having
> the orchestrator upgrade in a known good sequence from a known base to a
> known release reduces fragility.
>
> Thanks for ceph!
>
> Harry
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Fox, Kevin M
While there are many reasons containerization helps, I'll just touch on one 
real quick that is relevant to the conversation.

Orchestration.

Implementing orchestration of an entire clustered software with many different:
* package managers
* dependency chains
* init systems
* distro specific quirks
* configuration management systems

has over the years proven intractable. Each project cant implement all 
permutations of each of those with high quality. Which is especially important 
in a data storage system. Good orchestration of complex systems before 
containers was very rare. Finding two that you wanted to use, that supported 
the same base systems was even more rare.

Standardizing on containers allow orchestration to be performed without concern 
to those things, making it significantly easier to write good orchestration 
that is usable by a broad audience. Such as cephadm.

rook-ceph takes it to another level IMO and only requires managing just the 
orchestration bits that are specific to ceph over the base orchestration layer, 
k8s. So even more orchestration layer can be shared between the devs and the 
admins. This allows admins to generalize their skills even more. Yes, its 
another thing to learn. but its worth it in the long term IMO. Being able to 
quickly deploy ceph, elasticsearch, and greenplum clusters and deal with the 
day two issues with ease is something that hasn't been a thing before.

Yes, its a big shift from the traditional distro philosophy. But the advanced 
orchestration abilities possible once we step outside the traditional linux 
distro really is game changing.


From: Matthew Vernon 
Sent: Wednesday, June 2, 2021 2:36 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


Hi,

In the discussion after the Ceph Month talks yesterday, there was a bit
of chat about cephadm / containers / packages. IIRC, Sage observed that
a common reason in the recent user survey for not using cephadm was that
it only worked on containerised deployments. I think he then went on to
say that he hadn't heard any compelling reasons why not to use
containers, and suggested that resistance was essentially a user
education question[0].

I'd like to suggest, briefly, that:

* containerised deployments are more complex to manage, and this is not
simply a matter of familiarity
* reducing the complexity of systems makes admins' lives easier
* the trade-off of the pros and cons of containers vs packages is not
obvious, and will depend on deployment needs
* Ceph users will benefit from both approaches being supported into the
future

We make extensive use of containers at Sanger, particularly for
scientific workflows, and also for bundling some web apps (e.g.
Grafana). We've also looked at a number of container runtimes (Docker,
singularity, charliecloud). They do have advantages - it's easy to
distribute a complex userland in a way that will run on (almost) any
target distribution; rapid "cloud" deployment; some separation (via
namespaces) of network/users/processes.

For what I think of as a 'boring' Ceph deploy (i.e. install on a set of
dedicated hardware and then run for a long time), I'm not sure any of
these benefits are particularly relevant and/or compelling - Ceph
upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud
Archive) provide .debs of a couple of different Ceph releases per Ubuntu
LTS - meaning we can easily separate out OS upgrade from Ceph upgrade.
And upgrading the Ceph packages _doesn't_ restart the daemons[1],
meaning that we maintain control over restart order during an upgrade.
And while we might briefly install packages from a PPA or similar to
test a bugfix, we roll those (test-)cluster-wide, rather than trying to
run a mixed set of versions on a single cluster - and I understand this
single-version approach is best practice.

Deployment via containers does bring complexity; some examples we've
found at Sanger (not all Ceph-related, which we run from packages):

* you now have 2 process supervision points - dockerd and systemd
* docker updates (via distribution unattended-upgrades) have an
unfortunate habit of rudely restarting everything
* docker squats on a chunk of RFC 1918 space (and telling it not to can
be a bore), which coincides with our internal network...
* there is more friction if you need to look inside containers
(particularly if you have a lot running on a host and are trying to find
out what's going on)
* you typically need to be root to build docker containers (unlike packages)
* we already have package deployment infrastructure (which we'll need
regardless of deployment choice)

We also currently use systemd overrides to tweak some of the Ceph units
(e.g. to do some network sanity checks before bringing up an OSD), and
have some tools to pair OSD / journal / LVM / disk device up;

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Fox, Kevin M
Debating containers vs packages is like debating systemd vs initrd. There are 
lots of reasons why containers (and container orchestration) are good for 
deploying things, including ceph. Repeating them in each project every time it 
comes up is not really productive. I'd recommend looking at why containers are 
good in general. It applies to ceph too.


From: Sasha Litvak 
Sent: Wednesday, June 2, 2021 7:56 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


Is there a link of the talk  I can use as a reference?  I would like to
look at the pro container points as this post is getting a little bit one
sided.  I understand that most people prefer things to be stable especially
with the underlying storage systems.  To me personally, use of
containers in general adds a great flexibility because it
detaches underlying OS from the running software.  All points are fair
about adding complexity to the complex system but one thing is missing.
Every time developers decide to introduce some new more efficient libraries
or frameworks we hit a distribution dependency hell.  Because of that, ceph
sometimes abandons entire OS versions before their actual lifetime is
over.  My resources are limited and  I don't want to debug / troubleshoot
/ upgrade OS in addition to ceph itself, hence the containers.  Yes  it
took me a while to warm up to the idea in general but now I don't even
think too much about it.  I went from Nautilus to Pacific (Centos 7 to
Centos 8) within a few hours without needing to upgrade my Ubuntu bare
metal nodes.

This said,  I am for giving people a choice to use packages + ansible /
manual install and also allowing manual install of containers.  Forcing
users' hands too much may make people avoid upgrading their ceph clusters.


On Wed, Jun 2, 2021 at 9:27 AM Dave Hall  wrote:

> I'd like to pick up on something that Matthew alluded to, although what I'm
> saying may not be popular.  I agree that containers are compelling for
> cloud deployment and application scaling, and we can all be glad that
> container technology has progressed from the days of docker privilege
> escalation.  I also agree that for Ceph users switching from native Ceph
> processes to containers carries a learning curve that could be as
> intimidating as learning Ceph itself.
>
> However, here is where I disagree with containerized Ceph:  I worked for 19
> years as a software developer for a major world-wide company.  In that
> time, I noticed that most of the usability issues experienced by customers
> were due to the natural and understandable tendency for software developers
> to program in a way that's easy for the programmer, and in the process to
> lose sight of what's easy for the user.
>
> In the case of Ceph, containers make it easier for the developers to
> produce and ship releases.  It reduces dependency complexities and testing
> time.  But the developers aren't out in the field with their deployments
> when something weird impacts a cluster and the standard approaches don't
> resolve it.  And let's face it:  Ceph is a marvelously robust solution for
> large scale storage, but it is also an amazingly intricate matrix of
> layered interdependent processes, and you haven't got all of the bugs
> worked out yet.
>
> Just note that the beauty of software (or really of anything that is
> 'designed') is that a few people (developers) can produce something that a
> large number of people (storage administrators, or 'users') will want to
> use.
>
> Please remember the ratio of users (cluster administrators) to developers
> and don't lose sight of the users in working to ease and simplify
> development.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
>
> On Wed, Jun 2, 2021 at 5:37 AM Matthew Vernon  wrote:
>
> > Hi,
> >
> > In the discussion after the Ceph Month talks yesterday, there was a bit
> > of chat about cephadm / containers / packages. IIRC, Sage observed that
> > a common reason in the recent user survey for not using cephadm was that
> > it only worked on containerised deployments. I think he then went on to
> > say that he hadn't heard any compelling reasons why not to use
> > containers, and suggested that resistance was essentially a user
> > education question[0].
> >
> > I'd like to suggest, briefly, that:
> >
> > * containerised deployments are more complex to manage, and this is not
> > simply a matter of familiarity
> > * reducing the complexity of systems makes admins' lives easier
> > * the trade-off of the pros and cons of containers vs packages is not
> > obvious, and will depend on deployment needs
> > * Ceph users will benefit from both approaches being supported into the
> > future
> >
> > We make extensive use of containers at Sanger, particularly for
> > scientific workflows, and also for bund

[ceph-users] Re: cephfs vs rbd vs rgw

2021-05-25 Thread Fox, Kevin M
The quick answer, is they are optimized for different use cases.

Things like relational databases (mysql, postgresql) benefit from the 
performance that a dedicated filesystem can provide (rbd). Shared filesystems 
are usually counter indicated with such software.

Shared filesystems like cephfs are nice but can't scale quite as well in number 
of filesystems as something like rbd. Latency in certain operations can be 
worse. Posix network filesystems have their drawbacks. Posix wasn't really 
designed around network fs's. But super useful when you need to share 
filesystems across nodes. A lot of existing software assumes shared 
filesystems. Can get pretty good scaling easily out of some software with it.

rgw is a very different protocol (webby). A lot of existing software doesn't 
work with it. So comparability is not as good. But thats changing. Also has 
some assumptions around how data is read/written. Can be scaled quite large. 
http clients are very easy to come by to speak to it though, so for new 
software, its pretty nice.

So, its not necessarily a "which one should I support". One of cephs great 
features is you can support all 3 with the same storage and use them all as 
needed.


From: Jorge Garcia 
Sent: Tuesday, May 25, 2021 4:43 PM
To: ceph-users@ceph.io
Subject: [ceph-users] cephfs vs rbd vs rgw

Check twice before you click! This email originated from outside PNNL.


This may be too broad of a topic, or opening a can of worms, but we are
running a CEPH environment and I was wondering if there's any guidance
about this question:

Given that some group would like to store 50-100 TBs of data on CEPH and
use it from a linux environment, are there any advantages or
disadvantages in terms of performance/ease of use/learning curve to
using cephfs vs using a block device thru rbd vs using object storage
thru rgw? Here are my general thoughts:

cephfs - Until recently, you were not allowed to have multiple
filesystems. Not sure about performance.

rbd - Can only be mounted on one system at a time, but I guess that
filesystem could then be served using NFS.

rgw - A different usage model from regular linux file/directory
structure. Are there advantages to forcing people to use this interface?

I'm tempted to set up 3 separate areas and try them and compare the
results, but I'm wondering if somebody has done some similar experiment
in the past.

Thanks for any help you can provide!

Jorge
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-17 Thread Fox, Kevin M
There are a lot of benefits to containerization that is hard to do without it.
Finer grained ability to allocate resources to services. (This process gets 2g 
of ram and 1 cpu)
Security is better where only minimal software is available within the 
container so on service compromise its harder to escape.
Ability to run exactly what was tested / released by upstream. Fewer issues 
with version mismatches. Especially useful across different distros.
Easier to implement orchestration on top which enables some of the advanced 
features such as easy to allocate iscsi/nfs volumes. Ceph is finally doing so 
now that it is focusing on containers.
And much more.


From: Teoman Onay 
Sent: Wednesday, March 17, 2021 10:38 AM
To: Matthew H
Cc: Matthew Vernon; ceph-users
Subject: [ceph-users] Re: ceph-ansible in Pacific and beyond?

Check twice before you click! This email originated from outside PNNL.


A containerized environment just makes troubleshooting more difficult,
getting access and retrieving details on Ceph processes isn't as
straightforward as with a non containerized infrastructure. I am still not
convinced that containerizing everything brings any benefits except the
collocation of services.

On Wed, Mar 17, 2021 at 6:27 PM Matthew H  wrote:

> There should not be any performance difference between an un-containerized
> version and a containerized one.
>
> The shift to containers makes sense, as this is the general direction that
> the industry as a whole is taking. I would suggest giving cephadm a try,
> it's relatively straight forward and significantly faster for deployments
> then ceph-ansible is.
>
> 
> From: Matthew Vernon 
> Sent: Wednesday, March 17, 2021 12:50 PM
> To: ceph-users 
> Subject: [ceph-users] ceph-ansible in Pacific and beyond?
>
> Hi,
>
> I caught up with Sage's talk on what to expect in Pacific (
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DPVtn53MbxTc&data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cc8375da0c5e949514eae08d8e96beb60%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637515997042609565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7uTZ2om6cgMF7wVMY6ujPHdGS%2FltOUbv0C8L%2FKF3BSU%3D&reserved=0
>  ) and there was no mention
> of ceph-ansible at all.
>
> Is it going to continue to be supported? We use it (and uncontainerised
> packages) for all our clusters, so I'd be a bit alarmed if it was going
> to go away...
>
> Regards,
>
> Matthew
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd df results

2021-02-11 Thread Fox, Kevin M
+1


From: Marc 
Sent: Thursday, February 11, 2021 12:09 PM
To: ceph-users
Subject: [ceph-users] ceph osd df results

Check twice before you click! This email originated from outside PNNL.


Should the ceph osd df results not have this result for every device class? I 
do not think that there people mixing these classes in pools.

MIN/MAX VAR: 0.78/4.28  STDDEV: 6.15

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw bucket index issue

2021-02-02 Thread Fox, Kevin M
Ping


From: Fox, Kevin M 
Sent: Tuesday, December 29, 2020 3:17 PM
To: ceph-users@ceph.io
Subject: [ceph-users] radosgw bucket index issue

We have a fairly old cluster that has over time been upgraded to nautilus. We 
were digging through some things and found 3 bucket indexes without a 
corresponding bucket. They should have been deleted but somehow were left 
behind. When we try and delete the bucket index, it will not allow it as the 
bucket is not found. The bucket index list command works fine though without 
the bucket. Is there a way to delete the indexes? Maybe somehow relink the 
bucket so it can be deleted again?

Thanks,
Kevin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw bucket index issue

2020-12-29 Thread Fox, Kevin M
We have a fairly old cluster that has over time been upgraded to nautilus. We 
were digging through some things and found 3 bucket indexes without a 
corresponding bucket. They should have been deleted but somehow were left 
behind. When we try and delete the bucket index, it will not allow it as the 
bucket is not found. The bucket index list command works fine though without 
the bucket. Is there a way to delete the indexes? Maybe somehow relink the 
bucket so it can be deleted again?

Thanks,
Kevin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io