[ceph-users] Re: tuning for backup target cluster

2024-05-25 Thread Anthony D'Atri
> Hi Everyone, > > I'm putting together a HDD cluster with an ECC pool dedicated to the backup > environment. Traffic via s3. Version 18.2, 7 OSD nodes, 12 * 12TB HDD + > 1NVME each, QLC, man. QLC. That said, I hope you're going to use that single NVMe SSD for at least the index pool. Is

[ceph-users] Re: Ceph ARM providing storage for x86

2024-05-25 Thread Anthony D'Atri
Why not? The hwarch doesn't matter. > On May 25, 2024, at 07:35, filip Mutterer wrote: > > Is this known to be working: > > Setting up the Ceph Cluster with ARM and then use the storage with X86 > Machines for example LXC, Docker and KVM? > > Is this possible? > > Greetings > > filip >

[ceph-users] Re: Best practice regarding rgw scaling

2024-05-23 Thread Anthony D'Atri
I'm interested in these responses. Early this year a certain someone related having good results by deploying an RGW on every cluster node. This was when we were experiencing ballooning memory usage conflicting with K8s limits when running 3. So on the cluster in question we now run 25.

[ceph-users] Re: CephFS as Offline Storage

2024-05-21 Thread Anthony D'Atri
> I think it is his lab so maybe it is a test setup for production. Home production? > > I don't think it matters to much with scrubbing, it is not like it is related > to how long you were offline. It will scrub just as much being 1 month > offline as being 6 months offline. > >> >> If

[ceph-users] Re: CephFS as Offline Storage

2024-05-21 Thread Anthony D'Atri
If you have a single node arguably ZFS would be a better choice. > On May 21, 2024, at 14:53, adam.ther wrote: > > Hello, > > To save on power in my home lab can I have a single node CEPH cluster sit > idle and powered off for 3 months at a time then boot only to refresh > backups? Or will

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Anthony D'Atri
> > > I have additional questions, > We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk. In > other words 1 Node has 13 osds. > Do you think this is inefficient? > Is it better to create more OSD by creating LV on the disk? Not with the most recent Ceph releases. I

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Anthony D'Atri
> On May 20, 2024, at 2:24 PM, Matthew Vernon wrote: > > Hi, > > Thanks for your help! > > On 20/05/2024 18:13, Anthony D'Atri wrote: > >> You do that with the CRUSH rule, not with osd_crush_chooseleaf_type. Set >> that back to the default value

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Anthony D'Atri
> >>> This has left me with a single sad pg: >>> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive >>>pg 1.0 is stuck inactive for 33m, current state unknown, last acting [] >>> >> .mgr pool perhaps. > > I think so > >>> ceph osd tree shows that CRUSH picked up my racks OK,

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Anthony D'Atri
> On May 20, 2024, at 12:21 PM, Matthew Vernon wrote: > > Hi, > > I'm probably Doing It Wrong here, but. My hosts are in racks, and I wanted > ceph to use that information from the get-go, so I tried to achieve this > during bootstrap. > > This has left me with a single sad pg: > [WRN]

[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Anthony D'Atri
If using jumbo frames, also ensure that they're consistently enabled on all OS instances and network devices. > On May 16, 2024, at 09:30, Frank Schilder wrote: > > This is a long shot: if you are using octopus, you might be hit by this > pglog-dup problem: >

[ceph-users] Re: Upgrading Ceph Cluster OS

2024-05-14 Thread Anthony D'Atri
https://docs.ceph.com/en/latest/start/os-recommendations/#platforms You might want to go to 20.04, then to Reef, then to 22.04 > On May 13, 2024, at 12:22, Nima AbolhassanBeigi > wrote: > > The ceph version is 16.2.13 pacific. > It's deployed using ceph-ansible. (release branch stable-6.0)

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-12 Thread Anthony D'Atri
I halfway suspect that something akin to the speculation in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7MWAHAY7NCJK2DHEGO6MO4SWTLPTXQMD/ is going on. Below are reservations reported by a random OSD that serves (mostly) an EC RGW bucket pool. This is with the mclock

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Anthony D'Atri
>> For our customers we are still disabling mclock and using wpq. Might be >> worth trying. >> >> > Could you please elaborate a bit on the issue(s) preventing the > use of mClock. Is this specific to only the slow backfill rate and/or other > issue? > > This feedback would help prioritize

[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread Anthony D'Atri
Do you see *keys* aka omap traffic? Especially if you have RGW set up? > On Apr 24, 2024, at 15:37, David Orman wrote: > > Did you ever figure out what was happening here? > > David > > On Mon, May 29, 2023, at 07:16, Hector Martin wrote: >> On 29/05/2023

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-23 Thread Anthony D'Atri
> On Apr 23, 2024, at 12:24, Maged Mokhtar wrote: > > For nvme:HDD ratio, yes you can go for 1:10, or if you have extra slots you > can use 1:5 using smaller capacity/cheaper nvmes, this will reduce the impact > of nvme failures. On occasion I've seen a suggestion to mirror the fast

[ceph-users] Re: Status of IPv4 / IPv6 dual stack?

2024-04-23 Thread Anthony D'Atri
Sounds like an opportunity for you to submit an expansive code PR to implement it. > On Apr 23, 2024, at 04:28, Marc wrote: > >> I have removed dual-stack-mode-related information from the documentation >> on the assumption that dual-stack mode was planned but never fully >> implemented. >>

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-21 Thread Anthony D'Atri
me-series DB and watch both for drives nearing EOL and their burn rates. > > On Sun, Apr 21, 2024 at 11:06 PM Anthony D'Atri > wrote: >> >> A deep archive cluster benefits from NVMe too. You can use QLC up to 60TB >> in size, 32 of those in one RU makes for a cluste

[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-21 Thread Anthony D'Atri
Vendor lock-in only benefits vendors. You’ll pay outrageously for support / maint then your gear goes EOL and you’re trolling eBay for parts. With Ceph you use commodity servers, you can swap 100% of the hardware without taking downtime with servers and drives of your choice. And you get

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-21 Thread Anthony D'Atri
A deep archive cluster benefits from NVMe too. You can use QLC up to 60TB in size, 32 of those in one RU makes for a cluster that doesn’t take up the whole DC. > On Apr 21, 2024, at 5:42 AM, Darren Soothill wrote: > > Hi Niklaus, > > Lots of questions here but let me tray and get through

[ceph-users] Re: Upgrading Ceph 15 to 18

2024-04-21 Thread Anthony D'Atri
, > Malte > >> On 21.04.24 04:14, Anthony D'Atri wrote: >> The party line is to jump no more than 2 major releases at once. >> So that would be Octopus (15) to Quincy (17) to Reef (18). >> Squid (19) is due out soon, so you may want to pause at Quincy until Squid >>

[ceph-users] Re: Upgrading Ceph 15 to 18

2024-04-20 Thread Anthony D'Atri
The party line is to jump no more than 2 major releases at once. So that would be Octopus (15) to Quincy (17) to Reef (18). Squid (19) is due out soon, so you may want to pause at Quincy until Squid is released and has some runtime and maybe 19.2.1, then go straight to Squid from Quincy to

[ceph-users] Re: Mysterious Space-Eating Monster

2024-04-19 Thread Anthony D'Atri
Look for unlinked but open files, it may not be Ceph at fault. Suboptimal logrotate rules can cause this. lsof, fsck -n, etc. > On Apr 19, 2024, at 05:54, Sake Ceph wrote: > > Hi Matthew, > > Cephadm doesn't cleanup old container images, at least with Quincy. After a > upgrade we run the

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Anthony D'Atri
This is a ymmv thing, it depends on one's workload. > > However, we have some questions about this and are looking for some guidance > and advice. > > The first one is about the expected benefits. Before we undergo the efforts > involved in the transition, we are wondering if it is even

[ceph-users] Re: Performance of volume size, not a block size

2024-04-15 Thread Anthony D'Atri
If you're using SATA/SAS SSDs I would aim for 150-200 PGs per OSD as shown by `ceph osd df`. If NVMe, 200-300 unless you're starved for RAM. > On Apr 15, 2024, at 07:07, Mitsumasa KONDO wrote: > > Hi Menguy-san, > > Thank you for your reply. Users who use large IO with tiny volumes are a >

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Anthony D'Atri
If you're using an Icinga active check that just looks for SMART overall-health self-assessment test result: PASSED then it's not doing much for you. That bivalue status can be shown for a drive that is decidedly an ex-parrot. Gotta look at specific attributes, which is thorny since they

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Anthony D'Atri
One can up the ratios temporarily but it's all too easy to forget to reduce them later, or think that it's okay to run all the time with reduced headroom. Until a host blows up and you don't have enough space to recover into. > On Apr 12, 2024, at 05:01, Frédéric Nass > wrote: > > > Oh, and

[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Anthony D'Atri
My understanding is that omap and EC are incompatible, though. > On Apr 8, 2024, at 09:46, David Orman wrote: > > I would suggest that you might consider EC vs. replication for index data, > and the latency implications. There's more than just the nvme vs. rotational > discussion to

[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread Anthony D'Atri
ISTR that the Ceph slow op threshold defaults to 30 or 32 seconds. Naturally an op over the threshold often means there are more below the reporting threshold. 120s I think is the default Linux op timeout. > On Apr 6, 2024, at 10:53 AM, David C. wrote: > > Hi, > > Do slow ops impact

[ceph-users] Re: Bucket usage per storage classes

2024-04-04 Thread Anthony D'Atri
A bucket may contain objects spread across multiple storage classes, and AIUI the head object is always in the default storage class, so I'm not sure *exactly* what you're after here. > On Apr 4, 2024, at 17:09, Ondřej Kukla wrote: > > Hello, > > I’m playing around with Storage classes in

[ceph-users] Re: RBD image metric

2024-04-04 Thread Anthony D'Atri
> Istvan Szabo > Staff Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com> > ------- > > > > _

[ceph-users] Re: question about rbd_read_from_replica_policy

2024-04-04 Thread Anthony D'Atri
Network RTT? > On Apr 4, 2024, at 03:44, Noah Elias Feldt wrote: > > Hello, > I have a question about a setting for RBD. > How exactly does "rbd_read_from_replica_policy" with the value "localize" > work? > According to the RBD documentation, read operations will be sent to the > closest OSD

[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
t all, but IMO RGW > index/usage(/log/gc?) pools are always better off using asynchronous > recovery. > > Josh > > On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri wrote: >> >> We currently have in src/common/options/global.yaml.in >> >> - name: osd_async_

[ceph-users] Re: RBD image metric

2024-04-03 Thread Anthony D'Atri
Depending on your Ceph release you might need to enable rbdstats. Are you after provisioned, allocated, or both sizes? Do you have object-map and fast-diff enabled? They speed up `rbd du` massively. > On Apr 3, 2024, at 00:26, Szabo, Istvan (Agoda) > wrote: > > Hi, > > Trying to pull out

[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
We currently have in src/common/options/global.yaml.in - name: osd_async_recovery_min_cost type: uint level: advanced desc: A mixture measure of number of current log entries difference and historical missing objects, above which we switch to use asynchronous recovery when

[ceph-users] Re: Questions about rbd flatten command

2024-04-02 Thread Anthony D'Atri
Do these RBD volumes have a full feature set? I would think that fast-diff and objectmap would speed this. > On Apr 2, 2024, at 00:36, Henry lol wrote: > > I'm not sure, but it seems that read and write operations are > performed for all objects in rbd. > If so, is there any method to apply

[ceph-users] Re: ceph status not showing correct monitor services

2024-04-01 Thread Anthony D'Atri
> a001s017.bpygfm(active, since 13M), standbys: a001s016.ctmoay Looks like you just had an mgr failover? Could be that the secondary mgr hasn't caught up with current events. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: stretch mode item not defined

2024-03-26 Thread Anthony D'Atri
Yes, you will need to create datacenter buckets and move your host buckets under them. > On Mar 26, 2024, at 09:18, ronny.lippold wrote: > > hi there, need some help please. > > we are planning to replace our rbd-mirror setup and go to stretch mode. > the goal is, to have the cluster in 2

[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Anthony D'Atri
First try "ceph osd down 89" > On Mar 25, 2024, at 15:37, Alexander E. Patrakov wrote: > > On Mon, Mar 25, 2024 at 7:37 PM Torkil Svensgaard wrote: >> >> >> >> On 24/03/2024 01:14, Torkil Svensgaard wrote: >>> On 24-03-2024 00:31, Alexander E. Patrakov wrote: Hi Torkil, >>> >>> Hi

[ceph-users] Re: Are we logging IRC channels?

2024-03-23 Thread Anthony D'Atri
I fear this will raise controversy, but in 2024 what’s the value in perpetuating an interface from early 1980s BITnet batch operating systems? > On Mar 23, 2024, at 5:45 AM, Janne Johansson wrote: > >> Sure! I think Wido just did it all unofficially, but afaik we've lost >> all of those

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Anthony D'Atri
Perhaps emitting an extremely low value could have value for identifying a compromised drive? > On Mar 22, 2024, at 12:49, Michel Jouvin > wrote: > > Frédéric, > > We arrived at the same conclusions! I agree that an insane low value would be > a good addition: the idea would be that the

[ceph-users] Re: High OSD commit_latency after kernel upgrade

2024-03-22 Thread Anthony D'Atri
n this forum people recommend upgrading "M3CR046" > https://forums.unraid.net/topic/134954-warning-crucial-mx500-ssds-world-of-pain-stay-away-from-these/ > But actually in my ud cluster all the drives are "M3CR045" and have lower > latency. I'm really confused. >

[ceph-users] Re: High OSD commit_latency after kernel upgrade

2024-03-22 Thread Anthony D'Atri
https://askubuntu.com/questions/1454997/how-to-stop-sys-from-changing-usb-ssd-provisioning-mode-from-unmap-to-full-in-ub How to stop sys from changing USB SSD provisioning_mode from unmap to full in Ubuntu 22.04? askubuntu.com ? > On Mar 22, 2024, at 09:36, Özkan Göksu wrote: > > Hello! > >

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
37ssd 18.19040 1.0 18 TiB 13 TiB 13 TiB 13 GiB 53 GiB > 5.0 TiB 72.78 1.21 179 up > 43ssd 18.19040 1.0 18 TiB 8.9 TiB 8.8 TiB 17 GiB 23 GiB > 9.3 TiB 48.71 0.81 178 up >TOTAL 873 TiB 527 TiB 525 Ti

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Anthony D'Atri
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them behind. > On Mar 20, 2024, at 5:28 PM, Igor Fedotov wrote: > > Hi Thorne, > > unfortunately I'm unaware of any tools high level enough to easily map files > to rados objects without deep undestanding how this

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
ult.rgw.jv-comm-pool.non-ec 6432 0 B0 0 B 0 > 61 TiB > default.rgw.jv-va-pool.data 6532 4.8 TiB 22.17M 14 TiB 7.28 > 61 TiB > default.rgw.jv-va-pool.index 6632 38 GiB 401 113 GiB 0.06 > 61 TiB > default.rg

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
> On Mar 20, 2024, at 14:42, Michael Worsham > wrote: > > Is there an easy way to poll a Ceph cluster to see how much space is available `ceph df` The exporter has percentages per pool as well. > and how much space is available per bucket? Are you using RGW quotas? > > Looking for a

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Anthony D'Atri
Suggest issuing an explicit deep scrub against one of the subject PGs, see if it takes. > On Mar 20, 2024, at 8:20 AM, Michel Jouvin > wrote: > > Hi, > > We have a Reef cluster that started to complain a couple of weeks ago about > ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time.

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Anthony D'Atri
> Those files are VM disk images, and they're under constant heavy use, so yes- > there/is/ constant severe write load against this disk. Why are you using CephFS for an RBD application? ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: ceph osd different size to create a cluster for Openstack : asking for advice

2024-03-13 Thread Anthony D'Atri
NVMe (hope those are enterprise not client) drives aren't likely to suffer the same bottlenecks as HDDs or even SATA SSDs. And a 2:1 size ratio isn't the largest I've seen. So I would just use all 108 OSDs as a single device class and spread the pools across all of them. That way you won't

[ceph-users] Re: Remove cluster_network without routing

2024-03-07 Thread Anthony D'Atri
I think heartbeats will failover to the public network if the private doesn't work -- may not have always done that. >> Hi >> Cephadm Reef 18.2.0. >> We would like to remove our cluster_network without stopping the cluster and >> without having to route between the networks. >> global

[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-06 Thread Anthony D'Atri
> On Feb 28, 2024, at 17:55, Joel Davidow wrote: > > Current situation > - > We have three Ceph clusters that were originally built via cephadm on octopus > and later upgraded to pacific. All osds are HDD (will be moving to wal+db on > SSD) and were resharded after the

[ceph-users] Re: Ceph is constantly scrubbing 1/4 of all PGs and still have pigs not scrubbed in time

2024-03-06 Thread Anthony D'Atri
I don't see these in the config dump. I think you might have to apply them to `global` for them to take effect, not just `osd`, FWIW. > I have tried various settings, like osd_deep_scrub_interval, osd_max_scrubs, > mds_max_scrub_ops_in_progress etc. > All those get ignored.

[ceph-users] Re: Number of pgs

2024-03-05 Thread Anthony D'Atri
the right how many PG replicas are on each OSD. > On Mar 5, 2024, at 14:50, Nikolaos Dandoulakis wrote: > > Hi Anthony, > > I should have said, it’s replicated (3) > > Best, > Nick > > Sent from my phone, apologies for any typos! > From: Anthony D'Atri > Sen

[ceph-users] Re: Number of pgs

2024-03-05 Thread Anthony D'Atri
Replicated or EC? > On Mar 5, 2024, at 14:09, Nikolaos Dandoulakis wrote: > > Hi all, > > Pretty sure not the first time you see a thread like this. > > Our cluster consists of 12 nodes/153 OSDs/1.2 PiB used, 708 TiB /1.9 PiB avail > > The data pool is 2048 pgs big exactly the same number as

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Anthony D'Atri
* Try applying the settings to global so that mons/mgrs get them. * Set your shallow scrub settings back to the default. Shallow scrubs take very few resources * Set your randomize_ratio back to the default, you’re just bunching them up * Set the load threshold back to the default, I can’t

[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Anthony D'Atri
> I think the short answer is "because you have so wildly varying sizes > both for drives and hosts". Arguably OP's OSDs *are* balanced in that their PGs are roughly in line with their sizes, but indeed the size disparity is problematic in some ways. Notably, the 500GB OSD should just be

[ceph-users] Re: Question about erasure coding on cephfs

2024-03-02 Thread Anthony D'Atri
> On Mar 2, 2024, at 10:37 AM, Erich Weiler wrote: > > Hi Y'all, > > We have a new ceph cluster online that looks like this: > > md-01 : monitor, manager, mds > md-02 : monitor, manager, mds > md-03 : monitor, manager > store-01 : twenty 30TB NVMe OSDs > store-02 : twenty 30TB NVMe OSDs > >

[ceph-users] Re: has anyone enabled bdev_enable_discard?

2024-03-01 Thread Anthony D'Atri
I have a number of drives in my fleet with old firmware that seems to have discard / TRIM bugs, as in the drives get bricked. Much worse is that since they're on legacy RAID HBAs, many of them can't be updated. ymmv. > On Mar 1, 2024, at 13:15, Igor Fedotov wrote: > > I played with this

[ceph-users] Re: Seperate metadata pool in 3x MDS node

2024-02-24 Thread Anthony D'Atri
> > I'm designing a new Ceph storage from scratch and I want to increase CephFS > speed and decrease latency. > Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) Just go with pure-NVMe servers. NVMe SSDs shouldn't cost much if anything more than the few remaining SATA or

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> Low space hindering backfill (add storage if this doesn't resolve > itself): 21 pgs backfill_toofull ^^^ Ceph even told you what you need to do ;) If your have recovery taking place and the numbers of misplaced objects and *full PGs/pools keeps decreasing, then yes wait. As for

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Your recovery is stuck because there are no OSDs that have enough space to accept data. Your second OSD host appears to only have 9 OSDs currently, so you should be able to add a 10TB OSD there without removing anything. That will enable data to move to all three of your 10TB OSDs. > On Feb

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You aren’t going to be able to finish recovery without having somewhere to recover TO. > On Feb 24, 2024, at 10:33 AM, nguyenvand...@baoviet.com.vn wrote: > > Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is > very angry now and will not allow me to add one more

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You also might want to increase mon_max_pg_per_osd since you have a wide spread of OSD sizes. Default is 250. Set it to 1000. > On Feb 24, 2024, at 10:30 AM, Anthony D'Atri wrote: > > Add a 10tb HDD to the third node as I suggested, that will help your cluster. > > >

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Add a 10tb HDD to the third node as I suggested, that will help your cluster. > On Feb 24, 2024, at 10:29 AM, nguyenvand...@baoviet.com.vn wrote: > > I will correct some small things: > > we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs > service) > you r corrct, 2/3

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Read the four sections here: https://docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-out-of-order-full > On Feb 24, 2024, at 10:12 AM, nguyenvand...@baoviet.com.vn wrote: > > Hi Mr Anthony,

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
There ya go. You have 4 hosts, one of which appears to be down and have a single OSD that is so small as to not be useful. Whatever cephgw03 is, it looks like a mistake. OSDs much smaller than, say, 1TB often aren’t very useful. Your pools appear to be replicated, size=3. So each of your

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> > 2) It looks like you might have an interesting crush map. Allegedly you have > 41TiB of space but you can’t finish rococering you have lots of PGs stuck as > their destination is too full. Are you running homogenous hardware or do you > have different drive sizes? Are all the weights set

[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Anthony D'Atri
> you can sometimes find really good older drives like Intel P4510s on ebay > for reasonable prices. Just watch out for how much write wear they have on > them. Also be sure to update to the latest firmware before use, then issue a Secure Erase. >

[ceph-users] Re: Performance improvement suggestion

2024-02-20 Thread Anthony D'Atri
gt;> >> This situation will permit some rules to be relaxed (even if they are not >> ok at first). >> Likewise, there are already situations like lazyio that make some >> exceptions to standard procedures. >> >> >> Remembering: it's just a suggestion. >&g

[ceph-users] Re: Performance improvement suggestion

2024-02-20 Thread Anthony D'Atri
It would be better if this feature could make a replica at a second time on > selected pool. > Thanks. > Rafael. > > > > De: "Anthony D'Atri" > Enviada: 2024/02/01 15:00:59 > Para: quag...@bol.com.br > Cc: ceph-users@ceph.io > Assunto: [ceph-users] Re:

[ceph-users] Re: PG stuck at recovery

2024-02-19 Thread Anthony D'Atri
>> After wrangling with this myself, both with 17.2.7 and to an extent with >> 17.2.5, I'd like to follow up here and ask: >> Those who have experienced this, were the affected PGs >> * Part of an EC pool? >> * Part of an HDD pool? >> * Both? > > Both in my case, EC is 4+2 jerasure blaum_roth

[ceph-users] Re: PG stuck at recovery

2024-02-19 Thread Anthony D'Atri
After wrangling with this myself, both with 17.2.7 and to an extent with 17.2.5, I'd like to follow up here and ask: Those who have experienced this, were the affected PGs * Part of an EC pool? * Part of an HDD pool? * Both? > > You don't say anything about the Ceph version you are running.

[ceph-users] Re: concept of ceph and 2 datacenters

2024-02-14 Thread Anthony D'Atri
Notably, the tiebreaker should be in a third location. > On Feb 14, 2024, at 05:16, Peter Sabaini wrote: > > On 14.02.24 06:59, Vladimir Sigunov wrote: >> Hi Ronny, >> This is a good starting point for your design. >> https://docs.ceph.com/en/latest/rados/operations/stretch-mode/ >> >> My

[ceph-users] Re: Unable to add OSD after removing completely

2024-02-13 Thread Anthony D'Atri
rives were automatically added as OSD, and the Cluster was returning to > normal state. Currently, the degraded PGs are recovering, > > Thank you > >> Anthony D'Atri wrote: >> You probably have the H330 HBA, rebadged LSI. You can

[ceph-users] Re: Unable to add OSD after removing completely

2024-02-12 Thread Anthony D'Atri
You probably have the H330 HBA, rebadged LSI. You can set the “mode” or “personality” using storcli / perccli. You might need to remove the VDs from them too. > On Feb 12, 2024, at 7:53 PM, sa...@dcl-online.com wrote: > > Hello, > > I have a Ceph cluster created by orchestrator Cephadm.

[ceph-users] Re: How to solve data fixity

2024-02-09 Thread Anthony D'Atri
ag for that as it is does not contain md5 in case of multipart > upload. > > Michal > > > On 2/9/24 13:53, Anthony D'Atri wrote: >> You could use Lua scripting perhaps to do this at ingest, but I'm very >> curious about scrubs -- you have them turned off

[ceph-users] Re: How to solve data fixity

2024-02-09 Thread Anthony D'Atri
You could use Lua scripting perhaps to do this at ingest, but I'm very curious about scrubs -- you have them turned off completely? > On Feb 9, 2024, at 04:18, Michal Strnad wrote: > > Hi all! > > In the context of a repository-type project, we need to address a situation > where we cannot

[ceph-users] Re: Does it impact write performance when SSD applies into block.wal (not block.db)

2024-02-08 Thread Anthony D'Atri
> Hi everyone, > > I saw the bluestore can separate block.db, block.wal. > In my case, I'd like to apply hybrid device which uses SSD, HDD to improve > the small data write performance. > but I don't have enough SSD to cover block.db and block.wal. > so I think it can impact performance even

[ceph-users] Re: Performance issues with writing files to Ceph via S3 API

2024-02-08 Thread Anthony D'Atri
or pre-shard it in advance for the eventual size of the bucket. Recent releases have a feature that does this automatically, if it's enabled. My command of these dynamics is limited, so others on the list may be able to chime in with refinements. > > Thanks for the help already! > >

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Anthony D'Atri
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had > 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago, > we onboarded an additional 20 OSDs of 14TiB each. That's a big difference in size. I suggest increasing mon_max_pg_per_osd to 1000 --

[ceph-users] Re: Improving CephFS performance by always putting "default" data pool on SSDs?

2024-02-04 Thread Anthony D'Atri
Anything on HDDs yields suboptimal performance. > On Feb 4, 2024, at 13:42, Niklas Hambüchen wrote: > > https://docs.ceph.com/en/reef/cephfs/createfs/ says: > >> The data pool used to create the file system is the “default” data pool and >> the location for storing all inode backtrace

[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-03 Thread Anthony D'Atri
I’ve done the pg import dance a couple of times. It was very slow but did work ultimately. Depending on the situation, if there is one valid copy available one can enable recovery by temporarily setting min_size on the pool to 1, reverting it once recovery completes. You you run with 1

[ceph-users] Re: Performance issues with writing files to Ceph via S3 API

2024-02-03 Thread Anthony D'Atri
The slashes don’t mean much if anything to Ceph. Buckets are not hierarchical filesystems. You speak of millions of files. How many millions? How big are they? Very small objects stress any object system. Very large objects may be multi part uploads that stage to slow media or otherwise

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Anthony D'Atri
You adjusted osd_memory_target? Higher than the default 4GB? > > > Another thing that we've found is that rocksdb can become quite slow if it > doesn't have enough memory for internal caches. As our cluster usage has > grown, we've needed to increase OSD memory in accordance with bucket

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread Anthony D'Atri
suggestion. > If this type of functionality is not interesting, it is ok. > > > > Rafael. > > > De: "Anthony D'Atri" > Enviada: 2024/02/01 12:10:30 > Para: quag...@bol.com.br > Cc: ceph-users@ceph.io > Assunto: [ceph-users] Re: Performance improvement

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread Anthony D'Atri
that like 40 TIMES better density with SSDs. > However, I don't think it's interesting to lose the functionality of the > replicas. > I'm just suggesting another way to increase performance without losing > the functionality of replicas. > > > Rafael. > > &

[ceph-users] Re: Performance improvement suggestion

2024-01-31 Thread Anthony D'Atri
I’ve heard conflicting asserts on whether the write returns with min_size shards have been persisted, or all of them. > On Jan 31, 2024, at 2:58 PM, Can Özyurt wrote: > > I never tried this myself but "min_size = 1" should do what you want to > achieve.

[ceph-users] Re: Performance improvement suggestion

2024-01-31 Thread Anthony D'Atri
Would you be willing to accept the risk of data loss? > On Jan 31, 2024, at 2:48 PM, quag...@bol.com.br wrote: > > Hello everybody, > I would like to make a suggestion for improving performance in Ceph > architecture. > I don't know if this group would be the best place or if my

[ceph-users] Re: crushmap rules :: host selection

2024-01-28 Thread Anthony D'Atri
> > so .. in a PG there are no "file data" but pieces of "file data"? Yes. Chapter 8 may help here, but be warned, it’s pretty dense and may confuse more than help. The foundation layer of Ceph is RADOS — services including block (RBD), file (CephFS), and object (RGW) storage are built on

[ceph-users] Re: crushmap rules :: host selection

2024-01-28 Thread Anthony D'Atri
> >>> so it depends on failure domain .. but with host failure domain, if there >>> is space on some other OSDs >>> will the missing OSDs be "healed" on the available space on some other OSDs? >> Yes, if you have enough hosts. When using 3x replication it is thus >> advantageous to have at

[ceph-users] Re: crushmap rules :: host selection

2024-01-28 Thread Anthony D'Atri
>> Oh! so the device class is more like an arbitrary label not a immutable >> defined property! >> looking at >> https://docs.ceph.com/en/reef/rados/operations/crush-map/#device-classes >> this is not specified … "By default, OSDs automatically set their class at startup to hdd, ssd, or

[ceph-users] Re: Ceph OSD reported Slow operations

2024-01-28 Thread Anthony D'Atri
> > Just a continuation of this mail, Could you help me out to understand the ceph > df output. PFA the screenshot with this mail. No idea what PFA means, but attachments usually don’t make it through on mailing lists. Paste text instead. > 1. Raw storage is 180 TB The sum of OSD total

[ceph-users] Re: crushmap rules :: host selection

2024-01-28 Thread Anthony D'Atri
> > First a all, thanks a lot for for info and taking time to help > a beginner :) Nichts zu denken. This is a community, it’s what we do. Next year you’ll help someone else. >>> > Oh! so the device class is more like an arbitrary label not a immutable > defined property! > looking at

[ceph-users] Re: crushmap rules :: host selection

2024-01-27 Thread Anthony D'Atri
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: c-states and OSD performance

2024-01-26 Thread Anthony D'Atri
I’ve seen C-states impact mons by dropping a bunch of packets — on nodes that were lightly utilized so they transitioned a lot. Curiously both CPU and NIC generation seemed to be factors, as it only happened on one cluster out of a dozen or so. If by SSD you mean SAS/SATA SSDs, then the

[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Anthony D'Atri
> >>> forth), so this is why "ceph df" will tell you a pool has X free >>> space, where X is "smallest free space on the OSDs on which this pool >>> lies, times the number of OSDs". To be even more precise, this depends on the failure domain. With the typical "rack" failure domain, say you

[ceph-users] Re: Performance impact of Heterogeneous environment

2024-01-17 Thread Anthony D'Atri
Conventional wisdom is that with recent Ceph releases there is no longer a clear advantage to this. > On Jan 17, 2024, at 11:56, Peter Sabaini wrote: > > One thing that I've heard people do but haven't done personally with fast > NVMes (not familiar with the IronWolf so not sure if they

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-17 Thread Anthony D'Atri
> Also in our favour is that the users of the cluster we are currently > intending for this have established a practice of storing large objects. That definitely is in your favor. > but it remains to be seen how 60x 22TB behaves in practice. Be sure you don't get SMR drives. > and it's

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Anthony D'Atri
> > NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain > tier-one chassis manufacturers mark both the fsck up. You can get a better > warranty and pricing by buying drives from a VAR. > > We stopped buying “Vendor FW” drives a long time ago. Groovy.

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-15 Thread Anthony D'Atri
by “RBD for cloud”, do you mean VM / container general-purposes volumes on which a filesystem is usually built? Or large archive / backup volumes that are read and written sequentially without much concern for latency or throughput? How many of those ultra-dense chassis in a cluster? Are all

  1   2   3   4   5   >