> Oh, I just read your message again, and I see that I didn't answer your
> question. :D I admit I don't know how MAX AVAIL is calculated, and whether
> it takes things like imbalance into account (it might).
It does. It’s calculated relative to the most-full OSD in the pool, and the
full_ratio
yes.
> On Jul 5, 2021, at 11:23 PM, Martin Verges wrote:
>
> Hello,
>
>> This is not easy to answer without all the details. But for sure there
> are cluster running with BGP in the field just fine.
>
> Out of curiosity, is there someone here that has his Ceph cluster running
> with BGP in pro
> For similar reasons, CentOS 8 stream, as opposed to every other CentOS
> released before, is very experimental. I would never go in production with
> CentOS 8 stream.
Is it, though? Was the experience really any different before “Stream” was
appended to the name? We still saw dot releases
Also, only one Ethernet port. Worse yet they have *zero* HIPPI ports! Can you imagine!?Never used HIPPIAlmost nobody has ;)A 48-port gigabit managed switch is reasonably accessible to the homegamer, both in terms of availability and cost.USD249 from Netgear, interesting.Second-hand 10GbE switches
GCC, the whole toolchain, myriad dependencies, the ways that Python has
patterend itself after Java. Add in the way that the major Linux distributions
are moving targets and building / running on just one of them is a huge task,
not to mention multiple versions of each. And the way that system
>> - Bluestore requires OSD hosts with 8GB+ of RAM
With Filestore I found that in production I needed to raise vm.min_free_kbytes,
though inheriting the terrible mistake of -n size=65536 didn’t help.
A handful of years back WD Labs did their “microserver” project, a cluster of
504 drives with
> 3. Why is in this cephadm still being talked about systemd? Your orchestrator
> should handle restarts,namespaces and failed tasks not? There should be no
> need to have a systemd dependency, at least I have not seen any container
> images relying on this.
Podman uses systemd to manage conta
Thanks, Sage. This is a terrific distilation of the challenges and benefits.
FWIW here are a few of my own perspectives, as someone experienced with Ceph
but with limited container experience. To be very clear, these are
*perceptions* not *assertions*; my goal is discussion not argument. Fo
> Hi,
>
> as far as I understand it,
>
> you get no real benefit with doing them one by one, as each osd add, can
> cause a lot of data to be moved to a different osd, even tho you just
> rebalanced it.
Less than with older releases, but yeah.
I’ve known someone who advised against doing
> On Jun 15, 2021, at 10:26 AM, Andrew Walker-Brown
> wrote:
>
> With an unstable link/port you could see the issues you describe. Ping
> doesn’t have the packet rate for you to necessarily have a packet in transit
> at exactly the same time as the port fails temporarily. Iperf on the othe
>> Can you suggest me what is a good cephfs design?
One that uses copious complements of my employer’s components, naturally ;)
>> I've never used it, only
>> rgw and rbd we have, but want to give a try. Howvere in the mail list I saw
>> a huge amount of issues with cephfs
Something to remembe
>> I wonder that when a osd came back from power-lost, all the data
>> scrubbing and there are 2 other copies.
>> PLP is important on mostly Block Storage, Ceph should easily recover
>> from that situation.
>> That's why I don't understand why I should pay more for PLP and other
>> protections.
>
gt;> On 6/3/21 5:18 PM, Dave Hall wrote:
>>> Anthony,
>>>
>>> I had recently found a reference in the Ceph docs that indicated
>> something
>>> like 40GB per TB for WAL+DB space. For a 12TB HDD that comes out to
>>> 480GB. If this is n
Agreed. I think oh …. maybe 15-20 years ago there was often a wider difference
between SAS and SATA drives, but with modern queuing etc. my sense is that
there is less of an advantage. Seek and rotational latency I suspect dwarf
interface differences wrt performance. The HBA may be a bigger
The choice depends on scale, your choice of chassis / form factor, budget,
workload and needs.
The sizes you list seem awfully small. Tell us more about your use-case.
OpenStack? Proxmox? QEMU? VMware? Converged? Dedicated ?
—aad
> On May 29, 2021, at 2:10 PM, by morphin wrote:
>
> Hell
There is also a longstanding belief that using cpio saves you context switches
and data through a pipe. ymmv.
> On May 28, 2021, at 7:26 AM, Reed Dier wrote:
>
> I had it on my list of things to possibly try, a tar in | tar out copy to see
> if it yielded different results.
>
> On its face,
> It's not a 100% clear to me, but is the pdcache the same as the disk
> internal (non battery backed up) cache?
Yes, AIUI.
> As we are located very nearby the hydropower plant, we actually connect
> each server individually to an UPS.
Lucky you. I’ve seen an entire DC go dark with a power outa
I don’t have the firmware versions handy, but at one point around the 2014-2015
timeframe I found that both LSI’s firmware and storcli claimed that the default
setting was DiskDefault, ie. leave whatever the drive has alone. It turned
out, though, that for the 9266 and 9271, at least, behind t
I think what it’s saying is that it wants for more than one mgr daemon to be
provisioned, so that it can failover when the primary is restarted. I suspect
you would then run into the same thing with the mon. All sorts of things tend
to crop up on a cluster this minimal.
> On Apr 1, 2021, at
Depending on your kernel version, MemFree can be misleading. Attend to the
value of MemAvailable instead.
Your OSDs all look to be well below the target, I wouldn’t think you have any
problems. In fact 256GB for just 10 OSDs is an embarassment of riches. What
type of drives are you using, a
> On Mar 26, 2021, at 6:31 AM, Stefan Kooman wrote:
>
> On 3/9/21 4:03 PM, Jesper Lykkegaard Karlsen wrote:
>> Dear Ceph’ers
>> I am about to upgrade MDS nodes for Cephfs in the Ceph cluster (erasure code
>> 8+3 ) I am administrating.
>> Since they will get plenty of memory and CPU cores, I wa
> After you have filled that up, if such a host crashes or needs
> maintenance, another 80-100TB will need recreating from the other huge
> drives.
A judicious setting of mon_osd_down_out_subtree_limit can help mitigate the
thundering herd FWIW.
> I don't think there are specific limitations on
As Nathan describes, this information is maintained in the database on mon /
monitor nodes.
One always runs multiple mons in production, at least 3 and commonly 5. Each
has a full copy of everything, so that the loss of a node does not lose data or
impact operation.
BTW, it’s Ceph not CEPH
> I assume the limits are those that linux imposes. iops are the limits. One
> 20TB has 100 iops and 4x5TB have 400 iops. 400 iops serves more clients that
> 100 iops. You decide what you need/want to have.
>> Any other aspects on the limits of bigger capacity hard disk drives?
>
> Recovery wil
Which Ceph release are you running?
You mention the balancer, which would imply a certain lower bound.
What does `ceph balancer status` show?
>
> Does anyone know how I can rebalance my cluster to balance out the OSD
> usage?
>
>
>
> I just added 12 more 14Tb HDDs to my cluster (cluster
I’m at something of a loss to understand all the panic here.
Unless I’ve misinterpreted, CentOS isn’t killed, it’s being updated more
frequently. Want something stable? Freeze a repository into a local copy, and
deploy off of that. Like we all should be doing anyway, vs. relying on
slurping
With older releases, Michael Kidd’s log parser scripts were invaluable, notably
map_reporters_to_buckets.sh
https://github.com/linuxkidd/ceph-log-parsers
With newer releases, at least, one can send `dump_blocked_ops` to the OSD admin
socket. I collect these via Prometheus / node_exporter, it’
>> So if you are doing maintenance on a mon host in a 5 mon cluster you will
>> still have 3 in the quorum.
>
> Exactly. I was in exactly this situation, doing maintenance on 1 and screwing
> up number 2. Service outage
Been there. I had a cluster that nominally had 5 mons. Two suffered har
, etc)
>>> - NVMe device in other node dies
>>> - You loose data
>>>
>>> Although you can bring back the other node which was down but not broken
>>> you are missing data. The data on the NVMe devices in there is outdated
>>> and thus the PGs
Weighting up slowly so as not to DoS users. Huge omaps and EC. So yes you’re
actually agreeing with me.
>
> Taking a month to weight up a drive suggests the cluster doesn't have
> enough spare IO capacity.
___
ceph-users mailing list -- ceph-users@ce
>> Why would I when I can get a 18TB Seagate IronWolf for <$600, a 18TB Seagate
>> Exos for <$500, or a 18TB WD Gold for <$600?
>
> IOPS
Some installations don’t care so much about IOPS.
Less-tangible factors include:
* Time to repair and thus to restore redundancy. When an EC pool of spi
> I searched each to find the section where 2x was discussed. What I found was
> interesting. First, there are really only 2 positions here: Micron's and Red
> Hat's. Supermicro copies Micron's positon paragraph word for word. Not
> surprising considering that they are advertising a Superm
>
> Maybe the weakest thing in that configuration is having 2 OSDs per node; osd
> nearfull must be tuned accordingly so that no OSD goes beyond about 0.45, so
> that in case of failure of one disk, the other OSD in the node has enough
> space for healing replication.
>
A careful setting of
I’d be nervous about a plan to utilize a single volume, growing indefinitely.
I would think that from a blast radius perspective that you’d want to strike a
balance between a single monolithic blockchain-style volume vs a zillion tiny
files. Perhaps a strategy to shard into, say, 10 TB volumes
The survey team spent some time discussing the pros and cons of formats for a
number of the questions in the new survey. I think when we initially sent out
the first draft of the survey, that specific question was simple checkboxes, as
I think it had been in the previous year’s edition. The fi
I have firsthand experfience migrating multiple clusters from Ubuntu to RHEL,
preserving the OSDs along the way, with no loss or problems.
It’s not like you’re talking about OpenVMS ;)
> On Jan 25, 2021, at 9:14 PM, Szabo, Istvan (Agoda)
> wrote:
>
> Hi,
>
> Is there anybody running a cluste
When the below was first published my team tried to reproduce, and couldn’t. A
couple of factors likely contribute to differing behavior:
* _Micron 5100_ for example isn’t a model, the 5100 _Eco_, _Pro_, and _Max_ are
different beasts. Similarly, implementation and firmware details vary by dri
>
> Hi,
>
> We are replacing HDD with SSD, and we first (gradually) drain (reweight) the
> HDDs with 0.5 steps until 0 = empty.
>
> Works perfectly.
>
> Then (just for kicks) I tried reducing HDD weight from 3.6 to 0 in one large
> step. That seemed to have had more impact on the cluster, a
Perhaps setting the object-map feature on the image, and/or running rbd
object-map rebuild? Though I suspect that might perform an equivalent process
and take just as long?
> On Dec 15, 2020, at 11:49 PM, 胡 玮文 wrote:
>
> Hi Andre,
>
> I once faced the same problem. It turns out that ceph nee
and pool
> shrinks automatically again? Or still any additional actions are required?
> —
> Max
>
>> On 13. Dec 2020, at 15:53, Anthony D'Atri wrote:
>>
>> rbd status
>> rbd info
>>
>> If the ‘journaling’ flag is enabled, use ‘rbd feature’ to re
Any chance you might have orphaned `rados bench` objects ? This happens more
than one might think.
`rados ls > /tmp/out`
Inspect the result. You should see a few administrative objects, some header
and data objects for the RBD volume. If you see a zillion with names like
`bench*` there’s y
This is what I do as well.
> You can also just use a single command:
>
> ceph osd crush add-bucket host room=
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
>> If so why the client op priority is default 63 and recovery op is 3? This
>> means that by default recovery op is more prioritize than client op!
>
> Exactly the opposite. Client ops take priority over recovery ops. And
> various other ops have priorities as described in the document I poi
t’s why it is commonly suggested to set recovery_op_priority to 1 if you
need to slow down recovery as well as the other values I sent you.
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#operations
> Thanks.
>
> On Wed, Dec 2, 2020 at 10:25 PM Anthony D'Atri
> Give my above understanding, all-to-all is no difference from
> one-to-all. In either case, PGs of one disk are remapped to others.
>
> I must be missing something seriously:)
It’s a bit subtle, but I think part of what Frank is getting at is that when
OSDs are backfilled / recovered sequent
FWIW
https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/backfill_reservation.rst
has some discussion of op priorities, though client ops aren’t mentioned
explicitly. If you like, enter a documentation tracker and tag me and I’ll
look into adding that.
> On Dec 2, 2020, at 9:56 AM,
In certain cases (Luminous) it can actually be faster to destroy an OSD and
recreate it than to let it backfill huge maps, but I think that’s been improved
by Nautilus.
You might also try setting
osd_op_queue_cut_off = high
to reduce the impact of recovery on client operations. This became t
Christian wrote “post Octopus”. The referenced code seems likely to appear in
Pacific. We’ll see how it works out in practice.
I suspect that provisioned space will automagically be used when an OSD starts
under a future release, though the release notes may give us specific
instructions, li
>>
>
> Here is the context.
> https://docs.ceph.com/en/latest/mgr/orchestrator/#replace-an-osd
>
> When disk is broken,
> 1) orch osd rm --replace [--force]
> 2) Replace disk.
> 3) ceph orch apply osd -i
>
> Step #1 marks OSD "destroyed". I assume it has the same effect as
> "ceph osd destr
>> When replacing an osd, there will be no PG remapping, and backfill
>>> will restore the data on the new disk, right?
>>
>> That depends on how you decide to go through the replacement process.
>> Usually without your intervention (e.g. setting the appropriate OSD
>> flags) the remapping will
This was my first thought too. Is it just this one drive, all drives on this
host, or all drives in the cluster?
I’m curious if stupid HBA tricks are afoot, if this is a SAS / SATA drive.
Especially if it’s a RAID-capable HBA vs passthrough.
>>> It might be an issue with the driver then rep
context : JSON output was added to smartmontools 7 explicitly for Ceph use
>
> I had to roll an upstream version of the smartmon tools because everything
> with redhat 7/8 was too old to support the json option.
>
___
ceph-users mailing list -- ceph-
I had hoped to stay out of this, but here I go.
> 4) SATA controller and PCIe throughput
SoftIron claims “wire speed” with their custom hardware FWIW.
> Unfortunately these are the kinds of things that you can't easily generalize
> between ARM vs x86. Some ARM processors are going to do wildl
Those are QLC, with low durability. They may work okay for your use case if
you keep an eye on lifetime, esp if your writes tend to sequential. Random
writes will eat them more quickly, as will of course EC.
Remember that recovery and balancing contribute to writes, and ask Micron for
the
Same problem:
“Versions
latest octopus nautilus
“
This week I had to look up Jewel, Luminous, and Mimic docs and had to do so at
GitHub.
>
>> Hello,
>> maybe I missed the announcement but why is the documentation of the
>> older ceph version not accessible anymore on docs.ceph.com
>
> It's
>
> I'm probably going to get crucified for this
Naw. The <> in your From: header, though ….
;)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
> Am 11.11.20 um 11:20 schrieb Hans van den Bogert:
>> Hoping to learn from this myself, why will the current setup never work?
That was a bit harsh to have said. Without seeing your EC profile and the
topology, it’s hard to say for sure, but I suspect that adding another node
with at least o
Quoting in your message looks kind of messy so forgive me if I’m propagating
that below.
Honestly I agree that the Optanes will give diminishing returns at best for all
but the most extreme workloads (which will probably want to use NVMoF natively
anyway).
>>>
>>> This does split up the NVM
> I'm not entirely sure if primary on SSD will actually make the read happen on
> SSD.
My understanding is that by default reads always happen from the lead OSD in
the acting set. Octopus seems to (finally) have an option to spread the reads
around, which IIRC defaults to false.
I’ve never
10B as in ten bytes?
By chance have you run `rados bench` ? Sometimes a run is interrupted or one
forgets to clean up and there are a bunch of orphaned RADOS objects taking up
space, though I’d think `ceph df` would reflect that. Is your buckets.data
pool replicated or EC?
> On Oct 22, 2020
> Yeah, didn't think about a RAID10 really, although there wouldn't be enough
> space for 8x300GB = 2400GB WAL/DBs.
300 is overkill for many applications anyway.
>
> Also, using a RAID10 for WAL/DBs will:
> - make OSDs less movable between hosts (they'd have to be moved all
> together -
> Also, any thoughts/recommendations on 12TB OSD drives? For price/capacity
> this is a good size for us
Last I checked HDD prices seemed linear from 10-16TB. Remember to include the
cost of the drive bay, ie. the cost of the chassis, the RU(s) it takes up,
power, switch ports etc.
I’ll gu
I wonder if this would be impactful, even if `nodown` were set. When a given
OSD latches onto
the new replication network, I would expect it to want to use it for heartbeats
— but when
its heartbeat peers aren’t using the replication network yet, they won’t be
reachable.
Unless something has
th the mon DB size you
expect, try removing or replacing that OSD and I’ll bet you have better results.
— aad
>
> mon stat same yes.
>
> now I fininshed the email it is 8.7Gb.
>
> I hope I didn't break anything and it will delete everything.
>
> Thank you
> _
I hope you restarted those mons sequentially, waiting between each for the
quorum to return.
Is there any recovery or pg autoscaling going on?
Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the
same?
— aad
> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda)
> wro
>>
>> Very nice and useful document. One thing is not clear for me, the fio
>> parameters in appendix 5:
>> --numjobs=<1|4> --iodepths=<1|32>
>> it is not clear if/when the iodepth was set to 32, was it used with all
>> tests with numjobs=4 ? or was it:
>> --numjobs=<1|4> --iodepths=1
> We have
Poking through the source I *think* the doc should indeed refer to the “dup”
function, vs “copy”. That said, arguably we shouldn’t have a section in the
docs that says "there’s this thing you can do but we aren’t going to tell you
how”.
Looking at the history / blame info, which only seems to
* Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by
default.
If any OSD has repaired more than this many I/O errors in stored data a
``OSD_TOO_MANY_REPAIRS`` health warning is generated.
Look at `dmesg` and the underlying drive’s SMART counters. You almost
certainly hav
Thanks, Mark.
I’m interested as well, wanting to provide block service to baremetal hosts;
iSCSI seems to be the classic way to do that.
I know there’s some work on MS Windows RBD code, but I’m uncertain if it’s
production-worthy, and if RBD namespaces suffice for tenant isolation — and are
>> If you guys have any suggestions about used hardware that can be a good fit
>> considering mainly low noise, please let me know.
>
> So we didn’t get these requirements initially, there’s no way for us to help
> you when the requirements aren’t available for us to consider, even if we had
>
> thx for taking care. I read "works as designed, be sure to have disk
> space for the mon available”.
Well, yeah ;)
> It sounds a little odd that the growth
> from 50MB to ~15GB + compaction space happens within a couple of
> seconds, when two OSD rejoin the cluster.
I’m suspicious — even on
>> I think you found the answer!
>>
>> When adding 100 new OSDs to the cluster, I increased both pg and pgp
>> from 4096 to 16,384
>>
>
> Too much for your cluster, 4096 seems sufficient for a pool of size 10.
> You can still reduce it relatively cheaply while it hasn't been fully
> actuated y
>> With today’s networking, _maybe_ a super-dense NVMe box needs 100Gb/s where
>> a less-dense probably is fine with 25Gb/s. And of course PCI lanes.
>>
>> https://cephalocon2019.sched.com/event/M7uJ/affordable-nvme-performance-on-ceph-ceph-on-nvme-true-unbiased-story-to-fast-ceph-wido-den-holl
Apologies for not consolidating these replys. My UMA is not my friend today.
> With 10 NVMe drives per node, I'm guessing that a single EPYC 7451 is
> going to be CPU bound for small IO workloads (2.4c/4.8t per OSD), but
> will be network bound for large IO workloads unless you are sticking
> 2x1
> How they did it?
You can create partitions / LVs by hand and build OSDs on them, or you can use
ceph-volume lvm batch –osds-per-device
> I have an idea to create a new bucket type under host, and put two LV from
> each ceph osd VG into that new bucket. Rules are the same (different host),
> That's pretty much the advice I've been giving people since the Inktank days.
> It costs more and is lower density, but the design is simpler, you are less
> likely to under provision CPU, less likely to run into memory bandwidth
> bottlenecks, and you have less recovery to do when a node f
> we use heavily bonded interfaces (6x10G) and also needed to look at this
> balancing question. We use LACP bonding and, while the host OS probably tries
> to balance outgoing traffic over all NICs
> I tested something in the past[1] where I could notice that an osd
> staturated a bond link an
Depending what you’re looking to accomplish, setting up a cluster in VMs
(VirtualBox, Fusion, cloud provider, etc) may meet your needs without having to
buy anything.
>
> - Don't think having a few 1Gbit can replace a >10Gbit. Ceph doesn't use
> such bonds optimal. I already asked about this y
Is this a reply to Paul’s message from 11 months ago?
https://bit.ly/32oZGlR
The PM1725b is interesting in that it has explicitly configurable durability vs
capacity, which may be even more effective than user-level short-stroking /
underprovisioning.
>
> Hi. How do you say 883DCT is faster
If you have capacity to have both online at the same time, why not add the SSDs
to the existing pool, let the cluster converge, then remove the HDDs? Either
all at once or incrementally? With care you’d have zero service impact. If
you want to change the replication strategy at the same time,
Now that’s a *very* different question from numbers assigned during an install.
With recent releases instead of going down the full removal litany listed
below, you can instead down/out the OSD and `destroy` it. That preserves the
CRUSH bucket and OSD ID, then when you use ceph-disk, ceph-volu
FWIW a handful of years back there was a bug in at least some LSI firmware
where the setting “Disk Default” silently turned the volatile cache *on*
instead of the documented behavior, which was to leave alone.
> On Sep 3, 2020, at 8:13 AM, Reed Dier wrote:
>
> It looks like I ran into the same
>
> huxia...@horebdata.cn
>
> From: Anthony D'Atri
> Date: 2020-09-05 20:00
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] PG number per OSD
> One factor is RAM usage, that was IIRC the motivation for the lowering of the
> recommendat
One factor is RAM usage, that was IIRC the motivation for the lowering of the
recommendation of the ratio from 200 to 100. Memory needs also increase during
recovery and backfill.
When calculating, be sure to consider repllicas.
ratio = (pgp_num x replication) / num_osds
As HDDs grow the inte
>
> Looking for a bit of guidance / approach to upgrading from Nautilus to
> Octopus considering CentOS and Ceph-Ansible.
>
> We're presently running a Nautilus cluster (all nodes / daemons 14.2.11 as
> of this post).
> - There are 4 monitor-hosts with mon, mgr, and dashboard functions
> consol
Is your MUA wrapping lines, or is the list software?
As predicted. Look at the VAR column and the STDDEV of 37.27
> On Aug 27, 2020, at 9:02 AM, Dallas Jones wrote:
>
> 1 122.79410- 123 TiB 42 TiB 41 TiB 217 GiB 466 GiB 81
> TiB 33.86 1.00 -root default
> -3
Doubling the capacity in one shot was a big topology change, hence the 53%
misplaced.
OSD fullness will naturally reflect a bell curve; there will be a tail of
under-full and over-full OSDs. If you’d not said that your cluster was very
full before expansion I would have predicted it from the f
> I wanna limit the traffic of specific buckets. Can haproxy, nginx or any
> other proxy software deal with it ?
Yes. I’ve seen it done.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
My understanding is that the existing mon_clock_drift_allowed value of 50 ms
(default) is so that PAXOS among the mon quorum can function. So OSDs (and
mgrs, and clients etc) are out of scope of that existing code.
Things like this are why I like to ensure that the OS does `ntpdate -b` or
equi
One way this can happen is if you have the default setting
osd_scrub_during_recovery=false
If you’ve been doing a lot of [re]balancing, drive replacements, topology
changes, expansions, etc. scrubs can be starved especially if you’re doing EC
on HDDs.
HDD or SSD OSDs? Replication or E
This is a natural condition of CRUSH. You don’t mention what release the
back-end or the clients are running so it’s difficult to give an exact answer.
Don’t mess with the CRUSH weights.
Either adjust the override / reweights with `ceph osd
test-reweight-by-utilization / reweight-by-utilizatio
>> In the past, the minimum recommendation was 1GB RAM per HDD blue store OSD.
There was a rule of thumb of 1GB RAM *per TB* of HDD Filestore OSD, perhaps you
were influenced by that?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe
Bear in mind that ARM and x86 are architectures, not CPU models. Both are
available in a vast variety of core counts, clocks, and implementations.
Eg., an 80 core Ampere Altra likely will smoke a Intel Atom D410 in every way.
That said, what does “performance” mean? For object storage, it mig
min_compat is a different thing entirely.
You need to set the tunables as a group. This will cause data to move, so you
may wish to throttle recovery, model the PG movement ahead of time, use the
upmap trick to control movement etc.
https://ceph.io/geen-categorie/set-tunables-optimal-on-ce
> That is an interesting point. We are using 12 on 1 nvme journal for our
> Filestore nodes (which seems to work ok). The workload for wal + db is
> different so that could be a factor. However when I've looked at the IO
> metrics for the nvme it seems to be only lightly loaded, so does not ap
> Thanks for the thinking. By 'traffic' I mean: when a user space rbd
> write has as a destination three replica osds in the same chassis
eek.
> does the whole write get shipped out to the mon and then back
Mons are control-plane only.
> All the 'usual suspects' like lossy ethernets and mis
What does “traffic” mean? Reads? Writes will have to hit the net regardless
of any machinations.
> On Jun 29, 2020, at 7:31 PM, Harry G. Coin wrote:
>
> I need exactly what ceph is for a whole lot of work, that work just
> doesn't represent a large fraction of the total local traffic.
M=1 is never a good choice. Just use replication instead.
> On Jun 26, 2020, at 3:05 AM, Zhenshi Zhou wrote:
>
> Hi Janne,
>
> I use the default profile(2+1) and set failure-domain=host, is my best
> practice?
>
> Janne Johansson 于2020年6月26日周五 下午4:59写道:
>
>> Den fre 26 juni 2020 kl 10:3
The benefit of disabling on-drive cache may be at least partly dependent on the
HBA; I’ve done testing of one specific drive model and found no difference,
where someone else reported a measurable difference for the same model.
> Good to know that we're not alone :) I also looked for a newer fir
>> I can remember reading this before. I was hoping you maybe had some
>> setup with systemd scripts or maybe udev.
>
> Yeah, doing this on boot up would be ideal. I was looking really hard into
> tuned and other services that claimed can do it, but required plugins or
> other stuff did/does n
401 - 500 of 558 matches
Mail list logo