Hey Peter,
the /var/lib/ceph directories mainly contain "meta data" that, depending
on the ceph version and osd setup, can even be residing on tmpfs by
default.
Even if the data was on-disk, they are easy to recreate:
Good morning,
in the spirit of the previous thread, I am wondering if anyone ever
succeeded in merging two separate ceph clusters into one?
Background from my side: we are running multiple ceph clusters in
k8s/rook, but we still have some Nautilus/Devuan based clusters that are
Hello dear fellow ceph users,
it seems that for some months all current ceph releases (16.x, 17.x,
18.x) are having a bug in ceph-volume that causes disk
activation to fail with the error "IndexError: list index out of range"
(details below, [0]).
It also seems there is already a fix for it
We also had a look at this a few years ago when we flashed almost all
servers to boot iPXE from the nic directly. The idea was as follows:
iPXE -> http -> get kernel + initramfs -> mount rootfs
Our idea was to use RBD as a disk for the server, but to the last of my
knowledge, there is no
Hey Drew,
Drew Weaver writes:
> #1 cephadm or ceph-ansible for management?
> #2 Since the whole... CentOS thing... what distro appears to be the most
> straightforward to use with Ceph? I was going to try and deploy it on Rocky
> 9.
I would strongly recommend k8s+rook for new clusters,
Hello,
a note: we are running IPv6 only clusters since 2017, in case anyone has
questions. In earlier releases no tunings were necessary, later releases
need the bind parameters.
BR,
Nico
Stefan Kooman writes:
> On 15-09-2023 09:25, Robert Sander wrote:
>> Hi,
>> as the documentation sends
Morning,
we are running some ceph clusters with rook on bare metal and can very
much recomend it. You should have proper k8s knowledge, knowing how to
change objects such as configmaps or deployments, in case things go
wrong.
In regards to stability, the rook operator is written rather
Hey,
in case building from source does not work out for you, here is a
strategy we used to recover older systems before:
- Create a .tar from /, pipe it out via ssh to another host
- basically take everything with the exception of unwanted mountpoints
- Untar it, modify networking, hostname,
e. So my answer to "how do I start over" would be "go figure
> it out, its an important lesson".
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: N
Hello Redouane,
much appreciated kick-off for improving cephadm. I was wondering why
cephadm does not use a similar approach to rook in the sense of "repeat
until it is fixed?"
For the background, rook uses a controller that checks the state of the
cluster, the state of monitors, whether there
Good morning ceph community,
for quite some time I was wondering if it would not make sense to add an
iftop alike interface to ceph that shows network traffic / iops on a per
IP basis?
I am aware of "rbd perf image iotop", however I am much more interested
into a combined metric featuring 1)
Good morning ceph-users,
we currently have one OSD based on a SATA SSD (750GB raw) that consume
around 42 GB of RAM. The cluster status is HEALTH_OK, no rebalancing or
pg change.
I can go ahead and just kill it, but I was wondering if there is an easy
way to figure out why it is consuming so
Hey Oğuz,
the typical recommendations of native ceph still uphold in k8s,
additionally something you need to consider:
- Hyperconverged setup or dedicated nodes - what is your workload and
budget
- Similar to native ceph, think about where you want to place data, this
influences the
Good morning,
we are trying to migrate a Ceph/Nautilus cluster
into kubernetes/rook/pacific [0]. Due to limitations in kubernetes we
probably need to change the cluster network range, which is currently
set to 2a0a:e5c0::/64.
My question to the list: did anyone already go through this?
My
Hey Erwin,
I'd recommend to checkout the individual OSD performance in the slower
cluster. We have seen such issues with SSDs that wore off - it might
just be a specific OSD / pg that you are hitting.
Best regards,
Nico
Erwin Ceph writes:
> Hi,
>
> We do run several Ceph clusters, but one
Erik Lindahl writes:
>> On 20 Aug 2021, at 10:39, Nico Schottelius
>> wrote:
>>
>> I believe mid term everyone will need to provide their own image
>> registries, as the approach of "everything is at dockerhub|quay"
>> does not scale well.
>
ike everyone has to pull from quay.io now...
>
> Best regards,
> Stefan
>
> On 20 August 2021 08:32:01 CEST, Nico Schottelius
> wrote:
>>
>>Nigel Williams writes:
>>> not showing via podman either:
>>
>>A note to make it maybe a bit easier:
>>
&
Nigel Williams writes:
> not showing via podman either:
A note to make it maybe a bit easier:
You can see on
https://hub.docker.com/r/ceph/ceph/tags?page=1=last_updated
which images / tags exist. 15.2.14 has not yet been tagged/pushed.
Best regards,
Nico
--
Sustainable and modern
Hello Marc,
Marc writes:
> #ceph:ungleich.ch is not accessible at this time.
this might be a temporary issue - from which homeserver / domain are you
trying to join?
> Try again later, or ask a room admin to check if you have access.
>
> M_UNKNOWN was returned while trying to access the
nything. I'm running mimic 13.2.10 though.
>>
>> Best regards,
>> =====
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>>
>> From: Nico Schottelius
>> Sent: 11 August 2021 1
Hey Frank,
Frank Schilder writes:
> The recovery_sleep options are the next choice to look at. Increase it and
> clients will get more I/O time slots. However, with your settings, I'm
> surprised clients are impacted at all. I usually leave the op-priority at its
> default and use
Good morning,
after removing 3 osds which had been dead for some time,
rebalancing started this morning and makes client I/O really slow (in
the 10~30 MB/s area!). Rebalancing started at 1.2 ~ 1.6 Gb/s
after issuing
ceph tell 'osd.*' injectargs --osd-max-backfills=1
Hey David,
that is a normal process due to rebalancing. After the rebalancing is
done, you will have more space.
Best regards,
Nico
David Yang writes:
> There is also a set of mon+mgr+mds running on one of the storage nodes.
> David Yang 于2021年8月11日周三 上午11:24写道:
>
>> hi
>> I have a cluster
Brad Hubbard writes:
> On Tue, Jul 27, 2021 at 5:53 AM Nico Schottelius
> wrote:
> Can you clarify which IRC channel specifically you are referring to?
#ceph on oftc, I attached below the log of the last hours for reference.
01:50 -!- Guest2360 [~dc...@91-165-30-84.subs.proxad
Good morning Brad,
Brad Hubbard writes:
> On Tue, Jul 27, 2021 at 5:53 AM Nico Schottelius
> wrote:
>>
>>
>> Good evening dear mailing list,
>>
>> while I do think we have a great mailing list (this is one of the most
>> helpful open source ma
Good evening dear mailing list,
while I do think we have a great mailing list (this is one of the most
helpful open source mailing lists I'm subscribed to), I do agree with
the ceph IRC channel not being so helpful. The join/leave messages on
most days significantly exceeds the number of real
HC,
we have seen a very similar problem some months ago on Nautilus, where
our cluster had multiple ours slow client IO. The "solution" was to
re-re-re-start most components. As we often had several OSDs pointed out
to be slow, restarting slow OSD to slow OSD *seemed* to help, however
later
Hey Sage,
Sage Weil writes:
> Thank you for bringing this up. This is in fact a key reason why the
> orchestration abstraction works the way it does--to allow other
> runtime environments to be supported (FreeBSD!
> sysvinit/Devuan/whatever for systemd haters!)
I would like you to stop
Hey Sage,
thanks for the reply.
Sage Weil writes:
> Rook is based on kubernetes, and cephadm on podman or docker. These
> are well-defined runtimes. Yes, some have bugs, but our experience so
> far has been a big improvement over the complexity of managing package
> dependencies across even
I think 2 things need to be clarified here:
> [...]
> Again, clean orchestration, being able to upgrade each deamon without
> influencing running ones, this is just not possible with the native
> packages.
If a daemon is running on an operating system, it does not reload shared
libraries or
Good evening,
as an operator running Ceph clusters based on Debian and later Devuan
for years and recently testing ceph in rook, I would like to chime in to
some of the topics mentioned here with short review:
Devuan/OS package:
- Over all the years changing from Debian to Devuan, changing
I've to say I am reading quite some interesting strategies in this
thread and I'd like to shortly take the time to compare them:
1) one by one osd adding
- least amount of pg rebalance
- will potentially re-re-balance data that has just been distributed
with the next OSD phase in
- limits the
Dera Sasha and for everyone else as well,
Sasha Litvak writes:
> Podman containers will not restart due to restart or failure of centralized
> podman daemon. Container is not synonymous to Docker. This thread reminds
> me systemd haters threads more and more by I guess it is fine.
calling
Markus Kienast writes:
> Hi Nico,
>
> we are already doing exactly that:
>
> Loading initrd via iPXE
> which contains the necessary modules and scripts to boot an RBD boot dev.
> Works just fine.
Interesting and very good to hear. How do you handle kernel differences
(loaded kernel vs.
Hey Markus, Ilya,
you don't know with how much interest I am following this thread,
because ...
>> Generally it would be great if you could include the proper initrd code for
>> RBD and CephFS root filesystems to the Ceph project. You can happily use my
>> code as a starting point.
>>
>>
Reed Dier writes:
> I don't have a solution to offer, but I've seen this for years with no
> solution.
> Any time a MGR bounces, be it for upgrades, or a new daemon coming online,
> etc, I'll see a scale spike like is reported below.
Interesting to read that we are not the only ones.
>
Matt Larson writes:
> Is anyone trying Ceph clusters containing larger (4-8TB) SSD drives?
>
> 8TB SSDs are described here (
> https://www.anandtech.com/show/16136/qlc-8tb-ssd-review-samsung-870-qvo-sabrent-rocket-q
> ) and make use QLC NAND flash memory to reach the costs and capacity.
>
Hello,
we have a recurring, funky problem with managers on Nautilus (and
probably also earlier versions): the manager displays incorrect
information.
This is a recurring pattern and it also breaks the prometheus graphs, as
the I/O is described insanely incorrectly: "recovery: 43 TiB/s, 3.62k
I believe it was nautilus that started requiring
ms_bind_ipv4 = false
ms_bind_ipv6 = true
if you run IPv6 only clusters. OSDs prior to nautilus worked without
these settings for us.
I'm not sure if the port change (v1->v2) was part of luminous->nautilus
as well, but you might want to check
Reed Dier writes:
> I don't have any performance bits to offer, but I do have one experiential
> bit to offer.
>
> My initial ceph deployment was on existing servers, that had LSI raid
> controllers (3108 specifically).
> We created R0 vd's for each disk, and had BBUs so were using write
Mark Lehrer writes:
>> One server has LSI SAS3008 [0] instead of the Perc H800,
>> which comes with 512MB RAM + BBU. On most servers latencies are around
>> 4-12ms (average 6ms), on the system with the LSI controller we see
>> 20-60ms (average 30ms) latency.
>
> Are these reads, writes, or a
Marc writes:
> This is what I have when I query prometheus, most hdd's are still sata
> 5400rpm, there are also some ssd's. I also did not optimize cpu frequency
> settings. (forget about the instance=c03, that is just because the data comes
> from mgr c03, these drives are on different
Marc writes:
>> For the background: we have many Perc H800+MD1200 [1] systems running
>> with
>> 10TB HDDs (raid0, read ahead, writeback cache).
>> One server has LSI SAS3008 [0] instead of the Perc H800,
>> which comes with 512MB RAM + BBU. On most servers latencies are around
>> 4-12ms
Good evening,
I've to tackle an old, probably recurring topic: HBAs vs. Raid
controllers. While generally speaking many people in the ceph field
recommend to go with HBAs, it seems in our infrastructure the only
server we phased in with an HBA vs. raid controller is actually doing
worse in
?
>
> The systemd part is only enabling and starting the service but the tmpfs
> part should work if you're not using systemd
>
> https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L212
>
> Dimitri
>
> On Monday, April 19, 2021, Nico Scho
temd
Linux distributions to activate the ceph-osds, let me know.
In any case, we'll publish our new style scripts on [0].
Best regards,
Nico
[0] https://code.ungleich.ch/ungleich-public/ungleich-tools
Nico Schottelius writes:
> Good morning,
>
> is there any documentation available regarding t
Good morning,
is there any documentation available regarding the meta data stored
within LVM that ceph-volume manages / creates?
My background is that ceph-volume activate does not work on non-systemd
Linux distributions, but if I know how to recreate the tmpfs, we can
easily start the osd
and I was wondering if anyone can confirm or decline that?
Best regards,
Nico
Nico Schottelius writes:
> Update, posting information from other posts before:
>
> [08:09:40] server3.place6:~# ceph config-key dump | grep config/
> "config/global/auth_client_required": &
Good morning,
I've look somewhat intensively through the list and it seems we are
rather hard hit by this. Originally yesterday started on a mixed 14.2.9
and 14.2.16 cluster (osds, mons were all 14.2.16).
We started phasing in 7 new osds, 6 of them throttled by reweighting to
0.1.
Symptoms are
Good morning,
I was wondering if there are any timing indications as to how long a PG
should "usually" stay in a certain state?
For instance, how long should a pg stay in
- peering (seconds - minutes?)
- activating (seconds?)
- srubbing (+deep)
The scrub process obviously depends on the
Frank Schilder writes:
> I think there are a couple of reasons for LVM OSDs:
>
> - bluestore cannot handle multi-path devices, you need LVM here
> - the OSD meta-data does not require a separate partition
However the meta-data is saved in a different LV, isn't it? I.e. isn't
practically the
Stefan Kooman writes:
> On 3/23/21 11:00 AM, Nico Schottelius wrote:
>> Stefan Kooman writes:
>>>> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
>>>> this?
>>>
>>> Do you have: osd_class_update_on_start enabled
Stefan Kooman writes:
>> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
>> this?
>
> Do you have: osd_class_update_on_start enabled?
So this one is a bit funky. It seems to be off, but the behaviour would
indicate it isn't. Checking the typical configurations:
Hello,
follow up from my mail from 2020 [0], it seems that OSDs sometimes have
"multiple classes" assigned:
[15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush
rm-device-class osd.4
done removing class of osd(s): 4
[15:47:17] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd
Good evening,
I've seen the shift in ceph to focus more on LVM than on plain (direct)
access to disks. I was wondering what the motivation is for that.
>From my point of view OSD disk layouts never change (they are re-added
if they do), so the dynamic approach of LVM is probably not the
On 2021-03-16 22:06, Stefan Kooman wrote:
On 3/16/21 6:37 PM, Stephen Smith6 wrote:
Hey folks - thought I'd check and see if anyone has ever tried to use
ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes?
croit.io does that quite succesfully I believe [1].
Same here at ungleich, all
Hello,
we have recently moved our ceph monitors from small, 4GB RAM servers to
big servers, because we saw memory pressure on the machines.
However even on our big machines (64GB ~ 1TB RAM) we are seeing ceph-mon
processes being killed at around 90-94GB of RAM.
Now, my understanding is
Good evening,
since 2018 we have been using a custom script to create disks /
partitions, because at the time both ceph-disk and ceph-volume exhibited
bugs that made them unreliable for us.
We recently re-tested ceph-volume and while it seems generally speaking
[0] to work, using LVM seems to
Good evening Frank,
Frank Schilder writes:
> Hi Nico and Mark,
>
> your crush trees look indeed like they have been converted properly to
> using device classes already. Changing something within one device
> class should not influence placement in another. Maybe I'm overlooking
> something?
Frank Schilder writes:
>> To me it looks like the structure of both maps is pretty much the same -
>> or am I mistaken?
>
> Yes, but you are not Marc Roos. Do you work on the same cluster or do
> you observe the same problem?
No, but we recently also noticed that rebuilding one pool ("ssd")
Good morning,
you might have seen my previous mails and I wanted to discuss some
findings over the last day+night over what happened and why it happened
here.
As the system behaved inexplicitly for us, we are now looking for
someone to analyse the root cause on consultancy basis - if you are
7c8055680
0x7f07c8053740 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0
l=1).handle_connect_reply_2 connect got BADAUTHORIZ
Nico Schottelius writes:
> So the same problem happens with pgs which are in "unknown" state,
>
> [19:31:08] black2.place6:~# ceph pg 2.5b2 query | tee qu
Hey Andreas,
thanks for the insights. Maybe a bit more background:
We are running a variety of pools, the majority of data is stored on the
"hdd" and "ssd" pools, which make use of the "ssd" and "hdd-big" (as in
3.5") classes.
Andreas John writes:
Hey Frank,
Frank Schilder writes:
>> > Is the crush map aware about that?
>>
>> Yes, it correctly shows the osds at serve8 (previously server15).
>>
>> > I didn't ever try that, but don't you need to cursh move it?
>>
>> I originally imagined this, too. But as soon as the osd starts on a new
Hey Andreas,
Andreas John writes:
> Hey Nico,
>
> maybe you "pinned" the IP of the OSDs in question in ceph.conf to the IP
> of the old chassis?
That would be nice - unfortunately our ceph.conf is almost empty:
[22:11:59] server15.place6:/sys/class/block/sdg# cat /etc/ceph/ceph.conf
#
rsized+degraded
1active+remapped+backfill_toofull
io:
client: 44 MiB/s rd, 4.2 MiB/s wr, 991 op/s rd, 389 op/s wr
recovery: 71 MiB/s, 18 objects/s
Nico Schottelius writes:
> Hello,
>
> after having moved 4 ssds to another host (+ the ceph tell hanging issue
&
Hey Andreas,
Andreas John writes:
> Hello,
>
> On 22.09.20 20:45, Nico Schottelius wrote:
>> Hello,
>>
>> after having moved 4 ssds to another host (+ the ceph tell hanging issue
>> - see previous mail), we ran into 241 unknown pgs:
>
> You mean, t
Hello,
after having moved 4 ssds to another host (+ the ceph tell hanging issue
- see previous mail), we ran into 241 unknown pgs:
cluster:
id: 1ccd84f6-e362-4c50-9ffe-59436745e445
health: HEALTH_WARN
noscrub flag(s) set
2 nearfull osd(s)
1
So the same problem happens with pgs which are in "unknown" state,
[19:31:08] black2.place6:~# ceph pg 2.5b2 query | tee query_2.5b2
hangs until the pg actually because active again. I assume that this
should not be the case, should it?
Nico Schottelius writes:
> Update
Update to the update: currently debugging why pgs are stuck in the
peering state:
[18:57:49] black2.place6:~# ceph pg dump all | grep 2.7d1
dumped all
2.7d1 1 00 0 0 69698617344
0 0 3002 3002
r/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
osd.64
osd.65
What's the best way to figure out why osd.63 does not react to the tell
command?
Best regards,
Nico
Nico Schottelius writes:
> Hello Stefan,
>
> St
Hello again,
following up on the previous mail, one cluster gets rather slow at the
moment and we have spotted something "funny":
When checking ceph pg dump we see some osds have HB peers with osds that
they should not have any pg in common with.
When restarting one of the effected osds, we
Hello Stefan,
Stefan Kooman writes:
> Hi,
>
>> However as soon as we issue either of the above tell commands, it just
>> hangs. Furthermore when ceph tell hangs, pg are also becoming stuck in
>> "Activating" and "Peering" states.
>>
>> It seems to be related, as soon as we stop ceph tell
Hello,
recently we wanted to re-adjust rebalancing speed in one cluster with
ceph tell osd.* injectargs '--osd-max-backfills 4'
ceph tell osd.* injectargs '--osd-recovery-max-active 4'
The first osds responded and after about 6-7 osds ceph tell stopped
progressing, just after it encountered a
74 matches
Mail list logo