[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Anthony D'Atri
> In this context, I find it quite disturbing that nobody is willing even to > discuss an increase of the release cycle from say 2 to 4 years. What is so > important about pumping out one version after the other that real issues > caused by this speed are ignored? One factor I think is that

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡 玮文
Hi Patrick, One of the stuck client has num_caps at around 269700, and well above the number of files opened on the client (about 9k). See my reply to Dan for details. So I don't think this warning is simply caused by "mds_min_caps_working_set" being set too low. > -邮件原件- > 发件人:

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡 玮文
Thanks Dan, I choose one of the stuck client to investigate, as shown below, it currently holds ~269700 caps, which is pretty high with no obvious reason. I cannot understand most of the output, and failed to find any documents about it. # ceph tell mds.cephfs.gpu018.ovxvoz client ls

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Christian Wuerdig
I think Marc uses containers - but they've chosen Apache Mesos as orchestrator and ceph-adm doesn't work with that. Currently essentially two ceph container orchestrators exist - rook which is a ceph orch or kubernetes and ceph-adm which is an orchestrator expecting docker or podman Admittedly I

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Tony Liu
Instead of complaining, take some time to learn more about container would help. Tony From: Marc Sent: November 18, 2021 10:50 AM To: Pickett, Neale T; Hans van den Bogert; ceph-users@ceph.io Subject: [ceph-users] Re: [EXTERNAL] Re: Why you might want

[ceph-users] bluestore_quick_fix_on_mount

2021-11-18 Thread Lindsay Mathieson
How does one read/set that from the command line? Thanks, Lindsay ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Patrick Donnelly
On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 wrote: > > Hi all, > > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it > seems harmless, but we cannot get HEALTH_OK, which is annoying. > > The clients that are reported failing to respond to cache pressure are > constantly

[ceph-users] Re: This week: Ceph User + Dev Monthly Meetup

2021-11-18 Thread Neha Ojha
Hello, I don't think the meeting was recorded but there are detailed notes in https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. The next meeting is scheduled for December 16, feel free to include your discussion topic to the agenda. Thanks, Neha On Thu, Nov 18, 2021 at 11:04 AM Szabo,

[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-18 Thread Kai Börnert
Hi, do you use more nodes than deployed mgrs and cephadm? If so it might be, that the node you are connecting to no longer has a instance of the mgr running, and you only getting some leftovers in the browser cache? At least this was happening in my test cluster, but I was always able to

[ceph-users] Dashboard's website hangs during loading, no errors

2021-11-18 Thread Zach Heise (SSCC)
Hello!   Our test cluster is a few months old, was initially set up from scratch with Pacific and has now had two separate small patches 16.2.5 and then a couple weeks ago, 16.2.6 applied to it. The issue I?m describing has been present

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
It looks like the bug has been there since the read leases were introduced, which I believe was octopus (15.2.z) s On Thu, Nov 18, 2021 at 3:55 PM huxia...@horebdata.cn wrote: > May i ask, which versions are affected by this bug? and which versions are > going to receive backports? > > best

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread huxia...@horebdata.cn
May i ask, which versions are affected by this bug? and which versions are going to receive backports? best regards, samuel huxia...@horebdata.cn From: Sage Weil Date: 2021-11-18 22:02 To: Manuel Lausch; ceph-users Subject: [ceph-users] Re: OSD spend too much time on "waiting for readable"

[ceph-users] November Ceph Science Virtual User Group

2021-11-18 Thread Kevin Hrpcek
Hey all, We will be having a Ceph science/research/big cluster call on Wednesday November 24th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
Okay, good news: on the osd start side, I identified the bug (and easily reproduced locally). The tracker and fix are: https://tracker.ceph.com/issues/53326 https://github.com/ceph/ceph/pull/44015 These will take a while to work through QA and get backported. Also, to reiterate what I said

[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-18 Thread Wesley Dillingham
That response is typically indicative of a pg whose OSD sets has changed since it was last scrubbed (typically from a disk failing). Are you sure its actually getting scrubbed when you issue the scrub? For example you can issue: "ceph pg query" and look for "last_deep_scrub_stamp" which will

[ceph-users] A middle ground between containers and 'lts distros'?

2021-11-18 Thread Harry G. Coin
I sense the concern about ceph distributions via containers generally has to do with what you might call a feeling of 'opaqueness'.   The feeling is amplified as most folks who choose open source solutions prize being able promptly to address the particular concerns affecting them without

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> > If your building a ceph cluster, the state of a single node shouldn't > matter. Docker crashing should not be a show stopper. > You remind me of this senior software engineer of redhat that told me it was not that big of deal that ceph.conf got deleted and the root fs was mounted via a

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> > Please remember, free software comes still with a price. You can not > expect someone to work on your individual problem while being cheap on > your highly critical data. If your data has value, then you should > invest in ensuring data safety. There are companies out, paying Ceph >

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> We also use containers for ceph and love it. If for some reason we > couldn't run ceph this way any longer, we would probably migrate > everything to a different solution. We are absolutely committed to > containerization. I wonder if you are really using containers. Are you not just using

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> > docker itself is not the problem, I would even argue the opposite. If the docker daemon crashes it takes down all containers. Sorry but in this time this is really not necessary with other alternatives. ___ ceph-users mailing list --

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> Am 17.11.21 um 20:14 schrieb Marc: > >> a good choice. It lacks RBD encryption and read leases. But for us > >> upgrading from N to O or P is currently not > >> > > what about just using osd encryption with N? > > > That would be Data at Rest encryption only. The keys for the OSDs are > stored

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Would it be worth setting the OSD I removed back to "in" (or whatever the opposite of "out") is and seeing if things recovered? On Thu, Nov 18, 2021 at 3:44 PM David Tinker wrote: > Tx. # ceph version > ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus > (stable) > > > > On

[ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

2021-11-18 Thread Lincoln Bryant
Hi all, Just to close the loop on this one - we ultimately found that there was an MTU misconfiguration between the hosts that was causing Ceph and other things to fail in strange ways. After fixing the MTU, cephadm etc immediately started working. Cheers, Lincoln

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Daniel Tönnißen
The weighted category prioritization clearly identifies reliability as the top priority. Daniel > Am 18.11.2021 um 15:32 schrieb Sasha Litvak : > > Perhaps I missed something, but does the survey concludes that users don't > value reliability improvements at all? This would explain why

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Sasha Litvak
Perhaps I missed something, but does the survey concludes that users don't value reliability improvements at all? This would explain why developers team wants to concentrate on performance and ease of management. On Thu, Nov 18, 2021, 07:23 Stefan Kooman wrote: > On 11/18/21 14:09, Maged

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Tx. # ceph version ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable) On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman wrote: > On 11/18/21 13:20, David Tinker wrote: > > I just grepped all the OSD pod logs for error and warn and nothing comes > up: > > > > # k logs

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
If I ignore the dire warnings and about losing data and do: ceph osd purge 7 will I lose data? There are still 2 copies of everything right? I need to remove the node with the OSD from the k8s cluster, reinstall it and have it re-join the cluster. This will bring in some new OSDs and maybe Ceph

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Maged Mokhtar
Hello Cephers, i too am for LTS releases or for some kind of middle ground like longer release cycle and/or have even numbered releases designated for production like before. We all use LTS releases for the base OS when running Ceph, yet in reality we depend much more on the Ceph code than

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Dan van der Ster
Hi, We sometimes have similar stuck client recall warnings. To debug you can try: (1) ceph health detail that will show you the client ids which are generating the warning. (e.g. 1234) (2) ceph tell mds.* client ls id=1234 this will show lots of client statistics for the session.

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Peter Lieven
Am 17.11.21 um 20:14 schrieb Marc: a good choice. It lacks RBD encryption and read leases. But for us upgrading from N to O or P is currently not what about just using osd encryption with N? That would be Data at Rest encryption only. The keys for the OSDs are stored on the mons. Data is

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
I just grepped all the OSD pod logs for error and warn and nothing comes up: # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk | grep -i warn etc I am assuming that would bring back something if any of them were unhappy. On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman wrote: > On

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Sure. Tx. # ceph pg 3.1f query { "snap_trimq": "[]", "snap_trimq_len": 0, "state": "active+undersized+degraded", "epoch": 2477, "up": [ 0, 2 ], "acting": [ 0, 2 ], "acting_recovery_backfill": [ "0", "2" ],

[ceph-users] One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Hi Guys I am busy removing an OSD from my rook-ceph cluster. I did 'ceph osd out osd.7' and the re-balancing process started. Now it has stalled with one pg on "active+undersized+degraded". I have done this before and it has worked fine. # ceph health detail HEALTH_WARN Degraded data