n't see).
This mail is mostly to figure out if there are good guesses why the pg_log size
per OSD process exploded? Any technical (and moral) support is appreciated.
Also, currently we're not sure if 14.2.13 triggered this, so this is also to
put a data point out there f
: 3111,
"pgid": "26.4",
"ondisk_log_size": 3185,
"pgid": "33.4",
"ondisk_log_size": 3311,
"pgid": "33.8",
"ondisk_log_size": 3278,
I also have no idea what the average size of a pg log entry should be, in our
have
issues with memory.
Cheers,
Kalle
- Original Message -
> From: "Kalle Happonen"
> To: "Dan van der Ster"
> Cc: "ceph-users"
> Sent: Tuesday, 17 November, 2020 12:45:25
> Subject: [ceph-users] Re: osd_pglog memory hoarding - another cas
-3GB but with the cluster at this state, it's hard to be
specific.
Cheers,
Kalle
> -- dan
>
>
> On Tue, Nov 17, 2020 at 11:58 AM Kalle Happonen wrote:
>>
>> Another idea, which I don't know if has any merit.
>>
>> If 8 MB is a realistic log si
efully not increase pg_log memory consumption.
Cheers,
Kalle
- Original Message -
> From: "Kalle Happonen"
> To: "Dan van der Ster"
> Cc: "ceph-users"
> Sent: Tuesday, 17 November, 2020 16:07:03
> Subject: [ceph-users] Re: osd_pglog memory
Hi Robert,
This sounds very much like a big problem we had 2 weeks back.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EWPPEMPAJQT6GGYSHM7GIM3BZWS2PSUY/
Are you running EC? Which version are you running? It would fit our narrative
if you use EC and recently updated to 14.2.11+
restarting OSDs and while it takes a while, it seems it may help.
Of course this is not the greatest fix in production.
Has anybody gleaned any new information on this issue? Things to tweaks? Fixes
in the horizon? Other mitigations?
Cheers,
Kalle
- Original Message -
> From: "Ka
Quick update, restarting OSDs is not enough for us to compact the db. So we
stop the osd
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$osd compact
start the osd
It seems to fix the spillover. Until it grows again.
Cheers,
Kalle
- Original Message -
> From: "Kalle
Hi Stefan,
we had been seeing OSDs OOMing on 14.2.13, but on a larger scale. In our case
we hit a some bugs with pg_log memory growth and buffer_anon memory growth. Can
you check what's taking up the memory on the OSD with the following command?
ceph daemon osd.123 dump_mempools
Cheers,
Kalle
So we hope we found the (or a) trigger for the problem.
Hopefully reveals another thread to pull for others debugging the same issue
(and for us when we hit it again).
Cheers,
Kalle
- Original Message -
> From: "Dan van der Ster"
> To: "Kalle Happonen"
github.com/ceph/ceph/pull/35584
Cheers,
Kalle
- Original Message -
> From: huxia...@horebdata.cn
> To: "Kalle Happonen" , "Stefan Wild"
>
> Cc: "ceph-users"
> Sent: Monday, 14 December, 2020 10:27:57
> Subject: Re: [ceph-users] Re: O
For anybody facing similar issues, we wrote a blog post about everything we
faced, and how we worked through it.
https://cloud.blog.csc.fi/2020/12/allas-november-2020-incident-details.html
Cheers,
Kalle
- Original Message -
> From: "Kalle Happonen"
> To: "Dan
Hi Istvan,
I'm not sure it helps, but here's at least some pitfalls we faced when
migrating radosgws between clusters.
https://cloud.blog.csc.fi/2019/12/ceph-object-storage-migraine-i-mean.html
Cheers,
Kalle
- Original Message -
> From: "Szabo, Istvan (Agoda)"
> To: "ceph-users"
> Sen
sure it could be done.
Cheers,
Kalle
- Original Message -
> From: "Istvan Szabo, Agoda"
> To: "Kalle Happonen"
> Cc: "ceph-users"
> Sent: Thursday, 24 December, 2020 04:43:44
> Subject: Re: [ceph-users] Re: Data migration between clusters
&g
14 matches
Mail list logo