[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
First of all, do not rush into bad decisions.
Production is down and you wanna make it online but you should fix the
problem and be sure first. If a second crash occurs in a healing state
you will lose metadata.
You don't need to debug first!

You didn't mention your cluster status and we don't know what you have.
We need some information;
1- ceph -s
2- ceph health detail
3- ceph df
4- tail /var/log/ceph/ceph-osd{crashed osd number}.log -n 1000



Lee , 5 Oca 2022 Çar, 23:14 tarihinde şunu yazdı:
>
> Looking for some help as this is production effecting..
>
> We run a 3 Node cluster with a mix of 5xSSD,15xSATA and 5xSAS in each node.
> Running 15.2.15. All using DB/WAL on NVME SSD except the SSD's
>
> Earlier today I increased the PG num from 32 to 128 on one of our pools,
> due to the status complaining. pretty normally really. 2-3 mins in I
> watched in horror as SSD based OSD's crashed on all 3 nodes, refusing to
> restart.
>
> I've set debug_bluefs and bluestore to 20 it will get so far and then the
> daemon fails.
>
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.335+ 7f2794383700 20
> bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.335+ 7f2794383700  5
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.387+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.467+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.979+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.167+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.271+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.327+ 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:32 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Main process exited, code=killed, status=9/KILL
> 2022-01-05 19:39:32 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Failed with result 'signal'.
> 2022-01-05 19:39:42 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Scheduled restart job, restart counter is at 1.
>
> I've run
> ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-51
> inferring bluefs devices from bluestore path
> 1 : device size 0x3a3880 : own 0x[1bf220~25430] = 0x25430 :
> using 0x3fd1(1021 MiB) : bluestore has 0x1d8340(118 GiB) available
>
> Also fsck and repair all seems to be ok.
>
> The normal log looks like
>
> 2022-01-05 19:39:42 bb-ceph-enc-rm63-osd03-31 init.scope Starting Ceph
> object storage daemon osd.51...
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+ 7fca32943e00  0 set uid:gid to 64045:64045
> (ceph:ceph)
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+ 7fca32943e00  0 ceph version 15.2.15
> (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable), process
> ceph-osd, pid 139577
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+ 7fca32943e00  0 pidfile_write: ignore empty
> --pid-file
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+ 7fca32943e00  1 bdev create path
> /var/lib/ceph/osd/ceph-51/block type kernel
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+ 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) open path /var/lib/ceph/osd/ceph-51/block
> 2022-01-05 19:39

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
I'm not rushing,

I have found the issue, Im am getting OOM errors as the OSD boots,
basically is starts to process the PG's and then the node runs out of
memory and the daemon kill's

2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261 tick
2022-01-05 20:09:10 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.060+ 7fce3b441700 10 osd.51 24448261
tick_without_osd_lock
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:02.268+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
start
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:09.544+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
finish
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:10.260+ 7fce1e407700  5 osd.51 24448261 heartbeat
osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
0x3ff3688d), peers [] op hist [])
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.060+ 7fce3c6bc700 20 osd.51 24448261 tick
last_purged_snaps_scrub 2022-01-04T22:29:39.121925+ next
2022-01-05T22:29:39.121925+
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.104+ 7fce1e407700 20 osd.51 24448261
check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
none
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.108+ 7fce34c34700 20
bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.108+ 7fce34c34700  5
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.160+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.216+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.264+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.400+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.536+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.640+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.644+ 7fce1e407700  5 osd.51 24448261 heartbeat
osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
0x3ff3688d), peers [] op hist [])
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.712+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.688+ 7fce1e407700 20 osd.51 24448261
check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
none
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.480+ 7fce3b441700 20
bluestore(/var/lib/ceph/osd/ceph-51) statfs
store_statfs(0x2258948000/0x4000/0x3a3880, data
0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
0x3ff3688d)
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.844+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:14 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:14.016+ 7fce34c34700 20
bluestore.

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
It's nice to hear that. You can also decrease the osd ram usage from
4gb to 2gb. If you have enough spare ram go for it.
Good luck.

Lee , 6 Oca 2022 Per, 00:46 tarihinde şunu yazdı:
>
> I'm not rushing,
>
> I have found the issue, Im am getting OOM errors as the OSD boots, basically 
> is starts to process the PG's and then the node runs out of memory and the 
> daemon kill's
>
> 2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261 tick
> 2022-01-05 20:09:10 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:01.060+ 7fce3b441700 10 osd.51 24448261 
> tick_without_osd_lock
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:02.268+ 7fce3c6bc700 10 osd.51 24448261 do_waiters -- 
> start
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:09.544+ 7fce3c6bc700 10 osd.51 24448261 do_waiters -- 
> finish
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:10.260+ 7fce1e407700  5 osd.51 24448261 heartbeat 
> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data 
> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta 
> 0x3ff3688d), peers [] op hist [])
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.060+ 7fce3c6bc700 20 osd.51 24448261 tick 
> last_purged_snaps_scrub 2022-01-04T22:29:39.121925+ next 
> 2022-01-05T22:29:39.121925+
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.104+ 7fce1e407700 20 osd.51 24448261 
> check_full_status cur ratio 0.410072, physical ratio 0.410072, new state none
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.108+ 7fce34c34700 20 
> bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.108+ 7fce34c34700  5 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.160+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.216+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.264+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.400+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.536+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.640+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.644+ 7fce1e407700  5 osd.51 24448261 heartbeat 
> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data 
> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta 
> 0x3ff3688d), peers [] op hist [])
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.712+ 7fce34c34700 20 
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size: 134217728 
> kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864 meta_used: 75234 
> data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.688+ 7fce1e407700 20 osd.51 24448261 
> check_full_status cur ratio 0.410072, physical ratio 0.410072, new state none
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51 
> 2022-01-05T20:09:13.480+ 7fce3b441700 20 
> bluestore(/var/lib/ceph/osd/ceph-51) statfs 
> store_statfs(0x2258948000/0x4000/0x3a3880, data 
> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Mazzystr
and that is exactly why I run osds containerized with limited cpu and
memory as well as "bluestore cache size", "osd memory target", and "mds
cache memory limit".  Osd processes have become noisy neighbors in the last
few versions.



On Wed, Jan 5, 2022 at 1:47 PM Lee  wrote:

> I'm not rushing,
>
> I have found the issue, Im am getting OOM errors as the OSD boots,
> basically is starts to process the PG's and then the node runs out of
> memory and the daemon kill's
>
> 2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261 tick
> 2022-01-05 20:09:10 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:01.060+ 7fce3b441700 10 osd.51 24448261
> tick_without_osd_lock
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:02.268+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
> start
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:09.544+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
> finish
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:10.260+ 7fce1e407700  5 osd.51 24448261 heartbeat
> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
> 0x3ff3688d), peers [] op hist [])
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.060+ 7fce3c6bc700 20 osd.51 24448261 tick
> last_purged_snaps_scrub 2022-01-04T22:29:39.121925+ next
> 2022-01-05T22:29:39.121925+
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.104+ 7fce1e407700 20 osd.51 24448261
> check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
> none
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.108+ 7fce34c34700 20
> bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.108+ 7fce34c34700  5
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.160+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.216+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.264+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.400+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.536+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.640+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.644+ 7fce1e407700  5 osd.51 24448261 heartbeat
> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
> 0x3ff3688d), peers [] op hist [])
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.712+ 7fce34c34700 20
> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.688+ 7fce1e407700 20 osd.51 24448261
> check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
> none
> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T20:09:13.480+ 7fce3b441700 20
> bluestore(/var/lib/ceph/osd/ceph-51) statfs
> store_statfs(0x2258948000/0x4000/0x3a3880, data
> 0x17919fd8c4/0x179feb

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
The first OSD took 156Gb of Ram to boot.. :(

Is there a easy way to stop Mempool pulling so much memory.

On Wed, 5 Jan 2022 at 22:12, Mazzystr  wrote:

> and that is exactly why I run osds containerized with limited cpu and
> memory as well as "bluestore cache size", "osd memory target", and "mds
> cache memory limit".  Osd processes have become noisy neighbors in the last
> few versions.
>
>
>
> On Wed, Jan 5, 2022 at 1:47 PM Lee  wrote:
>
>> I'm not rushing,
>>
>> I have found the issue, Im am getting OOM errors as the OSD boots,
>> basically is starts to process the PG's and then the node runs out of
>> memory and the daemon kill's
>>
>> 2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261 tick
>> 2022-01-05 20:09:10 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:01.060+ 7fce3b441700 10 osd.51 24448261
>> tick_without_osd_lock
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:02.268+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
>> start
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:09.544+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
>> finish
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:10.260+ 7fce1e407700  5 osd.51 24448261 heartbeat
>> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
>> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
>> 0x3ff3688d), peers [] op hist [])
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.060+ 7fce3c6bc700 20 osd.51 24448261 tick
>> last_purged_snaps_scrub 2022-01-04T22:29:39.121925+ next
>> 2022-01-05T22:29:39.121925+
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.104+ 7fce1e407700 20 osd.51 24448261
>> check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
>> none
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.108+ 7fce34c34700 20
>> bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.108+ 7fce34c34700  5
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.160+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.216+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.264+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.400+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.536+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.640+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.644+ 7fce1e407700  5 osd.51 24448261 heartbeat
>> osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
>> 0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
>> 0x3ff3688d), peers [] op hist [])
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.712+ 7fce34c34700 20
>> bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
>> 134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
>> meta_used: 75234 data_alloc: 67108864 data_used: 0
>> 2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
>> 2022-01-05T20:09:13.688+ 7fce1e407700 20 osd.51 24448261
>> check_full_status cur ratio 0.410072, physical ratio 0.41007

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
For Example

top - 22:53:47 up  1:29,  2 users,  load average: 2.23, 2.08, 1.92
Tasks: 255 total,   2 running, 253 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.2 us,  4.5 sy,  0.0 ni, 91.1 id,  0.1 wa,  0.0 hi,  0.1 si,
 0.0 st
MiB Mem : 161169.7 total,  23993.9 free, 132036.5 used,   5139.3 buff/cache
MiB Swap:  0.0 total,  0.0 free,  0.0 used.  24425.1 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND
  32014 ceph  20   0  126.1g 124.1g  38656 S 100.3  78.8   5:44.77
ceph-osd
  17651 ceph  20   0 1122932 345204  14148 S  10.3   0.2   4:49.79
ceph-osd
  17248 ceph  20   0 8014892   3.3g  15956 S   8.0   2.1   8:07.75
ceph-osd
   1069 root  20   0  336504 225300 48 S   6.6   0.1   4:10.84
systemd-journal
  17862 ceph  20   0 1228508 443328  16740 S   6.3   0.3   2:05.88
ceph-osd
   2708 63150 20   0  431968  43560  41972 R   3.0   0.0   1:26.26
systemd-journal
   2718 root  20   0  226564   5016   1824 S   2.7   0.0   1:50.74
rsyslogd
  32727 root  20   0   0  0  0 I   1.0   0.0   0:01.00
kworker/2:2-mm_percpu_wq
   1511 root  20   0 2264860  29052   8552 S   0.7   0.0   3:55.76
croitd
 13 root  20   0   0  0  0 I   0.3   0.0   0:03.65
rcu_sched
  1 root  20   0  167900   8804   5332 S   0.0   0.0   0:30.66
systemd
  2 root  20   0   0  0  0 S   0.0   0.0   0:00.16
kthreadd
  3 root   0 -20   0  0  0 I   0.0   0.0   0:00.00
rcu_gp
  4 root   0 -20   0  0  0 I   0.0   0.0   0:00.00
rcu_par_gp
  6 root   0 -20   0  0  0 I   0.0   0.0   0:00.00
kworker/0:0H-events_highpri
  9 root   0 -20   0  0  0 I   0.0   0.0   0:00.00
mm_percpu_wq
 10 root  20   0   0  0  0 S   0.0   0.0   0:00.00
rcu_tasks_rude_
 11 root  20   0   0  0  0 S   0.0   0.0   0:00.00
rcu_tasks_trace
 12 root  20   0   0  0  0 S   0.0   0.0   0:00.37
ksoftirqd/0
 14 root  rt   0   0  0  0 S   0.0   0.0   0:00.09
migration/0

 root@bb-ceph-enc-rm63-osd03-31 ~ $ ceph daemon osd.13 dump_mempools -h
{
"mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 4671335,
"bytes": 97772992
},
"bluestore_cache_data": {
"items": 293,
"bytes": 272261
},
"bluestore_cache_onode": {
"items": 281,
"bytes": 173096
},
"bluestore_cache_meta": {
"items": 10777,
"bytes": 63953
},
"bluestore_cache_other": {
"items": 638,
"bytes": 34200
},
"bluestore_Buffer": {
"items": 8,
"bytes": 768
},
"bluestore_Extent": {
"items": 8,
"bytes": 384
},
"bluestore_Blob": {
"items": 8,
"bytes": 832
},
"bluestore_SharedBlob": {
"items": 8,
"bytes": 896
},
"bluestore_inline_bl": {
"items": 0,
"bytes": 0
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 0,
"bytes": 0
},
"bluestore_writing_deferred": {
"items": 0,
"bytes": 0
},
"bluestore_writing": {
"items": 0,
"bytes": 0
},
"bluefs": {
"items": 440,
"bytes": 15760
},
"bluefs_file_reader": {
"items": 62,
"bytes": 5898112
},
"bluefs_file_writer": {
"items": 3,
"bytes": 672
},
"buffer_anon": {
"items": 30941954,
"bytes": 126064178281
},
"buffer_meta": {
"items": 2708,
"bytes": 238304
},
"osd": {
"items": 277,
"bytes": 3583272
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 45797772,
"bytes": 4854818176
},
"osdmap": {
"items": 3792,
"bytes": 140872
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
 

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Igor Fedotov

Hi Lee,

could you please raise debug-bluestore and debug-osd to 20 (via ceph 
tell osd.N injectargs command) when OSD starts to eat up the RAM. Then 
drop it back to defaults after a few seconds (10s is enough) to avoid 
huge log size and share the resulting OSD log.


Also I'm curious if you have any non-default settings for OSDs, please 
share if any.



Thanks,

Igor


On 1/6/2022 1:43 AM, Lee wrote:

The first OSD took 156Gb of Ram to boot.. :(

Is there a easy way to stop Mempool pulling so much memory.

On Wed, 5 Jan 2022 at 22:12, Mazzystr  wrote:


and that is exactly why I run osds containerized with limited cpu and
memory as well as "bluestore cache size", "osd memory target", and "mds
cache memory limit".  Osd processes have become noisy neighbors in the last
few versions.



On Wed, Jan 5, 2022 at 1:47 PM Lee  wrote:


I'm not rushing,

I have found the issue, Im am getting OOM errors as the OSD boots,
basically is starts to process the PG's and then the node runs out of
memory and the daemon kill's

2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261 tick
2022-01-05 20:09:10 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.060+ 7fce3b441700 10 osd.51 24448261
tick_without_osd_lock
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:02.268+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
start
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:09.544+ 7fce3c6bc700 10 osd.51 24448261 do_waiters --
finish
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:10.260+ 7fce1e407700  5 osd.51 24448261 heartbeat
osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
0x3ff3688d), peers [] op hist [])
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.060+ 7fce3c6bc700 20 osd.51 24448261 tick
last_purged_snaps_scrub 2022-01-04T22:29:39.121925+ next
2022-01-05T22:29:39.121925+
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.104+ 7fce1e407700 20 osd.51 24448261
check_full_status cur ratio 0.410072, physical ratio 0.410072, new state
none
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.108+ 7fce34c34700 20
bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.108+ 7fce34c34700  5
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.160+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.216+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.264+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.400+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.536+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.640+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.644+ 7fce1e407700  5 osd.51 24448261 heartbeat
osd_stat(store_statfs(0x2258948000/0x4000/0x3a3880, data
0x17919fd8c4/0x179feb4000, compress 0x0/0x0/0x0, omap 0xc9773, meta
0x3ff3688d), peers [] op hist [])
2022-01-05 20:09:13 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:13.712+ 7fce34c34700 20
bluestore.MempoolThread(0x55f42e762a98) _resize_shards cache_size:
134217728 kv_alloc: 67108864 kv_used: 67082912 meta_alloc: 67108864
meta_used: 75234 data_alloc: 67108864 data_used: 0
2022-01-05 20:09:

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc
Running your osd's with resource limitations is not so straightforward. I can 
guess that if you are running close to full resource utilization on your nodes, 
it makes more sense to make sure everything stays as much within their 
specified limits. (Aside from the question if you would even want to operate 
such environment, and aside from the question if you even want to force osd's 
into oom)

However if you are not walking such thin line, and have eg. more memory 
available. It is just not good, not using that memory. I do not really know how 
advanced most orchestrators are nowadays, and if you can dynamically change 
resource limits on containers. But if not, you will just not use memory as 
cache, and not using memory as cache means increased disk io, and decreased 
performance.

I think the linux kernel is probably better in deciding how to share resources 
among my osd's than I am. And that is a reason why I do not put them in 
containers. ( but I am still on nautilus, so I will keep an eye on this 'noisy' 
when upgrading ;) )


> 
> and that is exactly why I run osds containerized with limited cpu and
> memory as well as "bluestore cache size", "osd memory target", and "mds
> cache memory limit".  Osd processes have become noisy neighbors in the
> last
> few versions.
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc



> I assume the huge memory consumption is temporary. Once the OSD is up and
> stable, it would release the memory.
> 
> So how about allocate a large swap temporarily just to let the OSD up. I
> remember that someone else on the list have resolved a similar issue with
> swap.

But is this already a reported bug, or should I from now on take into account 
that osd's can consume >150GB of memory
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
чт, 6 янв. 2022 г. в 12:21, Lee :

> I've tried add a swap and that fails also.
>

How exactly did it fail? Did you put it on some disk, or in zram?

In the past I had to help a customer who hit memory over-use when upgrading
Ceph (due to shallow_fsck), and we were able to fix it by adding 64 GB GB
of zram-based swap on each server (with 128 GB of physical RAM in this type
of server).

-- 
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :

> чт, 6 янв. 2022 г. в 12:21, Lee :
>
>> I've tried add a swap and that fails also.
>>
>
> How exactly did it fail? Did you put it on some disk, or in zram?
>
> In the past I had to help a customer who hit memory over-use when
> upgrading Ceph (due to shallow_fsck), and we were able to fix it by adding
> 64 GB GB of zram-based swap on each server (with 128 GB of physical RAM in
> this type of server).
>
>
On the other hand, if you have some spare disks for temporary storage and
for new OSDs, and this failed OSD is not a part of an erasure-coded pool,
another approach might be to export all PGs using ceph-objectstore-tool as
files onto the temporary storage (in hope that it doesn't suffer from the
same memory explosion), and then import them all into a new temporary OSD.

-- 
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Lee
I tried with disk based swap on a SATA SSD.

I think that might be the last option. I have exported already all the down
PG's from the OSD that they are waiting for.

Kind Regards

Lee

On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov 
wrote:

> пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :
>
>> чт, 6 янв. 2022 г. в 12:21, Lee :
>>
>>> I've tried add a swap and that fails also.
>>>
>>
>> How exactly did it fail? Did you put it on some disk, or in zram?
>>
>> In the past I had to help a customer who hit memory over-use when
>> upgrading Ceph (due to shallow_fsck), and we were able to fix it by adding
>> 64 GB GB of zram-based swap on each server (with 128 GB of physical RAM in
>> this type of server).
>>
>>
> On the other hand, if you have some spare disks for temporary storage and
> for new OSDs, and this failed OSD is not a part of an erasure-coded pool,
> another approach might be to export all PGs using ceph-objectstore-tool as
> files onto the temporary storage (in hope that it doesn't suffer from the
> same memory explosion), and then import them all into a new temporary OSD.
>
> --
> Alexander E. Patrakov
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io