[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-02-28 Thread Jan Pekař - Imatic

Hi,

the same on my side - destroyed and replaced by bluestore.

JP

On 28/02/2023 14.17, Mark Schouten wrote:

Hi,

I just destroyed the filestore osd and added it as a bluestore osd. Worked fine.

—
Mark Schouten, CTO
Tuxis B.V.
m...@tuxis.nl / +31 318 200208


-- Original Message --
From "Jan Pekař - Imatic" 
To m...@tuxis.nl; ceph-users@ceph.io
Date 2/25/2023 4:14:54 PM
Subject Re: [ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper 
upgrade stuck


Hi,

I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 
1500 keys.

JP

On 23/02/2023 00.16, Jan Pekař - Imatic wrote:

Hi,

I enabled debug and the same - 1500 keys is where it ends.. I also enabled 
debug_filestore and ...

2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7920 already registered
2023-02-23T00:02:34.876+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 148188829 osr(meta) 
4859 bytes   (queue has 49 ops and 238167 bytes)

2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1470 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 148188845 
[Transaction(0x55fb299870e0)]
2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# 
(0x55fb29f9c9e0)

2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1500 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 148188830 
osr(meta) [Transaction(0x55fb29986000)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7aa0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 148188830 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 148188831 
osr(meta) [Transaction(0x55fb29986120)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7c20 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 148188831 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 148188832 
osr(meta) [Transaction(0x55fb29986240)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7d40 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 148188832 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 148188833 
osr(meta) [Transaction(0x55fb29986360)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7ec0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 148188833 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 148188834 
osr(meta) [Transaction(0x55fb29986480)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921db60 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 148188834 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 148188835 
osr(meta) [Transaction(0x55fb299865a0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921c8a0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 148188835 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 148188836 
osr(meta) [Transaction(0x55fb299866c0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 

[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-02-25 Thread Jan Pekař - Imatic

Hi,

I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 
1500 keys.

JP

On 23/02/2023 00.16, Jan Pekař - Imatic wrote:

Hi,

I enabled debug and the same - 1500 keys is where it ends.. I also enabled 
debug_filestore and ...

2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7920 already registered
2023-02-23T00:02:34.876+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 148188829 osr(meta) 
4859 bytes   (queue has 49 ops and 238167 bytes)

2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1470 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 148188845 
[Transaction(0x55fb299870e0)]
2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# 
(0x55fb29f9c9e0)

2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1500 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 148188830 
osr(meta) [Transaction(0x55fb29986000)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7aa0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 148188830 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 148188831 
osr(meta) [Transaction(0x55fb29986120)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7c20 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 148188831 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 148188832 
osr(meta) [Transaction(0x55fb29986240)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7d40 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 148188832 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 148188833 
osr(meta) [Transaction(0x55fb29986360)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7ec0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 148188833 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 148188834 
osr(meta) [Transaction(0x55fb29986480)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921db60 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 148188834 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 148188835 
osr(meta) [Transaction(0x55fb299865a0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921c8a0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 148188835 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 148188836 
osr(meta) [Transaction(0x55fb299866c0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb29c52000 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb29c52000 seq 148188836 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52180 seq 148188837 
osr(meta) [Transaction(0x55fb299867e0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr

[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-02-22 Thread Jan Pekař - Imatic
. And nothing happens.

I’m now stuck with a cluster that has 4 OSD’s on Octopus, 10 on Nautilus, and 
one down .. A hint on how to work around this is welcome :)

—
Mark Schouten, CTO
Tuxis B.V.
m...@tuxis.nl / +31 318 200208


-- Original Message --
From "Jan Pekař - Imatic" 
To ceph-users@ceph.io
Date 1/12/2023 5:53:02 PM
Subject [ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper 
upgrade stuck


Hi all,

I have problem upgrading nautilus to octopus on my OSD.

Upgrade mon and mgr was OK and first OSD stuck on

2023-01-12T09:25:54.122+0100 7f49ff3eae00  1 osd.0 126556 init upgrade 
snap_mapper (first start as octopus)

and there were no activity after that for more than 48 hours. No disk activity.

I restarted OSD many times and nothing changed.

It is old, filestore OSD based on XFS filesystem. Is upgrade to snap mapper 2 reliable? What is OSD waiting for? Can I start OSD without 
upgrade and get cluster healthy with old snap structure? Or should I skip octopus upgrade and go to pacific directly (some bug backport 
is missing?).


Thank you for help, I'm sending some logs below..

Log shows

2023-01-09T19:12:49.471+0100 7f41f60f1e00  0 ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus (stable), process 
ceph-osd, pid 2566563

2023-01-09T19:12:49.471+0100 7f41f60f1e00  0 pidfile_write: ignore empty 
--pid-file
2023-01-09T19:12:49.499+0100 7f41f60f1e00 -1 missing 'type' file, inferring 
filestore from current/ dir
2023-01-09T19:12:49.531+0100 7f41f60f1e00  0 starting osd.0 osd_data 
/var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2023-01-09T19:12:49.531+0100 7f41f60f1e00 -1 Falling back to public interface
2023-01-09T19:12:49.871+0100 7f41f60f1e00  0 load: jerasure load: lrc load: isa
2023-01-09T19:12:49.875+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:0.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:1.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:2.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)

2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled 
via 'filestore fiemap' config option
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is 
disabled via 'filestore seek data hole' config option
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 
'filestore splice' config option
2023-01-09T19:12:49.983+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully 
supported (by glibc and kernel)

2023-01-09T19:12:49.983+0100 7f41f60f1e00  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is 
disabled by conf
2023-01-09T19:12:50.015+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
2023-01-09T19:12:50.079+0100 7f41f60f1e00  1 leveldb: Recovering log #165531
2023-01-09T19:12:50.083+0100 7f41f60f1e00  1 leveldb: Level-0 table #165533: 
started
2023-01-09T19:12:50.235+0100 7f41f60f1e00  1 leveldb: Level-0 table #165533: 
1598 bytes OK
2023-01-09T19:12:50.583+0100 7f41f60f1e00  1 leveldb: Delete type=0 #165531

2023-01-09T19:12:50.615+0100 7f41f60f1e00  1 leveldb: Delete type=3 #165529

2023-01-09T19:12:51.339+0100 7f41f60f1e00  0 filestore(/var/lib/ceph/osd/ceph-0) mount(1861): enabling WRITEAHEAD journal mode: 
checkpoint is not enabled
2023-01-09T19:12:51.379+0100 7f41f60f1e00  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 
bytes, directio = 1, aio = 1

2023-01-09T19:12:51.931+0100 7f41f60f1e00 -1 journal do_read_entry(243675136): 
bad header magic
2023-01-09T19:12:51.939+0100 7f41f60f1e00  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 
bytes, directio = 1, aio = 1

2023-01-09T19:12:51.943+0100 7f41f60f1e00  1 
filestore(/var/lib/ceph/osd/ceph-0) upgrade(1466)
2023-01-09T19:12:52.015+0100 7f41f60f1e00  1 osd.0 126556 init upgrade 
snap_mapper (first start as octopus)

lsof shows

COMMAND PID USER   FD  TYPE DEVICE SI

[ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-01-12 Thread Jan Pekař - Imatic

Hi all,

I have problem upgrading nautilus to octopus on my OSD.

Upgrade mon and mgr was OK and first OSD stuck on

2023-01-12T09:25:54.122+0100 7f49ff3eae00  1 osd.0 126556 init upgrade 
snap_mapper (first start as octopus)

and there were no activity after that for more than 48 hours. No disk activity.

I restarted OSD many times and nothing changed.

It is old, filestore OSD based on XFS filesystem. Is upgrade to snap mapper 2 reliable? What is OSD waiting for? Can I start OSD without 
upgrade and get cluster healthy with old snap structure? Or should I skip octopus upgrade and go to pacific directly (some bug backport is 
missing?).


Thank you for help, I'm sending some logs below..

Log shows

2023-01-09T19:12:49.471+0100 7f41f60f1e00  0 ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus (stable), process 
ceph-osd, pid 2566563

2023-01-09T19:12:49.471+0100 7f41f60f1e00  0 pidfile_write: ignore empty 
--pid-file
2023-01-09T19:12:49.499+0100 7f41f60f1e00 -1 missing 'type' file, inferring 
filestore from current/ dir
2023-01-09T19:12:49.531+0100 7f41f60f1e00  0 starting osd.0 osd_data 
/var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2023-01-09T19:12:49.531+0100 7f41f60f1e00 -1 Falling back to public interface
2023-01-09T19:12:49.871+0100 7f41f60f1e00  0 load: jerasure load: lrc load: isa
2023-01-09T19:12:49.875+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:0.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:1.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:2.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)
2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 osd.0:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, 
cutoff=196)

2023-01-09T19:12:49.883+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 
'filestore fiemap' config option
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is 
disabled via 'filestore seek data hole' config option
2023-01-09T19:12:49.927+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 
'filestore splice' config option
2023-01-09T19:12:49.983+0100 7f41f60f1e00  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully 
supported (by glibc and kernel)

2023-01-09T19:12:49.983+0100 7f41f60f1e00  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is 
disabled by conf
2023-01-09T19:12:50.015+0100 7f41f60f1e00  0 
filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
2023-01-09T19:12:50.079+0100 7f41f60f1e00  1 leveldb: Recovering log #165531
2023-01-09T19:12:50.083+0100 7f41f60f1e00  1 leveldb: Level-0 table #165533: 
started
2023-01-09T19:12:50.235+0100 7f41f60f1e00  1 leveldb: Level-0 table #165533: 
1598 bytes OK
2023-01-09T19:12:50.583+0100 7f41f60f1e00  1 leveldb: Delete type=0 #165531

2023-01-09T19:12:50.615+0100 7f41f60f1e00  1 leveldb: Delete type=3 #165529

2023-01-09T19:12:51.339+0100 7f41f60f1e00  0 filestore(/var/lib/ceph/osd/ceph-0) mount(1861): enabling WRITEAHEAD journal mode: checkpoint 
is not enabled
2023-01-09T19:12:51.379+0100 7f41f60f1e00  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, 
directio = 1, aio = 1

2023-01-09T19:12:51.931+0100 7f41f60f1e00 -1 journal do_read_entry(243675136): 
bad header magic
2023-01-09T19:12:51.939+0100 7f41f60f1e00  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, 
directio = 1, aio = 1

2023-01-09T19:12:51.943+0100 7f41f60f1e00  1 
filestore(/var/lib/ceph/osd/ceph-0) upgrade(1466)
2023-01-09T19:12:52.015+0100 7f41f60f1e00  1 osd.0 126556 init upgrade 
snap_mapper (first start as octopus)

lsof shows

COMMAND PID USER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
ceph-osd 225860 ceph  cwd   DIR  9,127 4096  2 /
ceph-osd 225860 ceph  rtd   DIR  9,127 4096  2 /
ceph-osd 225860 ceph  txt   REG  9,127 31762544   5021 
/usr/bin/ceph-osd
ceph-osd 225860 ceph  mem   REG   8,70  2147237 68104224 
/var/lib/ceph/osd/ceph-0/current/omap/165546.ldb
ceph-osd 225860 ceph  mem   REG   8,70  2147792 68104190 
/var/lib/ceph/osd/ce

[ceph-users] Disable peering of some pool

2022-03-16 Thread Jan Pekař - Imatic

Hi all,

we have problem on our production cluster running nautilus (14.2.22).

Cluster is almost full and few month ago we noticed issues with slow peering - when we restart any osd (or host) it takes hours to finish 
peering process, instead of minutes.


We noticed, that some pool contains 90k in 300GB objects per PG, so we decided to increase pg_num on that pool so individual PG is peered 
quickly. During that state we got into stuck PG inactive for hours and peering not finised, and some OSD went down with this error 
https://tracker.ceph.com/issues/51168


We decided to restart all osds and waiting, but problem with slow peering 
persists.

Is there any way how to get cluster healthy? Or disable peering of some pool so other pools with RBD images get peered and get online and 
after that try to peer that big pool?


Thank you for help, it is urgent situation

With regards
Jan Pekar

--


Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
https://www.imatic.cz | +420326555326

--

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MonSession vs TCP connection

2021-05-11 Thread Jan Pekař - Imatic

Hi all,

I would like to "pair" MonSession with TCP connection to get real process, which is using that session. I need it to identify processes with 
old ceph features.


MonSession looks like

MonSession(client.84324148 [..IP...]:0/3096235764 is open allow *, features 
0x27018fb86aa42ada (jewel))

What does client.NUMBER and 0/3096235764 means?

How can I resolve client.NUMBER or that /NUMBER with certain TCP session. I have many processes on that server (on that IP) with different 
features.


Thank you

--


Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem unusable after deleting pool with bilion objects

2020-09-11 Thread Jan Pekař - Imatic

Hi Igor,

thank you, I also think that it is the problem you described.

I recreated OSD's now and also noticed strange warnings -

HEALTH_WARN Degraded data redundancy: 106763/723 objects degraded (14766.667%)

Maybe there are some "phantom", zero sized objects (OMAPs?), that cluster is 
recovering, but I don't need them (are not listed in ceph df).

You mentioned DB vs. Main devices ratio (1:11) - I'm not separating DB from 
device - each device has it's own RockDB on it.

With regards
Jan Pekar

On 11/09/2020 14.36, Igor Fedotov wrote:


Hi Jan,

most likely this is a known issue with slow and ineffective pool removal 
procedure in Ceph.

I did some presentation on the topic at yesterday's weekly performance meeting, 
presumably a recording will be available in a couple of days.

An additional accompanying issue not covered during this meeting is RocksDB's misbehavior after (or during) such massive removals. At some 
point it starts to slow  down reading operations handling (e.g. collection listing) which results in OSD suicide timeouts. Exactly what is 
observed in your case. There were multiple discussion on this issue in this mailing list too. In short the currect workaround is to 
perform manual DB compaction using ceph-kvstore-tool. Pool removal will most likely to proceed hence one might face similar assertions 
after a while. Hence there might be a need for multiple "compaction-restart" iterations until pool is finally removed.



And yet another potential issue (or at least an additional factor) with your setup is a pretty high DB vs. Main devices ratio (1:11). 
Deleting procedures from multiple OSDs result in a pretty highload on DB volume which becomes overburdened...



Thanks,

Igor

On 9/11/2020 3:00 PM, Jan Pekař - Imatic wrote:

Hi all,

I have build testing cluster with 4 hosts, 1 SSD's  and 11 HDD on each host.
Running ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) 
nautilus (stable) on Ubuntu.

Because we want to save small size object, I set bluestore_min_alloc_size 8192 
(it is maybe important in this case)

I have filled it through rados gw with approx billion of small objects. After tests I changed min_alloc_size back and deleted rados pools 
(to emtpy whole cluster) and I was waiting till cluster deletes data from OSD's, but that destabilized the cluster. I never reached 
health OK. OSD's were killed in random order. I can start them back but they will again get out from cluster with..


```

   -18> 2020-09-05 22:11:19.430 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -17> 2020-09-05 22:11:19.430 7f7a3ee40700  5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 
1644167168 kv_used: 1644135504 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
   -16> 2020-09-05 22:11:20.434 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064941056 unmapped: 8126464 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -15> 2020-09-05 22:11:21.434 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -14> 2020-09-05 22:11:22.258 7f7a2b81f700  5 osd.42 103257 heartbeat osd_stat(store_statfs(0x1ce1829/0x2d08c/0x1d18000, 
data 0x23143355/0x974a, compress 0x0/0x0/0x0, omap 0x1f11e, meta 0x2d08a0ee2), peers 
[3,4,6,7,8,11,12,13,14,16,17,18,19,21,23,24,25,27,28,29,31,32,33,34,41,43] op hist [])
   -13> 2020-09-05 22:11:22.438 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -12> 2020-09-05 22:11:23.442 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -11> 2020-09-05 22:11:24.442 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064285696 unmapped: 8781824 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -10> 2020-09-05 22:11:24.442 7f7a3ee40700  5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 
1644167168 kv_used: 1644119840 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
    -9> 2020-09-05 22:11:24.442 7f7a2e024700  0 bluestore(/var/lib/ceph/osd/ceph-42) log_latency_fn slow operation observed for 
_collection_list, latency = 151.113s, lat = 2m cid =5.47_head start #5:e2000# end #MAX# max 2147483647

    -8> 2020-09-05 22:11:24.446 7f7a2e024700  1 heartbeat_map reset_timeout 
'OSD::osd_op_tp thread 0x7f7a2e024700' had timed out after 15
    -7> 2020-09-05 22:11:24.446 7f7a2e024700  1 heartbeat_map reset_timeout 'OSD::osd_op_t

[ceph-users] Problem unusable after deleting pool with bilion objects

2020-09-11 Thread Jan Pekař - Imatic

Hi all,

I have build testing cluster with 4 hosts, 1 SSD's  and 11 HDD on each host.
Running ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) 
nautilus (stable) on Ubuntu.

Because we want to save small size object, I set bluestore_min_alloc_size 8192 
(it is maybe important in this case)

I have filled it through rados gw with approx billion of small objects. After tests I changed min_alloc_size back and deleted rados pools 
(to emtpy whole cluster) and I was waiting till cluster deletes data from OSD's, but that destabilized the cluster. I never reached health 
OK. OSD's were killed in random order. I can start them back but they will again get out from cluster with..


```

   -18> 2020-09-05 22:11:19.430 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -17> 2020-09-05 22:11:19.430 7f7a3ee40700  5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 
1644167168 kv_used: 1644135504 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
   -16> 2020-09-05 22:11:20.434 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064941056 unmapped: 8126464 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -15> 2020-09-05 22:11:21.434 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -14> 2020-09-05 22:11:22.258 7f7a2b81f700  5 osd.42 103257 heartbeat osd_stat(store_statfs(0x1ce1829/0x2d08c/0x1d18000, data 
0x23143355/0x974a, compress 0x0/0x0/0x0, omap 0x1f11e, meta 0x2d08a0ee2), peers 
[3,4,6,7,8,11,12,13,14,16,17,18,19,21,23,24,25,27,28,29,31,32,33,34,41,43] op hist [])
   -13> 2020-09-05 22:11:22.438 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -12> 2020-09-05 22:11:23.442 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -11> 2020-09-05 22:11:24.442 7f7a3ee40700  5 prioritycache tune_memory target: 3221225472 mapped: 2064285696 unmapped: 8781824 heap: 
2073067520 old mem: 1932735282 new mem: 1932735282
   -10> 2020-09-05 22:11:24.442 7f7a3ee40700  5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 
1644167168 kv_used: 1644119840 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
    -9> 2020-09-05 22:11:24.442 7f7a2e024700  0 bluestore(/var/lib/ceph/osd/ceph-42) log_latency_fn slow operation observed for 
_collection_list, latency = 151.113s, lat = 2m cid =5.47_head start #5:e2000# end #MAX# max 2147483647

    -8> 2020-09-05 22:11:24.446 7f7a2e024700  1 heartbeat_map reset_timeout 
'OSD::osd_op_tp thread 0x7f7a2e024700' had timed out after 15
    -7> 2020-09-05 22:11:24.446 7f7a2e024700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had suicide timed out 
after 150

    -6> 2020-09-05 22:11:24.446 7f7a4c2a4700 10 monclient: get_auth_request con 
0x555b15d07680 auth_method 0
    -5> 2020-09-05 22:11:24.446 7f7a3c494700  2 osd.42 103257 ms_handle_reset 
con 0x555b15963600 session 0x555a9f9d6d00
    -4> 2020-09-05 22:11:24.446 7f7a3c494700  2 osd.42 103257 ms_handle_reset 
con 0x555b15961b00 session 0x555a9f9d7980
    -3> 2020-09-05 22:11:24.446 7f7a3c494700  2 osd.42 103257 ms_handle_reset 
con 0x555b15963a80 session 0x555a9f9d6a80
    -2> 2020-09-05 22:11:24.446 7f7a3c494700  2 osd.42 103257 ms_handle_reset 
con 0x555b15960480 session 0x555a9f9d6f80
    -1> 2020-09-05 22:11:24.446 7f7a3c494700  3 osd.42 103257 handle_osd_map 
epochs [103258,103259], i have 103257, src has [83902,103259]
 0> 2020-09-05 22:11:24.450 7f7a2e024700 -1 *** Caught signal (Aborted) **
```

I have approx 12 OSD's down with this error.

I decided to wipe problematic OSD's so I cannot debug it, but I'm curious what I did wrong (deleting pool with many small data?) or what to 
do next time.


I did that before but not with bilion object and without 
bluestore_min_alloc_size change, and it worked without problems.

With regards
Jan Pekar

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

2020-07-21 Thread Jan Pekař - Imatic

Hi Ben,

we are not using EC pool on that cluster.

OSD out behavior almost stopped when we solved memory issues (less memory 
allocated to OSD's).
Now we are not working on that cluster anymore so we have no other info about 
that problem.

Jan

On 20/07/2020 07.59, Benoît Knecht wrote:

Hi Jan,

Jan Pekař wrote:

Also I'm concerned, that this OSD restart caused data degradation and recovery 
- cluster should be clean immediately after OSD up when no
client was uploading/modifying data during my tests.

We're experiencing the same thing on our 14.2.10 cluster. After marking an OSD 
out, if it's briefly marked down (due to the missed heartbeats or because the 
daemon was manually restarted) the PGs that were still mapped on it disappear 
all at once, and we get degraded objects as a result.

In our case, those PGs belong to an EC pool, and we use the PG balancer in 
upmap mode, so we have a few upmapped PGs on that OSD. Is that the case for you 
too?

We're going to run some tests to try and better understand what's going on 
there, but we welcome any feedback in the meantime.

Cheers,

--
Ben


--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

2020-03-21 Thread Jan Pekař - Imatic
I understand, so I expected slow requests (like X slow requests are blocked > 32 sec) but I was not expecting that heartbeats are missed or 
OSD's were restarted.


Maybe this "hard recovery" was not tested enough.

Also I'm concerned, that this OSD restart caused data degradation and recovery - cluster should be clean immediately after OSD up when no 
client was uploading/modifying data during my tests.


With regards
Jan Pekar

On 21/03/2020 15.56, Anthony D'Atri wrote:

This is an expensive operation.  You want to slow it down, not burden the OSDs.


On Mar 21, 2020, at 5:46 AM, Jan Pekař - Imatic  wrote:

Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used).


On 21/03/2020 13.14, XuYun wrote:
Bluestore requires more than 4G memory per OSD, do you have enough memory?


2020年3月21日 下午8:09,Jan Pekař - Imatic  写道:

Hello,

I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) 
nautilus (stable)

4 nodes - each node 11 HDD, 1 SSD, 10Gbit network

Cluster was empty, fresh install. We filled cluster with data (small blocks) 
using RGW.

Cluster is now used for testing so no client was using it during my admin 
operations mentioned below

After a while (7TB of data / 40M objects uploaded) we decided, that we increase 
pg_num from 128 to 256 to better spread data and to speedup this operation, 
I've set

  ceph config set mgr target_max_misplaced_ratio 1

so that whole cluster rebalance as quickly as it can.

I have 3 issues/questions below:

1)

I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to 
restart with logged

heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide 
timed out after 150

after a while OSD's were back so I continued after a while with my tests.

My question - increasing number of PG with maximal target_max_misplaced_ratio 
was too much for that OSDs? It is not recommended to do it this way? I had no 
problem with this increase before, but configuration of cluster was slightly 
different and it was luminous version.

2)

Rebuild was still slow so I increased number of backfills

  ceph tell osd.*  injectargs "--osd-max-backfills 10"

and reduced recovery sleep time

  ceph tell osd.*  injectargs "--osd-recovery-sleep-hdd 0.01"

and after few hours I noticed, that some of my OSD's were restarted during 
recovery, in log I can see

...

|2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 
0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 
heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed 
out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log 
[WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 
06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly 
marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 
start_waiting_for_healthy |

I observed network graph usage and network utilization was low during recovery 
(10Gbit was not saturated).

So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that 
OSD is using threads and HDD timeouts are not influencing heartbeats to other 
OSD's and MON. It looks like it is not true.

3)

After OSD was wrongly marked down I can see that cluster has object degraded. 
There were no degraded object before that.

  Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs 
degraded, 8 pgs undersized

It means that this OSD disconnection causes data degraded? How is it possible, 
when no OSD was lost. Data should be on that OSD and after peering should be 
everything OK. With luminous I had no problem, after OSD up degraded objects 
where recovered/found during few seconds and cluster was healthy within seconds.

Thank you very much for additional info. I can perform additional tests you 
recommend because cluster is used for testing purpose now.

With regards
Jan Pekar

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326
===

[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

2020-03-21 Thread Jan Pekař - Imatic

Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used).

On 21/03/2020 13.14, XuYun wrote:

Bluestore requires more than 4G memory per OSD, do you have enough memory?


2020年3月21日 下午8:09,Jan Pekař - Imatic  写道:

Hello,

I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) 
nautilus (stable)

4 nodes - each node 11 HDD, 1 SSD, 10Gbit network

Cluster was empty, fresh install. We filled cluster with data (small blocks) 
using RGW.

Cluster is now used for testing so no client was using it during my admin 
operations mentioned below

After a while (7TB of data / 40M objects uploaded) we decided, that we increase 
pg_num from 128 to 256 to better spread data and to speedup this operation, 
I've set

  ceph config set mgr target_max_misplaced_ratio 1

so that whole cluster rebalance as quickly as it can.

I have 3 issues/questions below:

1)

I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to 
restart with logged

heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide 
timed out after 150

after a while OSD's were back so I continued after a while with my tests.

My question - increasing number of PG with maximal target_max_misplaced_ratio 
was too much for that OSDs? It is not recommended to do it this way? I had no 
problem with this increase before, but configuration of cluster was slightly 
different and it was luminous version.

2)

Rebuild was still slow so I increased number of backfills

  ceph tell osd.*  injectargs "--osd-max-backfills 10"

and reduced recovery sleep time

  ceph tell osd.*  injectargs "--osd-recovery-sleep-hdd 0.01"

and after few hours I noticed, that some of my OSD's were restarted during 
recovery, in log I can see

...

|2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 
0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 
heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed 
out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log 
[WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 
06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly 
marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 
start_waiting_for_healthy |

I observed network graph usage and network utilization was low during recovery 
(10Gbit was not saturated).

So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that 
OSD is using threads and HDD timeouts are not influencing heartbeats to other 
OSD's and MON. It looks like it is not true.

3)

After OSD was wrongly marked down I can see that cluster has object degraded. 
There were no degraded object before that.

  Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs 
degraded, 8 pgs undersized

It means that this OSD disconnection causes data degraded? How is it possible, 
when no OSD was lost. Data should be on that OSD and after peering should be 
everything OK. With luminous I had no problem, after OSD up degraded objects 
where recovered/found during few seconds and cluster was healthy within seconds.

Thank you very much for additional info. I can perform additional tests you 
recommend because cluster is used for testing purpose now.

With regards
Jan Pekar

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Problem with OSD::osd_op_tp thread had timed out and other connected issues

2020-03-21 Thread Jan Pekař - Imatic

Hello,

I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) 
nautilus (stable)

4 nodes - each node 11 HDD, 1 SSD, 10Gbit network

Cluster was empty, fresh install. We filled cluster with data (small blocks) 
using RGW.

Cluster is now used for testing so no client was using it during my admin 
operations mentioned below

After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup 
this operation, I've set


 ceph config set mgr target_max_misplaced_ratio 1

so that whole cluster rebalance as quickly as it can.

I have 3 issues/questions below:

1)

I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to 
restart with logged

heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide 
timed out after 150

after a while OSD's were back so I continued after a while with my tests.

My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it 
this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version.


2)

Rebuild was still slow so I increased number of backfills

 ceph tell osd.*  injectargs "--osd-max-backfills 10"

and reduced recovery sleep time

 ceph tell osd.*  injectargs "--osd-recovery-sleep-hdd 0.01"

and after few hours I noticed, that some of my OSD's were restarted during 
recovery, in log I can see

...

|2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 
7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888 
7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 
7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 
start_waiting_for_healthy |


I observed network graph usage and network utilization was low during recovery 
(10Gbit was not saturated).

So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing 
heartbeats to other OSD's and MON. It looks like it is not true.


3)

After OSD was wrongly marked down I can see that cluster has object degraded. 
There were no degraded object before that.

 Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs 
degraded, 8 pgs undersized

It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after 
peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and 
cluster was healthy within seconds.


Thank you very much for additional info. I can perform additional tests you 
recommend because cluster is used for testing purpose now.

With regards
Jan Pekar

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with OSD - stuck in CPU loop after rbd snapshot mount

2020-02-05 Thread Jan Pekař - Imatic
7;3048390 mlcod 60042'3048390 active+clean] do_osd_op 2:311c8802:::rbd_data.09df7e2ae8944a.:7b5 [copy-get max 8388608]
2020-02-05 22:46:53.787483 7ff1c3749700 10 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean] do_osd_op  copy-get max 8388608
2020-02-05 22:46:53.787515 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean]  got attrs
2020-02-05 22:46:53.807853 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean]  got data
2020-02-05 22:46:53.808034 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean]  got reqids
2020-02-05 22:46:53.808051 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean]  cursor.is_complete=1 0 attrs 4194304 bytes 0 omap header bytes 0 omap data bytes in 0 
keys 0 reqids
2020-02-05 22:46:53.808085 7ff1c3749700 10 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean]  dropping ondisk_read_lock
2020-02-05 22:46:53.808107 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean] do_osd_op_effects osd.9 con 0x56135d9e5000
2020-02-05 22:46:53.808132 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean] log_op_stats osd_op(osd.9.60026:836317 2.c 
2:311c8802:::rbd_data.09df7e2ae8944a.:7b5 [copy-get max 8388608] snapc 0=[] 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e60042) v8 inb 0 outb 4194423 lat 19.627387
2020-02-05 22:46:53.808171 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] 
local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 
lcod 60042'3048390 mlcod 60042'3048390 active+clean] publish_stats_to_osd 60042:3424975

2020-02-05 22:46:53.808292 7ff1c3749700 10 osd.1 60042 dequeue_op 
0x561362c80bc0 finish



On 03/02/2020 15.48, Jan Pekař - Imatic wrote:

Hi all,

I have small cluster and yesterday I tried to mount older RBD snapshot torecover data. (I have approx. 230 daily snapshots of one RBD 
image on my small ceph).


After I did mount and ls operation, cluster was stuck and I notice that 2of my OSD's eaten CPU and raise in memory usage (more than 4x 
more memory than usual). Cluster was unusable after that and it didn't help stopping rbd mount operation and restarting OSD's etc.


Strange is, that it affects only 2 of my OSD's - one is on NVME and has cache PG's mapped, other one is rotational HDD. IOPS are normal, 
only CPU and memory usage are unusual.


I noticed hundred thousands of slow requests (on idle cluster) so I startto debug OSD operations. You can find OSD 8 log below. OSD 9 
contains nvme cached PG's and there is nothing unusual in standard log. In the middle of log I raised debug level to 10.


It look's to me, that it is trying to do one operation with one snapshotted RBD object in PG 2.b 
(rbd_data.9e87fa74b0dc51.0001d1a9:7e3) in a loop.


Is th

[ceph-users] Problem with OSD - stuck in CPU loop after rbd snapshot mount

2020-02-03 Thread Jan Pekař - Imatic

Hi all,

I have small cluster and yesterday I tried to mount older RBD snapshot torecover data. (I have approx. 230 daily snapshots of one RBD image 
on my small ceph).


After I did mount and ls operation, cluster was stuck and I notice that 2of my OSD's eaten CPU and raise in memory usage (more than 4x more 
memory than usual). Cluster was unusable after that and it didn't help stopping rbd mount operation and restarting OSD's etc.


Strange is, that it affects only 2 of my OSD's - one is on NVME and has cache PG's mapped, other one is rotational HDD. IOPS are normal, 
only CPU and memory usage are unusual.


I noticed hundred thousands of slow requests (on idle cluster) so I startto debug OSD operations. You can find OSD 8 log below. OSD 9 
contains nvme cached PG's and there is nothing unusual in standard log. In the middle of log I raised debug level to 10.


It look's to me, that it is trying to do one operation with one snapshotted RBD object in PG 2.b 
(rbd_data.9e87fa74b0dc51.0001d1a9:7e3) in a loop.


Is there any way how to stop that? What is causing the problem? Can I somehow flush OSD's "journals" or "queues" to stop current operation 
and make cluster usable again? Is there any way how I can access old snapshot data?


Problem started on Luminous 12.2.12 and upgrade to 12.2.13 didn't help.

Thank you
Jan Pekar

2020-02-03 15:09:22.470277 7f51f6af7700  0 log_channel(cluster) log [WRN] : 30966 slow requests, 5 included below; oldest blocked for > 
37.505482 secs
2020-02-03 15:09:22.470336 7f51f6af7700  0 log_channel(cluster) log [WRN] : slow request 37.503404 seconds old, received at 2020-02-03 
15:08:44.956422: osd_op(osd.9.59923:2670991 2.b 2.162d8f8b (undecoded) 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg
2020-02-03 15:09:22.470360 7f51f6af7700  0 log_channel(cluster) log [WRN] : slow request 37.500695 seconds old, received at 2020-02-03 
15:08:44.959132: osd_op(osd.9.59923:2671023 2.b 2.162d8f8b (undecoded) 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg
2020-02-03 15:09:22.470366 7f51f6af7700  0 log_channel(cluster) log [WRN] : slow request 37.498013 seconds old, received at 2020-02-03 
15:08:44.961814: osd_op(osd.9.59923:2671055 2.b 2.162d8f8b (undecoded) 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg
2020-02-03 15:09:22.470371 7f51f6af7700  0 log_channel(cluster) log [WRN] : slow request 37.494836 seconds old, received at 2020-02-03 
15:08:44.964990: osd_op(osd.9.59923:2671087 2.b 2.162d8f8b (undecoded) 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg
2020-02-03 15:09:22.470376 7f51f6af7700  0 log_channel(cluster) log [WRN] : slow request 37.491950 seconds old, received at 2020-02-03 
15:08:44.967877: osd_op(osd.9.59923:2671119 2.b 2.162d8f8b (undecoded) 
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg

...
2020-02-03 15:09:35.537259 7f51ddac5700  0 osd.8 59939 do_command r=0
2020-02-03 15:09:35.588753 7f51ddac5700  0 osd.8 59939 do_command r=0 Set 
debug_osd to 10/10
2020-02-03 15:09:35.588814 7f51ddac5700  0 log_channel(cluster) log [INF] : Set 
debug_osd to 10/10
2020-02-03 15:09:35.597667 7f51f0945700  2 osd.8 59939 ms_handle_reset con 
0x5647736ac800 session 0x564785519e00
2020-02-03 15:09:35.599056 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] 
local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 
lcod 0'0 mlcod 0'0 active+clean]  dropping ondisk_read_lock

2020-02-03 15:09:35.599209 7f51e0acb700 10 osd.8 59939 dequeue_op 
0x5647873d5800 finish
2020-02-03 15:09:35.599250 7f51e0acb700 10 osd.8 59939 dequeue_op 0x5647873d59c0 prio 63 cost 29 latency 22.224501 
osd_op(osd.9.59923:2779946 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected 
e59939) v8 pg pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 
59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 lcod 0'0 mlcod 0'0 active+clean]
2020-02-03 15:09:35.599293 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] 
local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 
lcod 0'0 mlcod 0'0 active+clean] _handle_message: 0x5647873d59c0
2020-02-03 15:09:35.599347 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] 
local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 cr