[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck
Hi, the same on my side - destroyed and replaced by bluestore. JP On 28/02/2023 14.17, Mark Schouten wrote: Hi, I just destroyed the filestore osd and added it as a bluestore osd. Worked fine. — Mark Schouten, CTO Tuxis B.V. m...@tuxis.nl / +31 318 200208 -- Original Message -- From "Jan Pekař - Imatic" To m...@tuxis.nl; ceph-users@ceph.io Date 2/25/2023 4:14:54 PM Subject Re: [ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck Hi, I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 1500 keys. JP On 23/02/2023 00.16, Jan Pekař - Imatic wrote: Hi, I enabled debug and the same - 1500 keys is where it ends.. I also enabled debug_filestore and ... 2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7920 already registered 2023-02-23T00:02:34.876+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 148188829 osr(meta) 4859 bytes (queue has 49 ops and 238167 bytes) 2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy converted 1470 keys 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 0x55fb27780540 osr(meta) 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 148188845 [Transaction(0x55fb299870e0)] 2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# (0x55fb29f9c9e0) 2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy converted 1500 keys 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 0x55fb27780540 osr(meta) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 148188830 osr(meta) [Transaction(0x55fb29986000)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7aa0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 148188830 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 148188831 osr(meta) [Transaction(0x55fb29986120)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7c20 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 148188831 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 148188832 osr(meta) [Transaction(0x55fb29986240)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7d40 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 148188832 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 148188833 osr(meta) [Transaction(0x55fb29986360)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7ec0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 148188833 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 148188834 osr(meta) [Transaction(0x55fb29986480)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb2921db60 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 148188834 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 148188835 osr(meta) [Transaction(0x55fb299865a0)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb2921c8a0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 148188835 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 148188836 osr(meta) [Transaction(0x55fb299866c0)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20
[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck
Hi, I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 1500 keys. JP On 23/02/2023 00.16, Jan Pekař - Imatic wrote: Hi, I enabled debug and the same - 1500 keys is where it ends.. I also enabled debug_filestore and ... 2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7920 already registered 2023-02-23T00:02:34.876+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 148188829 osr(meta) 4859 bytes (queue has 49 ops and 238167 bytes) 2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy converted 1470 keys 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 0x55fb27780540 osr(meta) 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 148188845 [Transaction(0x55fb299870e0)] 2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# (0x55fb29f9c9e0) 2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy converted 1500 keys 2023-02-23T00:02:34.880+0100 7f8efc23ee00 5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 0x55fb27780540 osr(meta) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 148188830 osr(meta) [Transaction(0x55fb29986000)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7aa0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 148188830 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 148188831 osr(meta) [Transaction(0x55fb29986120)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7c20 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 148188831 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 148188832 osr(meta) [Transaction(0x55fb29986240)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7d40 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 148188832 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 148188833 osr(meta) [Transaction(0x55fb29986360)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb297e7ec0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 148188833 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 148188834 osr(meta) [Transaction(0x55fb29986480)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb2921db60 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 148188834 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 148188835 osr(meta) [Transaction(0x55fb299865a0)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb2921c8a0 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 148188835 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 148188836 osr(meta) [Transaction(0x55fb299866c0)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52000 already registered 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb29c52000 seq 148188836 osr(meta) 4859 bytes (queue has 50 ops and 243026 bytes) 2023-02-23T00:02:34.888+0100 7f8ef26d1700 5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52180 seq 148188837 osr(meta) [Transaction(0x55fb299867e0)] 2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr
[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck
. And nothing happens. I’m now stuck with a cluster that has 4 OSD’s on Octopus, 10 on Nautilus, and one down .. A hint on how to work around this is welcome :) — Mark Schouten, CTO Tuxis B.V. m...@tuxis.nl / +31 318 200208 -- Original Message -- From "Jan Pekař - Imatic" To ceph-users@ceph.io Date 1/12/2023 5:53:02 PM Subject [ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck Hi all, I have problem upgrading nautilus to octopus on my OSD. Upgrade mon and mgr was OK and first OSD stuck on 2023-01-12T09:25:54.122+0100 7f49ff3eae00 1 osd.0 126556 init upgrade snap_mapper (first start as octopus) and there were no activity after that for more than 48 hours. No disk activity. I restarted OSD many times and nothing changed. It is old, filestore OSD based on XFS filesystem. Is upgrade to snap mapper 2 reliable? What is OSD waiting for? Can I start OSD without upgrade and get cluster healthy with old snap structure? Or should I skip octopus upgrade and go to pacific directly (some bug backport is missing?). Thank you for help, I'm sending some logs below.. Log shows 2023-01-09T19:12:49.471+0100 7f41f60f1e00 0 ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus (stable), process ceph-osd, pid 2566563 2023-01-09T19:12:49.471+0100 7f41f60f1e00 0 pidfile_write: ignore empty --pid-file 2023-01-09T19:12:49.499+0100 7f41f60f1e00 -1 missing 'type' file, inferring filestore from current/ dir 2023-01-09T19:12:49.531+0100 7f41f60f1e00 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2023-01-09T19:12:49.531+0100 7f41f60f1e00 -1 Falling back to public interface 2023-01-09T19:12:49.871+0100 7f41f60f1e00 0 load: jerasure load: lrc load: isa 2023-01-09T19:12:49.875+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:0.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:1.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:2.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 'filestore splice' config option 2023-01-09T19:12:49.983+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2023-01-09T19:12:49.983+0100 7f41f60f1e00 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf 2023-01-09T19:12:50.015+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) start omap initiation 2023-01-09T19:12:50.079+0100 7f41f60f1e00 1 leveldb: Recovering log #165531 2023-01-09T19:12:50.083+0100 7f41f60f1e00 1 leveldb: Level-0 table #165533: started 2023-01-09T19:12:50.235+0100 7f41f60f1e00 1 leveldb: Level-0 table #165533: 1598 bytes OK 2023-01-09T19:12:50.583+0100 7f41f60f1e00 1 leveldb: Delete type=0 #165531 2023-01-09T19:12:50.615+0100 7f41f60f1e00 1 leveldb: Delete type=3 #165529 2023-01-09T19:12:51.339+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) mount(1861): enabling WRITEAHEAD journal mode: checkpoint is not enabled 2023-01-09T19:12:51.379+0100 7f41f60f1e00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, directio = 1, aio = 1 2023-01-09T19:12:51.931+0100 7f41f60f1e00 -1 journal do_read_entry(243675136): bad header magic 2023-01-09T19:12:51.939+0100 7f41f60f1e00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, directio = 1, aio = 1 2023-01-09T19:12:51.943+0100 7f41f60f1e00 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade(1466) 2023-01-09T19:12:52.015+0100 7f41f60f1e00 1 osd.0 126556 init upgrade snap_mapper (first start as octopus) lsof shows COMMAND PID USER FD TYPE DEVICE SI
[ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck
Hi all, I have problem upgrading nautilus to octopus on my OSD. Upgrade mon and mgr was OK and first OSD stuck on 2023-01-12T09:25:54.122+0100 7f49ff3eae00 1 osd.0 126556 init upgrade snap_mapper (first start as octopus) and there were no activity after that for more than 48 hours. No disk activity. I restarted OSD many times and nothing changed. It is old, filestore OSD based on XFS filesystem. Is upgrade to snap mapper 2 reliable? What is OSD waiting for? Can I start OSD without upgrade and get cluster healthy with old snap structure? Or should I skip octopus upgrade and go to pacific directly (some bug backport is missing?). Thank you for help, I'm sending some logs below.. Log shows 2023-01-09T19:12:49.471+0100 7f41f60f1e00 0 ceph version 15.2.17 (694d03a6f6c6e9f814446223549caf9a9f60dba0) octopus (stable), process ceph-osd, pid 2566563 2023-01-09T19:12:49.471+0100 7f41f60f1e00 0 pidfile_write: ignore empty --pid-file 2023-01-09T19:12:49.499+0100 7f41f60f1e00 -1 missing 'type' file, inferring filestore from current/ dir 2023-01-09T19:12:49.531+0100 7f41f60f1e00 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2023-01-09T19:12:49.531+0100 7f41f60f1e00 -1 Falling back to public interface 2023-01-09T19:12:49.871+0100 7f41f60f1e00 0 load: jerasure load: lrc load: isa 2023-01-09T19:12:49.875+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:0.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:1.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:2.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 osd.0:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) 2023-01-09T19:12:49.883+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2023-01-09T19:12:49.927+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 'filestore splice' config option 2023-01-09T19:12:49.983+0100 7f41f60f1e00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2023-01-09T19:12:49.983+0100 7f41f60f1e00 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf 2023-01-09T19:12:50.015+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) start omap initiation 2023-01-09T19:12:50.079+0100 7f41f60f1e00 1 leveldb: Recovering log #165531 2023-01-09T19:12:50.083+0100 7f41f60f1e00 1 leveldb: Level-0 table #165533: started 2023-01-09T19:12:50.235+0100 7f41f60f1e00 1 leveldb: Level-0 table #165533: 1598 bytes OK 2023-01-09T19:12:50.583+0100 7f41f60f1e00 1 leveldb: Delete type=0 #165531 2023-01-09T19:12:50.615+0100 7f41f60f1e00 1 leveldb: Delete type=3 #165529 2023-01-09T19:12:51.339+0100 7f41f60f1e00 0 filestore(/var/lib/ceph/osd/ceph-0) mount(1861): enabling WRITEAHEAD journal mode: checkpoint is not enabled 2023-01-09T19:12:51.379+0100 7f41f60f1e00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, directio = 1, aio = 1 2023-01-09T19:12:51.931+0100 7f41f60f1e00 -1 journal do_read_entry(243675136): bad header magic 2023-01-09T19:12:51.939+0100 7f41f60f1e00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 35: 2998927360 bytes, block size 4096 bytes, directio = 1, aio = 1 2023-01-09T19:12:51.943+0100 7f41f60f1e00 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade(1466) 2023-01-09T19:12:52.015+0100 7f41f60f1e00 1 osd.0 126556 init upgrade snap_mapper (first start as octopus) lsof shows COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-osd 225860 ceph cwd DIR 9,127 4096 2 / ceph-osd 225860 ceph rtd DIR 9,127 4096 2 / ceph-osd 225860 ceph txt REG 9,127 31762544 5021 /usr/bin/ceph-osd ceph-osd 225860 ceph mem REG 8,70 2147237 68104224 /var/lib/ceph/osd/ceph-0/current/omap/165546.ldb ceph-osd 225860 ceph mem REG 8,70 2147792 68104190 /var/lib/ceph/osd/ce
[ceph-users] Disable peering of some pool
Hi all, we have problem on our production cluster running nautilus (14.2.22). Cluster is almost full and few month ago we noticed issues with slow peering - when we restart any osd (or host) it takes hours to finish peering process, instead of minutes. We noticed, that some pool contains 90k in 300GB objects per PG, so we decided to increase pg_num on that pool so individual PG is peered quickly. During that state we got into stuck PG inactive for hours and peering not finised, and some OSD went down with this error https://tracker.ceph.com/issues/51168 We decided to restart all osds and waiting, but problem with slow peering persists. Is there any way how to get cluster healthy? Or disable peering of some pool so other pools with RBD images get peered and get online and after that try to peer that big pool? Thank you for help, it is urgent situation With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 https://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] MonSession vs TCP connection
Hi all, I would like to "pair" MonSession with TCP connection to get real process, which is using that session. I need it to identify processes with old ceph features. MonSession looks like MonSession(client.84324148 [..IP...]:0/3096235764 is open allow *, features 0x27018fb86aa42ada (jewel)) What does client.NUMBER and 0/3096235764 means? How can I resolve client.NUMBER or that /NUMBER with certain TCP session. I have many processes on that server (on that IP) with different features. Thank you -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem unusable after deleting pool with bilion objects
Hi Igor, thank you, I also think that it is the problem you described. I recreated OSD's now and also noticed strange warnings - HEALTH_WARN Degraded data redundancy: 106763/723 objects degraded (14766.667%) Maybe there are some "phantom", zero sized objects (OMAPs?), that cluster is recovering, but I don't need them (are not listed in ceph df). You mentioned DB vs. Main devices ratio (1:11) - I'm not separating DB from device - each device has it's own RockDB on it. With regards Jan Pekar On 11/09/2020 14.36, Igor Fedotov wrote: Hi Jan, most likely this is a known issue with slow and ineffective pool removal procedure in Ceph. I did some presentation on the topic at yesterday's weekly performance meeting, presumably a recording will be available in a couple of days. An additional accompanying issue not covered during this meeting is RocksDB's misbehavior after (or during) such massive removals. At some point it starts to slow down reading operations handling (e.g. collection listing) which results in OSD suicide timeouts. Exactly what is observed in your case. There were multiple discussion on this issue in this mailing list too. In short the currect workaround is to perform manual DB compaction using ceph-kvstore-tool. Pool removal will most likely to proceed hence one might face similar assertions after a while. Hence there might be a need for multiple "compaction-restart" iterations until pool is finally removed. And yet another potential issue (or at least an additional factor) with your setup is a pretty high DB vs. Main devices ratio (1:11). Deleting procedures from multiple OSDs result in a pretty highload on DB volume which becomes overburdened... Thanks, Igor On 9/11/2020 3:00 PM, Jan Pekař - Imatic wrote: Hi all, I have build testing cluster with 4 hosts, 1 SSD's and 11 HDD on each host. Running ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) on Ubuntu. Because we want to save small size object, I set bluestore_min_alloc_size 8192 (it is maybe important in this case) I have filled it through rados gw with approx billion of small objects. After tests I changed min_alloc_size back and deleted rados pools (to emtpy whole cluster) and I was waiting till cluster deletes data from OSD's, but that destabilized the cluster. I never reached health OK. OSD's were killed in random order. I can start them back but they will again get out from cluster with.. ``` -18> 2020-09-05 22:11:19.430 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -17> 2020-09-05 22:11:19.430 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 1644167168 kv_used: 1644135504 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304 -16> 2020-09-05 22:11:20.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064941056 unmapped: 8126464 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -15> 2020-09-05 22:11:21.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -14> 2020-09-05 22:11:22.258 7f7a2b81f700 5 osd.42 103257 heartbeat osd_stat(store_statfs(0x1ce1829/0x2d08c/0x1d18000, data 0x23143355/0x974a, compress 0x0/0x0/0x0, omap 0x1f11e, meta 0x2d08a0ee2), peers [3,4,6,7,8,11,12,13,14,16,17,18,19,21,23,24,25,27,28,29,31,32,33,34,41,43] op hist []) -13> 2020-09-05 22:11:22.438 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -12> 2020-09-05 22:11:23.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -11> 2020-09-05 22:11:24.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064285696 unmapped: 8781824 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -10> 2020-09-05 22:11:24.442 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 1644167168 kv_used: 1644119840 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304 -9> 2020-09-05 22:11:24.442 7f7a2e024700 0 bluestore(/var/lib/ceph/osd/ceph-42) log_latency_fn slow operation observed for _collection_list, latency = 151.113s, lat = 2m cid =5.47_head start #5:e2000# end #MAX# max 2147483647 -8> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had timed out after 15 -7> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_t
[ceph-users] Problem unusable after deleting pool with bilion objects
Hi all, I have build testing cluster with 4 hosts, 1 SSD's and 11 HDD on each host. Running ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) on Ubuntu. Because we want to save small size object, I set bluestore_min_alloc_size 8192 (it is maybe important in this case) I have filled it through rados gw with approx billion of small objects. After tests I changed min_alloc_size back and deleted rados pools (to emtpy whole cluster) and I was waiting till cluster deletes data from OSD's, but that destabilized the cluster. I never reached health OK. OSD's were killed in random order. I can start them back but they will again get out from cluster with.. ``` -18> 2020-09-05 22:11:19.430 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -17> 2020-09-05 22:11:19.430 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 1644167168 kv_used: 1644135504 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304 -16> 2020-09-05 22:11:20.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064941056 unmapped: 8126464 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -15> 2020-09-05 22:11:21.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -14> 2020-09-05 22:11:22.258 7f7a2b81f700 5 osd.42 103257 heartbeat osd_stat(store_statfs(0x1ce1829/0x2d08c/0x1d18000, data 0x23143355/0x974a, compress 0x0/0x0/0x0, omap 0x1f11e, meta 0x2d08a0ee2), peers [3,4,6,7,8,11,12,13,14,16,17,18,19,21,23,24,25,27,28,29,31,32,33,34,41,43] op hist []) -13> 2020-09-05 22:11:22.438 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -12> 2020-09-05 22:11:23.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -11> 2020-09-05 22:11:24.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064285696 unmapped: 8781824 heap: 2073067520 old mem: 1932735282 new mem: 1932735282 -10> 2020-09-05 22:11:24.442 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc: 1644167168 kv_used: 1644119840 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304 -9> 2020-09-05 22:11:24.442 7f7a2e024700 0 bluestore(/var/lib/ceph/osd/ceph-42) log_latency_fn slow operation observed for _collection_list, latency = 151.113s, lat = 2m cid =5.47_head start #5:e2000# end #MAX# max 2147483647 -8> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had timed out after 15 -7> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had suicide timed out after 150 -6> 2020-09-05 22:11:24.446 7f7a4c2a4700 10 monclient: get_auth_request con 0x555b15d07680 auth_method 0 -5> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15963600 session 0x555a9f9d6d00 -4> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15961b00 session 0x555a9f9d7980 -3> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15963a80 session 0x555a9f9d6a80 -2> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15960480 session 0x555a9f9d6f80 -1> 2020-09-05 22:11:24.446 7f7a3c494700 3 osd.42 103257 handle_osd_map epochs [103258,103259], i have 103257, src has [83902,103259] 0> 2020-09-05 22:11:24.450 7f7a2e024700 -1 *** Caught signal (Aborted) ** ``` I have approx 12 OSD's down with this error. I decided to wipe problematic OSD's so I cannot debug it, but I'm curious what I did wrong (deleting pool with many small data?) or what to do next time. I did that before but not with bilion object and without bluestore_min_alloc_size change, and it worked without problems. With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues
Hi Ben, we are not using EC pool on that cluster. OSD out behavior almost stopped when we solved memory issues (less memory allocated to OSD's). Now we are not working on that cluster anymore so we have no other info about that problem. Jan On 20/07/2020 07.59, Benoît Knecht wrote: Hi Jan, Jan Pekař wrote: Also I'm concerned, that this OSD restart caused data degradation and recovery - cluster should be clean immediately after OSD up when no client was uploading/modifying data during my tests. We're experiencing the same thing on our 14.2.10 cluster. After marking an OSD out, if it's briefly marked down (due to the missed heartbeats or because the daemon was manually restarted) the PGs that were still mapped on it disappear all at once, and we get degraded objects as a result. In our case, those PGs belong to an EC pool, and we use the PG balancer in upmap mode, so we have a few upmapped PGs on that OSD. Is that the case for you too? We're going to run some tests to try and better understand what's going on there, but we welcome any feedback in the meantime. Cheers, -- Ben -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues
I understand, so I expected slow requests (like X slow requests are blocked > 32 sec) but I was not expecting that heartbeats are missed or OSD's were restarted. Maybe this "hard recovery" was not tested enough. Also I'm concerned, that this OSD restart caused data degradation and recovery - cluster should be clean immediately after OSD up when no client was uploading/modifying data during my tests. With regards Jan Pekar On 21/03/2020 15.56, Anthony D'Atri wrote: This is an expensive operation. You want to slow it down, not burden the OSDs. On Mar 21, 2020, at 5:46 AM, Jan Pekař - Imatic wrote: Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used). On 21/03/2020 13.14, XuYun wrote: Bluestore requires more than 4G memory per OSD, do you have enough memory? 2020年3月21日 下午8:09,Jan Pekař - Imatic 写道: Hello, I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW. Cluster is now used for testing so no client was using it during my admin operations mentioned below After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup this operation, I've set ceph config set mgr target_max_misplaced_ratio 1 so that whole cluster rebalance as quickly as it can. I have 3 issues/questions below: 1) I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150 after a while OSD's were back so I continued after a while with my tests. My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version. 2) Rebuild was still slow so I increased number of backfills ceph tell osd.* injectargs "--osd-max-backfills 10" and reduced recovery sleep time ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01" and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see ... |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy | I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated). So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing heartbeats to other OSD's and MON. It looks like it is not true. 3) After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that. Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and cluster was healthy within seconds. Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now. With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 ===
[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues
Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used). On 21/03/2020 13.14, XuYun wrote: Bluestore requires more than 4G memory per OSD, do you have enough memory? 2020年3月21日 下午8:09,Jan Pekař - Imatic 写道: Hello, I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW. Cluster is now used for testing so no client was using it during my admin operations mentioned below After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup this operation, I've set ceph config set mgr target_max_misplaced_ratio 1 so that whole cluster rebalance as quickly as it can. I have 3 issues/questions below: 1) I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150 after a while OSD's were back so I continued after a while with my tests. My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version. 2) Rebuild was still slow so I increased number of backfills ceph tell osd.* injectargs "--osd-max-backfills 10" and reduced recovery sleep time ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01" and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see ... |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy | I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated). So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing heartbeats to other OSD's and MON. It looks like it is not true. 3) After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that. Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and cluster was healthy within seconds. Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now. With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Problem with OSD::osd_op_tp thread had timed out and other connected issues
Hello, I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW. Cluster is now used for testing so no client was using it during my admin operations mentioned below After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup this operation, I've set ceph config set mgr target_max_misplaced_ratio 1 so that whole cluster rebalance as quickly as it can. I have 3 issues/questions below: 1) I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150 after a while OSD's were back so I continued after a while with my tests. My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version. 2) Rebuild was still slow so I increased number of backfills ceph tell osd.* injectargs "--osd-max-backfills 10" and reduced recovery sleep time ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01" and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see ... |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy | I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated). So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing heartbeats to other OSD's and MON. It looks like it is not true. 3) After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that. Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and cluster was healthy within seconds. Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now. With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz | +420326555326 -- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Problem with OSD - stuck in CPU loop after rbd snapshot mount
7;3048390 mlcod 60042'3048390 active+clean] do_osd_op 2:311c8802:::rbd_data.09df7e2ae8944a.:7b5 [copy-get max 8388608] 2020-02-05 22:46:53.787483 7ff1c3749700 10 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] do_osd_op copy-get max 8388608 2020-02-05 22:46:53.787515 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] got attrs 2020-02-05 22:46:53.807853 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] got data 2020-02-05 22:46:53.808034 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] got reqids 2020-02-05 22:46:53.808051 7ff1c3749700 20 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] cursor.is_complete=1 0 attrs 4194304 bytes 0 omap header bytes 0 omap data bytes in 0 keys 0 reqids 2020-02-05 22:46:53.808085 7ff1c3749700 10 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] dropping ondisk_read_lock 2020-02-05 22:46:53.808107 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] do_osd_op_effects osd.9 con 0x56135d9e5000 2020-02-05 22:46:53.808132 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] log_op_stats osd_op(osd.9.60026:836317 2.c 2:311c8802:::rbd_data.09df7e2ae8944a.:7b5 [copy-get max 8388608] snapc 0=[] ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e60042) v8 inb 0 outb 4194423 lat 19.627387 2020-02-05 22:46:53.808171 7ff1c3749700 15 osd.1 pg_epoch: 60042 pg[2.c( v 60042'3048392 (58743'3046874,60042'3048392] local-lis/les=60041/60042 n=5112 ec=1/1 lis/c 60041/60041 les/c/f 60042/60042/38237 60041/60041/60034) [1,8] r=0 lpr=60041 crt=60042'3048392 lcod 60042'3048390 mlcod 60042'3048390 active+clean] publish_stats_to_osd 60042:3424975 2020-02-05 22:46:53.808292 7ff1c3749700 10 osd.1 60042 dequeue_op 0x561362c80bc0 finish On 03/02/2020 15.48, Jan Pekař - Imatic wrote: Hi all, I have small cluster and yesterday I tried to mount older RBD snapshot torecover data. (I have approx. 230 daily snapshots of one RBD image on my small ceph). After I did mount and ls operation, cluster was stuck and I notice that 2of my OSD's eaten CPU and raise in memory usage (more than 4x more memory than usual). Cluster was unusable after that and it didn't help stopping rbd mount operation and restarting OSD's etc. Strange is, that it affects only 2 of my OSD's - one is on NVME and has cache PG's mapped, other one is rotational HDD. IOPS are normal, only CPU and memory usage are unusual. I noticed hundred thousands of slow requests (on idle cluster) so I startto debug OSD operations. You can find OSD 8 log below. OSD 9 contains nvme cached PG's and there is nothing unusual in standard log. In the middle of log I raised debug level to 10. It look's to me, that it is trying to do one operation with one snapshotted RBD object in PG 2.b (rbd_data.9e87fa74b0dc51.0001d1a9:7e3) in a loop. Is th
[ceph-users] Problem with OSD - stuck in CPU loop after rbd snapshot mount
Hi all, I have small cluster and yesterday I tried to mount older RBD snapshot torecover data. (I have approx. 230 daily snapshots of one RBD image on my small ceph). After I did mount and ls operation, cluster was stuck and I notice that 2of my OSD's eaten CPU and raise in memory usage (more than 4x more memory than usual). Cluster was unusable after that and it didn't help stopping rbd mount operation and restarting OSD's etc. Strange is, that it affects only 2 of my OSD's - one is on NVME and has cache PG's mapped, other one is rotational HDD. IOPS are normal, only CPU and memory usage are unusual. I noticed hundred thousands of slow requests (on idle cluster) so I startto debug OSD operations. You can find OSD 8 log below. OSD 9 contains nvme cached PG's and there is nothing unusual in standard log. In the middle of log I raised debug level to 10. It look's to me, that it is trying to do one operation with one snapshotted RBD object in PG 2.b (rbd_data.9e87fa74b0dc51.0001d1a9:7e3) in a loop. Is there any way how to stop that? What is causing the problem? Can I somehow flush OSD's "journals" or "queues" to stop current operation and make cluster usable again? Is there any way how I can access old snapshot data? Problem started on Luminous 12.2.12 and upgrade to 12.2.13 didn't help. Thank you Jan Pekar 2020-02-03 15:09:22.470277 7f51f6af7700 0 log_channel(cluster) log [WRN] : 30966 slow requests, 5 included below; oldest blocked for > 37.505482 secs 2020-02-03 15:09:22.470336 7f51f6af7700 0 log_channel(cluster) log [WRN] : slow request 37.503404 seconds old, received at 2020-02-03 15:08:44.956422: osd_op(osd.9.59923:2670991 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg 2020-02-03 15:09:22.470360 7f51f6af7700 0 log_channel(cluster) log [WRN] : slow request 37.500695 seconds old, received at 2020-02-03 15:08:44.959132: osd_op(osd.9.59923:2671023 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg 2020-02-03 15:09:22.470366 7f51f6af7700 0 log_channel(cluster) log [WRN] : slow request 37.498013 seconds old, received at 2020-02-03 15:08:44.961814: osd_op(osd.9.59923:2671055 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg 2020-02-03 15:09:22.470371 7f51f6af7700 0 log_channel(cluster) log [WRN] : slow request 37.494836 seconds old, received at 2020-02-03 15:08:44.964990: osd_op(osd.9.59923:2671087 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg 2020-02-03 15:09:22.470376 7f51f6af7700 0 log_channel(cluster) log [WRN] : slow request 37.491950 seconds old, received at 2020-02-03 15:08:44.967877: osd_op(osd.9.59923:2671119 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) currently queued_for_pg ... 2020-02-03 15:09:35.537259 7f51ddac5700 0 osd.8 59939 do_command r=0 2020-02-03 15:09:35.588753 7f51ddac5700 0 osd.8 59939 do_command r=0 Set debug_osd to 10/10 2020-02-03 15:09:35.588814 7f51ddac5700 0 log_channel(cluster) log [INF] : Set debug_osd to 10/10 2020-02-03 15:09:35.597667 7f51f0945700 2 osd.8 59939 ms_handle_reset con 0x5647736ac800 session 0x564785519e00 2020-02-03 15:09:35.599056 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 lcod 0'0 mlcod 0'0 active+clean] dropping ondisk_read_lock 2020-02-03 15:09:35.599209 7f51e0acb700 10 osd.8 59939 dequeue_op 0x5647873d5800 finish 2020-02-03 15:09:35.599250 7f51e0acb700 10 osd.8 59939 dequeue_op 0x5647873d59c0 prio 63 cost 29 latency 22.224501 osd_op(osd.9.59923:2779946 2.b 2.162d8f8b (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e59939) v8 pg pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 lcod 0'0 mlcod 0'0 active+clean] 2020-02-03 15:09:35.599293 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 crt=59908'4218227 lcod 0'0 mlcod 0'0 active+clean] _handle_message: 0x5647873d59c0 2020-02-03 15:09:35.599347 7f51e0acb700 10 osd.8 pg_epoch: 59939 pg[2.b( v 59908'4218227 (58731'4216723,59908'4218227] local-lis/les=59932/59934 n=4693 ec=1/1 lis/c 59932/59932 les/c/f 59934/59934/38237 59932/59932/59932) [8,0] r=0 lpr=59932 cr