Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems
On 6.3.2018 22:28, Gregory Farnum wrote: On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic> wrote: Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware failures (disk IO errors, restarts, corrupted data etc). I'm running RBDs using libvirt on debian wheezy and jessie (stable and oldstable). I'm snapshotting RBD's using Luminous client on Debian Jessie only. When you say "cached by", do you mean there's a cache pool? Or are you using bcache or something underneath? I mean cache pool. Now problems, from light to severe: 1) Almost every day I notice health some problems after deep scrub 1-2 inconsistent PG's with "read_error" on some osd's. When I don't repair it, it disappears after few days (? another deep scrub). There is no read_error on disks (disk check ok, no errors logged in syslog). 2) I noticed on my pool 6 (cached pool), that scrub reports some objects, that shouldn't be there: 2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod 8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2 (have MIN) I think, that means orphaned snap object without his head replica. Maybe snaptrim left it there? Why? Maybe error during snaptrim? Or fstrim/discard removed "head" object (this is I hope nonsense)? 3) I ended with one object (probably snap object), that has only 1 replica (out from size 3) and when I try to repair it, my OSD crash with /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) I guess, that it detected orphaned snap object I noticed at 2) and don't repair it, just assterts and stop OSD. Am I right? I noticed comment "// hmm, should we warn?" on ceph source at that assert code. So should someone remove that assert? There's a ticket https://tracker.ceph.com/issues/23030, which links to a much longer discussion on this mailing list between Sage and Stefan which discusses this particular assert. I'm not entirely clear from the rest of your story (and the lng history in that thread) if there are other potential causes, or if your story might help diagnose it. But I'd start there since AFAIK it's still a mystery that looks serious but has only a very small number of incidences. :/ -Greg Thank you, I will go through it, but it looks not to be related to my issue. Now I added new disk to cluster, upgraded to 12.2.4. on some nodes and so far no scrub errors. If there is no clear answer to my OSD crash, I will try to wipe OSD containing problematic object (that causes my primary OSD fail), and rebuild it from other copies. Hope without that object, crash will not appear. If you think, that osdmaptool is safe for fuse-mount and delete object by hand, I can try it. But I'm looking for some tool to do that "online" on all PG copies and with checks (that snap object is not referenced somewhere). With regards Jan Pekar And my questions are How can I fix issue with crashing OSD? How can I safely remove that objects with missing head? Is there any tool or force-snaptrim on non-existent snapshots? It is prod cluster so I want to be careful. I have no problems now with data availability. My last idea is to move RBD's to another pool, but have not enough space to do that (as I know RBD can only copy not move) so I'm looking for another clean solution. And last question - how can I find, what is causing that read_erros and snap object leftovers? Should I paste my whole log? It is bigger than allowed post size. Pasting most important events: -23> 2018-02-27 23:43:07.903368 7f4b3820e700 2 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985 mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20 repair 1 missing, 0 inconsistent objects -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed
Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imaticwrote: > Hi all, > > I have few problems on my cluster, that are maybe linked together and > now caused OSD down during pg repair. > > First few notes about my cluster: > > 4 nodes, 15 OSDs installed on Luminous (no upgrade). > Replicated pools with 1 pool (pool 6) cached by ssd disks. > I don't detect any hardware failures (disk IO errors, restarts, > corrupted data etc). > I'm running RBDs using libvirt on debian wheezy and jessie (stable and > oldstable). > I'm snapshotting RBD's using Luminous client on Debian Jessie only. > When you say "cached by", do you mean there's a cache pool? Or are you using bcache or something underneath? > > Now problems, from light to severe: > > 1) > Almost every day I notice health some problems after deep scrub > 1-2 inconsistent PG's with "read_error" on some osd's. > When I don't repair it, it disappears after few days (? another deep > scrub). There is no read_error on disks (disk check ok, no errors logged > in syslog). > > 2) > I noticed on my pool 6 (cached pool), that scrub reports some objects, > that shouldn't be there: > > 2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20( > v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299 > ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) > [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod > 8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps > no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2 > (have MIN) > > I think, that means orphaned snap object without his head replica. Maybe > snaptrim left it there? Why? Maybe error during snaptrim? Or > fstrim/discard removed "head" object (this is I hope nonsense)? > > 3) > I ended with one object (probably snap object), that has only 1 replica > (out from size 3) and when I try to repair it, my OSD crash with > > /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > recovery_info.ss.clone_snaps.end()) > I guess, that it detected orphaned snap object I noticed at 2) and don't > repair it, just assterts and stop OSD. Am I right? > > I noticed comment "// hmm, should we warn?" on ceph source at that > assert code. So should someone remove that assert? > There's a ticket https://tracker.ceph.com/issues/23030, which links to a much longer discussion on this mailing list between Sage and Stefan which discusses this particular assert. I'm not entirely clear from the rest of your story (and the lng history in that thread) if there are other potential causes, or if your story might help diagnose it. But I'd start there since AFAIK it's still a mystery that looks serious but has only a very small number of incidences. :/ -Greg > > And my questions are > > How can I fix issue with crashing OSD? > How can I safely remove that objects with missing head? Is there any > tool or force-snaptrim on non-existent snapshots? It is prod cluster so > I want to be careful. I have no problems now with data availability. > My last idea is to move RBD's to another pool, but have not enough space > to do that (as I know RBD can only copy not move) so I'm looking for > another clean solution. > And last question - how can I find, what is causing that read_erros and > snap object leftovers? > > Should I paste my whole log? It is bigger than allowed post size. > Pasting most important events: > > -23> 2018-02-27 23:43:07.903368 7f4b3820e700 2 osd.1 pg_epoch: 8712 > pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 > n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 > 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985 > mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20 > repair 1 missing, 0 inconsistent objects > -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster) > log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects > -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster) > log [ERR] : 6.20 repair 3 errors, 2 fixed > -20> 2018-02-27 23:43:07.903480 7f4b3820e700 5 > write_log_and_missing with: dirty_to: 0'0, dirty_from: > 4294967295'18446744073709551615, writeout_from: > 4294967295'18446744073709551615, trimmed: , trimmed_dups: , > clear_divergent_priors: 0 > -19> 2018-02-27 23:43:07.903604 7f4b3820e700 1 -- > [2a01:430:22a::cef:c011]:6805/514544 --> > [2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE > e8712) v1 -- 0x55a4c5459c00 con 0 > -18> 2018-02-27 23:43:07.903651 7f4b3820e700 1 -- > [2a01:430:22a::cef:c011]:6805/514544 --> > [2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE > e8712) v1 -- 0x55a4cb6dee00 con 0 > -17> 2018-02-27 23:43:07.903679 7f4b3820e700 1 -- > [2a01:430:22a::cef:c011]:6805/514544 --> > [2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712 > 6.20( v 8712'771986
[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems
Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware failures (disk IO errors, restarts, corrupted data etc). I'm running RBDs using libvirt on debian wheezy and jessie (stable and oldstable). I'm snapshotting RBD's using Luminous client on Debian Jessie only. Now problems, from light to severe: 1) Almost every day I notice health some problems after deep scrub 1-2 inconsistent PG's with "read_error" on some osd's. When I don't repair it, it disappears after few days (? another deep scrub). There is no read_error on disks (disk check ok, no errors logged in syslog). 2) I noticed on my pool 6 (cached pool), that scrub reports some objects, that shouldn't be there: 2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod 8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2 (have MIN) I think, that means orphaned snap object without his head replica. Maybe snaptrim left it there? Why? Maybe error during snaptrim? Or fstrim/discard removed "head" object (this is I hope nonsense)? 3) I ended with one object (probably snap object), that has only 1 replica (out from size 3) and when I try to repair it, my OSD crash with /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) I guess, that it detected orphaned snap object I noticed at 2) and don't repair it, just assterts and stop OSD. Am I right? I noticed comment "// hmm, should we warn?" on ceph source at that assert code. So should someone remove that assert? And my questions are How can I fix issue with crashing OSD? How can I safely remove that objects with missing head? Is there any tool or force-snaptrim on non-existent snapshots? It is prod cluster so I want to be careful. I have no problems now with data availability. My last idea is to move RBD's to another pool, but have not enough space to do that (as I know RBD can only copy not move) so I'm looking for another clean solution. And last question - how can I find, what is causing that read_erros and snap object leftovers? Should I paste my whole log? It is bigger than allowed post size. Pasting most important events: -23> 2018-02-27 23:43:07.903368 7f4b3820e700 2 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985 mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20 repair 1 missing, 0 inconsistent objects -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed -20> 2018-02-27 23:43:07.903480 7f4b3820e700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0 -19> 2018-02-27 23:43:07.903604 7f4b3820e700 1 -- [2a01:430:22a::cef:c011]:6805/514544 --> [2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE e8712) v1 -- 0x55a4c5459c00 con 0 -18> 2018-02-27 23:43:07.903651 7f4b3820e700 1 -- [2a01:430:22a::cef:c011]:6805/514544 --> [2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE e8712) v1 -- 0x55a4cb6dee00 con 0 -17> 2018-02-27 23:43:07.903679 7f4b3820e700 1 -- [2a01:430:22a::cef:c011]:6805/514544 --> [2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712 6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cba734a0 con 0 -16> 2018-02-27 23:43:07.903704 7f4b3820e700 1 -- [2a01:430:22a::cef:c011]:6805/514544 --> [2a01:430:22a::cef:c041]:6802/3012729 -- pg_info((query:8712 sent:8712 6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cccb5680 con 0 -15> 2018-02-27 23:43:07.903746 7f4b39a11700 5 osd.1 pg_epoch: 8712 pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod