Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-07 Thread Jan Pekař - Imatic

On 6.3.2018 22:28, Gregory Farnum wrote:
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic > wrote:


Hi all,

I have few problems on my cluster, that are maybe linked together and
now caused OSD down during pg repair.

First few notes about my cluster:

4 nodes, 15 OSDs installed on Luminous (no upgrade).
Replicated pools with 1 pool (pool 6) cached by ssd disks.
I don't detect any hardware failures (disk IO errors, restarts,
corrupted data etc).
I'm running RBDs using libvirt on debian wheezy and jessie (stable and
oldstable).
I'm snapshotting RBD's using Luminous client on Debian Jessie only.


When you say "cached by", do you mean there's a cache pool? Or are you 
using bcache or something underneath?


I mean cache pool.




Now problems, from light to severe:

1)
Almost every day I notice health some problems after deep scrub
1-2 inconsistent PG's with "read_error" on some osd's.
When I don't repair it, it disappears after few days (? another deep
scrub). There is no read_error on disks (disk check ok, no errors logged
in syslog).


2)
I noticed on my pool 6 (cached pool), that scrub reports some objects,
that shouldn't be there:

2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20(
v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299
ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710)
[1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod
8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps
no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2
(have MIN)

I think, that means orphaned snap object without his head replica. Maybe
snaptrim left it there? Why? Maybe error during snaptrim? Or
fstrim/discard removed "head" object (this is I hope nonsense)?

3)
I ended with one object (probably snap object), that has only 1 replica
(out from size 3) and when I try to repair it, my OSD crash with

/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
recovery_info.ss.clone_snaps.end())
I guess, that it detected orphaned snap object I noticed at 2) and don't
repair it, just assterts and stop OSD. Am I right?

I noticed comment "// hmm, should we warn?" on ceph source at that
assert code. So should someone remove that assert?


There's a ticket https://tracker.ceph.com/issues/23030, which links to a 
much longer discussion on this mailing list between Sage and Stefan 
which discusses this particular assert. I'm not entirely clear from the 
rest of your story (and the lng history in that thread) if there are 
other potential causes, or if your story might help diagnose it. But I'd 
start there since AFAIK it's still a mystery that looks serious but has 
only a very small number of incidences. :/

-Greg


Thank you, I will go through it, but it looks not to be related to my issue.
Now I added new disk to cluster, upgraded to 12.2.4. on some nodes and 
so far no scrub errors.
If there is no clear answer to my OSD crash, I will try to wipe OSD 
containing problematic object (that causes my primary OSD fail), and 
rebuild it from other copies. Hope without that object, crash will not 
appear.
If you think, that osdmaptool is safe for fuse-mount and delete object 
by hand, I can try it. But I'm looking for some tool to do that "online" 
on all PG copies and with checks (that snap object is not referenced 
somewhere).


With regards
Jan Pekar




And my questions are

How can I fix issue with crashing OSD?
How can I safely remove that objects with missing head? Is there any
tool or force-snaptrim on non-existent snapshots? It is prod cluster so
I want to be careful. I have no problems now with data availability.
My last idea is to move RBD's to another pool, but have not enough space
to do that (as I know RBD can only copy not move) so I'm looking for
another clean solution.
And last question - how can I find, what is causing that read_erros and
snap object leftovers?

Should I paste my whole log? It is bigger than allowed post size.
Pasting most important events:

     -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch:
8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20
repair 1 missing, 0 inconsistent objects
     -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1
log_channel(cluster)
log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
     -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1
log_channel(cluster)
log [ERR] : 6.20 repair 3 errors, 2 fixed
   

Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-06 Thread Gregory Farnum
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic 
wrote:

> Hi all,
>
> I have few problems on my cluster, that are maybe linked together and
> now caused OSD down during pg repair.
>
> First few notes about my cluster:
>
> 4 nodes, 15 OSDs installed on Luminous (no upgrade).
> Replicated pools with 1 pool (pool 6) cached by ssd disks.
> I don't detect any hardware failures (disk IO errors, restarts,
> corrupted data etc).
> I'm running RBDs using libvirt on debian wheezy and jessie (stable and
> oldstable).
> I'm snapshotting RBD's using Luminous client on Debian Jessie only.
>

When you say "cached by", do you mean there's a cache pool? Or are you
using bcache or something underneath?


>
> Now problems, from light to severe:
>
> 1)
> Almost every day I notice health some problems after deep scrub
> 1-2 inconsistent PG's with "read_error" on some osd's.
> When I don't repair it, it disappears after few days (? another deep
> scrub). There is no read_error on disks (disk check ok, no errors logged
> in syslog).
>

> 2)
> I noticed on my pool 6 (cached pool), that scrub reports some objects,
> that shouldn't be there:
>
> 2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20(
> v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299
> ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710)
> [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod
> 8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps
> no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2
> (have MIN)
>
> I think, that means orphaned snap object without his head replica. Maybe
> snaptrim left it there? Why? Maybe error during snaptrim? Or
> fstrim/discard removed "head" object (this is I hope nonsense)?
>
> 3)
> I ended with one object (probably snap object), that has only 1 replica
> (out from size 3) and when I try to repair it, my OSD crash with
>
> /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
> recovery_info.ss.clone_snaps.end())
> I guess, that it detected orphaned snap object I noticed at 2) and don't
> repair it, just assterts and stop OSD. Am I right?
>
> I noticed comment "// hmm, should we warn?" on ceph source at that
> assert code. So should someone remove that assert?
>

There's a ticket https://tracker.ceph.com/issues/23030, which links to a
much longer discussion on this mailing list between Sage and Stefan which
discusses this particular assert. I'm not entirely clear from the rest of
your story (and the lng history in that thread) if there are other
potential causes, or if your story might help diagnose it. But I'd start
there since AFAIK it's still a mystery that looks serious but has only a
very small number of incidences. :/
-Greg


>
> And my questions are
>
> How can I fix issue with crashing OSD?
> How can I safely remove that objects with missing head? Is there any
> tool or force-snaptrim on non-existent snapshots? It is prod cluster so
> I want to be careful. I have no problems now with data availability.
> My last idea is to move RBD's to another pool, but have not enough space
> to do that (as I know RBD can only copy not move) so I'm looking for
> another clean solution.
> And last question - how can I find, what is causing that read_erros and
> snap object leftovers?
>
> Should I paste my whole log? It is bigger than allowed post size.
> Pasting most important events:
>
> -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch: 8712
> pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
> n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
> 8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
> mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20
> repair 1 missing, 0 inconsistent objects
> -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster)
> log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
> -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster)
> log [ERR] : 6.20 repair 3 errors, 2 fixed
> -20> 2018-02-27 23:43:07.903480 7f4b3820e700  5
> write_log_and_missing with: dirty_to: 0'0, dirty_from:
> 4294967295'18446744073709551615, writeout_from:
> 4294967295'18446744073709551615, trimmed: , trimmed_dups: ,
> clear_divergent_priors: 0
> -19> 2018-02-27 23:43:07.903604 7f4b3820e700  1 --
> [2a01:430:22a::cef:c011]:6805/514544 -->
> [2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE
> e8712) v1 -- 0x55a4c5459c00 con 0
> -18> 2018-02-27 23:43:07.903651 7f4b3820e700  1 --
> [2a01:430:22a::cef:c011]:6805/514544 -->
> [2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE
> e8712) v1 -- 0x55a4cb6dee00 con 0
> -17> 2018-02-27 23:43:07.903679 7f4b3820e700  1 --
> [2a01:430:22a::cef:c011]:6805/514544 -->
> [2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712
> 6.20( v 8712'771986 

[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-03 Thread Jan Pekař - Imatic

Hi all,

I have few problems on my cluster, that are maybe linked together and 
now caused OSD down during pg repair.


First few notes about my cluster:

4 nodes, 15 OSDs installed on Luminous (no upgrade).
Replicated pools with 1 pool (pool 6) cached by ssd disks.
I don't detect any hardware failures (disk IO errors, restarts, 
corrupted data etc).
I'm running RBDs using libvirt on debian wheezy and jessie (stable and 
oldstable).

I'm snapshotting RBD's using Luminous client on Debian Jessie only.

Now problems, from light to severe:

1)
Almost every day I notice health some problems after deep scrub
1-2 inconsistent PG's with "read_error" on some osd's.
When I don't repair it, it disappears after few days (? another deep 
scrub). There is no read_error on disks (disk check ok, no errors logged 
in syslog).


2)
I noticed on my pool 6 (cached pool), that scrub reports some objects, 
that shouldn't be there:


2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20( 
v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299 
ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710) 
[1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod 
8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps 
no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2 
(have MIN)


I think, that means orphaned snap object without his head replica. Maybe 
snaptrim left it there? Why? Maybe error during snaptrim? Or 
fstrim/discard removed "head" object (this is I hope nonsense)?


3)
I ended with one object (probably snap object), that has only 1 replica 
(out from size 3) and when I try to repair it, my OSD crash with


/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != 
recovery_info.ss.clone_snaps.end())
I guess, that it detected orphaned snap object I noticed at 2) and don't 
repair it, just assterts and stop OSD. Am I right?


I noticed comment "// hmm, should we warn?" on ceph source at that 
assert code. So should someone remove that assert?


And my questions are

How can I fix issue with crashing OSD?
How can I safely remove that objects with missing head? Is there any 
tool or force-snaptrim on non-existent snapshots? It is prod cluster so 
I want to be careful. I have no problems now with data availability.
My last idea is to move RBD's to another pool, but have not enough space 
to do that (as I know RBD can only copy not move) so I'm looking for 
another clean solution.
And last question - how can I find, what is causing that read_erros and 
snap object leftovers?


Should I paste my whole log? It is bigger than allowed post size. 
Pasting most important events:


   -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch: 8712 
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985 
mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20 
repair 1 missing, 0 inconsistent objects
   -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster) 
log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
   -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster) 
log [ERR] : 6.20 repair 3 errors, 2 fixed
   -20> 2018-02-27 23:43:07.903480 7f4b3820e700  5 
write_log_and_missing with: dirty_to: 0'0, dirty_from: 
4294967295'18446744073709551615, writeout_from: 
4294967295'18446744073709551615, trimmed: , trimmed_dups: , 
clear_divergent_priors: 0
   -19> 2018-02-27 23:43:07.903604 7f4b3820e700  1 -- 
[2a01:430:22a::cef:c011]:6805/514544 --> 
[2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE 
e8712) v1 -- 0x55a4c5459c00 con 0
   -18> 2018-02-27 23:43:07.903651 7f4b3820e700  1 -- 
[2a01:430:22a::cef:c011]:6805/514544 --> 
[2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE 
e8712) v1 -- 0x55a4cb6dee00 con 0
   -17> 2018-02-27 23:43:07.903679 7f4b3820e700  1 -- 
[2a01:430:22a::cef:c011]:6805/514544 --> 
[2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712 
6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 
8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cba734a0 con 0
   -16> 2018-02-27 23:43:07.903704 7f4b3820e700  1 -- 
[2a01:430:22a::cef:c011]:6805/514544 --> 
[2a01:430:22a::cef:c041]:6802/3012729 -- pg_info((query:8712 sent:8712 
6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 
8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cccb5680 con 0
   -15> 2018-02-27 23:43:07.903746 7f4b39a11700  5 osd.1 pg_epoch: 8712 
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711 
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod