RE: Help with leaf parent key incorrect

2018-02-26 Thread Paul Jones
> -Original Message-
> From: Anand Jain [mailto:anand.j...@oracle.com]
> Sent: Monday, 26 February 2018 7:27 PM
> To: Paul Jones <p...@pauljones.id.au>; linux-btrfs@vger.kernel.org
> Subject: Re: Help with leaf parent key incorrect
> 
> 
> 
>  > There is one io error in the log below,
> 
> Apparently, that's not a real EIO. We need to fix it.
> But can't be the root cause we are looking for here.
> 
> 
>  > Feb 24 22:41:59 home kernel: BTRFS: error (device dm-6) in
> btrfs_run_delayed_refs:3076: errno=-5 IO failure  > Feb 24 22:41:59 home
> kernel: BTRFS info (device dm-6): forced readonly
> 
> static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
>   struct btrfs_fs_info *fs_info,
>   struct btrfs_delayed_ref_head *head,
>   struct btrfs_delayed_extent_op *extent_op) {
> ::
> 
>  } else {
>  err = -EIO;
>  goto out;
>  }
> 
> 
>  > but other than that I have never had io errors before, or any other
> troubles.
> 
>   Hm. btrfs dev stat shows real disk IO errors.
>   As this FS isn't mountable .. pls try
>btrfs dev stat  > file
>search for 'device stats', there will be one for each disk.
>   Or it reports in the syslog when it happens not necessarily
>   during dedupe.

vm-server ~ # btrfs dev stat /media/storage/
[/dev/mapper/b-storage--b].write_io_errs0
[/dev/mapper/b-storage--b].read_io_errs 0
[/dev/mapper/b-storage--b].flush_io_errs0
[/dev/mapper/b-storage--b].corruption_errs  0
[/dev/mapper/b-storage--b].generation_errs  0
[/dev/mapper/a-storage--a].write_io_errs0
[/dev/mapper/a-storage--a].read_io_errs 0
[/dev/mapper/a-storage--a].flush_io_errs0
[/dev/mapper/a-storage--a].corruption_errs  0
[/dev/mapper/a-storage--a].generation_errs  0
vm-server ~ # btrfs dev stat /
[/dev/sdb1].write_io_errs0
[/dev/sdb1].read_io_errs 0
[/dev/sdb1].flush_io_errs0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sda1].write_io_errs0
[/dev/sda1].read_io_errs 0
[/dev/sda1].flush_io_errs0
[/dev/sda1].corruption_errs  0
[/dev/sda1].generation_errs  0
vm-server ~ # btrfs dev stat /dev/mapper/a-backup--a
ERROR: '/dev/mapper/a-backup--a' is not a mounted btrfs device

I check syslog regularly and I haven't seen any errors on any drives for over a 
year.

> 
>  > One of my other filesystems share the same two discs and it is still fine, 
> so I
> think the hardware is probably ok.
>   Right. I guess that too. A confirmation will be better.
>  > I've copied the beginning of the errors below.
> 
> 
>   At my end finding the root cause of 'parent transid verify failed'
>   during/after dedupe is is kind of fading as disk seems to be had
>   no issues. which I had in mind.
> 
>   Also, there wasn't abrupt power-recycle here? I presume.

No, although now that I think about it I just realised it happened right after 
I upgraded from 4.15.4 to 4.15.5 and I didn't quit bees before rebooting, I let 
the system do it. Not sure if it's relevant or not.
I also just noticed that the kernel has spawned hundreds of kworkers - the 
highest number I can see is 516.

> 
>   It's better to save the output disk1-log and disk2-log as below
>   before further efforts to recovery. Just in case if something
>   pops out.
> 
>btrfs in dump-super -fa disk1 > disk1-log
>btrfs in dump-tree --degraded disk1 >> disk1-log [1]

I applied the patch and started dumping the tree, but I stopped it after about 
10 mins and 9GB.
Because I use zstd and free space tree the recovery tools wouldn't do anything 
in RW mode, so I've decided to just blow it away and restore from a backup.
I made a block level copy of both discs in case I need anything.

Thanks for your help anyway.

Regards,
Paul.


Re: Help with leaf parent key incorrect

2018-02-26 Thread Anand Jain



> There is one io error in the log below,

Apparently, that's not a real EIO. We need to fix it.
But can't be the root cause we are looking for here.


> Feb 24 22:41:59 home kernel: BTRFS: error (device dm-6) in 
btrfs_run_delayed_refs:3076: errno=-5 IO failure

> Feb 24 22:41:59 home kernel: BTRFS info (device dm-6): forced readonly

static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_head *head,
 struct btrfs_delayed_extent_op *extent_op)
{
::

} else {
err = -EIO;
goto out;
}


> but other than that I have never had io errors before, or any other 
troubles.


 Hm. btrfs dev stat shows real disk IO errors.
 As this FS isn't mountable .. pls try
  btrfs dev stat  > file
  search for 'device stats', there will be one for each disk.
 Or it reports in the syslog when it happens not necessarily
 during dedupe.

> One of my other filesystems share the same two discs and it is still 
fine, so I think the hardware is probably ok.

 Right. I guess that too. A confirmation will be better.
> I've copied the beginning of the errors below.


 At my end finding the root cause of 'parent transid verify failed'
 during/after dedupe is is kind of fading as disk seems to be had
 no issues. which I had in mind.

 Also, there wasn't abrupt power-recycle here? I presume.

 It's better to save the output disk1-log and disk2-log as below
 before further efforts to recovery. Just in case if something
 pops out.

  btrfs in dump-super -fa disk1 > disk1-log
  btrfs in dump-tree --degraded disk1 >> disk1-log [1]

  btrfs in dump-super -fa disk2 > disk2-log
  btrfs in dump-tree --degraded disk2 >> disk2-log [1]

 [1]
  --degraded option is in the ML.
  [PATCH] btrfs-progs: dump-tree: add degraded option

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with leaf parent key incorrect

2018-02-25 Thread Anand Jain



On 02/25/2018 06:16 PM, Paul Jones wrote:

Hi all,

I was running dedupe on my filesystem and something went wrong overnight, by 
the time I noticed the fs was readonly.


 Thanks for the report. I have few questions..
  Kind of raid profile used here?
  Dedupe tool that was used?
  Was the fs full before dedupe?
  Were there any IO errors?

Thanks, Anand


When trying to check it this is what I get:
vm-server ~ # btrfs check /dev/mapper/a-backup--a
parent transid verify failed on 2371034071040 wanted 62977 found 62893
parent transid verify failed on 2371034071040 wanted 62977 found 62893
parent transid verify failed on 2371034071040 wanted 62977 found 62893
parent transid verify failed on 2371034071040 wanted 62977 found 62893
Ignoring transid failure
leaf parent key incorrect 2371034071040
ERROR: cannot open file system

Is there a way to fix this? I'm using kernel 4.15.5

This is the last part of dmesg

[  +0.02] BTRFS error (device dm-6): parent transid verify failed on 
2374016368640 wanted 63210 found 63208
[  +0.04] BTRFS error (device dm-6): parent transid verify failed on 
2374016368640 wanted 63210 found 63208
[  +1.107963] BTRFS error (device dm-6): parent transid verify failed on 
2374016368640 wanted 63210 found 63208
[  +0.05] BTRFS error (device dm-6): parent transid verify failed on 
2374016368640 wanted 63210 found 63208
[  +1.473598] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +0.04] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +0.001927] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +0.03] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +0.60] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +0.01] BTRFS error (device dm-6): parent transid verify failed on 
2373996298240 wanted 63210 found 63208
[  +2.676048] verify_parent_transid: 10362 callbacks suppressed
[  +0.02] BTRFS error (device dm-6): parent transid verify failed on 
2373991677952 wanted 63210 found 63208
[  +0.03] BTRFS error (device dm-6): parent transid verify failed on 
2373991677952 wanted 63210 found 63208
[  +0.078432] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.04] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.43] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.01] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.058638] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.04] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.139174] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[  +0.04] BTRFS error (device dm-6): parent transid verify failed on 
2373996232704 wanted 63210 found 63208
[Feb25 20:48] BTRFS info (device dm-6): using free space tree
[  +0.02] BTRFS error (device dm-6): Remounting read-write after error is 
not allowed
[Feb25 20:49] BTRFS error (device dm-6): cleaner transaction attach returned -30
[  +0.238718] BTRFS warning (device dm-6): page private not zero on page 
1596642967552
[  +0.03] BTRFS warning (device dm-6): page private not zero on page 
1596642971648
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596642975744
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596642979840
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643672064
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643676160
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643680256
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643684352
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643704832
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643708928
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643713024
[  +0.02] BTRFS warning (device dm-6): page private not zero on page 
1596643717120
[  +0.28] BTRFS warning (device dm-6): page private not zero on page 
2363051098112
[  +0.01] BTRFS warning (device dm-6): page private not zero on page 
2363051102208
[  +0.01] BTRFS warning (device dm-6): page private not zero on page 
2363051106304
[  +0.01] BTRFS warning (device dm-6): page private not zero on page 
2363051110400
[  +0.01] BTRFS warning (device dm-6): page private not zero on page 
2368056344576
[  +0.00] BTRFS