BTRFS errors, and won't mount

2019-10-04 Thread Patrick Dijkgraaf
Hi guys,

During the night, I started getting the following errors and data was
no longer accessible:

[Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
suppressed
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 17686343003259060482 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 254095834002432 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 2574563607252646368 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 17873260189421384017 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 9965805624054187110 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 15108378087789580224 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 7914705769619568652 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 16752645757091223687 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 9617669583708276649 7808404996096
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
start 3384408928046898608 7808404996096
[Fri Oct  4 08:04:26 2019] btrfs_dev_stat_print_on_error: 159 callbacks
suppressed
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174280, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174281, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174282, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174283, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174284, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174285, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174286, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174287, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174288, flush 0, corrupt 0, gen 0
[Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
errs: wr 7, rd 174289, flush 0, corrupt 0, gen 0

Decided to reboot (for another reason) and tried to mount afterwards:

[Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space caching
is enabled
[Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny extents
[Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
verify failed on 5483020828672 wanted 470169 found 470108
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286352011705795888 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286318771218040112 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286363934109025584 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286229742125204784 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286353230849918256 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286246155688035632 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286321695890425136 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286384677254874416 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286386365024912688 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
start 2286284400752608560 5483020828672
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to recover
balance: -5
[Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree failed

The FS info is shown below. It is a RAID6.

Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
Total devices 16 FS bytes used 36.73TiB
devid1 size 7.28TiB used 2.66TiB path /dev/sde2
devid2 size 3.64TiB used 2.66TiB path /dev/sdf2
devid3 size 3.64TiB used 2.66TiB path /dev/sdg2
devid4 size 7.28TiB used 2.66TiB path /dev/sdh2
devid5 size 3.64TiB used 2.66TiB path /dev/sdi2
devid6 size 7.28TiB used 2.66TiB path /dev/sdj2
devid7 size 3.64TiB used 2.66TiB path /dev/sdk2
devid8 size 3.64TiB used 2.66TiB path /dev/sdl2
devid9 size 7.28TiB used 2.66TiB path /dev/sdm2
devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
devid   12 size 3.64TiB used 2.66TiB path /

Re: BTRFS errors, and won't mount

2019-10-04 Thread Patrick Dijkgraaf
BTRFS restore shows the following output:

#btrfs restore -D /dev/sde2 /mnt/data
This is a dry-run, no files are going to be restored
parent transid verify failed on 4314638041088 wanted 470169 found
470107
parent transid verify failed on 4314638041088 wanted 470169 found
470107
checksum verify failed on 4314638041088 found 4D792F65 wanted A99A92D3
checksum verify failed on 4314638041088 found 8D966120 wanted 4C528768
checksum verify failed on 4314638041088 found 8D966120 wanted 4C528768
bad tree block 4314638041088, bytenr mismatch, want=4314638041088,
have=20210165085184
Error reading subvolume /mnt/data/00-live/nextcloud:
18446744073709551611
Error searching /mnt/data/00-live/nextcloud

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Fri, 2019-10-04 at 08:59 +0200, Patrick Dijkgraaf wrote:
> Hi guys,
> 
> During the night, I started getting the following errors and data was
> no longer accessible:
> 
> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17686343003259060482 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 254095834002432 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 2574563607252646368 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17873260189421384017 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9965805624054187110 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 15108378087789580224 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 7914705769619568652 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 16752645757091223687 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9617669583708276649 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 3384408928046898608 7808404996096
> [Fri Oct  4 08:04:26 2019] btrfs_dev_stat_print_on_error: 159
> callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174280, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174281, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174282, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174283, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174284, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174285, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174286, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174287, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174288, flush 0, corrupt 0, gen 0
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bdev /dev/sdw2
> errs: wr 7, rd 174289, flush 0, corrupt 0, gen 0
> 
> Decided to reboot (for another reason) and tried to mount afterwards:
> 
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> caching
> is enabled
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> extents
> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): fa

Re: BTRFS errors, and won't mount

2019-10-04 Thread Qu Wenruo


On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> Hi guys,
> 
> During the night, I started getting the following errors and data was
> no longer accessible:
> 
> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks
> suppressed
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17686343003259060482 7808404996096

Tree block at address 7808404996096 is completely broken.

All the other messages with 7808404996096 shows btrfs is trying all
possible device combinations to rebuild that tree block, but obviously
all failed.

Not sure why the tree block is corrupted, but it's pretty possible that
RAID5/6 write hole ruined your possibility to recover.

> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 254095834002432 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 2574563607252646368 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 17873260189421384017 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9965805624054187110 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 15108378087789580224 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 7914705769619568652 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 16752645757091223687 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 9617669583708276649 7808404996096
> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree block
> start 3384408928046898608 7808404996096
[...]
> Decided to reboot (for another reason) and tried to mount afterwards:
> 
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space caching
> is enabled
> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny extents
> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to recover
> balance: -5
> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree failed

You're lucky, as the problem is from balance recovery, thus you may have
a chance to mount the RO.
As your fs can progress to btrfs_recover_relocation(), most essential
trees should be OK, thus you have a chance to mount it RO.

> 
> The FS info is shown below. It is a RAID6.
> 
> Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
>   Total devices 16 FS bytes used 36.73TiB

You won't want to salvage data from a near 40T fs...

>   devid1 size 7.28TiB used 2.66TiB path /dev/sde2
>   devid2 size 3.64TiB used 2.66TiB path /dev/sdf2
>   devid3 size 3.64TiB used 2.66TiB path /dev/sdg2
>   devid4 size 7.28TiB used 2.66TiB path /dev/sdh2
>   devid5 size 3.64TiB used 2.66TiB path /dev/sdi2
>   devid6 size 7.28TiB used 2.66TiB path /dev/sdj2
>   devid7 size 3.64TiB used 2.66TiB path /dev/sdk2
>   devid8 size 3.64TiB used 2.66TiB path /dev/sdl2
>   devid9 size 7.28TiB used 2.66TiB path /dev/sdm2
>   devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
>   devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
>   devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
>   devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
>   devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
>   devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
>   devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2

And you won't want to use RAID6 if you're expecting RAID6 to tolerant 2
disks malfunction.

As btrfs RAID5/6 has write-hole problem, any unexpected power loss or
disk error could reduce the error tolerance step by

Re: BTRFS errors, and won't mount

2019-10-04 Thread Patrick Dijkgraaf
Hi Qu,

I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
currenlty working on another solution, but I was not quite there yet...

mount -o ro /dev/sdh2 /mnt/data gives me:

[Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space caching
is enabled
[Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny extents
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
verify failed on 5483020828672 wanted 470169 found 470108
[Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
suppressed
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286352011705795888 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286318771218040112 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286363934109025584 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286229742125204784 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286353230849918256 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286246155688035632 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286321695890425136 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286384677254874416 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286386365024912688 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
start 2286284400752608560 5483020828672
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to recover
balance: -5
[Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed

Do you think there is any chance to recover?

Thanks,
Patrick.


On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
> On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> > Hi guys,
> > 
> > During the night, I started getting the following errors and data
> > was
> > no longer accessible:
> > 
> > [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
> > callbacks
> > suppressed
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 17686343003259060482 7808404996096
> 
> Tree block at address 7808404996096 is completely broken.
> 
> All the other messages with 7808404996096 shows btrfs is trying all
> possible device combinations to rebuild that tree block, but
> obviously
> all failed.
> 
> Not sure why the tree block is corrupted, but it's pretty possible
> that
> RAID5/6 write hole ruined your possibility to recover.
> 
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 254095834002432 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2574563607252646368 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 17873260189421384017 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 9965805624054187110 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 15108378087789580224 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 7914705769619568652 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 16752645757091223687 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 9617669583708276649 7808404996096
> > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > block
> > start 3384408928046898608 7808404996096
> 
> [...]
> > Decided to reboot (for another reason) and tried to mount
> > afterwards:
> > 
> > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> > caching
> > is enabled
> > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> > extents
> > [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent
> > transid
> > verify failed on 5483020828672 wanted 470169 found 470108
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286352011705795888 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286318771218040112 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286363934109025584 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286229742125204784 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286353230849918256 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286246155688035632 5483020828672
> > [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
> > block
> > start 2286321695890425136 

[PATCH 3/4] btrfs: include non-missing as a qualifier for the latest_bdev

2019-10-04 Thread Anand Jain
btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
to point to the bdev with greatest device::generation number.
For a typical-missing device the generation number is zero so
fs_devices::latest_bdev will never point to it.

But if the missing device is due to alienating [1], then
device::generation is not-zero and if it is >= to rest of
device::generation in the list, then fs_devices::latest_bdev
ends up pointing to the missing device and reports the error
like this [2]

[1]
mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
sleep 3 # avoid racing with udev's useless scans if needed
btrfs dev add -f /dev/sdb /btrfs

mount -o degraded /dev/sdc /btrfs1

[2]
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

kernel: BTRFS warning (device sdc): devid 1 uuid 
072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
kernel: BTRFS error (device sdc): failed to read devices
kernel: BTRFS error (device sdc): open_ctree failed

Fix the root of the issue, by checking if the the device is not
missing before it can be a contender for the fs_devices::latest_bdev
title.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 05ade8c7342b..6c42048ec099 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1186,6 +1186,8 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
&device->dev_state)) {
if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
 &device->dev_state) &&
+   !test_bit(BTRFS_DEV_STATE_MISSING,
+&device->dev_state) &&
 (!latest_dev ||
  device->generation > latest_dev->generation)) {
latest_dev = device;
-- 
1.8.3.1



[PATCH 4/4] btrfs: free alien device due to device add

2019-10-04 Thread Anand Jain
When the old device has new fsid through btrfs device add -f  our
fs_devices list has an alien device in one of the fs_devices.

By having an alien device in fs_devices, we have two issues so far

1. missing device is not shows as missing in the userland

Which is due to cracks in the function btrfs_open_one_device() and
hardened by patch
 btrfs: delete identified alien device in open_fs_devices

2. mount of a degraded fs_devices fails

Which is due to cracks in the function btrfs_free_extra_devids() and
hardened by patch
 btrfs: include non-missing as a qualifier for the latest_bdev

Now the reason for both of this issue is that there is an alien (does not
contain the intended fsid) device in the fs_devices.

We know a device can be scanned/added through
btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
or by
ioctl::BTRFS_IOC_ADD_DEV

And device coming through btrfs-control is checked against the all other
devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.

This patch checks if the device add is alienating any other scanned
device and deletes it.

In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
the source of the issue, but still they have their own patch as well
because its the right way to harden the functions and fill the cracks.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6c42048ec099..f8b21e82df2a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2736,6 +2736,19 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
 
/* Update ctime/mtime for libblkid */
update_dev_time(device_path);
+
+   /*
+* Now that we have written a new sb into this device, check all other
+* fs_devices list if it alienates any scanned device.
+*/
+   mutex_lock(&uuid_mutex);
+   /*
+* Ignore the return as we are successfull in the core task - to added
+* the device
+*/
+   btrfs_free_stale_devices(device_path, NULL);
+   mutex_unlock(&uuid_mutex);
+
return ret;
 
 error_sysfs:
-- 
1.8.3.1



[PATCH 2/4] btrfs: delete identified alien device in open_fs_devices

2019-10-04 Thread Anand Jain
In open_fs_devices() we identify alien device but we don't reset its
the device::name. So progs device list does not show the device missing
as shown in the script below.

mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdb
sleep 3 # avoid racing with udev's useless scans if needed
btrfs dev add -f /dev/sdb /btrfs
mount -o degraded /dev/sdc /btrfs1

No missing device:
btrfs fi show -m /btrfs1
Label: none  uuid: 3eb7cd50-4594-458f-9d68-c243cc49954d
Total devices 2 FS bytes used 128.00KiB
devid1 size 12.00GiB used 1.26GiB path /dev/sdc
devid2 size 12.00GiB used 1.26GiB path /dev/sdb

Signed-off-by: Anand Jain 
---
PS: Fundamentally its wrong approach that btrfs-progs deduces the device
missing state in the userland instead of obtaining it from the kernel.
I objected on the patch, but still those patches got merged, this bug is
one of its side effects. Ironically I wrote patches to read device_state
from the kernel using ioctl, procfs and sysfs but didn't get the due
attention till a merger.

 fs/btrfs/volumes.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 06ec3577c6b4..05ade8c7342b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -803,10 +803,10 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
disk_super = (struct btrfs_super_block *)bh->b_data;
devid = btrfs_stack_device_id(&disk_super->dev_item);
if (devid != device->devid)
-   goto error_brelse;
+   goto free_alien;
 
if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
-   goto error_brelse;
+   goto free_alien;
 
device->generation = btrfs_super_generation(disk_super);
 
@@ -845,6 +845,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
 
return 0;
 
+free_alien:
+   fs_devices->num_devices--;
+   list_del(&device->dev_list);
+   btrfs_free_device(device);
+
 error_brelse:
brelse(bh);
blkdev_put(bdev, flags);
@@ -1329,11 +1334,13 @@ static int open_fs_devices(struct btrfs_fs_devices 
*fs_devices,
fmode_t flags, void *holder)
 {
struct btrfs_device *device;
+   struct btrfs_device *tmp_device;
struct btrfs_device *latest_dev = NULL;
 
flags |= FMODE_EXCL;
 
-   list_for_each_entry(device, &fs_devices->devices, dev_list) {
+   list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
+dev_list) {
/* Just open everything we can; ignore failures here */
if (btrfs_open_one_device(fs_devices, device, flags, holder))
continue;
-- 
1.8.3.1



[PATCH 1/4] btrfs: drop useless goto in open_fs_devices

2019-10-04 Thread Anand Jain
There is no need of goto out in open_fs_devices() as there is nothing
special done at %out:. So refactor it.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e176346afed1..06ec3577c6b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1330,7 +1330,6 @@ static int open_fs_devices(struct btrfs_fs_devices 
*fs_devices,
 {
struct btrfs_device *device;
struct btrfs_device *latest_dev = NULL;
-   int ret = 0;
 
flags |= FMODE_EXCL;
 
@@ -1343,15 +1342,14 @@ static int open_fs_devices(struct btrfs_fs_devices 
*fs_devices,
device->generation > latest_dev->generation)
latest_dev = device;
}
-   if (fs_devices->open_devices == 0) {
-   ret = -EINVAL;
-   goto out;
-   }
+   if (fs_devices->open_devices == 0)
+   return -EINVAL;
+
fs_devices->opened = 1;
fs_devices->latest_bdev = latest_dev->bdev;
fs_devices->total_rw_bytes = 0;
-out:
-   return ret;
+
+   return 0;
 }
 
 static int devid_cmp(void *priv, struct list_head *a, struct list_head *b)
-- 
1.8.3.1



[PATCH 0/4] btrfs: fix issues due to alien device

2019-10-04 Thread Anand Jain
Alien device is a device in fs_devices list having a different fsid than
the expected fsid. This patch set fixes issues found due to the same.

Patch1: is a cleanup patch, not related.
Patch2: fixes the missing device not missing in the userland, by
hardening the function btrfs_open_one_device().
Patch3: fixes failing to mount a degraded RAID1 (but it can apply
to RAID5/6/10 as well), by hardening the function
btrfs_free_extra_devids().
Patch4: eliminates the source of the alien device in the fs_devices.

Anand Jain (4):
  btrfs: drop useless goto in open_fs_devices
  btrfs: delete identified alien device in open_fs_devices
  btrfs: include non-missing as a qualifier for the latest_bdev
  btrfs: free alien device due to device add

 fs/btrfs/volumes.c | 40 ++--
 1 file changed, 30 insertions(+), 10 deletions(-)

-- 
1.8.3.1



Re: BTRFS errors, and won't mount

2019-10-04 Thread Qu Wenruo


On 2019/10/4 下午3:41, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
> currenlty working on another solution, but I was not quite there yet...
> 
> mount -o ro /dev/sdh2 /mnt/data gives me:
> 
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space caching
> is enabled
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny extents
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
> suppressed
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to recover
> balance: -5
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed
> 
> Do you think there is any chance to recover?

This means it's the tailing part of root tree get corrupted.

You can comment out the btrfs_recover_balance() call in open_ctree() of
fs/btrfs/disk-io.c, then try mount RO again.

This means some of your subvolumes can't be read out.


Another way to salvage is try using backup roots.

You can get all backup roots bytenr by "btrfs ins dump-super -f".
E.g:

$ btrfs ins dump-super -f /dev/nvme/btrfs | grep backup_tree_root
backup_tree_root:   5259264 gen: 5  level: 0
backup_tree_root:   24641536gen: 6  level: 0
backup_tree_root:   26378240gen: 7  level: 0
backup_tree_root:   5341184 gen: 8  level: 0

Then pass the bytenr into "btrfs check --tree-root " to see
which one could process further.

Thanks,
Qu
> 
> Thanks,
> Patrick.
> 
> 
> On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
>> On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
>>> Hi guys,
>>>
>>> During the night, I started getting the following errors and data
>>> was
>>> no longer accessible:
>>>
>>> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
>>> callbacks
>>> suppressed
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17686343003259060482 7808404996096
>>
>> Tree block at address 7808404996096 is completely broken.
>>
>> All the other messages with 7808404996096 shows btrfs is trying all
>> possible device combinations to rebuild that tree block, but
>> obviously
>> all failed.
>>
>> Not sure why the tree block is corrupted, but it's pretty possible
>> that
>> RAID5/6 write hole ruined your possibility to recover.
>>
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 254095834002432 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2574563607252646368 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17873260189421384017 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9965805624054187110 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 15108378087789580224 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 7914705769619568652 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 16752645757091223687 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9617669583708276649 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 3384408928046898608 7808404996096
>>
>> [...]
>>> Decided to reboot (for another reason) and tried to mount
>>> afterwards:
>>>
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
>>> caching
>>> is enabled
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
>>> extents
>>> [Fri Oct  4 08:

Re: BTRFS errors, and won't mount

2019-10-04 Thread Patrick Dijkgraaf
Decided to upgrade my system to the latest and give it a shot:

# btrfs check /dev/sde2
Opening filesystem to check...
parent transid verify failed on 4314780106752 wanted 470169 found
470107
checksum verify failed on 4314780106752 found 7077566E wanted 9494EBD8
checksum verify failed on 4314780106752 found 489FC179 wanted 73D057EA
checksum verify failed on 4314780106752 found 489FC179 wanted 73D057EA
bad tree block 4314780106752, bytenr mismatch, want=4314780106752,
have=20212047631104
ERROR: cannot open file system

# uname -r
5.3.1-arch1-1-ARCH

# btrfs --version
btrfs-progs v5.2.2

Does that help anything?

-- 
Groet / Cheers,
Patrick Dijkgraaf


On Fri, 2019-10-04 at 09:41 +0200, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
> currenlty working on another solution, but I was not quite there
> yet...
> 
> mount -o ro /dev/sdh2 /mnt/data gives me:
> 
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space
> caching
> is enabled
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny
> extents
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
> suppressed
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to
> recover
> balance: -5
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree
> failed
> 
> Do you think there is any chance to recover?
> 
> Thanks,
> Patrick.
> 
> 
> On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
> > On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
> > > Hi guys,
> > > 
> > > During the night, I started getting the following errors and data
> > > was
> > > no longer accessible:
> > > 
> > > [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
> > > callbacks
> > > suppressed
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 17686343003259060482 7808404996096
> > 
> > Tree block at address 7808404996096 is completely broken.
> > 
> > All the other messages with 7808404996096 shows btrfs is trying all
> > possible device combinations to rebuild that tree block, but
> > obviously
> > all failed.
> > 
> > Not sure why the tree block is corrupted, but it's pretty possible
> > that
> > RAID5/6 write hole ruined your possibility to recover.
> > 
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 254095834002432 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 2574563607252646368 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 17873260189421384017 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 9965805624054187110 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 15108378087789580224 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 7914705769619568652 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 16752645757091223687 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 9617669583708276649 7808404996096
> > > [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
> > > block
> > > start 3384408928046898608 7808404996096
> > 
> > [...]
> > > Decided to reboot (for another reason) and tried to mount
> > > afterwards:
> > > 
> > > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
> > > caching
> > > is enabled
> > > [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
> > > extents
> > > [Fri Oct  4 08:29:44 20

Re: [PATCH 3/4] btrfs: include non-missing as a qualifier for the latest_bdev

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 10:50 ч., Anand Jain wrote:
> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
> to point to the bdev with greatest device::generation number.
> For a typical-missing device the generation number is zero so
> fs_devices::latest_bdev will never point to it.
> 
> But if the missing device is due to alienating [1], then
> device::generation is not-zero and if it is >= to rest of
> device::generation in the list, then fs_devices::latest_bdev
> ends up pointing to the missing device and reports the error
> like this [2]
> 
> [1]
> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
> sleep 3 # avoid racing with udev's useless scans if needed
> btrfs dev add -f /dev/sdb /btrfs

Hm, here I think the correct way is to refuse adding /dev/sdb to an
existing fs if it's detected to be part of a different one. I.e it
should require wipefs to be done.

> 
> mount -o degraded /dev/sdc /btrfs1
> 
> [2]
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>missing codepage or helper program, or other error
> 
>In some cases useful info is found in syslog - try
>dmesg | tail or so.
> 
> kernel: BTRFS warning (device sdc): devid 1 uuid 
> 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
> kernel: BTRFS error (device sdc): failed to read devices
> kernel: BTRFS error (device sdc): open_ctree failed
> 
> Fix the root of the issue, by checking if the the device is not
> missing before it can be a contender for the fs_devices::latest_bdev
> title.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 05ade8c7342b..6c42048ec099 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1186,6 +1186,8 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
> *fs_devices, int step)
>   &device->dev_state)) {
>   if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
>&device->dev_state) &&
> + !test_bit(BTRFS_DEV_STATE_MISSING,
> +  &device->dev_state) &&
>(!latest_dev ||
> device->generation > latest_dev->generation)) {
>   latest_dev = device;
> 


Re: [PATCH 1/4] btrfs: drop useless goto in open_fs_devices

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 10:50 ч., Anand Jain wrote:
> There is no need of goto out in open_fs_devices() as there is nothing
> special done at %out:. So refactor it.
> 
> Signed-off-by: Anand Jain 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/volumes.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e176346afed1..06ec3577c6b4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1330,7 +1330,6 @@ static int open_fs_devices(struct btrfs_fs_devices 
> *fs_devices,
>  {
>   struct btrfs_device *device;
>   struct btrfs_device *latest_dev = NULL;
> - int ret = 0;
>  
>   flags |= FMODE_EXCL;
>  
> @@ -1343,15 +1342,14 @@ static int open_fs_devices(struct btrfs_fs_devices 
> *fs_devices,
>   device->generation > latest_dev->generation)
>   latest_dev = device;
>   }
> - if (fs_devices->open_devices == 0) {
> - ret = -EINVAL;
> - goto out;
> - }
> + if (fs_devices->open_devices == 0)
> + return -EINVAL;
> +
>   fs_devices->opened = 1;
>   fs_devices->latest_bdev = latest_dev->bdev;
>   fs_devices->total_rw_bytes = 0;
> -out:
> - return ret;
> +
> + return 0;
>  }
>  
>  static int devid_cmp(void *priv, struct list_head *a, struct list_head *b)
> 


Re: [PATCH 2/4] btrfs: delete identified alien device in open_fs_devices

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 10:50 ч., Anand Jain wrote:
> In open_fs_devices() we identify alien device but we don't reset its
> the device::name. So progs device list does not show the device missing
> as shown in the script below.
> 
> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdb
> sleep 3 # avoid racing with udev's useless scans if needed
> btrfs dev add -f /dev/sdb /btrfs
> mount -o degraded /dev/sdc /btrfs1
> 
> No missing device:
> btrfs fi show -m /btrfs1
> Label: none  uuid: 3eb7cd50-4594-458f-9d68-c243cc49954d
>   Total devices 2 FS bytes used 128.00KiB
>   devid1 size 12.00GiB used 1.26GiB path /dev/sdc
>   devid2 size 12.00GiB used 1.26GiB path /dev/sdb
> 
> Signed-off-by: Anand Jain 
> ---
> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
> missing state in the userland instead of obtaining it from the kernel.
> I objected on the patch, but still those patches got merged, this bug is
> one of its side effects. Ironically I wrote patches to read device_state
> from the kernel using ioctl, procfs and sysfs but didn't get the due
> attention till a merger.
> 
>  fs/btrfs/volumes.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 06ec3577c6b4..05ade8c7342b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -803,10 +803,10 @@ static int btrfs_open_one_device(struct 
> btrfs_fs_devices *fs_devices,
>   disk_super = (struct btrfs_super_block *)bh->b_data;
>   devid = btrfs_stack_device_id(&disk_super->dev_item);
>   if (devid != device->devid)
> - goto error_brelse;
> + goto free_alien;
>  
>   if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
> - goto error_brelse;
> + goto free_alien;
>  

Imo a better approach is to return a particular error code and do the
deletion in open_fs_devices. Otherwise it's not apparent why you use
list_for_each_entry_safe in one function to delete something in a
different one (whose name by the way doesn't suggest a deletion is going
on). Looking at the error I think enodev/enxio is appropriate.

>   device->generation = btrfs_super_generation(disk_super);
>  
> @@ -845,6 +845,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
> *fs_devices,
>  
>   return 0;
>  
> +free_alien:
> + fs_devices->num_devices--;
> + list_del(&device->dev_list);
> + btrfs_free_device(device);
> +
>  error_brelse:
>   brelse(bh);
>   blkdev_put(bdev, flags);
> @@ -1329,11 +1334,13 @@ static int open_fs_devices(struct btrfs_fs_devices 
> *fs_devices,
>   fmode_t flags, void *holder)
>  {
>   struct btrfs_device *device;
> + struct btrfs_device *tmp_device;
>   struct btrfs_device *latest_dev = NULL;
>  
>   flags |= FMODE_EXCL;
>  
> - list_for_each_entry(device, &fs_devices->devices, dev_list) {
> + list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
> +  dev_list) {
>   /* Just open everything we can; ignore failures here */
>   if (btrfs_open_one_device(fs_devices, device, flags, holder))
>   continue;
> 


Re: [PATCH 0/4] btrfs: fix issues due to alien device

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 10:49 ч., Anand Jain wrote:
> Alien device is a device in fs_devices list having a different fsid than
> the expected fsid. This patch set fixes issues found due to the same.

Are you going to submit an fstests patch for this bug ?

> 
> Patch1: is a cleanup patch, not related.
> Patch2: fixes the missing device not missing in the userland, by
> hardening the function btrfs_open_one_device().
> Patch3: fixes failing to mount a degraded RAID1 (but it can apply
> to RAID5/6/10 as well), by hardening the function
>   btrfs_free_extra_devids().
> Patch4: eliminates the source of the alien device in the fs_devices.
> 
> Anand Jain (4):
>   btrfs: drop useless goto in open_fs_devices
>   btrfs: delete identified alien device in open_fs_devices
>   btrfs: include non-missing as a qualifier for the latest_bdev
>   btrfs: free alien device due to device add
> 
>  fs/btrfs/volumes.c | 40 ++--
>  1 file changed, 30 insertions(+), 10 deletions(-)
> 


Re: [PATCH] btrfs: drop unused parameter is_new from btrfs_iget

2019-10-04 Thread Anand Jain

On 10/4/19 1:09 AM, David Sterba wrote:

The parameter is now always set to NULL and could be dropped. The last
user was get_default_root but that got reworked in 05dbe6837b60 ("Btrfs:
unify subvol= and subvolid= mounting") and the parameter became unused.

Signed-off-by: David Sterba 


Reviewed-by: Anand Jain 


Re: [Not TLS] Re: [PATCH 3/4] btrfs: include non-missing as a qualifier for the latest_bdev

2019-10-04 Thread Graham Cobb
On 04/10/2019 09:11, Nikolay Borisov wrote:
> 
> 
> On 4.10.19 г. 10:50 ч., Anand Jain wrote:
>> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
>> to point to the bdev with greatest device::generation number.
>> For a typical-missing device the generation number is zero so
>> fs_devices::latest_bdev will never point to it.
>>
>> But if the missing device is due to alienating [1], then
>> device::generation is not-zero and if it is >= to rest of
>> device::generation in the list, then fs_devices::latest_bdev
>> ends up pointing to the missing device and reports the error
>> like this [2]
>>
>> [1]
>> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
>> sleep 3 # avoid racing with udev's useless scans if needed
>> btrfs dev add -f /dev/sdb /btrfs
> 
> Hm, here I think the correct way is to refuse adding /dev/sdb to an
> existing fs if it's detected to be part of a different one. I.e it
> should require wipefs to be done.

I disagree. -f means "force overwrite of existing filesystem on the
given disk(s)". It shouldn't be any different whether the existing fs is
btrfs or something else.

Graham


Re: [Not TLS] Re: [PATCH 3/4] btrfs: include non-missing as a qualifier for the latest_bdev

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 12:08 ч., Graham Cobb wrote:
> On 04/10/2019 09:11, Nikolay Borisov wrote:
>>
>>
>> On 4.10.19 г. 10:50 ч., Anand Jain wrote:
>>> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
>>> to point to the bdev with greatest device::generation number.
>>> For a typical-missing device the generation number is zero so
>>> fs_devices::latest_bdev will never point to it.
>>>
>>> But if the missing device is due to alienating [1], then
>>> device::generation is not-zero and if it is >= to rest of
>>> device::generation in the list, then fs_devices::latest_bdev
>>> ends up pointing to the missing device and reports the error
>>> like this [2]
>>>
>>> [1]
>>> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
>>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
>>> sleep 3 # avoid racing with udev's useless scans if needed
>>> btrfs dev add -f /dev/sdb /btrfs
>>
>> Hm, here I think the correct way is to refuse adding /dev/sdb to an
>> existing fs if it's detected to be part of a different one. I.e it
>> should require wipefs to be done.
> 
> I disagree. -f means "force overwrite of existing filesystem on the
> given disk(s)". It shouldn't be any different whether the existing fs is
> btrfs or something else.

You are right, looking at btrfs device add implementation though it
seems that wipefs is being done on the device to be added:

cmd_device_add
 btrfs_prepare_device
  btrfs_wipe_existing_sb

So if the device to be added is formatted and then it goes through
btrfs_init_new_device which commits the transaction which in turns
rewrites the sb then why is the error happening in the first place?


> 
> Graham
> 


[PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees

2019-10-04 Thread Qu Wenruo
There is a false alerts of tree-checker when running fstests/btrfs/063
in a loop.

The bug is caused by commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect
missing INODE_ITEM").
For the full error analyse, please check the first patch.

The first patch will give it a quick fix, so that it can be addressed in
v5.4 release cycle.

The 2nd patch is a more proper patch, with refactor to reduce duplicated
code and add the check to INODE_REF item.
But it's pretty large (+72, -41), not sure if it's suitbale for late
-rc.

Also current write-time tree checker error message is too silent, can't
be caught by fstests nor a quick glance of dmesg. And it doesn't contain
enough info to debug.

So to enhance the error message, and make it more noisy, the 3rd patch
will enhance the error message.

Qu Wenruo (3):
  btrfs: tree-checker: Fix false alerts on log trees
  btrfs: tree-checker: Refactor prev_key check for ino into a function
  btrfs: Enhance the error outputting for write time tree checker

 fs/btrfs/disk-io.c  |   2 +
 fs/btrfs/tree-checker.c | 111 ++--
 2 files changed, 74 insertions(+), 39 deletions(-)

-- 
2.23.0



[PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function

2019-10-04 Thread Qu Wenruo
Refactor the check for prev_key->objectid of the following key types
into one function, check_prev_ino():
- EXTENT_DATA
- INODE_REF
- DIR_INDEX
- DIR_ITEM
- XATTR_ITEM

Despite the refactor, also add the check of prev_key for INODE_REF.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 113 +---
 1 file changed, 72 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 5e34cd5e3e2e..73678393340a 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -125,6 +125,74 @@ static u64 file_extent_end(struct extent_buffer *leaf,
return end;
 }
 
+/*
+ * Customized reported for dir_item, only important new info is key->objectid,
+ * which represents inode number
+ */
+__printf(3, 4)
+__cold
+static void dir_item_err(const struct extent_buffer *eb, int slot,
+const char *fmt, ...)
+{
+   const struct btrfs_fs_info *fs_info = eb->fs_info;
+   struct btrfs_key key;
+   struct va_format vaf;
+   va_list args;
+
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   va_start(args, fmt);
+
+   vaf.fmt = fmt;
+   vaf.va = &args;
+
+   btrfs_crit(fs_info,
+   "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV",
+   btrfs_header_level(eb) == 0 ? "leaf" : "node",
+   btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
+   key.objectid, &vaf);
+   va_end(args);
+}
+
+/*
+ * This functions checks prev_key->objectid, to ensure current key and prev_key
+ * shares the same objectid as ino.
+ *
+ * This is to detect missing INODE_ITEM in subvolume trees.
+ *
+ * Return true if everything is OK or we don't need to check.
+ * Return false if anything is wrong.
+ */
+static bool check_prev_ino(struct extent_buffer *leaf,
+  struct btrfs_key *key, int slot,
+  struct btrfs_key *prev_key)
+{
+   /* No prev key, skip check */
+   if (slot == 0)
+   return true;
+
+   /* Only these key->types needs to be checked */
+   ASSERT(key->type == BTRFS_XATTR_ITEM_KEY ||
+  key->type == BTRFS_INODE_REF_KEY ||
+  key->type == BTRFS_DIR_INDEX_KEY ||
+  key->type == BTRFS_DIR_ITEM_KEY ||
+  key->type == BTRFS_EXTENT_DATA_KEY);
+
+   /*
+* Only subvolume trees along with their reloc trees needs this check.
+* Things like log tree doesn't follow this ino requirement.
+*/
+   if (!is_fstree(btrfs_header_owner(leaf)))
+   return true;
+
+   if (key->objectid == prev_key->objectid)
+   return true;
+
+   /* Error found */
+   dir_item_err(leaf, slot,
+   "invalid previous key objectid, have %llu expect %llu",
+   prev_key->objectid, key->objectid);
+   return false;
+}
 static int check_extent_data_item(struct extent_buffer *leaf,
  struct btrfs_key *key, int slot,
  struct btrfs_key *prev_key)
@@ -148,13 +216,8 @@ static int check_extent_data_item(struct extent_buffer 
*leaf,
 * But if objectids mismatch, it means we have a missing
 * INODE_ITEM.
 */
-   if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
-   prev_key->objectid != key->objectid) {
-   file_extent_err(leaf, slot,
-   "invalid previous key objectid, have %llu expect %llu",
-   prev_key->objectid, key->objectid);
+   if (!check_prev_ino(leaf, key, slot, prev_key))
return -EUCLEAN;
-   }
 
fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
@@ -285,34 +348,6 @@ static int check_csum_item(struct extent_buffer *leaf, 
struct btrfs_key *key,
return 0;
 }
 
-/*
- * Customized reported for dir_item, only important new info is key->objectid,
- * which represents inode number
- */
-__printf(3, 4)
-__cold
-static void dir_item_err(const struct extent_buffer *eb, int slot,
-const char *fmt, ...)
-{
-   const struct btrfs_fs_info *fs_info = eb->fs_info;
-   struct btrfs_key key;
-   struct va_format vaf;
-   va_list args;
-
-   btrfs_item_key_to_cpu(eb, &key, slot);
-   va_start(args, fmt);
-
-   vaf.fmt = fmt;
-   vaf.va = &args;
-
-   btrfs_crit(fs_info,
-   "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV",
-   btrfs_header_level(eb) == 0 ? "leaf" : "node",
-   btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
-   key.objectid, &vaf);
-   va_end(args);
-}
-
 static int check_dir_item(struct extent_buffer *leaf,
  struct btrfs_key *key, struct btrfs_key *prev_key,
  int slot)
@@ -322,14 +357,8 @@ static int check_dir_item(struct extent_buffer *leaf,
u32 item_size = btrfs_item_size_nr(leaf, slot);

[PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker

2019-10-04 Thread Qu Wenruo
Unlike read time tree checker error, write time error can't be inspected
by "btrfs ins dump-tree", so we need extra info to determine what's
going wrong.

The patch will add the following output for write time tree checker
error:
- The content of the offending tree block
  To help determining if it's a false alert.

- Kernel WARN_ON() for debug build
  This is helpful for us to detect unexpected write time tree checker
  error, especially fstests could catch the dmesg.
  Since the WARN_ON() is only triggered for write time tree checker,
  test cases utilizing dm-error won't trigger this WARN_ON(), thus no
  extra noise.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 16dc60b4966d..a0925b4e00af 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -545,9 +545,11 @@ static int csum_dirty_buffer(struct btrfs_fs_info 
*fs_info, struct page *page)
ret = btrfs_check_leaf_full(eb);
 
if (ret < 0) {
+   btrfs_print_tree(eb, 0);
btrfs_err(fs_info,
"block=%llu write time tree block corruption detected",
  eb->start);
+   WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
return ret;
}
write_extent_buffer(eb, result, 0, csum_size);
-- 
2.23.0



[PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees

2019-10-04 Thread Qu Wenruo
[BUG]
When running btrfs/063 in a loop, we got the following random write time
tree checker error:

  BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 
block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, 
have 305 expect 307
  BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 
owner 18446744073709551610
  BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) 
lock_owner 0 current 26176
  item 0 key (305 1 0) itemoff 16123 itemsize 160
  inode generation 0 size 0 mode 40777
  item 1 key (305 12 257) itemoff 16111 itemsize 12
  item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
  extent data disk bytenr 0 nr 0
  extent data offset 0 nr 614400 ram 671744
  item 3 key (307 108 614400) itemoff 16005 itemsize 53
  extent data disk bytenr 195342336 nr 57344
  extent data offset 0 nr 53248 ram 57344
  item 4 key (307 108 667648) itemoff 15952 itemsize 53
  extent data disk bytenr 194048000 nr 4096
  extent data offset 0 nr 4096 ram 4096
  [...]
  BTRFS error (device dm-4): block=33095680 write time tree block corruption 
detected
  BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO 
failure (Error while writing out transaction)
  BTRFS info (device dm-4): forced readonly
  BTRFS warning (device dm-4): Skipping commit of aborted transaction.
  BTRFS info (device dm-4): use zlib compression, level 3
  BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure

[CAUSE]
Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
should have previous key with the same objectid as ino.

But it's only true for fs trees. For log-tree, we can get above log tree
block where an EXTENT_DATA item has no previous key with the same ino.
As log tree only records modified items, it won't record unmodified
items like INODE_ITEM.

So this triggers write time tree check warning.

[FIX]
As a quick fix, check header owner to skip the previous key if it's not
fs tree (log tree doesn't count as fs tree).

This fix is only to be merged as a quick fix.
There will be a more comprehensive fix to refactor the common check into
one function.

Reported-by: David Sterba 
Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index b8f82d9be9f0..5e34cd5e3e2e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer 
*leaf,
 * But if objectids mismatch, it means we have a missing
 * INODE_ITEM.
 */
-   if (slot > 0 && prev_key->objectid != key->objectid) {
+   if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
+   prev_key->objectid != key->objectid) {
file_extent_err(leaf, slot,
"invalid previous key objectid, have %llu expect %llu",
prev_key->objectid, key->objectid);
@@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
u32 cur = 0;
 
/* Same check as in check_extent_data_item() */
-   if (slot > 0 && prev_key->objectid != key->objectid) {
+   if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
+   prev_key->objectid != key->objectid) {
dir_item_err(leaf, slot,
"invalid previous key objectid, have %llu expect %llu",
 prev_key->objectid, key->objectid);
-- 
2.23.0



Re: [bug] strange systemd-udevd scan for btrfs device

2019-10-04 Thread Anand Jain



use appropriate ml for systemd bugs
-systemd-b...@lists.freedesktop.org
+systemd-de...@lists.freedesktop.org

inline below..

On 10/2/19 9:33 PM, Andrei Borzenkov wrote:

On Wed, Oct 2, 2019 at 1:19 PM Anand Jain  wrote:




On 10/2/19 6:02 PM, Anand Jain wrote:



On 10/2/19 5:55 PM, Andrei Borzenkov wrote:

On Wed, Oct 2, 2019 at 12:27 PM Anand Jain  wrote:




I am looking for systemd part of the answers to understand what
is triggering a strange problem. Any help is appreciated.

After mkfs.btrfs creates btrfs filesystem it scans to register the
device in btrfs.ko.
And we have 'btrfs dev scan --forget' command to undo the process of
register.

But the problem is - immediately after 'btrfs dev scan --forget' the
systemd-udevd scans the device again, defeating the purpose of the
forget as show below (scanned-by).

mkfs.btrfs -fq /dev/sdc && btrfs dev scan --forget /dev/sdc

---
kernel: BTRFS: device fsid 8ea20bb2-888a-4b3b-9f4c-1db9117dc219 devid 1
transid 5 /dev/sdc scanned-by mkfs.btrfs
kernel: BTRFS: device fsid 8ea20bb2-888a-4b3b-9f4c-1db9117dc219 devid 1
transid 5 /dev/sdc scanned-by systemd-udevd
---

And the problem does _not_ happen if there is a sleep of 3 secs after
the mkfs.btrfs, as below.

mkfs.btrfs -fq /dev/sdc && sleep 3 && btrfs dev scan --forget /dev/sdc

--
kernel: BTRFS: device fsid 601bd01a-5e6b-488a-b020-0e7556c83087 devid 1
transid 5 /dev/sdc scanned-by mkfs.btrfs
--

Any idea what happening from the systemd point of view?



run

udevadm monitor -ku

in both cases and post results. My educated guess is that udev scan is
in response to mkfs and you have unfortunate race condition here.




Looks like what is happening is ..

   . Change in fsid (by mkfs.btrfs) notifies and triggers systemd
   . Systemd checks if the device is ready by using
 BTRFS_IOC_DEVICES_READY.
   . However BTRFS_IOC_DEVICES_READY from systemd races with forget
 command and the result depends on who wins the race.




I get this for the command mkfs.btrfs: (for /dev/sdc)

KERNEL[185263.634507] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)
UDEV  [185263.637870] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)
KERNEL[185263.640572] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)
KERNEL[185263.641552] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)
UDEV  [185263.644337] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)
UDEV  [185263.647656] change
/devices/pci:5d/:5d:02.0/:65:00.0/host0/target0:2:2/0:2:2:0/block/sdc
(block)

And no notification for mkfs.btrfs -fq /dev/sdb

Looks like there is some rules set. But I don't find any rules
in /etc/udev/rules.d specific to /dev/sdb can it be set somewhere
else?




Default rules are in /usr/lib/udev, but rules can only block udev
events (if at all), they have no impact on what kernel does. I guess
util-linux would be a better place to ask about mkfs behavior.





In the event of a mkfs.btrfs on a device the udev action to scan
the device into the btrfs kernel is redundant !. because mkfs.btrfs
is already doing it.

/usr/lib/udev has quite a lot of rules.
Now how do I find why udev has an rule for sdc and not for sdb.
There isn't any obvious things I found in /usr/lib/udev for sdc/sdb.

Thanks, Anand


Re: BTRFS Raid5 error during Scrub.

2019-10-04 Thread Robert Krig
Thank you all for your help so far.

I'm doing backups at the moment. My other Server is a ZFS system. I
think what I'm going to do, is to have this system, that's currently
BTRFS RAID5 migrated to using ZFS and once that's done, migrate my
backup system to BTRFS RAID5. That way I can still experiment with
BTRFS Raid5, but starting from scratch is not so problematic, since
it's a backup system and not being actively used.

I'm still going to try the steps you mentioned with btrfs check --
repair and see what that's going to spit out.


Am Donnerstag, den 03.10.2019, 14:18 -0600 schrieb Chris Murphy:
> On Thu, Oct 3, 2019 at 6:18 AM Robert Krig
>  wrote:
> > By the way, how serious is the error I've encountered?
> > I've run a second scrub in the meantime, it aborted when it came
> > close
> > to the end, just like the first time.
> > If the files that are corrupt have been deleted is this error going
> > to
> > go away?
> 
> Maybe.
> 
> 
> > > > > Opening filesystem to check...
> > > > > Checking filesystem on /dev/sda
> > > > > UUID: f7573191-664f-4540-a830-71ad654d9301
> > > > > [1/7] checking root items  (0:01:17
> > > > > elapsed,
> > > > > 5138533 items checked)
> > > > > parent transid verify failed on 48781340082176 wanted 109181
> > > > > found
> > > > > 109008items checked)
> > > > > parent transid verify failed on 48781340082176 wanted 109181
> > > > > found
> > > > > 109008
> > > > > parent transid verify failed on 48781340082176 wanted 109181
> > > > > found
> > > > > 109008
> 
> These look suspiciously like the 5.2 regression:
> https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdman...@kernel.org/T/#u
> 
> You should either revert to a 5.1 kernel, or use 5.2.15+.
> 
> As far as I'm aware it's not possible to fix this kind of corruption,
> so I suggest refreshing your backups while you can still mount this
> file system, and prepare to create it from scratch.
> 
> 
> > > > > Ignoring transid failure
> > > > > leaf parent key incorrect 48781340082176
> > > > > bad block 48781340082176
> > > > > [2/7] checking extents (0:03:22
> > > > > elapsed,
> > > > > 1143429 items checked)
> > > > > ERROR: errors found in extent allocation tree or chunk
> > > > > allocation
> 
> That's usually not a good sign.
> 
> 
> 
> > > > > [3/7] checking free space cache(0:05:10
> > > > > elapsed,
> > > > > 7236
> > > > > items checked)
> > > > > parent transid verify failed on 48781340082176 wanted 109181
> > > > > found
> > > > > 109008ems checked)
> > > > > Ignoring transid failure
> > > > > root 15197 inode 81781 errors 1000, some csum missing48
> > > > > elapsed,
> 
> That's inode 81781 in the subvolume with ID 15197. I'm not sure what
> error 1000 is, but btrfs check is a bit fussy when it enounters files
> that are marked +C (nocow) but have been compressed. This used to be
> possible with older kernels when nocow files were defragmented while
> the file system is mounted with compression enabled. If that sounds
> like your use case, that might be what's going on here, and it's
> actually a benign message. It's normal for nocow files to be missing
> csums. To confirm you can use 'find /pathtosubvol/ -inum 81781' to
> find the file, then lsattr it and see if +C is set.
> 
> You have a few options but the first thing is to refresh backups and
> prepare to lose this file system:
> 
> a. bail now, and just create a new Btrfs from scratch and restore
> from backup
> b. try 'btrfs check --repair' to see if the transid problems are
> fixed; if not
> c. try 'btrfs check --repair --init-extent-tree' there's a good
> chance
> this fails and makes things worse but probably faster to try than
> restoring from backup
> 



Re: [PATCH 1/3] btrfs: add __cold attribute to more functions

2019-10-04 Thread David Sterba
On Wed, Oct 02, 2019 at 01:52:16PM +0300, Nikolay Borisov wrote:
> On 1.10.19 г. 20:57 ч., David Sterba wrote:
> > The attribute can mark functions supposed to be called rarely if at all
> > and the text can be moved to sections far from the other code. The
> > attribute has been added to several functions already, this patch is
> > based on hints given by gcc -Wsuggest-attribute=cold.
> > 
> > The net effect of this patch is decrease of btrfs.ko by 1000-1300,
> > depending on the config options.
> > 
> > Signed-off-by: David Sterba 
> > ---
> >  fs/btrfs/disk-io.c | 4 ++--
> >  fs/btrfs/disk-io.h | 4 ++--
> >  fs/btrfs/super.c   | 2 +-
> >  fs/btrfs/volumes.c | 2 +-
> >  4 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index e335fa4c4d1d..04d86e7b 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -2583,7 +2583,7 @@ static int btrfs_validate_write_super(struct 
> > btrfs_fs_info *fs_info,
> > return ret;
> >  }
> >  
> > -int open_ctree(struct super_block *sb,
> > +int __cold open_ctree(struct super_block *sb,
> 
> According to the documentation
> (https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html) of gcc
> attributes are placed in the declaration of a function (3rd paragraph):
> 
> 
> "Function attributes are introduced by the __attribute__ keyword in the
> declaration of a function, ..."

I'd rather keep the attributes to declaration and definition so it's in
sync and looks consistent without further questions.

> > +void __cold btrfs_printk(const struct btrfs_fs_info *fs_info, const char 
> > *fmt, ...)
> >  {
> 
> Is printk really cold though? It's used in the various print helpers,
> even for info level print which might not be rare once the fs is
> mounted? What's a possible negative effect of this size optimisation -
> runtime cost?

Messages are printed rarely under normal circumstances, ie. a message
once in a few minutes maybe. The penalty of loading the code into caches
is IMO justified here, there's not much chance of reusing the cache hot
code. And that it also involves all other helpers is kind of
intentional.

A very frequent pattern:

x = find_some_structure();
if (!x) {
btrfs_err("there is a problem");
ret = -EUCLEAN;
goto out_error;
}

In this case the cold attribute is another hint to the compiler that the
whole code block following the 'if' is cold and can be rearranged out of
the way.

If you have counter examples for printk-related functions that are on a
hot path in btrfs, I'd like to hear about them. To move them out of the
that hot path :)


Re: [PATCH 2/3] btrfs: add const function attribute

2019-10-04 Thread David Sterba
On Wed, Oct 02, 2019 at 02:07:50PM +0300, Nikolay Borisov wrote:
> > +struct list_head * __attribute_const__ btrfs_get_fs_uuids(void)
> 
> I'm not entirely sure this function is cons. According to the manual:
> 
> Calls to functions whose return value is not affected by changes to the
> observable state of the program and that have no observable effects on
> such state other than to return a value may lend themselves to
> optimizations such as common subexpression elimination.
> 
> The const attribute prohibits a function from reading objects that
> affect its return value between successive invocations. However,
> functions declared with the attribute can safely read objects that do
> not change their return value, such as non-volatile constants.
> 
> 
> My doubt stems from the fact this function actually references outside
> memory, namely gets the ptr to fs_uuids. There is a specific remark not
> to use const when the function takes a ptr argument but it doesn't say
> anything when getting a ptr from a global var.

The fs_uuids are a non-volatile constant. It does not change accross the
lifetime of the module, so all code executed will always see the same
value.


Re: Intro to fstests environment?

2019-10-04 Thread Austin S. Hemmelgarn

On 2019-10-03 13:51, Graham Cobb wrote:

Hi,

I seem to have another case where scrub gets confused when it is
cancelled and restarted many times (or, maybe, it is my error or
something). I will look into it further but, instead of just hacking
away at my script to work out what is going on, I thought I might try to
create a regression test for it this time.

I have read the kdave/fstests READMEs and the wiki. Is there any other
documentation or advice I should read? Of course, I will look at
existing test scripts as well.

I don't suppose anyone has a convenient VM image or similar which I can
use as a starting point?


Theodore T'so has a project on GitHub called xfstests-bld [1] to 
automate setup of an environment for GCE or Android or a QEMU VM to run 
xfstests.  That's probably going to be your best bet as a starting point 
as it not only has a bunch of the infrastructure needed to build such 
things, but also has good documentation.



[1] https://github.com/tytso/xfstests-bld


Re: BTRFS errors, and won't mount

2019-10-04 Thread Patrick Dijkgraaf
Hi Qu,

Tried it with backup roots and it won't mount. I have a backup of a
week ago and I'll accept the dataloss.

I'll rebuild (this time without BTRFS RAID5/6) and restore.

Thanks for your help!

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Fri, 2019-10-04 at 09:59 +0200, Patrick Dijkgraaf wrote:
> Decided to upgrade my system to the latest and give it a shot:
> 
> # btrfs check /dev/sde2
> Opening filesystem to check...
> parent transid verify failed on 4314780106752 wanted 470169 found
> 470107
> checksum verify failed on 4314780106752 found 7077566E wanted
> 9494EBD8
> checksum verify failed on 4314780106752 found 489FC179 wanted
> 73D057EA
> checksum verify failed on 4314780106752 found 489FC179 wanted
> 73D057EA
> bad tree block 4314780106752, bytenr mismatch, want=4314780106752,
> have=20212047631104
> ERROR: cannot open file system
> 
> # uname -r
> 5.3.1-arch1-1-ARCH
> 
> # btrfs --version
> btrfs-progs v5.2.2
> 
> Does that help anything?
> 
> 



Re: while (1) in btrfs_relocate_block_group didn't end

2019-10-04 Thread Cebtenzzre
On Sun, 2019-09-29 at 07:37 +0800, Qu Wenruo wrote:
> 
> On 2019/9/29 上午2:36, Cebtenzzre wrote:
> > On Mon, 2019-09-16 at 17:20 -0400, Cebtenzzre wrote:
> > > On Sat, 2019-09-14 at 17:36 -0400, Cebtenzzre wrote:
> > > > Hi,
> > > > 
> > > > I started a balance of one block group, and I saw this in dmesg:
> > > > 
> > > > BTRFS info (device sdi1): balance: start 
> > > > -dvrange=2236714319872..2236714319873
> > > > BTRFS info (device sdi1): relocating block group 2236714319872 flags 
> > > > data|raid0
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > 
> > > > [...]
> > > > 
> > > > I am using Arch Linux with kernel version 5.2.14-arch2, and I specified
> > > > "slub_debug=P,kmalloc-2k" in the kernel cmdline to detect and protect
> > > > against a use-after-free that I found when I had KASAN enabled. Would
> > > > that kernel parameter result in a silent retry if it hit the use-after-
> > > > free?
> > > 
> > > Please disregard the quoted message. This behavior does appear to be a
> > > result of using the slub_debug option instead of KASAN. It is not
> > > directly caused by BTRFS.
> > 
> > Actually, I just reproduced this behavior without slub_debug in the
> > cmdline, on Linux 5.3.0 with "[PATCH] btrfs: relocation: Fix KASAN
> > report about use-after-free due to dead reloc tree cleanup race" (
> > https://patchwork.kernel.org/patch/11153729/) applied.
> > 
> > So, this issue is still relevant and possible to trigger, though under
> > different conditions (different volume, kernel version, and cmdline).
> > 
> 
> That patch is not to solve the while loop problem, so we still need some
> extra info for this problem.
> 
> Is the problem always reproducible on that fs or still with some randomness?
> 
> And, can you still reproduce it with v5.1/v5.2?
> 
> Thanks,
> Qu
> 

I mentioned that patch because it was the only patch I had applied to my
kernel at the time. The "issue" I was referring to was the looping issue
that I reported in the first email.

I have only come across this behavior without slub_debug once or twice,
so I don't have enough of a sample size to say whether it can happen on
older kernels. It's caused by running a balance with *just* the right
amount of free space, such that the correct behavior is probably ENOSPC.

I might eventually dedicate a volume to reproducing this issue, and
bisect the kernel. But I need all of my disks to be usable right now.
-- 
Cebtenzzre 



Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
> [BUG]
> When running btrfs/063 in a loop, we got the following random write time
> tree checker error:
> 
>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 
> block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, 
> have 305 expect 307
>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 
> 12146 owner 18446744073709551610
>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) 
> lock_owner 0 current 26176
>   item 0 key (305 1 0) itemoff 16123 itemsize 160
>   inode generation 0 size 0 mode 40777
>   item 1 key (305 12 257) itemoff 16111 itemsize 12
>   item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>   extent data disk bytenr 0 nr 0
>   extent data offset 0 nr 614400 ram 671744
>   item 3 key (307 108 614400) itemoff 16005 itemsize 53
>   extent data disk bytenr 195342336 nr 57344
>   extent data offset 0 nr 53248 ram 57344
>   item 4 key (307 108 667648) itemoff 15952 itemsize 53
>   extent data disk bytenr 194048000 nr 4096
>   extent data offset 0 nr 4096 ram 4096
> [...]
>   BTRFS error (device dm-4): block=33095680 write time tree block corruption 
> detected
>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO 
> failure (Error while writing out transaction)
>   BTRFS info (device dm-4): forced readonly
>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>   BTRFS info (device dm-4): use zlib compression, level 3
>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
> 
> [CAUSE]
> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> should have previous key with the same objectid as ino.
> 
> But it's only true for fs trees. For log-tree, we can get above log tree
> block where an EXTENT_DATA item has no previous key with the same ino.
> As log tree only records modified items, it won't record unmodified
> items like INODE_ITEM.
> 
> So this triggers write time tree check warning.
> 
> [FIX]
> As a quick fix, check header owner to skip the previous key if it's not
> fs tree (log tree doesn't count as fs tree).
> 
> This fix is only to be merged as a quick fix.
> There will be a more comprehensive fix to refactor the common check into
> one function.
> 
> Reported-by: David Sterba 
> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> Signed-off-by: Qu Wenruo 


It's not entirely clear why this bug manifests. My tests show that when
we write extents we always update the inode's c/m time so it's always
dirtied hence it's logged. OTOH when punching a hole the same thing is
valid.

Filipe, under what conditions should it be possible to log an
EXTENT_DATA item without first logging the inode it belongs to? It seems
using the usual write paths (e.g. buffered write and punchole) that's
impossible?

> ---
>  fs/btrfs/tree-checker.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b8f82d9be9f0..5e34cd5e3e2e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer 
> *leaf,
>* But if objectids mismatch, it means we have a missing
>* INODE_ITEM.
>*/
> - if (slot > 0 && prev_key->objectid != key->objectid) {
> + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> + prev_key->objectid != key->objectid) {
>   file_extent_err(leaf, slot,
>   "invalid previous key objectid, have %llu expect %llu",
>   prev_key->objectid, key->objectid);
> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
>   u32 cur = 0;
>  
>   /* Same check as in check_extent_data_item() */
> - if (slot > 0 && prev_key->objectid != key->objectid) {
> + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> + prev_key->objectid != key->objectid) {
>   dir_item_err(leaf, slot,
>   "invalid previous key objectid, have %llu expect %llu",
>prev_key->objectid, key->objectid);
> 


Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees

2019-10-04 Thread Filipe Manana
On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov  wrote:
>
>
>
> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
> > [BUG]
> > When running btrfs/063 in a loop, we got the following random write time
> > tree checker error:
> >
> >   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 
> > block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, 
> > have 305 expect 307
> >   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 
> > 12146 owner 18446744073709551610
> >   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) 
> > lock_owner 0 current 26176
> >   item 0 key (305 1 0) itemoff 16123 itemsize 160
> >   inode generation 0 size 0 mode 40777
> >   item 1 key (305 12 257) itemoff 16111 itemsize 12
> >   item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
> >   extent data disk bytenr 0 nr 0
> >   extent data offset 0 nr 614400 ram 671744
> >   item 3 key (307 108 614400) itemoff 16005 itemsize 53
> >   extent data disk bytenr 195342336 nr 57344
> >   extent data offset 0 nr 53248 ram 57344
> >   item 4 key (307 108 667648) itemoff 15952 itemsize 53
> >   extent data disk bytenr 194048000 nr 4096
> >   extent data offset 0 nr 4096 ram 4096
> > [...]
> >   BTRFS error (device dm-4): block=33095680 write time tree block 
> > corruption detected
> >   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO 
> > failure (Error while writing out transaction)
> >   BTRFS info (device dm-4): forced readonly
> >   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
> >   BTRFS info (device dm-4): use zlib compression, level 3
> >   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO 
> > failure
> >
> > [CAUSE]
> > Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing 
> > INODE_ITEM")
> > assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> > should have previous key with the same objectid as ino.
> >
> > But it's only true for fs trees. For log-tree, we can get above log tree
> > block where an EXTENT_DATA item has no previous key with the same ino.
> > As log tree only records modified items, it won't record unmodified
> > items like INODE_ITEM.
> >
> > So this triggers write time tree check warning.
> >
> > [FIX]
> > As a quick fix, check header owner to skip the previous key if it's not
> > fs tree (log tree doesn't count as fs tree).
> >
> > This fix is only to be merged as a quick fix.
> > There will be a more comprehensive fix to refactor the common check into
> > one function.
> >
> > Reported-by: David Sterba 
> > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing 
> > INODE_ITEM")
> > Signed-off-by: Qu Wenruo 
>
>
> It's not entirely clear why this bug manifests. My tests show that when
> we write extents we always update the inode's c/m time so it's always
> dirtied hence it's logged. OTOH when punching a hole the same thing is
> valid.
>
> Filipe, under what conditions should it be possible to log an
> EXTENT_DATA item without first logging the inode it belongs to? It seems
> using the usual write paths (e.g. buffered write and punchole) that's
> impossible?

The tests you did are pointless, none of those operations write to a
log tree, only fsync does that.

This change is perfectly fine. Logging (fsync) always logs the inode
item since commit [1] (2015),
however it might do so after logging extents and other items, and in
between that, if writeback for
the log tree leaf happens we get that error from the tree-checker.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5

>
> > ---
> >  fs/btrfs/tree-checker.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> > index b8f82d9be9f0..5e34cd5e3e2e 100644
> > --- a/fs/btrfs/tree-checker.c
> > +++ b/fs/btrfs/tree-checker.c
> > @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer 
> > *leaf,
> >* But if objectids mismatch, it means we have a missing
> >* INODE_ITEM.
> >*/
> > - if (slot > 0 && prev_key->objectid != key->objectid) {
> > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> > + prev_key->objectid != key->objectid) {
> >   file_extent_err(leaf, slot,
> >   "invalid previous key objectid, have %llu expect %llu",
> >   prev_key->objectid, key->objectid);
> > @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
> >   u32 cur = 0;
> >
> >   /* Same check as in check_extent_data_item() */
> > - if (slot > 0 && prev_key->objectid != key->objectid) {
> > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> > + p

Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees

2019-10-04 Thread Filipe Manana
On Fri, Oct 4, 2019 at 11:27 AM Qu Wenruo  wrote:
>
> [BUG]
> When running btrfs/063 in a loop, we got the following random write time
> tree checker error:
>
>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 
> block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, 
> have 305 expect 307
>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 
> 12146 owner 18446744073709551610
>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) 
> lock_owner 0 current 26176
>   item 0 key (305 1 0) itemoff 16123 itemsize 160
>   inode generation 0 size 0 mode 40777
>   item 1 key (305 12 257) itemoff 16111 itemsize 12
>   item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>   extent data disk bytenr 0 nr 0
>   extent data offset 0 nr 614400 ram 671744
>   item 3 key (307 108 614400) itemoff 16005 itemsize 53
>   extent data disk bytenr 195342336 nr 57344
>   extent data offset 0 nr 53248 ram 57344
>   item 4 key (307 108 667648) itemoff 15952 itemsize 53
>   extent data disk bytenr 194048000 nr 4096
>   extent data offset 0 nr 4096 ram 4096
>   [...]
>   BTRFS error (device dm-4): block=33095680 write time tree block corruption 
> detected
>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO 
> failure (Error while writing out transaction)
>   BTRFS info (device dm-4): forced readonly
>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>   BTRFS info (device dm-4): use zlib compression, level 3
>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
>
> [CAUSE]
> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> should have previous key with the same objectid as ino.
>
> But it's only true for fs trees. For log-tree, we can get above log tree
> block where an EXTENT_DATA item has no previous key with the same ino.
> As log tree only records modified items, it won't record unmodified
> items like INODE_ITEM.
>
> So this triggers write time tree check warning.
>
> [FIX]
> As a quick fix, check header owner to skip the previous key if it's not
> fs tree (log tree doesn't count as fs tree).
>
> This fix is only to be merged as a quick fix.
> There will be a more comprehensive fix to refactor the common check into
> one function.
>
> Reported-by: David Sterba 
> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")

So this is bogus, since that commit is not in Linus' tree, and once it
gets there its ID changes.
More likely, this will get squashed into that commit in misc-next
since we are still far from the 5.5 merge window.

> Signed-off-by: Qu Wenruo 

Anyway, the change looks fine to me.

Reviewed-by: Filipe Manana 

Thanks.

> ---
>  fs/btrfs/tree-checker.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b8f82d9be9f0..5e34cd5e3e2e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer 
> *leaf,
>  * But if objectids mismatch, it means we have a missing
>  * INODE_ITEM.
>  */
> -   if (slot > 0 && prev_key->objectid != key->objectid) {
> +   if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +   prev_key->objectid != key->objectid) {
> file_extent_err(leaf, slot,
> "invalid previous key objectid, have %llu expect %llu",
> prev_key->objectid, key->objectid);
> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
> u32 cur = 0;
>
> /* Same check as in check_extent_data_item() */
> -   if (slot > 0 && prev_key->objectid != key->objectid) {
> +   if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +   prev_key->objectid != key->objectid) {
> dir_item_err(leaf, slot,
> "invalid previous key objectid, have %llu expect %llu",
>  prev_key->objectid, key->objectid);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”


Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees

2019-10-04 Thread Nikolay Borisov



On 4.10.19 г. 17:13 ч., Filipe Manana wrote:
> On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov  wrote:
>>
>>
>>
>> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
>>> [BUG]
>>> When running btrfs/063 in a loop, we got the following random write time
>>> tree checker error:
>>>
>>>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 
>>> block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, 
>>> have 305 expect 307
>>>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 
>>> 12146 owner 18446744073709551610
>>>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) 
>>> lock_owner 0 current 26176
>>>   item 0 key (305 1 0) itemoff 16123 itemsize 160
>>>   inode generation 0 size 0 mode 40777
>>>   item 1 key (305 12 257) itemoff 16111 itemsize 12
>>>   item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>>>   extent data disk bytenr 0 nr 0
>>>   extent data offset 0 nr 614400 ram 671744
>>>   item 3 key (307 108 614400) itemoff 16005 itemsize 53
>>>   extent data disk bytenr 195342336 nr 57344
>>>   extent data offset 0 nr 53248 ram 57344
>>>   item 4 key (307 108 667648) itemoff 15952 itemsize 53
>>>   extent data disk bytenr 194048000 nr 4096
>>>   extent data offset 0 nr 4096 ram 4096
>>> [...]
>>>   BTRFS error (device dm-4): block=33095680 write time tree block 
>>> corruption detected
>>>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO 
>>> failure (Error while writing out transaction)
>>>   BTRFS info (device dm-4): forced readonly
>>>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>>>   BTRFS info (device dm-4): use zlib compression, level 3
>>>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO 
>>> failure
>>>
>>> [CAUSE]
>>> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing 
>>> INODE_ITEM")
>>> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
>>> should have previous key with the same objectid as ino.
>>>
>>> But it's only true for fs trees. For log-tree, we can get above log tree
>>> block where an EXTENT_DATA item has no previous key with the same ino.
>>> As log tree only records modified items, it won't record unmodified
>>> items like INODE_ITEM.
>>>
>>> So this triggers write time tree check warning.
>>>
>>> [FIX]
>>> As a quick fix, check header owner to skip the previous key if it's not
>>> fs tree (log tree doesn't count as fs tree).
>>>
>>> This fix is only to be merged as a quick fix.
>>> There will be a more comprehensive fix to refactor the common check into
>>> one function.
>>>
>>> Reported-by: David Sterba 
>>> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing 
>>> INODE_ITEM")
>>> Signed-off-by: Qu Wenruo 
>>
>>
>> It's not entirely clear why this bug manifests. My tests show that when
>> we write extents we always update the inode's c/m time so it's always
>> dirtied hence it's logged. OTOH when punching a hole the same thing is
>> valid.
>>
>> Filipe, under what conditions should it be possible to log an
>> EXTENT_DATA item without first logging the inode it belongs to? It seems
>> using the usual write paths (e.g. buffered write and punchole) that's
>> impossible?
> 
> The tests you did are pointless, none of those operations write to a
> log tree, only fsync does that.

You were quick to judge, I tried:
xfs_io -f -c "fpunch 1m 4k" -c "fsync" /media/foo (foo was a 4m, fully
sycned file)

Similar command with the just writing in the middle of the file i.e not
changing isize.

> 
> This change is perfectly fine. Logging (fsync) always logs the inode
> item since commit [1] (2015),
> however it might do so after logging extents and other items, and in
> between that, if writeback for
> the log tree leaf happens we get that error from the tree-checker.

Fair enough, however that clarification about the sequence of events
should be in the changelog.

> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5
> 
>>
>>> ---
>>>  fs/btrfs/tree-checker.c | 6 --
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>>> index b8f82d9be9f0..5e34cd5e3e2e 100644
>>> --- a/fs/btrfs/tree-checker.c
>>> +++ b/fs/btrfs/tree-checker.c
>>> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer 
>>> *leaf,
>>>* But if objectids mismatch, it means we have a missing
>>>* INODE_ITEM.
>>>*/
>>> - if (slot > 0 && prev_key->objectid != key->objectid) {
>>> + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
>>> + prev_key->objectid != key->objectid) {
>>>   file_extent_err(leaf, slot,
>>>   "invalid previous key objectid, ha

[PATCH 1/3] btrfs: remove unnecessary hash_init()

2019-10-04 Thread Chengguang Xu
hash_init() is not necessary in btrfs_props_init(),
so remove it.

Signed-off-by: Chengguang Xu 
---
 fs/btrfs/props.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 1e664e0b59b8..68508db3dc65 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -437,8 +437,6 @@ void __init btrfs_props_init(void)
 {
int i;
 
-   hash_init(prop_handlers_ht);
-
for (i = 0; i < ARRAY_SIZE(prop_handlers); i++) {
struct prop_handler *p = &prop_handlers[i];
u64 h = btrfs_name_hash(p->xattr_name, strlen(p->xattr_name));
-- 
2.21.0





[PATCH 3/3] btrfs: using enum to replace macro

2019-10-04 Thread Chengguang Xu
using enum to replace macro definition for extent
types.

Signed-off-by: Chengguang Xu 
---
 fs/btrfs/tree-checker.c |  4 ++--
 include/uapi/linux/btrfs_tree.h | 10 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 2d91c34bbf63..9b0c5fdbe04e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -156,11 +156,11 @@ static int check_extent_data_item(struct extent_buffer 
*leaf,
 
fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
-   if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) {
+   if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_TYPES) {
file_extent_err(leaf, slot,
"invalid type for file extent, have %u expect range [0, %u]",
btrfs_file_extent_type(leaf, fi),
-   BTRFS_FILE_EXTENT_TYPES);
+   BTRFS_FILE_EXTENT_TYPES - 1);
return -EUCLEAN;
}
 
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index b65c7ee75bc7..34bd09ffc71d 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -737,10 +737,12 @@ struct btrfs_balance_item {
__le64 unused[4];
 } __attribute__ ((__packed__));
 
-#define BTRFS_FILE_EXTENT_INLINE 0
-#define BTRFS_FILE_EXTENT_REG 1
-#define BTRFS_FILE_EXTENT_PREALLOC 2
-#define BTRFS_FILE_EXTENT_TYPES2
+enum {
+   BTRFS_FILE_EXTENT_INLINE,
+   BTRFS_FILE_EXTENT_REG,
+   BTRFS_FILE_EXTENT_PREALLOC,
+   BTRFS_FILE_EXTENT_TYPES
+};
 
 struct btrfs_file_extent_item {
/*
-- 
2.21.0





[PATCH 2/3] btrfs: code cleanup for compression type

2019-10-04 Thread Chengguang Xu
Let BTRFS_COMPRESS_TYPES represents the total number
of cmpressoin types and fix related calling places.
It will be more safe when adding new compression type
in the future.

Signed-off-by: Chengguang Xu 
---
 fs/btrfs/compression.c  |  2 ++
 fs/btrfs/compression.h  | 12 ++--
 fs/btrfs/ioctl.c|  2 +-
 fs/btrfs/tree-checker.c |  4 ++--
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d70c46407420..93deaf0cc2b8 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -39,6 +39,8 @@ const char* btrfs_compress_type2str(enum 
btrfs_compression_type type)
case BTRFS_COMPRESS_ZSTD:
case BTRFS_COMPRESS_NONE:
return btrfs_compress_types[type];
+   default:
+   break;
}
 
return NULL;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index dd392278ab3f..091ff3f986e5 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -101,11 +101,11 @@ blk_status_t btrfs_submit_compressed_read(struct inode 
*inode, struct bio *bio,
 unsigned int btrfs_compress_str2level(unsigned int type, const char *str);
 
 enum btrfs_compression_type {
-   BTRFS_COMPRESS_NONE  = 0,
-   BTRFS_COMPRESS_ZLIB  = 1,
-   BTRFS_COMPRESS_LZO   = 2,
-   BTRFS_COMPRESS_ZSTD  = 3,
-   BTRFS_COMPRESS_TYPES = 3,
+   BTRFS_COMPRESS_NONE,
+   BTRFS_COMPRESS_ZLIB,
+   BTRFS_COMPRESS_LZO,
+   BTRFS_COMPRESS_ZSTD,
+   BTRFS_COMPRESS_TYPES
 };
 
 struct workspace_manager {
@@ -163,7 +163,7 @@ struct btrfs_compress_op {
 };
 
 /* The heuristic workspaces are managed via the 0th workspace manager */
-#define BTRFS_NR_WORKSPACE_MANAGERS(BTRFS_COMPRESS_TYPES + 1)
+#define BTRFS_NR_WORKSPACE_MANAGERSBTRFS_COMPRESS_TYPES
 
 extern const struct btrfs_compress_op btrfs_heuristic_compress;
 extern const struct btrfs_compress_op btrfs_zlib_compress;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index de730e56d3f5..8c7196ed7ae0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1411,7 +1411,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
return -EINVAL;
 
if (do_compress) {
-   if (range->compress_type > BTRFS_COMPRESS_TYPES)
+   if (range->compress_type >= BTRFS_COMPRESS_TYPES)
return -EINVAL;
if (range->compress_type)
compress_type = range->compress_type;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index f28f9725cef1..2d91c34bbf63 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -168,11 +168,11 @@ static int check_extent_data_item(struct extent_buffer 
*leaf,
 * Support for new compression/encryption must introduce incompat flag,
 * and must be caught in open_ctree().
 */
-   if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) {
+   if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_TYPES) {
file_extent_err(leaf, slot,
"invalid compression for file extent, have %u expect range [0, %u]",
btrfs_file_extent_compression(leaf, fi),
-   BTRFS_COMPRESS_TYPES);
+   BTRFS_COMPRESS_TYPES - 1);
return -EUCLEAN;
}
if (btrfs_file_extent_encryption(leaf, fi)) {
-- 
2.21.0