Re: [PATCH 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-21 Thread Eryu Guan
On Wed, Mar 21, 2018 at 02:22:29PM +0200, Amir Goldstein wrote:
> > +
> > +_log_writes_mount
> > +$FSSTRESS_PROG $fsstress_args > /dev/null 2>&1
> 
> You should run fsstress with run_check() so output will go to $seqres.full
> this way if you are able to catch a bug, you can take the random seed
> from fsstress output and repeat the same event sequence, which
> doesn't guaranty, but can increase the chances of reproducing the bug.

I suggested dropping run_check, as I don't think we care about the
fsstress return value here (and I always try to avoid new run_check
usage), but I agree that it might be useful to save the fsstress output
to $seqres.full.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-21 Thread Amir Goldstein
On Wed, Mar 21, 2018 at 10:01 AM, Qu Wenruo  wrote:
> Basic test case which triggers fsstress with dm-log-writes, and then
> check the fs after each FUA writes.
> With needed infrastructure and special handlers for journal based fs.
>
> Signed-off-by: Qu Wenruo 
> ---
> Unfortunately, neither xfs nor ext4 survies this test for even single
> success, while btrfs survives.
> (Although not what I want, I'm just trying my luck
> to reproduce a serious btrfs corruption situation)
>
> Although btrfs may be the fastest fs for the test, since it has fixed
> small amount of write in mkfs and almost nothing to replay, it still
> takes about 240s~300s to finish (the same level using snapshot).
>
> It would take longer time for ext4 for its large amount of write during
> mkfs, if it can survive the test in the first space.
>
> As a comparison, btrfs takes about 5 seconds to replay log, mount,
> unmount and run fsck at the end of the test.
> But for ext4, it already takes about 5 seconds to do the same thing
> before triggering fsck error.
>
> Fsck fail for ext4:
> _check_generic_filesystem: filesystem on /dev/mapper/test-scratch1 is 
> inconsistent
> *** fsck.ext4 output ***
> fsck from util-linux 2.31.1
> e2fsck 1.43.8 (1-Jan-2018)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 131076 extent tree (at level 1) could be shorter.  Fix? no
>
> Inode 131262, i_size is 0, should be 258048.  Fix? no
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
>
> For xfs:
> _check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is 
> inconsistent (r)
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 3
> - agno = 2
> entry "lb" in shortform directory 8409188 references free inode 8409190
> would have junked entry "lb" in directory inode 8409188
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> entry "lb" in shortform directory inode 8409188 points to free inode 8409190
> would junk entry
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> Maximum metadata LSN (1:396) is ahead of log (1:63).
> Would format log to cycle 4.
> No modify flag set, skipping filesystem flush and exiting.
>
> And special note for XFS guys, I also hit an XFS internal metadata
> warning during journal replay:
> [ 7901.423659] XFS (dm-4): Starting recovery (logdev: internal)
> [ 7901.577511] XFS (dm-4): Metadata corruption detected at 
> xfs_dinode_verify+0x467/0x570 [xfs], inode 0x805067 dinode
> [ 7901.580529] XFS (dm-4): Unmount and run xfs_repair
> [ 7901.581901] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
> [ 7901.583205] b8963f41: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 
> 00  IN..
> [ 7901.584659] f35a50e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [ 7901.586075] 386eea9e: 5a b2 0e 69 0a f3 43 27 5a b2 0e 69 0a f3 43 
> 27  Z..i..C'Z..i..C'
> [ 7901.587561] ac636661: 5a b2 0e 69 0d 92 bc 00 00 00 00 00 00 00 00 
> 00  Z..i
> [ 7901.588969] d75f9093: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [ 7901.590475] d2af5688: 00 00 00 02 00 00 00 00 00 00 00 00 84 d7 6a 
> e9  ..j.
> [ 7901.591907] e8dd8211: ff ff ff ff 34 93 a9 3a 00 00 00 00 00 00 00 
> 04  4..:
> [ 7901.593319] b7610e4e: 00 00 00 01 00 00 00 45 00 00 00 00 00 00 00 
> 00  ...E
>
> ---
>  common/dmlogwrites|  56 +++
>  tests/generic/481 | 104 
> ++
>  tests/generic/481.out |   2 +
>  tests/generic/group   |   1 +
>  4 files changed, 163 insertions(+)
>  create mode 100755 tests/generic/481
>  create mode 100644 tests/generic/481.out
>
> diff --git a/common/dmlogwrites 

[PATCH 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-21 Thread Qu Wenruo
Basic test case which triggers fsstress with dm-log-writes, and then
check the fs after each FUA writes.
With needed infrastructure and special handlers for journal based fs.

Signed-off-by: Qu Wenruo 
---
Unfortunately, neither xfs nor ext4 survies this test for even single
success, while btrfs survives.
(Although not what I want, I'm just trying my luck
to reproduce a serious btrfs corruption situation)

Although btrfs may be the fastest fs for the test, since it has fixed
small amount of write in mkfs and almost nothing to replay, it still
takes about 240s~300s to finish (the same level using snapshot).

It would take longer time for ext4 for its large amount of write during
mkfs, if it can survive the test in the first space.

As a comparison, btrfs takes about 5 seconds to replay log, mount,
unmount and run fsck at the end of the test.
But for ext4, it already takes about 5 seconds to do the same thing
before triggering fsck error.

Fsck fail for ext4:
_check_generic_filesystem: filesystem on /dev/mapper/test-scratch1 is 
inconsistent
*** fsck.ext4 output ***
fsck from util-linux 2.31.1
e2fsck 1.43.8 (1-Jan-2018)
Pass 1: Checking inodes, blocks, and sizes
Inode 131076 extent tree (at level 1) could be shorter.  Fix? no

Inode 131262, i_size is 0, should be 258048.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

For xfs:
_check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is inconsistent 
(r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
bad symlink header ino 8409190, file block 0, disk block 1051147
problem with symbolic link in inode 8409190
would have cleared inode 8409190
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 3
- agno = 2
entry "lb" in shortform directory 8409188 references free inode 8409190
would have junked entry "lb" in directory inode 8409188
bad symlink header ino 8409190, file block 0, disk block 1051147
problem with symbolic link in inode 8409190
would have cleared inode 8409190
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
entry "lb" in shortform directory inode 8409188 points to free inode 8409190
would junk entry
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1:396) is ahead of log (1:63).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.

And special note for XFS guys, I also hit an XFS internal metadata
warning during journal replay:
[ 7901.423659] XFS (dm-4): Starting recovery (logdev: internal)
[ 7901.577511] XFS (dm-4): Metadata corruption detected at 
xfs_dinode_verify+0x467/0x570 [xfs], inode 0x805067 dinode
[ 7901.580529] XFS (dm-4): Unmount and run xfs_repair
[ 7901.581901] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
[ 7901.583205] b8963f41: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 
00  IN..
[ 7901.584659] f35a50e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  
[ 7901.586075] 386eea9e: 5a b2 0e 69 0a f3 43 27 5a b2 0e 69 0a f3 43 
27  Z..i..C'Z..i..C'
[ 7901.587561] ac636661: 5a b2 0e 69 0d 92 bc 00 00 00 00 00 00 00 00 
00  Z..i
[ 7901.588969] d75f9093: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  
[ 7901.590475] d2af5688: 00 00 00 02 00 00 00 00 00 00 00 00 84 d7 6a 
e9  ..j.
[ 7901.591907] e8dd8211: ff ff ff ff 34 93 a9 3a 00 00 00 00 00 00 00 
04  4..:
[ 7901.593319] b7610e4e: 00 00 00 01 00 00 00 45 00 00 00 00 00 00 00 
00  ...E

---
 common/dmlogwrites|  56 +++
 tests/generic/481 | 104 ++
 tests/generic/481.out |   2 +
 tests/generic/group   |   1 +
 4 files changed, 163 insertions(+)
 create mode 100755 tests/generic/481
 create mode 100644 tests/generic/481.out

diff --git a/common/dmlogwrites b/common/dmlogwrites
index 467b872e..bf643a77 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -126,3 +126,59 @@ _log_writes_cleanup()
$UDEV_SETTLE_PROG >/dev/null 2>&1
_log_writes_remove
 }
+
+# Convert log writes mark to entry number
+# Result entry number is output to stdout,