Re: btrfs/125 deadlock using nospace_cache or space_cache=v2

Qu Wenruo Tue, 07 Feb 2017 00:28:21 -0800


At 02/07/2017 04:02 PM, Anand Jain wrote:


Hi Qu,

 I don't think I have seen this before, I don't know the reason
 why I wrote this, may be to test encryption, however it was all
 with default options.


Forgot to mention, thanks for the test case.
Or we will never find it.

Thanks,
Qu


 But now I could reproduce and, looks like balance fails to
 start with IO error though the mount is successful.
------------------
# tail -f ./results/btrfs/125.full
    intense and takes potentially very long. It is recommended to
    use the balance filters to narrow down the balanced data.
    Use 'btrfs balance start --full-balance' option to skip this
    warning. The operation will start in 10 seconds.
    Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1ERROR: error during balancing '/scratch':
Input/output error
There may be more info in syslog - try dmesg | tail

Starting balance without any filters.
failed: '/root/bin/btrfs balance start /scratch'
--------------------

 This must be fixed. For debugging if I add a sync before previous
 unmount, the problem isn't reproduced. just fyi. Strange.

-------
diff --git a/tests/btrfs/125 b/tests/btrfs/125
index 91aa8d8c3f4d..4d4316ca9f6e 100755
--- a/tests/btrfs/125
+++ b/tests/btrfs/125
@@ -133,6 +133,7 @@ echo "-----Mount normal-----" >> $seqres.full
 echo
 echo "Mount normal and balance"

+_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
 _scratch_unmount
 _run_btrfs_util_prog device scan
 _scratch_mount >> $seqres.full 2>&1
------

 HTH.

Thanks, Anand


On 02/07/17 14:09, Qu Wenruo wrote:

Hi Anand,

I found that btrfs/125 test case can only pass if we enabled space cache.

If using nospace_cache or space_cache=v2 mount option, it will get
blocked forever with the following callstack(the only blocked process):

[11382.046978] btrfs           D11128  6705   6057 0x00000000
[11382.047356] Call Trace:
[11382.047668]  __schedule+0x2d4/0xae0
[11382.047956]  schedule+0x3d/0x90
[11382.048283]  btrfs_start_ordered_extent+0x160/0x200 [btrfs]
[11382.048630]  ? wake_atomic_t_function+0x60/0x60
[11382.048958]  btrfs_wait_ordered_range+0x113/0x210 [btrfs]
[11382.049360]  btrfs_relocate_block_group+0x260/0x2b0 [btrfs]
[11382.049703]  btrfs_relocate_chunk+0x51/0xf0 [btrfs]
[11382.050073]  btrfs_balance+0xaa9/0x1610 [btrfs]
[11382.050404]  ? btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs]
[11382.050739]  btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs]
[11382.051109]  btrfs_ioctl+0xbe7/0x27f0 [btrfs]
[11382.051430]  ? trace_hardirqs_on+0xd/0x10
[11382.051747]  ? free_object+0x74/0xa0
[11382.052084]  ? debug_object_free+0xf2/0x130
[11382.052413]  do_vfs_ioctl+0x94/0x710
[11382.052750]  ? enqueue_hrtimer+0x160/0x160
[11382.053090]  ? do_nanosleep+0x71/0x130
[11382.053431]  SyS_ioctl+0x79/0x90
[11382.053735]  entry_SYSCALL_64_fastpath+0x18/0xad
[11382.054570] RIP: 0033:0x7f397d7a6787

I also found in the test case, we only have 3 continuous data extents,
whose sizes are 1M, 68.5M and 31.5M respectively.

Original data block group:
0       1M                     64M    69.5M                  101M   128M
| Ext A |     Extent B(68.5M)         |    Extent C(31.5M)   |


While relocation write them in 4 extents:
0~1M            :same as Extent A.         (1st)
1M~68.3438M     :smaller than Extent B     (2nd)
68.3438M~69.5M  :tail part of Extent B     (3rd)
69.5M~ 101M     :same as Extent C.         (4th)

However only ordered extent of (3rd) and (4th) get finished.
While ordered extent of (1st) and (2nd) never reached
finish_ordered_io().

So relocation will wait for no one to finish the these two ordered
extent, and get blocked.

Did you experienced the same bug submitting the test case?
Is there any known fix for it?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs/125 deadlock using nospace_cache or space_cache=v2

Reply via email to