Re: Btrfs umount hang

2017-08-10 Thread Angel Shtilianov
Here is a fresh snapshot hang example:

Here kworker/u24:0 is blocked waiting on btrfs_tree_read_lock.
And again underlaying is btrfs_search_slot after it has gone through
cache_block_group.
The other two has blocked on the same place in cache_block_group. So I
can confirm it is hanging the same way as with umount.
Here is the struct extent_buffer passed to btrfs_tree_read_lock for
kworker/u24:0:
struct extent_buffer {
...
  lock_owner = 23286,
  write_locks = {
counter = 1
  },
  read_locks = {
counter = 0
  },
  blocking_writers = {
counter = 1
  },
..
}
So again the ioctl has taken the lock needed for its underlying operation.
Btrfs pid 23286 is going to perform a write, so it can't skip locking,
and not just read locking, but write locking.
The fact that it owns the wr lock is good. skip_locking in its
btrfs_path struct is zero, so till here it is fine and locked. The
problem rises later, if cache isnt available. Then we are about to try
to do cache_block_group. Even if it is not going to hang on the
wait in the beginning in the cache_block_group function (as in current case):
while (cache->cached == BTRFS_CACHE_FAST) {
struct btrfs_caching_control *ctl;

ctl = cache->caching_ctl;
atomic_inc(>count);
prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
spin_unlock(>lock);

>schedule();

finish_wait(>wait, );
put_caching_control(ctl);
spin_lock(>lock);
}
it seems that there is high chance to hang later on
btrfs_tree_read_lock even if there is absolutely no concurrent work
going on by calling:
cache_block_group
 ->load_free_space_cache
   ->lookup_free_space_inode
 ->__lookup_free_space_inode
   ->btrfs_search_slot
 ->btrfs_read_lock_root_node
   ->btrfs_tree_read_lock
since we are already holding the write lock.
It looks like if we call cache_block_group at find_free_extent under
call via btrfs_search_slot we are about to get deadlocked too in case
the data isn't cached.

It looks scary. I hope that I've missed something.

Best regards,
Angel

On Thu, Aug 10, 2017 at 1:30 AM, Angel Shtilianov  wrote:
> Hi Jeff,
> That's exactly what I am observing in the backtraces.
> umount reaches cache_block_group passing through btrfs_search_slot,
> which looks like got the wr_lock (to be confirmed tomorrow when I'm in
> the office) and all other btrfs worker threads are fighting for the
> btrfs_tree_read_lock.
> Umount is hanging on the cache_block_group wait, all others are
> hanging on the read_lock. So caching seems to be the bottleneck,
> blocking all others. Good catch! I'll take a look at the commit
> tomorrow morning (with large cup of coffee of course).
>
> By the way, looking through the logs I can see one more pattern -
> umount after drop snapshot reaches similar state. The common is that
> drop snapshot will also call search slot, get the readlock, drop it
> and get the write lock. In the same time there is umount causing
> fs_sync going concurently. I've seen at least three such hangs today
> and I've started the corresponding crashdumps to get copied to my
> machine. Tomorrow I'll take a look at them. It seems the common is
> that umount again reaches to the same place - cache_block_group and
> hangs. Tomorrow I'll take a look at the crash dump for the workers
> state.
>
> Thank you,
> Angel
>
> On Wed, Aug 9, 2017 at 6:42 PM, Jeff Mahoney  wrote:
>> On 8/8/17 7:30 AM, Angel Shtilianov wrote:
>>> crash> bt -f 31625
>>> PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
>>> wants to acquire struct extent_buffer 88000460aca0 lock, whose
>>> lock_owner is 27574.
>>>
>>> here is pid 27574:
>>> PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
>>> which is also is trying to acquire eb lock 8802598b6200, and here
>>> the owner is 31696.
>>>
>>> 31696 is
>>> PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"
>>>
>>> So definitely here is a kind of deadlock.
>>> umount holds the lock needed by the workers to complete and waits them
>>> to complete.
>>> Lockdep wouldn't complain about that.
>>> I am still about to investigate what has previously triggered/disabled 
>>> lockdep.
>>> I have to obtain the log from the machine, but I need some time to get it.
>>>
>>> Jeff, you were right.
>>> Could you help demystifying how we ended up here?
>>
>> Hi Angel -
>>
>> It looks like a regression introduced by 291c7d2f5, but that's a very
>> old commit.  As that commit says, it's a rare occurence to hit that
>> wait, and that's probably why we haven't seen this issue sooner.
>>
>> There's potential for this to happen whenever two threads are modifying
>> the tree at once and one needs to find a free extent.  I'll need to
>> think a bit on how to fix it.
>>
>> -Jeff
>>
>>> Best regards,
>>> Angel
>>>
>>> On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney 

Re: Btrfs umount hang

2017-08-09 Thread Angel Shtilianov
Hi Jeff,
That's exactly what I am observing in the backtraces.
umount reaches cache_block_group passing through btrfs_search_slot,
which looks like got the wr_lock (to be confirmed tomorrow when I'm in
the office) and all other btrfs worker threads are fighting for the
btrfs_tree_read_lock.
Umount is hanging on the cache_block_group wait, all others are
hanging on the read_lock. So caching seems to be the bottleneck,
blocking all others. Good catch! I'll take a look at the commit
tomorrow morning (with large cup of coffee of course).

By the way, looking through the logs I can see one more pattern -
umount after drop snapshot reaches similar state. The common is that
drop snapshot will also call search slot, get the readlock, drop it
and get the write lock. In the same time there is umount causing
fs_sync going concurently. I've seen at least three such hangs today
and I've started the corresponding crashdumps to get copied to my
machine. Tomorrow I'll take a look at them. It seems the common is
that umount again reaches to the same place - cache_block_group and
hangs. Tomorrow I'll take a look at the crash dump for the workers
state.

Thank you,
Angel

On Wed, Aug 9, 2017 at 6:42 PM, Jeff Mahoney  wrote:
> On 8/8/17 7:30 AM, Angel Shtilianov wrote:
>> crash> bt -f 31625
>> PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
>> wants to acquire struct extent_buffer 88000460aca0 lock, whose
>> lock_owner is 27574.
>>
>> here is pid 27574:
>> PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
>> which is also is trying to acquire eb lock 8802598b6200, and here
>> the owner is 31696.
>>
>> 31696 is
>> PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"
>>
>> So definitely here is a kind of deadlock.
>> umount holds the lock needed by the workers to complete and waits them
>> to complete.
>> Lockdep wouldn't complain about that.
>> I am still about to investigate what has previously triggered/disabled 
>> lockdep.
>> I have to obtain the log from the machine, but I need some time to get it.
>>
>> Jeff, you were right.
>> Could you help demystifying how we ended up here?
>
> Hi Angel -
>
> It looks like a regression introduced by 291c7d2f5, but that's a very
> old commit.  As that commit says, it's a rare occurence to hit that
> wait, and that's probably why we haven't seen this issue sooner.
>
> There's potential for this to happen whenever two threads are modifying
> the tree at once and one needs to find a free extent.  I'll need to
> think a bit on how to fix it.
>
> -Jeff
>
>> Best regards,
>> Angel
>>
>> On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney  wrote:
>>> On 8/7/17 1:19 PM, Jeff Mahoney wrote:
 On 8/7/17 10:12 AM, Angel Shtilianov wrote:
> Hi there,
> I'm investigating sporadic hanging during btrfs umount. The FS is
> contained in a loop mounted file.
> I have no reproduction scenario and the issue may happen once a day or
> once a month. It is rare, but frustrating.
> I have a crashdump (the server has been manually crashed and collected
> a crashdump), so I could take look through the data structures.
> What happens is that umount is getting in D state and a the kernel
> complains about hung tasks. We are using kernel 4.4.y The actual back
> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
> used (4.4.30 through 4.4.70).
> Tasks like:
> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
> are getting blocked waiting for btrfs_tree_read_lock, which is owned
> by task umount:31696 (which is also blocked for more than 120 seconds)
> regarding the lock debug.
>
> umount is hung in "cache_block_group", see the '>' mark:
>while (cache->cached == BTRFS_CACHE_FAST) {
> struct btrfs_caching_control *ctl;
>
> ctl = cache->caching_ctl;
> atomic_inc(>count);
> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
> spin_unlock(>lock);
>
>>schedule();
>
> finish_wait(>wait, );
> put_caching_control(ctl);
> spin_lock(>lock);
> }
>
> The complete backtraces could be found in the attached log.
>
> Do you have any ideas ?

 Hi Angel -

 In your log, it says lockdep is disabled.  What tripped it earlier?
 Lockdep really should be catching locking deadlocks in situations like
 this, if that's really the underlying cause.
>>>
>>> Actually, I'm not sure if lockdep would catch this one.  Here's my
>>> hypothesis:
>>>
>>> kworker/u32:9 is waiting on a read lock while reading the free space
>>> cache, which means it owns the 

Re: Btrfs umount hang

2017-08-09 Thread Jeff Mahoney
On 8/8/17 7:30 AM, Angel Shtilianov wrote:
> crash> bt -f 31625
> PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
> wants to acquire struct extent_buffer 88000460aca0 lock, whose
> lock_owner is 27574.
> 
> here is pid 27574:
> PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
> which is also is trying to acquire eb lock 8802598b6200, and here
> the owner is 31696.
> 
> 31696 is
> PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"
> 
> So definitely here is a kind of deadlock.
> umount holds the lock needed by the workers to complete and waits them
> to complete.
> Lockdep wouldn't complain about that.
> I am still about to investigate what has previously triggered/disabled 
> lockdep.
> I have to obtain the log from the machine, but I need some time to get it.
> 
> Jeff, you were right.
> Could you help demystifying how we ended up here?

Hi Angel -

It looks like a regression introduced by 291c7d2f5, but that's a very
old commit.  As that commit says, it's a rare occurence to hit that
wait, and that's probably why we haven't seen this issue sooner.

There's potential for this to happen whenever two threads are modifying
the tree at once and one needs to find a free extent.  I'll need to
think a bit on how to fix it.

-Jeff

> Best regards,
> Angel
> 
> On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney  wrote:
>> On 8/7/17 1:19 PM, Jeff Mahoney wrote:
>>> On 8/7/17 10:12 AM, Angel Shtilianov wrote:
 Hi there,
 I'm investigating sporadic hanging during btrfs umount. The FS is
 contained in a loop mounted file.
 I have no reproduction scenario and the issue may happen once a day or
 once a month. It is rare, but frustrating.
 I have a crashdump (the server has been manually crashed and collected
 a crashdump), so I could take look through the data structures.
 What happens is that umount is getting in D state and a the kernel
 complains about hung tasks. We are using kernel 4.4.y The actual back
 trace is from 4.4.70, but this happens with all the 4.4 kernels I've
 used (4.4.30 through 4.4.70).
 Tasks like:
 INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
 INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
 INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
 are getting blocked waiting for btrfs_tree_read_lock, which is owned
 by task umount:31696 (which is also blocked for more than 120 seconds)
 regarding the lock debug.

 umount is hung in "cache_block_group", see the '>' mark:
while (cache->cached == BTRFS_CACHE_FAST) {
 struct btrfs_caching_control *ctl;

 ctl = cache->caching_ctl;
 atomic_inc(>count);
 prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
 spin_unlock(>lock);

>schedule();

 finish_wait(>wait, );
 put_caching_control(ctl);
 spin_lock(>lock);
 }

 The complete backtraces could be found in the attached log.

 Do you have any ideas ?
>>>
>>> Hi Angel -
>>>
>>> In your log, it says lockdep is disabled.  What tripped it earlier?
>>> Lockdep really should be catching locking deadlocks in situations like
>>> this, if that's really the underlying cause.
>>
>> Actually, I'm not sure if lockdep would catch this one.  Here's my
>> hypothesis:
>>
>> kworker/u32:9 is waiting on a read lock while reading the free space
>> cache, which means it owns the cache->cached value and will issue the
>> wakeup when it completes.
>>
>> umount is waiting on for the wakeup from kworker/u32:9 but is holding
>> some tree locks in write mode.
>>
>> If kworker/u32:9 is waiting on the locks that umount holds, we have a
>> deadlock.
>>
>> Can you dump the extent buffer that kworker/u32:9 is waiting on?  Part
>> of that will contain the PID of the holder, and if matches umount, we
>> found the cause.
>>
>> -Jeff
>>
>> --
>> Jeff Mahoney
>> SUSE Labs
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: Btrfs umount hang

2017-08-08 Thread Angel Shtilianov
crash> bt -f 31625
PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
wants to acquire struct extent_buffer 88000460aca0 lock, whose
lock_owner is 27574.

here is pid 27574:
PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
which is also is trying to acquire eb lock 8802598b6200, and here
the owner is 31696.

31696 is
PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"

So definitely here is a kind of deadlock.
umount holds the lock needed by the workers to complete and waits them
to complete.
Lockdep wouldn't complain about that.
I am still about to investigate what has previously triggered/disabled lockdep.
I have to obtain the log from the machine, but I need some time to get it.

Jeff, you were right.
Could you help demystifying how we ended up here?

Best regards,
Angel

On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney  wrote:
> On 8/7/17 1:19 PM, Jeff Mahoney wrote:
>> On 8/7/17 10:12 AM, Angel Shtilianov wrote:
>>> Hi there,
>>> I'm investigating sporadic hanging during btrfs umount. The FS is
>>> contained in a loop mounted file.
>>> I have no reproduction scenario and the issue may happen once a day or
>>> once a month. It is rare, but frustrating.
>>> I have a crashdump (the server has been manually crashed and collected
>>> a crashdump), so I could take look through the data structures.
>>> What happens is that umount is getting in D state and a the kernel
>>> complains about hung tasks. We are using kernel 4.4.y The actual back
>>> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
>>> used (4.4.30 through 4.4.70).
>>> Tasks like:
>>> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
>>> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
>>> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
>>> are getting blocked waiting for btrfs_tree_read_lock, which is owned
>>> by task umount:31696 (which is also blocked for more than 120 seconds)
>>> regarding the lock debug.
>>>
>>> umount is hung in "cache_block_group", see the '>' mark:
>>>while (cache->cached == BTRFS_CACHE_FAST) {
>>> struct btrfs_caching_control *ctl;
>>>
>>> ctl = cache->caching_ctl;
>>> atomic_inc(>count);
>>> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
>>> spin_unlock(>lock);
>>>
schedule();
>>>
>>> finish_wait(>wait, );
>>> put_caching_control(ctl);
>>> spin_lock(>lock);
>>> }
>>>
>>> The complete backtraces could be found in the attached log.
>>>
>>> Do you have any ideas ?
>>
>> Hi Angel -
>>
>> In your log, it says lockdep is disabled.  What tripped it earlier?
>> Lockdep really should be catching locking deadlocks in situations like
>> this, if that's really the underlying cause.
>
> Actually, I'm not sure if lockdep would catch this one.  Here's my
> hypothesis:
>
> kworker/u32:9 is waiting on a read lock while reading the free space
> cache, which means it owns the cache->cached value and will issue the
> wakeup when it completes.
>
> umount is waiting on for the wakeup from kworker/u32:9 but is holding
> some tree locks in write mode.
>
> If kworker/u32:9 is waiting on the locks that umount holds, we have a
> deadlock.
>
> Can you dump the extent buffer that kworker/u32:9 is waiting on?  Part
> of that will contain the PID of the holder, and if matches umount, we
> found the cause.
>
> -Jeff
>
> --
> Jeff Mahoney
> SUSE Labs
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs umount hang

2017-08-07 Thread Jeff Mahoney
On 8/7/17 1:19 PM, Jeff Mahoney wrote:
> On 8/7/17 10:12 AM, Angel Shtilianov wrote:
>> Hi there,
>> I'm investigating sporadic hanging during btrfs umount. The FS is
>> contained in a loop mounted file.
>> I have no reproduction scenario and the issue may happen once a day or
>> once a month. It is rare, but frustrating.
>> I have a crashdump (the server has been manually crashed and collected
>> a crashdump), so I could take look through the data structures.
>> What happens is that umount is getting in D state and a the kernel
>> complains about hung tasks. We are using kernel 4.4.y The actual back
>> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
>> used (4.4.30 through 4.4.70).
>> Tasks like:
>> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
>> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
>> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
>> are getting blocked waiting for btrfs_tree_read_lock, which is owned
>> by task umount:31696 (which is also blocked for more than 120 seconds)
>> regarding the lock debug.
>>
>> umount is hung in "cache_block_group", see the '>' mark:
>>while (cache->cached == BTRFS_CACHE_FAST) {
>> struct btrfs_caching_control *ctl;
>>
>> ctl = cache->caching_ctl;
>> atomic_inc(>count);
>> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
>> spin_unlock(>lock);
>>
>>>schedule();
>>
>> finish_wait(>wait, );
>> put_caching_control(ctl);
>> spin_lock(>lock);
>> }
>>
>> The complete backtraces could be found in the attached log.
>>
>> Do you have any ideas ?
> 
> Hi Angel -
> 
> In your log, it says lockdep is disabled.  What tripped it earlier?
> Lockdep really should be catching locking deadlocks in situations like
> this, if that's really the underlying cause.

Actually, I'm not sure if lockdep would catch this one.  Here's my
hypothesis:

kworker/u32:9 is waiting on a read lock while reading the free space
cache, which means it owns the cache->cached value and will issue the
wakeup when it completes.

umount is waiting on for the wakeup from kworker/u32:9 but is holding
some tree locks in write mode.

If kworker/u32:9 is waiting on the locks that umount holds, we have a
deadlock.

Can you dump the extent buffer that kworker/u32:9 is waiting on?  Part
of that will contain the PID of the holder, and if matches umount, we
found the cause.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: Btrfs umount hang

2017-08-07 Thread Jeff Mahoney
On 8/7/17 10:12 AM, Angel Shtilianov wrote:
> Hi there,
> I'm investigating sporadic hanging during btrfs umount. The FS is
> contained in a loop mounted file.
> I have no reproduction scenario and the issue may happen once a day or
> once a month. It is rare, but frustrating.
> I have a crashdump (the server has been manually crashed and collected
> a crashdump), so I could take look through the data structures.
> What happens is that umount is getting in D state and a the kernel
> complains about hung tasks. We are using kernel 4.4.y The actual back
> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
> used (4.4.30 through 4.4.70).
> Tasks like:
> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
> are getting blocked waiting for btrfs_tree_read_lock, which is owned
> by task umount:31696 (which is also blocked for more than 120 seconds)
> regarding the lock debug.
> 
> umount is hung in "cache_block_group", see the '>' mark:
>while (cache->cached == BTRFS_CACHE_FAST) {
> struct btrfs_caching_control *ctl;
> 
> ctl = cache->caching_ctl;
> atomic_inc(>count);
> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
> spin_unlock(>lock);
> 
>>schedule();
> 
> finish_wait(>wait, );
> put_caching_control(ctl);
> spin_lock(>lock);
> }
> 
> The complete backtraces could be found in the attached log.
> 
> Do you have any ideas ?

Hi Angel -

In your log, it says lockdep is disabled.  What tripped it earlier?
Lockdep really should be catching locking deadlocks in situations like
this, if that's really the underlying cause.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: Btrfs umount hang

2017-08-07 Thread Nikolay Borisov


On  7.08.2017 17:12, Angel Shtilianov wrote:
> Hi there,
> I'm investigating sporadic hanging during btrfs umount. The FS is
> contained in a loop mounted file.
> I have no reproduction scenario and the issue may happen once a day or
> once a month. It is rare, but frustrating.
> I have a crashdump (the server has been manually crashed and collected
> a crashdump), so I could take look through the data structures.
> What happens is that umount is getting in D state and a the kernel
> complains about hung tasks. We are using kernel 4.4.y The actual back
> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
> used (4.4.30 through 4.4.70).
> Tasks like:
> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
> are getting blocked waiting for btrfs_tree_read_lock, which is owned
> by task umount:31696 (which is also blocked for more than 120 seconds)
> regarding the lock debug.
> 
> umount is hung in "cache_block_group", see the '>' mark:
>while (cache->cached == BTRFS_CACHE_FAST) {
> struct btrfs_caching_control *ctl;
> 
> ctl = cache->caching_ctl;
> atomic_inc(>count);
> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
> spin_unlock(>lock);
> 
>>schedule();
> 
> finish_wait(>wait, );
> put_caching_control(ctl);
> spin_lock(>lock);
> }
> 
> The complete backtraces could be found in the attached log.
> 
> Do you have any ideas ?
> Any help will be greatly appreciated.

So by the looks of it while writing dirty bgs and requiring a free block
for the CoW process, cache_block_group() kicks off a caching thread
which should just go and read in the respective block group. So the
newly spawned caching_thread should actually wake up cache_block_group
either due to success, if it manages to find 2megs:
if (total_found > (1024 * 1024 * 2)) {

or in case of failure after the out label. But in both cases it will set
cache->cached to something different than BTRFS_CACHE_FAST and it ought
to exit the loop.


But from your description of the issue I take it the process never comes
back from the schedule, meaning it missed the wakeup from caching_thread
and atm I cannot see how this could happen. Can you print the state of
the 'cache' parameter of cache_block_group ?
> 
> Best regards,
> Angel Shtilianov
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html