Hello, 

I've been investigating a severe performance regression in generic/027 on 
latest master. Specifically I traced that back to the rm portion of the test. 
rm now takes 4  times longer to delete the same amount of files. After 
bisecting I 
identified the culprit as being 957780eb2788 ("Btrfs: introduce 
ticketed enospc infrastructure"). With this commit applied rm 
would take 65 seconds to delete around 10k files whereas before 
it would take 15 seconds. I guess this is only triggered in the 
case where there isn't much available metadata space since no one 
has complained so far. 

Further analysis showed that the extra time comes from the fact there 
are a lot more transaction commits happening with the ticketed enosp.
 Here are some stats obtained via systemtap instrumentation of flush_space: 

Good: 
btrfs_trans_commit: 1088
FLUSH_DELAYED_ITEMS_NR => 12281                                                 
FLUSH_DELAYED_ITEMS => 6151                                                     
FLUSH_DELALLOC => 6124                                                          
FLUSH_DELALLOC_WAIT => 6124                                                     
ALLOC_CHUNK => 6124                                                             
COMMIT_TRANS => 1087                                                            

BAD: 
btrfs_trans_commit: 4711                              
FLUSH_DELAYED_ITEMS_NR => 9606                                                  
FLUSH_DELAYED_ITEMS => 4709                                                     
FLUSH_DELALLOC => 4709                                                          
FLUSH_DELALLOC_WAIT => 4709                                                     
ALLOC_CHUNK => 4709                                                             
COMMIT_TRANS => 4709

So we do 4 times more transaction commits. I've also traced the 
2 branches in reserve_metadata_bytes (__reserve_metadata_bytes) respectively, 
and here are the numbers: 

Good: 
Failures: 9383 

Bad:
Failures: 1579

Those numbers count how many times have we took the 
if (ret && flush != BTRFS_RESERVE_NO_FLUSH) {

branch in the good case and how many times we executed 
the 'queue_work' function call in the bad case i.e. the 
numbers of time we have to flush_space. Here are the stack 
traces of btrfs_commit_trans: 

Good:
 0xffffffff81387ca0 : btrfs_commit_transaction+0x0/0xb40 [kernel]
 0xffffffff813726eb : flush_space+0x9b/0x4e0 [kernel]
 0xffffffff81372fe4 : reserve_metadata_bytes+0x2f4/0x710 [kernel]
 0xffffffff81373425 : btrfs_block_rsv_add+0x25/0x60 [kernel]
 0xffffffff81388bd3 : start_transaction+0x3f3/0x570 [kernel]
 0xffffffff813891fa : btrfs_start_transaction_fallback_global_rsv+0x2a/0x200 
[kernel]
 0xffffffff81392785 : btrfs_unlink+0x35/0xb0 [kernel]
 0xffffffff811c0e0a : vfs_unlink+0xba/0x1a0 [kernel]
 0xffffffff811c5cdb : do_unlinkat+0x24b/0x2a0 [kernel]
 0xffffffff811c646b : SyS_unlinkat+0x1b/0x30 [kernel]
 0xffffffff816d0b40 : entry_SYSCALL_64_fastpath+0x23/0xc1 [kernel]

Naturally.

Bad: 

 0xffffffff81388b30 : btrfs_commit_transaction+0x0/0xb40 [kernel]
 0xffffffff81372d73 : flush_space+0x93/0x510 [kernel]
 0xffffffff813732f3 : btrfs_async_reclaim_metadata_space+0x103/0x460 [kernel]
 0xffffffff8106d86f : process_one_work+0x1cf/0x660 [kernel]
 0xffffffff8106dd4e : worker_thread+0x4e/0x490 [kernel]
 0xffffffff810745ae : kthread+0xfe/0x120 [kernel]
 0xffffffff816d1c0f : ret_from_fork+0x1f/0x40 [kernel]
 0xffffffff810744b0 : kthread+0x0/0x120 [kernel]

Also no surprises here, since flushing is alway async now. Further breaking 
down which site actually queued the btrfs_async_reclaim_metadata_space 
reveals the following: 

traces[                                                                         
 0xffffffff81373774 : __reserve_metadata_bytes+0x124/0xa80 [kernel]             
 0xffffffff813740f4 : reserve_metadata_bytes+0x24/0x1c0 [kernel]                
 0xffffffff813742b5 : btrfs_block_rsv_add+0x25/0x60 [kernel]                    
 0xffffffff81389a63 : start_transaction+0x3f3/0x570 [kernel]                    
 0xffffffff8138a08a : btrfs_start_transaction_fallback_global_rsv+0x2a/0x200 
[kernel]
 0xffffffff81393615 : btrfs_unlink+0x35/0xb0 [kernel]                           
 0xffffffff811c0e0a : vfs_unlink+0xba/0x1a0 [kernel]                            
 0xffffffff811c5cdb : do_unlinkat+0x24b/0x2a0 [kernel]                          
 0xffffffff811c646b : SyS_unlinkat+0x1b/0x30 [kernel]                           
 0xffffffff816d19c0 : entry_SYSCALL_64_fastpath+0x23/0xc1 [kernel]              
] = 3905   

__reserve_metadata_bytes+0x124/0xa80 is the queue_work from the 
if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) branch
e.g. when we managed to do our reservation but still would like to 
do preemptive space flushing.


                                                                     
traces[                                                                         
 0xffffffff81373eed : __reserve_metadata_bytes+0x89d/0xa80 [kernel]             
 0xffffffff813740f4 : reserve_metadata_bytes+0x24/0x1c0 [kernel]                
 0xffffffff813742b5 : btrfs_block_rsv_add+0x25/0x60 [kernel]                    
 0xffffffff81389a63 : start_transaction+0x3f3/0x570 [kernel]                    
 0xffffffff8138a08a : btrfs_start_transaction_fallback_global_rsv+0x2a/0x200 
[kernel]
 0xffffffff81393615 : btrfs_unlink+0x35/0xb0 [kernel]                           
 0xffffffff811c0e0a : vfs_unlink+0xba/0x1a0 [kernel]                            
 0xffffffff811c5cdb : do_unlinkat+0x24b/0x2a0 [kernel]                          
 0xffffffff811c646b : SyS_unlinkat+0x1b/0x30 [kernel]                           
 0xffffffff816d19c0 : entry_SYSCALL_64_fastpath+0x23/0xc1 [kernel]              
] = 1579

__reserve_metadata_bytes+0x89d/0xa80 this is the queue from the 
if (ret && flush != BTRFS_RESERVE_NO_FLUSH) { branch. E.g. when we 
failed to do our reservation. 

SO with the ticketed ENOSPC we actually have less reservation failures but 
ironically a lot more transaction commits. The end result being slower 
operations.

I'm also sharing two files which have a lot more debugging by printing
the space_info state during failure to do reservations in both the good 
case (pre enospc ticketed work) and the bad (after ticketed enospc) case: 

good: http://sprunge.us/QBSc
bad: http://sprunge.us/PaNI

I'm running out of ideas where this might be happening or how to further 
debug it. Any pointers would be much appreciated. I've had an off-list 
discussion with Josef re. this and one thing he suggested is that we 
might be hitting the ->failfast in use_block_rsv however in both 
cases that code was never triggered. All measurements have been made
only during the rm portion of the test and not the whole "Create files and 
delete them" phase. 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to