On 27.03.19 г. 19:23 ч., David Sterba wrote:
> On Tue, Mar 12, 2019 at 05:20:24PM +0200, Nikolay Borisov wrote:
>> @@ -1190,45 +1201,71 @@ static int cow_file_range_async(struct inode *inode, 
>> struct page *locked_page,
>>                              unsigned int write_flags)
>>  {
>>      struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> -    struct async_cow *async_cow;
>> +    struct async_cow *ctx;
>> +    struct async_chunk *async_chunk;
>>      unsigned long nr_pages;
>>      u64 cur_end;
>> +    u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
>> +    int i;
>> +    bool should_compress;
>>  
>>      clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
>>                       1, 0, NULL);
>> -    while (start < end) {
>> -            async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
>> -            BUG_ON(!async_cow); /* -ENOMEM */
>> +
>> +    if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS &&
>> +        !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
>> +            num_chunks = 1;
>> +            should_compress = false;
>> +    } else {
>> +            should_compress = true;
>> +    }
>> +
>> +    ctx = kmalloc(struct_size(ctx, chunks, num_chunks), GFP_NOFS);
> 
> This leads to OOM due to high order allocation. And this is worse than
> the previous state, where there are many small allocation that could
> potentially fail (but most likely will not due to GFP_NOSF and size <
> PAGE_SIZE).
> 
> So this needs to be reworked to avoid the costly allocations or reverted
> to the previous state.

Right, makes sense. In order to have a simplified submission logic I
think to rework the allocation to have a loop that allocates a single
item for every chunk or alternatively switch to using kvmalloc? I think
the fact that vmalloced memory might not be contiguous is not critical
for the metadata structures in this case?

> 
> btrfs/138               [19:44:05][ 4034.368157] run fstests btrfs/138 at 
> 2019-03-25 19:44:05
> [ 4034.559716] BTRFS: device fsid 9300f07a-78f4-4ac6-8376-1a902ef26830 devid 
> 1 transid 5 /dev/vdb
> [ 4034.573670] BTRFS info (device vdb): disk space caching is enabled
> [ 4034.575068] BTRFS info (device vdb): has skinny extents
> [ 4034.576258] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 4034.580226] BTRFS info (device vdb): checking UUID tree
> [ 4066.104734] BTRFS info (device vdb): disk space caching is enabled
> [ 4066.108558] BTRFS info (device vdb): has skinny extents
> [ 4066.186856] BTRFS info (device vdb): setting 8 feature flag
> [ 4074.017307] BTRFS info (device vdb): disk space caching is enabled
> [ 4074.019646] BTRFS info (device vdb): has skinny extents
> [ 4074.065117] BTRFS info (device vdb): setting 16 feature flag
> [ 4075.787401] kworker/u8:12: page allocation failure: order:4, 
> mode:0x604040(GFP_NOFS|__GFP_COMP), nodemask=(null)
> [ 4075.789581] CPU: 0 PID: 31258 Comm: kworker/u8:12 Not tainted 
> 5.0.0-rc8-default+ #524
> [ 4075.791235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
> [ 4075.793334] Workqueue: writeback wb_workfn (flush-btrfs-718)
> [ 4075.794455] Call Trace:
> [ 4075.795029]  dump_stack+0x67/0x90
> [ 4075.795756]  warn_alloc.cold.131+0x73/0xf3
> [ 4075.796601]  __alloc_pages_slowpath+0xa0e/0xb50
> [ 4075.797595]  ? __wake_up_common_lock+0x89/0xc0
> [ 4075.798558]  __alloc_pages_nodemask+0x2bd/0x310
> [ 4075.799537]  kmalloc_order+0x14/0x60
> [ 4075.800382]  kmalloc_order_trace+0x1d/0x120
> [ 4075.801341]  btrfs_run_delalloc_range+0x3e6/0x4b0 [btrfs]
> [ 4075.802344]  writepage_delalloc+0xf8/0x150 [btrfs]
> [ 4075.802991]  __extent_writepage+0x113/0x420 [btrfs]
> [ 4075.803640]  extent_write_cache_pages+0x2a6/0x400 [btrfs]
> [ 4075.804340]  extent_writepages+0x52/0xa0 [btrfs]
> [ 4075.804951]  do_writepages+0x3e/0xe0
> [ 4075.805480]  ? writeback_sb_inodes+0x133/0x550
> [ 4075.806406]  __writeback_single_inode+0x54/0x640
> [ 4075.807315]  writeback_sb_inodes+0x204/0x550
> [ 4075.808112]  __writeback_inodes_wb+0x5d/0xb0
> [ 4075.808692]  wb_writeback+0x337/0x4a0
> [ 4075.809207]  wb_workfn+0x3a7/0x590
> [ 4075.809849]  process_one_work+0x246/0x610
> [ 4075.810665]  worker_thread+0x3c/0x390
> [ 4075.811415]  ? rescuer_thread+0x360/0x360
> [ 4075.812293]  kthread+0x116/0x130
> [ 4075.812965]  ? kthread_create_on_node+0x60/0x60
> [ 4075.813870]  ret_from_fork+0x24/0x30
> [ 4075.814664] Mem-Info:
> [ 4075.815167] active_anon:2942 inactive_anon:15105 isolated_anon:0
> [ 4075.815167]  active_file:2749 inactive_file:454876 isolated_file:0
> [ 4075.815167]  unevictable:0 dirty:68316 writeback:0 unstable:0
> [ 4075.815167]  slab_reclaimable:5500 slab_unreclaimable:6458
> [ 4075.815167]  mapped:940 shmem:15483 pagetables:51 bounce:0
> [ 4075.815167]  free:7068 free_pcp:297 free_cma:0
> [ 4075.823236] Node 0 active_anon:11768kB inactive_anon:60420kB 
> active_file:10996kB inactive_file:1827676kB unevictable:0kB 
> isolated(anon):0kB isolated(file):0kB mapped:3760kB dirty:277360kB 
> writeback:0kB shmem:61932kB writeback_tmp:0kB unstable:0kB all_unreclaimable? 
> no
> [ 4075.828200] Node 0 DMA free:7860kB min:44kB low:56kB high:68kB 
> active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:8012kB 
> unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB 
> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
> free_cma:0kB
> [ 4075.834484] lowmem_reserve[]: 0 1955 1955 1955
> [ 4075.835419] Node 0 DMA32 free:11292kB min:5632kB low:7632kB high:9632kB 
> active_anon:11768kB inactive_anon:60416kB active_file:10996kB 
> inactive_file:1820532kB unevictable:0kB writepending:281184kB 
> present:2080568kB managed:2009324kB mlocked:0kB kernel_stack:1984kB 
> pagetables:204kB bounce:0kB free_pcp:132kB local_pcp:0kB free_cma:0k 
> [ 4075.841848] lowmem_reserve[]: 0 0 0 0
> [ 4075.842677] Node 0 DMA: 1*4kB (U) 2*8kB (U) 4*16kB (UME) 5*32kB (UME) 
> 1*64kB (E) 3*128kB (UME) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME) 
> 0*4096kB = 7860kB
> [ 4075.844961] Node 0 DMA32: 234*4kB (UME) 238*8kB (UME) 426*16kB (UM) 
> 43*32kB (UM) 28*64kB (UM) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 
> 0*4096kB = 16280kB
> [ 4075.847915] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=2048kB
> [ 4075.849266] 474599 total pagecache pages
> [ 4075.850058] 0 pages in swap cache
> [ 4075.850808] Swap cache stats: add 0, delete 0, find 0/0
> [ 4075.851990] Free swap  = 0kB
> [ 4075.852811] Total swap = 0kB
> [ 4075.853635] 524140 pages RAM
> [ 4075.854351] 0 pages HighMem/MovableOnly
> [ 4075.855048] 17832 pages reserved
> 

Reply via email to