On 27.03.19 г. 19:23 ч., David Sterba wrote:
> On Tue, Mar 12, 2019 at 05:20:24PM +0200, Nikolay Borisov wrote:
>> @@ -1190,45 +1201,71 @@ static int cow_file_range_async(struct inode *inode,
>> struct page *locked_page,
>> unsigned int write_flags)
>> {
>> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> - struct async_cow *async_cow;
>> + struct async_cow *ctx;
>> + struct async_chunk *async_chunk;
>> unsigned long nr_pages;
>> u64 cur_end;
>> + u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
>> + int i;
>> + bool should_compress;
>>
>> clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
>> 1, 0, NULL);
>> - while (start < end) {
>> - async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
>> - BUG_ON(!async_cow); /* -ENOMEM */
>> +
>> + if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS &&
>> + !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
>> + num_chunks = 1;
>> + should_compress = false;
>> + } else {
>> + should_compress = true;
>> + }
>> +
>> + ctx = kmalloc(struct_size(ctx, chunks, num_chunks), GFP_NOFS);
>
> This leads to OOM due to high order allocation. And this is worse than
> the previous state, where there are many small allocation that could
> potentially fail (but most likely will not due to GFP_NOSF and size <
> PAGE_SIZE).
>
> So this needs to be reworked to avoid the costly allocations or reverted
> to the previous state.
Right, makes sense. In order to have a simplified submission logic I
think to rework the allocation to have a loop that allocates a single
item for every chunk or alternatively switch to using kvmalloc? I think
the fact that vmalloced memory might not be contiguous is not critical
for the metadata structures in this case?
>
> btrfs/138 [19:44:05][ 4034.368157] run fstests btrfs/138 at
> 2019-03-25 19:44:05
> [ 4034.559716] BTRFS: device fsid 9300f07a-78f4-4ac6-8376-1a902ef26830 devid
> 1 transid 5 /dev/vdb
> [ 4034.573670] BTRFS info (device vdb): disk space caching is enabled
> [ 4034.575068] BTRFS info (device vdb): has skinny extents
> [ 4034.576258] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 4034.580226] BTRFS info (device vdb): checking UUID tree
> [ 4066.104734] BTRFS info (device vdb): disk space caching is enabled
> [ 4066.108558] BTRFS info (device vdb): has skinny extents
> [ 4066.186856] BTRFS info (device vdb): setting 8 feature flag
> [ 4074.017307] BTRFS info (device vdb): disk space caching is enabled
> [ 4074.019646] BTRFS info (device vdb): has skinny extents
> [ 4074.065117] BTRFS info (device vdb): setting 16 feature flag
> [ 4075.787401] kworker/u8:12: page allocation failure: order:4,
> mode:0x604040(GFP_NOFS|__GFP_COMP), nodemask=(null)
> [ 4075.789581] CPU: 0 PID: 31258 Comm: kworker/u8:12 Not tainted
> 5.0.0-rc8-default+ #524
> [ 4075.791235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
> [ 4075.793334] Workqueue: writeback wb_workfn (flush-btrfs-718)
> [ 4075.794455] Call Trace:
> [ 4075.795029] dump_stack+0x67/0x90
> [ 4075.795756] warn_alloc.cold.131+0x73/0xf3
> [ 4075.796601] __alloc_pages_slowpath+0xa0e/0xb50
> [ 4075.797595] ? __wake_up_common_lock+0x89/0xc0
> [ 4075.798558] __alloc_pages_nodemask+0x2bd/0x310
> [ 4075.799537] kmalloc_order+0x14/0x60
> [ 4075.800382] kmalloc_order_trace+0x1d/0x120
> [ 4075.801341] btrfs_run_delalloc_range+0x3e6/0x4b0 [btrfs]
> [ 4075.802344] writepage_delalloc+0xf8/0x150 [btrfs]
> [ 4075.802991] __extent_writepage+0x113/0x420 [btrfs]
> [ 4075.803640] extent_write_cache_pages+0x2a6/0x400 [btrfs]
> [ 4075.804340] extent_writepages+0x52/0xa0 [btrfs]
> [ 4075.804951] do_writepages+0x3e/0xe0
> [ 4075.805480] ? writeback_sb_inodes+0x133/0x550
> [ 4075.806406] __writeback_single_inode+0x54/0x640
> [ 4075.807315] writeback_sb_inodes+0x204/0x550
> [ 4075.808112] __writeback_inodes_wb+0x5d/0xb0
> [ 4075.808692] wb_writeback+0x337/0x4a0
> [ 4075.809207] wb_workfn+0x3a7/0x590
> [ 4075.809849] process_one_work+0x246/0x610
> [ 4075.810665] worker_thread+0x3c/0x390
> [ 4075.811415] ? rescuer_thread+0x360/0x360
> [ 4075.812293] kthread+0x116/0x130
> [ 4075.812965] ? kthread_create_on_node+0x60/0x60
> [ 4075.813870] ret_from_fork+0x24/0x30
> [ 4075.814664] Mem-Info:
> [ 4075.815167] active_anon:2942 inactive_anon:15105 isolated_anon:0
> [ 4075.815167] active_file:2749 inactive_file:454876 isolated_file:0
> [ 4075.815167] unevictable:0 dirty:68316 writeback:0 unstable:0
> [ 4075.815167] slab_reclaimable:5500 slab_unreclaimable:6458
> [ 4075.815167] mapped:940 shmem:15483 pagetables:51 bounce:0
> [ 4075.815167] free:7068 free_pcp:297 free_cma:0
> [ 4075.823236] Node 0 active_anon:11768kB inactive_anon:60420kB
> active_file:10996kB inactive_file:1827676kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB mapped:3760kB dirty:277360kB
> writeback:0kB shmem:61932kB writeback_tmp:0kB unstable:0kB all_unreclaimable?
> no
> [ 4075.828200] Node 0 DMA free:7860kB min:44kB low:56kB high:68kB
> active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:8012kB
> unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB
> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> free_cma:0kB
> [ 4075.834484] lowmem_reserve[]: 0 1955 1955 1955
> [ 4075.835419] Node 0 DMA32 free:11292kB min:5632kB low:7632kB high:9632kB
> active_anon:11768kB inactive_anon:60416kB active_file:10996kB
> inactive_file:1820532kB unevictable:0kB writepending:281184kB
> present:2080568kB managed:2009324kB mlocked:0kB kernel_stack:1984kB
> pagetables:204kB bounce:0kB free_pcp:132kB local_pcp:0kB free_cma:0k
> [ 4075.841848] lowmem_reserve[]: 0 0 0 0
> [ 4075.842677] Node 0 DMA: 1*4kB (U) 2*8kB (U) 4*16kB (UME) 5*32kB (UME)
> 1*64kB (E) 3*128kB (UME) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME)
> 0*4096kB = 7860kB
> [ 4075.844961] Node 0 DMA32: 234*4kB (UME) 238*8kB (UME) 426*16kB (UM)
> 43*32kB (UM) 28*64kB (UM) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 1*2048kB (H)
> 0*4096kB = 16280kB
> [ 4075.847915] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=2048kB
> [ 4075.849266] 474599 total pagecache pages
> [ 4075.850058] 0 pages in swap cache
> [ 4075.850808] Swap cache stats: add 0, delete 0, find 0/0
> [ 4075.851990] Free swap = 0kB
> [ 4075.852811] Total swap = 0kB
> [ 4075.853635] 524140 pages RAM
> [ 4075.854351] 0 pages HighMem/MovableOnly
> [ 4075.855048] 17832 pages reserved
>