Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes
On 2018年06月20日 13:25, Nikolay Borisov wrote: > > > On 15.06.2018 04:35, Qu Wenruo wrote: >> [BUG] >> Under certain KVM load and LTP tests, we are possible to hit the >> following calltrace if quota is enabled: >> -- >> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 >> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 >> [ cut here ] >> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 >> blk_status_to_errno+0x1a/0x30 >> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 >> (unreleased) >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.0.0-prebuilt.qemu-project.org 04/01/2014 >> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] >> task: 9f827b340bc0 task.stack: b4f8c0304000 >> RIP: 0010:blk_status_to_errno+0x1a/0x30 >> Call Trace: >> submit_extent_page+0x191/0x270 [btrfs] >> ? btrfs_create_repair_bio+0x130/0x130 [btrfs] >> __do_readpage+0x2d2/0x810 [btrfs] >> ? btrfs_create_repair_bio+0x130/0x130 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> __extent_read_full_page+0xe7/0x100 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] >> read_tree_block+0x31/0x60 [btrfs] >> read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] >> btrfs_search_slot+0x46b/0xa00 [btrfs] >> ? kmem_cache_alloc+0x1a8/0x510 >> ? btrfs_get_token_32+0x5b/0x120 [btrfs] >> find_parent_nodes+0x11d/0xeb0 [btrfs] >> ? leaf_space_used+0xb8/0xd0 [btrfs] >> ? btrfs_leaf_free_space+0x49/0x90 [btrfs] >> ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] >> btrfs_find_all_roots_safe+0x93/0x100 [btrfs] >> btrfs_find_all_roots+0x45/0x60 [btrfs] >> btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] >> btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] >> btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] >> insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] >> btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] >> ? pick_next_task_fair+0x2cd/0x530 >> ? __switch_to+0x92/0x4b0 >> btrfs_worker_helper+0x81/0x300 [btrfs] >> process_one_work+0x1da/0x3f0 >> worker_thread+0x2b/0x3f0 >> ? process_one_work+0x3f0/0x3f0 >> kthread+0x11a/0x130 >> ? kthread_create_on_node+0x40/0x40 >> ret_from_fork+0x35/0x40 >> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 >> 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff >> ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 >> ---[ end trace f079fb809e7a862b ]--- >> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 >> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO >> failure >> BTRFS info (device vda2): forced readonly >> BTRFS error (device vda2): pending csums is 2887680 >> -- >> >> [CAUSE] >> It's caused by race with block group auto removal like the following >> case: >> - There is a meta block group X, which has only one tree block >> The tree block belongs to fs tree 257. >> - In current transaction, some operation modified fs tree 257 >> The tree block get CoWed, so the block group X is empty, and marked as >> unused, queued to be deleted. >> - Some workload (like fsync) wakes up cleaner_kthread() >> Which will call btrfs_deleted_unused_bgs() to remove unused block >> groups. >> So block group X along its chunk map get removed. >> - Some delalloc work finished for fs tree 257 >> Quota needs to get the original reference of the extent, which will >> reads tree blocks of commit root of 257. >> Then since the chunk map get removed, above warning get triggered. >> >> [FIX] >> Just teach btrfs_delete_unused_bgs() to skip block group who still has >> pinned bytes. >> >> However there is a minor side effect, since currently we only queue >> empty blocks at update_block_group(), and such empty block group with >> pinned bytes won't go through update_block_group() again, such block >> group won't be removed, until it get new extent allocated and removed. >> >> But please note that, there are more problems related to extent >> allocator with block group auto removal. >> >> Even a block group is marked unused, extent allocator can still allocate >> new extents from unused block group. >> Thus delaying block group to next transaction won't work. >> (Extents get allocated in current transaction, and removed again in next >> transaction). > > Isn't this easily solvable by making the allocator check whether the > block group it has chosen is part of the unused_bgs_list ? That's also my initial idea, however bg_cache->bg_list can also be used in trans->new_bgs. Another factor is, even we check the block group of an extent in find_free_extent() and skip the allocation, we can lead to more frequent false ENOSPC. The correct way to do it is to allow find_free_extent() to skip block group which
Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes
On 15.06.2018 04:35, Qu Wenruo wrote: > [BUG] > Under certain KVM load and LTP tests, we are possible to hit the > following calltrace if quota is enabled: > -- > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > [ cut here ] > WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 > blk_status_to_errno+0x1a/0x30 > CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 > (unreleased) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > task: 9f827b340bc0 task.stack: b4f8c0304000 > RIP: 0010:blk_status_to_errno+0x1a/0x30 > Call Trace: > submit_extent_page+0x191/0x270 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > __do_readpage+0x2d2/0x810 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > __extent_read_full_page+0xe7/0x100 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] > read_tree_block+0x31/0x60 [btrfs] > read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] > btrfs_search_slot+0x46b/0xa00 [btrfs] > ? kmem_cache_alloc+0x1a8/0x510 > ? btrfs_get_token_32+0x5b/0x120 [btrfs] > find_parent_nodes+0x11d/0xeb0 [btrfs] > ? leaf_space_used+0xb8/0xd0 [btrfs] > ? btrfs_leaf_free_space+0x49/0x90 [btrfs] > ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots+0x45/0x60 [btrfs] > btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] > btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] > btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] > insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] > btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] > ? pick_next_task_fair+0x2cd/0x530 > ? __switch_to+0x92/0x4b0 > btrfs_worker_helper+0x81/0x300 [btrfs] > process_one_work+0x1da/0x3f0 > worker_thread+0x2b/0x3f0 > ? process_one_work+0x3f0/0x3f0 > kthread+0x11a/0x130 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 > ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff > ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 > ---[ end trace f079fb809e7a862b ]--- > BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 > BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO > failure > BTRFS info (device vda2): forced readonly > BTRFS error (device vda2): pending csums is 2887680 > -- > > [CAUSE] > It's caused by race with block group auto removal like the following > case: > - There is a meta block group X, which has only one tree block > The tree block belongs to fs tree 257. > - In current transaction, some operation modified fs tree 257 > The tree block get CoWed, so the block group X is empty, and marked as > unused, queued to be deleted. > - Some workload (like fsync) wakes up cleaner_kthread() > Which will call btrfs_deleted_unused_bgs() to remove unused block > groups. > So block group X along its chunk map get removed. > - Some delalloc work finished for fs tree 257 > Quota needs to get the original reference of the extent, which will > reads tree blocks of commit root of 257. > Then since the chunk map get removed, above warning get triggered. > > [FIX] > Just teach btrfs_delete_unused_bgs() to skip block group who still has > pinned bytes. > > However there is a minor side effect, since currently we only queue > empty blocks at update_block_group(), and such empty block group with > pinned bytes won't go through update_block_group() again, such block > group won't be removed, until it get new extent allocated and removed. > > But please note that, there are more problems related to extent > allocator with block group auto removal. > > Even a block group is marked unused, extent allocator can still allocate > new extents from unused block group. > Thus delaying block group to next transaction won't work. > (Extents get allocated in current transaction, and removed again in next > transaction). Isn't this easily solvable by making the allocator check whether the block group it has chosen is part of the unused_bgs_list ? > > So the root fix need to co-operate with extent allocator. > But current fix should be enough for this particular bug. > > Signed-off-by: Qu Wenruo LGTM, Reviewed-by: Nikolay Borisov > --- > changelog: > v2: > Commit message update, to better indicate how pinned byte is used in > btrfs and why it's related to quota. > v3: > Commit message update, further explaining the bug with an example. > And added the side effect of the fix, and possible further fix. >
Re: [PATCH 2/2] btrfs-progs: misc-tests: Fix 029 test cases for sudo test environment
On 20.06.2018 03:38, Qu Wenruo wrote: > Test misc/029 only works if the test case is executed as root, while for > sudo usage, it doesn't work as initial mkdir and final cleanup doesn't > use $SUDO_HELPER. > > Add "run_check $SUDO_HELPER" for such cases to allow it works under sudo > usage. > > Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov Although this seems unrelated to the first patch. I'd advise in this case to send them as separate patches or if you are going to batch up multiple fixes add a cover letter saying "just a bunch of random fixes to xxx" :) > --- > tests/misc-tests/029-send-p-different-mountpoints/test.sh | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tests/misc-tests/029-send-p-different-mountpoints/test.sh > b/tests/misc-tests/029-send-p-different-mountpoints/test.sh > index 0b42b772e3d7..90465e1d08fd 100755 > --- a/tests/misc-tests/029-send-p-different-mountpoints/test.sh > +++ b/tests/misc-tests/029-send-p-different-mountpoints/test.sh > @@ -14,7 +14,7 @@ prepare_test_dev > SUBVOL_MNT="$TEST_MNT/subvol" > TOPLEVEL_MNT="$TEST_MNT/toplevel" > TEST_MNT="$TOPLEVEL_MNT" > -mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT" > +run_check $SUDO_HELPER mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT" > > run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV" > run_check_mount_test_dev > @@ -47,5 +47,5 @@ run_mustfail_stdout "send -p on 2 mount points" \ > run_check_umount_test_dev "$SUBVOL_MNT" > run_check_umount_test_dev "$TOPLEVEL_MNT" > > -rmdir "$SUBVOL_MNT" > -rmdir "$TOPLEVEL_MNT" > +run_check $SUDO_HELPER rmdir "$SUBVOL_MNT" > +run_check $SUDO_HELPER rmdir "$TOPLEVEL_MNT" > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes
Gentle ping. Since it's causing LTP tests failure, I'd like to backport it asap. Thanks, Qu On 2018年06月15日 09:35, Qu Wenruo wrote: > [BUG] > Under certain KVM load and LTP tests, we are possible to hit the > following calltrace if quota is enabled: > -- > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > [ cut here ] > WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 > blk_status_to_errno+0x1a/0x30 > CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 > (unreleased) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > task: 9f827b340bc0 task.stack: b4f8c0304000 > RIP: 0010:blk_status_to_errno+0x1a/0x30 > Call Trace: > submit_extent_page+0x191/0x270 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > __do_readpage+0x2d2/0x810 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > __extent_read_full_page+0xe7/0x100 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] > read_tree_block+0x31/0x60 [btrfs] > read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] > btrfs_search_slot+0x46b/0xa00 [btrfs] > ? kmem_cache_alloc+0x1a8/0x510 > ? btrfs_get_token_32+0x5b/0x120 [btrfs] > find_parent_nodes+0x11d/0xeb0 [btrfs] > ? leaf_space_used+0xb8/0xd0 [btrfs] > ? btrfs_leaf_free_space+0x49/0x90 [btrfs] > ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots+0x45/0x60 [btrfs] > btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] > btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] > btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] > insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] > btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] > ? pick_next_task_fair+0x2cd/0x530 > ? __switch_to+0x92/0x4b0 > btrfs_worker_helper+0x81/0x300 [btrfs] > process_one_work+0x1da/0x3f0 > worker_thread+0x2b/0x3f0 > ? process_one_work+0x3f0/0x3f0 > kthread+0x11a/0x130 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 > ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff > ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 > ---[ end trace f079fb809e7a862b ]--- > BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 > BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO > failure > BTRFS info (device vda2): forced readonly > BTRFS error (device vda2): pending csums is 2887680 > -- > > [CAUSE] > It's caused by race with block group auto removal like the following > case: > - There is a meta block group X, which has only one tree block > The tree block belongs to fs tree 257. > - In current transaction, some operation modified fs tree 257 > The tree block get CoWed, so the block group X is empty, and marked as > unused, queued to be deleted. > - Some workload (like fsync) wakes up cleaner_kthread() > Which will call btrfs_deleted_unused_bgs() to remove unused block > groups. > So block group X along its chunk map get removed. > - Some delalloc work finished for fs tree 257 > Quota needs to get the original reference of the extent, which will > reads tree blocks of commit root of 257. > Then since the chunk map get removed, above warning get triggered. > > [FIX] > Just teach btrfs_delete_unused_bgs() to skip block group who still has > pinned bytes. > > However there is a minor side effect, since currently we only queue > empty blocks at update_block_group(), and such empty block group with > pinned bytes won't go through update_block_group() again, such block > group won't be removed, until it get new extent allocated and removed. > > But please note that, there are more problems related to extent > allocator with block group auto removal. > > Even a block group is marked unused, extent allocator can still allocate > new extents from unused block group. > Thus delaying block group to next transaction won't work. > (Extents get allocated in current transaction, and removed again in next > transaction). > > So the root fix need to co-operate with extent allocator. > But current fix should be enough for this particular bug. > > Signed-off-by: Qu Wenruo > --- > changelog: > v2: > Commit message update, to better indicate how pinned byte is used in > btrfs and why it's related to quota. > v3: > Commit message update, further explaining the bug with an example. > And added the side effect of the fix, and possible further fix. > --- > fs/btrfs/extent-tree.c | 2 +- > 1 file changed, 1 insertion(+), 1
Re: [LKP] [lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression
Ping * 3 "Huang, Ying" writes: > Ping again ... > > "Huang, Ying" writes: > >> Ping... >> >> "Huang, Ying" writes: >> >>> Hi, Josef, >>> >>> Do you have time to take a look at the regression? >>> >>> kernel test robot writes: >>> Greeting, FYI, we noticed a -12.3% regression of blogbench.write_score and a +9.6% improvement of blogbench.read_score due to commit: commit: 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use sc->priority for slab shrink targets") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: blogbench on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory with following parameters: disk: 1SSD fs: btrfs cpufreq_governor: performance test-description: Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. test-url: https://www.pureftpd.org/project/blogbench Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/testcase: gcc-7/performance/1SSD/btrfs/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-bdw-de1/blogbench commit: fcb2b0c577 ("mm: show total hugetlb memory consumption in /proc/meminfo") 9092c71bb7 ("mm: use sc->priority for slab shrink targets") fcb2b0c577f145c7 9092c71bb724dba2ecba849eae -- %stddev %change %stddev \ |\ 3256 -12.3% 2854blogbench.write_score 1235237 2% +9.6%1354163blogbench.read_score 28050912 -10.1% 25212230 blogbench.time.file_system_outputs 6481995 3% +25.0%8105320 2% blogbench.time.involuntary_context_switches 906.00 +13.7% 1030 blogbench.time.percent_of_cpu_this_job_got 2552 +14.0% 2908blogbench.time.system_time 173.80+8.4% 188.32blogbench.time.user_time 19353936+3.6% 20045728 blogbench.time.voluntary_context_switches 8719514 +13.0%9850451softirqs.RCU 2.97 5% -0.72.30 3% mpstat.cpu.idle% 24.92-6.5 18.46mpstat.cpu.iowait% 0.65 2% +0.10.75mpstat.cpu.soft% 67.76+6.7 74.45mpstat.cpu.sys% 50206 -10.7% 44858vmstat.io.bo 49.25-9.1% 44.75 2% vmstat.procs.b 224125-1.8% 220135vmstat.system.cs 48903 +10.7% 54134vmstat.system.in 3460654 +10.8%3834883meminfo.Active 3380666 +11.0%3752872meminfo.Active(file) 1853849 -17.4%1530415meminfo.Inactive 1836507 -17.6%1513054meminfo.Inactive(file) 551311 -10.3% 494265meminfo.SReclaimable 196525 -12.6% 171775meminfo.SUnreclaim 747837 -10.9% 666040meminfo.Slab 8.904e+08 -24.9% 6.683e+08cpuidle.C1.time 22971020 -12.8% 20035820cpuidle.C1.usage 2.518e+08 3% -31.7% 1.72e+08cpuidle.C1E.time 821393 2% -33.3% 548003cpuidle.C1E.usage 75460078 2% -23.3% 57903768 2% cpuidle.C3.time 136506 3% -25.3% 101956 3% cpuidle.C3.usage 56892498 4% -23.3% 43608427 4% cpuidle.C6.time 85034 3% -33.9% 56184 3% cpuidle.C6.usage 24373567 -24.5% 18395538cpuidle.POLL.time 449033 2% -10.8% 400493cpuidle.POLL.usage 1832+9.3% 2002turbostat.Avg_MHz 22967645 -12.8% 20032521turbostat.C1 18.43-4.6 13.85turbostat.C1% 821328 2% -33.3% 547948turbostat.C1E 5.21 3% -1.63.56turbostat.C1E% 136377 3% -25.3% 101823 3% turbostat.C3 1.56 2% -0.41.20 3% turbostat.C3% 84404 3% -34.0%
Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents
fdman...@kernel.org 於 2018-06-19 19:31 寫到: From: Filipe Manana Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero") introduced a regression where we no longer report 0 as the physical offset for inline extents. This is because it always sets the variable used to report the physical offset ("disko") as em->block_start plus some offset, and em->block_start has the value 18446744073709551614 ((u64) -2) for inline extents. This made the btrfs test 004 (from fstests) often fail, for example, for a file with an inline extent we have the following items in the subvolume tree: item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160 generation 25 transid 38 size 1525 nbytes 1525 block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x2(none) atime 1529342058.461891730 (2018-06-18 18:14:18) ctime 1529342058.461891730 (2018-06-18 18:14:18) mtime 1529342058.461891730 (2018-06-18 18:14:18) otime 1529342055.869892885 (2018-06-18 18:14:15) item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13 index 25 namelen 3 name: fc7 item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546 generation 38 type 0 (inline) inline extent data size 1525 ram_bytes 1525 compression 0 (none) Then when test 004 invoked fiemap against the file it got a non-zero physical offset: $ filefrag -v /mnt/p0/d4/d7/fc7 Filesystem type is: 9123683e File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..4095: 18446744073709551614.. 4093: 4096: last,not_aligned,inline,eof /mnt/p0/d4/d7/fc7: 1 extent found This resulted in the test failing like this: btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad) --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad 2018-06-18 18:15:02.385872155 +0100 @@ -1,3 +1,10 @@ QA output created by 004 *** test backref walking -*** done +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected +ERROR: 7.55578637259143e+22 is not a valid numeric value. +unexpected output from + /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1 ... (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad' to see the entire diff) Ran: btrfs/004 The large number in scientific notation reported as an invalid numeric value is the result from the filter passed to perl which multiplies the physical offset by the block size reported by fiemap. So fix this by ensuring the physical offset is always set to 0 when we are processing an inline extent. Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero") Signed-off-by: Filipe Manana --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8e4a7cdbc9f5..978327d98fc5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, end = 1; flags |= FIEMAP_EXTENT_LAST; } else if (em->block_start == EXTENT_MAP_INLINE) { + disko = 0; flags |= (FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED); } else if (em->block_start == EXTENT_MAP_DELALLOC) { EXTENT_MAP_DELALLOC should have the same problem. em->block_start has some special values. The following values should not be considered disko #define EXTENT_MAP_LAST_BYTE((u64)-4) #define EXTENT_MAP_HOLE((u64)-3) #define EXTENT_MAP_INLINE((u64)-2) #define EXTENT_MAP_DELALLOC((u64)-1) Is the following change more suitable? if (em->block_start >= EXTENT_MAP_LAST_BYTE) disko = 0; else disko = em->block_start + offset_in_extent; Thanks. Robbie Ko -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem vs original
On 06/20/2018 10:19 AM, Chris Murphy wrote: I've retested btrfs-progs 4.17 lowmem mode and still get a bunch of 'referencer count mismatch' that I don't see with original mode. My bad. The bug is existed for long time since I changed lowmem mode to traverse all trees. Due to laziness then time passed, I have lost images you provided and links are invalid. It will be so nice of you if a image can be provided again. I will look for causes immediately. Thanks for the reports. e.g. ERROR: extent[1299275636736, 2654208] referencer count mismatch (root: 2192, owner: 317900, offset: 0) wanted: 5, have: 21 Complete output attached as a text file, since the MUA will mess up the line wrap. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem vs original
I've retested btrfs-progs 4.17 lowmem mode and still get a bunch of 'referencer count mismatch' that I don't see with original mode. e.g. ERROR: extent[1299275636736, 2654208] referencer count mismatch (root: 2192, owner: 317900, offset: 0) wanted: 5, have: 21 Complete output attached as a text file, since the MUA will mess up the line wrap. -- Chris Murphy [chris@f28s ~]$ sudo ~/Applications/btrfs-progs/btrfs --version btrfs-progs v4.17 [chris@f28s ~]$ sudo ~/Applications/btrfs-progs/btrfs check --mode=lowmem /dev/mapper/first Checking filesystem on /dev/mapper/first UUID: 0f43f49d-6e63-4b1b-bc8c-c54da409872d checking extents ERROR: extent[1258793181184, 81903616] referencer count mismatch (root: 2192, owner: 323075, offset: 0) wanted: 551, have: 625 ERROR: extent[1258899460096, 82878464] referencer count mismatch (root: 2192, owner: 323115, offset: 0) wanted: 521, have: 633 ERROR: extent[1259036663808, 64319488] referencer count mismatch (root: 2192, owner: 323125, offset: 0) wanted: 376, have: 491 ERROR: extent[1259102384128, 64552960] referencer count mismatch (root: 2192, owner: 323137, offset: 0) wanted: 439, have: 493 ERROR: extent[1259171381248, 113295360] referencer count mismatch (root: 2192, owner: 323147, offset: 0) wanted: 763, have: 865 ERROR: extent[1259307835392, 2576384] referencer count mismatch (root: 2192, owner: 323191, offset: 0) wanted: 9, have: 20 ERROR: extent[1259327455232, 30089216] referencer count mismatch (root: 2192, owner: 323203, offset: 0) wanted: 204, have: 230 ERROR: extent[1259372060672, 1974272] referencer count mismatch (root: 2192, owner: 323218, offset: 0) wanted: 13, have: 16 ERROR: extent[1259924090880, 22102016] referencer count mismatch (root: 2192, owner: 323199, offset: 427819008) wanted: 52, have: 169 ERROR: extent[1259960025088, 2211840] referencer count mismatch (root: 2192, owner: 323244, offset: 0) wanted: 11, have: 17 ERROR: extent[1260158959616, 53563392] referencer count mismatch (root: 2192, owner: 323328, offset: 0) wanted: 328, have: 409 ERROR: extent[1260258131968, 134217728] referencer count mismatch (root: 2192, owner: 323609, offset: 0) wanted: 959, have: 1024 ERROR: extent[1260410531840, 1716224] referencer count mismatch (root: 2192, owner: 323625, offset: 0) wanted: 4, have: 14 ERROR: extent[1261505388544, 1839104] referencer count mismatch (root: 2192, owner: 323654, offset: 0) wanted: 10, have: 15 ERROR: extent[1261846781952, 56676352] referencer count mismatch (root: 2192, owner: 323731, offset: 0) wanted: 354, have: 433 ERROR: extent[1262246526976, 2256896] referencer count mismatch (root: 2192, owner: 323750, offset: 0) wanted: 10, have: 18 ERROR: extent[1262765875200, 765952] referencer count mismatch (root: 2192, owner: 319045, offset: 0) wanted: 5, have: 6 ERROR: extent[1262984384512, 1642496] referencer count mismatch (root: 2192, owner: 322213, offset: 0) wanted: 6, have: 13 ERROR: extent[1263573057536, 1159168] referencer count mismatch (root: 2192, owner: 323009, offset: 0) wanted: 2, have: 9 ERROR: extent[1263615098880, 1220608] referencer count mismatch (root: 2192, owner: 323062, offset: 0) wanted: 1, have: 10 ERROR: extent[1263647588352, 2310144] referencer count mismatch (root: 2192, owner: 330026, offset: 0) wanted: 6, have: 18 ERROR: extent[1263665254400, 3538944] referencer count mismatch (root: 2192, owner: 330035, offset: 0) wanted: 16, have: 27 ERROR: extent[1263695650816, 1773568] referencer count mismatch (root: 2192, owner: 330054, offset: 0) wanted: 8, have: 14 ERROR: extent[1263894016000, 1536000] referencer count mismatch (root: 2192, owner: 319139, offset: 0) wanted: 9, have: 12 ERROR: extent[1263904493568, 1122304] referencer count mismatch (root: 2192, owner: 319151, offset: 0) wanted: 3, have: 9 ERROR: extent[1263915298816, 540672] referencer count mismatch (root: 2192, owner: 319176, offset: 0) wanted: 1, have: 5 ERROR: extent[1264166404096, 679936] referencer count mismatch (root: 2192, owner: 319672, offset: 0) wanted: 3, have: 6 ERROR: extent[1264205729792, 2166784] referencer count mismatch (root: 2192, owner: 319706, offset: 0) wanted: 13, have: 17 ERROR: extent[1264443080704, 327680] referencer count mismatch (root: 2192, owner: 319874, offset: 0) wanted: 2, have: 3 ERROR: extent[1264474132480, 1277952] referencer count mismatch (root: 2192, owner: 319926, offset: 0) wanted: 6, have: 10 ERROR: extent[1264494592000, 1622016] referencer count mismatch (root: 2192, owner: 319943, offset: 0) wanted: 12, have: 13 ERROR: extent[1264528855040, 2527232] referencer count mismatch (root: 2192, owner: 319965, offset: 0) wanted: 16, have: 20 ERROR: extent[1264556707840, 901120] referencer count mismatch (root: 2192, owner: 319974, offset: 0) wanted: 3, have: 7 ERROR: extent[1265866297344, 651264] referencer count mismatch (root: 2192, owner: 320062, offset: 0) wanted: 2, have: 5 ERROR: extent[1266110365696, 929792] referencer count mismatch (root: 2192, owner: 320074,
Re: [PATCH] btrfs-progs: Check factor out root parsing from check_chunks_and_extents
On 06/19/2018 01:34 PM, Nikolay Borisov wrote: On 19.06.2018 05:18, Su Yue wrote: On 06/18/2018 07:10 PM, Nikolay Borisov wrote: check_chunks_and_extents does quite a number of distinct things. The first of those is going through all root items in the root tree and classify every root depending on whether they have a dropping operation in progress or not. Lets factor out this code and move the variables specific to this in a separate function so clean up check_chunks_and_extents a bit. Accidentally, this patch fixes some reference leaks since in error conditions in the loop the code does "goto out" but at that label we don't really release the path. Having this code extracted in a separate function which always releases the path avoids this problem entirely. Signed-off-by: Nikolay Borisov Code looks okay. One nitpick belows. --- check/main.c | 141 +++ 1 file changed, 85 insertions(+), 56 deletions(-) diff --git a/check/main.c b/check/main.c index a4d6855dccbf..fb5c86df21c9 100644 --- a/check/main.c +++ b/check/main.c @@ -8069,6 +8069,89 @@ static int deal_root_from_list(struct list_head *list, return ret; } +/** + * parse_tree_roots - Go over all roots in the tree root and add each one to + * a list. + * + * @fs_info - pointer to fs_info struct of the file system. + * + * @normal_trees - list to contains all roots which don't have a drop + * operation in progress + * + * @dropping_trees - list containing all roots which have a drop operation + * pending + * + * Returns 0 on success or a negative value indicating an error. + * + */ +static int parse_tree_roots(struct btrfs_fs_info *fs_info, + struct list_head *normal_trees, + struct list_head *dropping_trees) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root_item ri; + struct extent_buffer *leaf; + int slot; + int ret = 0; + + btrfs_init_path(); + key.offset = 0; + key.objectid = 0; + key.type = BTRFS_ROOT_ITEM_KEY; + ret = btrfs_search_slot(NULL, fs_info->tree_root, , , 0, 0); + if (ret < 0) + goto out; + while (1) { + leaf = path.nodes[0]; + slot = path.slots[0]; + if (slot >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(fs_info->tree_root, ); + if (ret != 0) + break; + leaf = path.nodes[0]; + slot = path.slots[0]; + } Nit: I know the segment is just moved. The @slot is unused. It's used in the if (slot >= btrfs_header_nritems(path.nodes[0])) check. I guess the definition could be moved inside the while to reduce the scope but it's a minor thing. Since it's so small. (Annoying duplicate mails blame to my personal gmail.) So, Reviewed-by: Su Yue + btrfs_item_key_to_cpu(leaf, _key, path.slots[0]); + if (found_key.type == BTRFS_ROOT_ITEM_KEY) { + unsigned long offset; + u64 last_snapshot; + u8 level; + + offset = btrfs_item_ptr_offset(leaf, path.slots[0]); + read_extent_buffer(leaf, , offset, sizeof(ri)); + last_snapshot = btrfs_root_last_snapshot(); + level = btrfs_root_level(); + if (btrfs_disk_key_objectid(_progress) == 0) { + ret = add_root_item_to_list(normal_trees, + found_key.objectid, + btrfs_root_bytenr(), + last_snapshot, level, + 0, NULL); + if (ret < 0) + break; + } else { + u64 objectid = found_key.objectid; + btrfs_disk_key_to_cpu(_key, + _progress); + ret = add_root_item_to_list(dropping_trees, + objectid, + btrfs_root_bytenr(), + last_snapshot, level, + ri.drop_level, _key); + if (ret < 0) + break; + } + } + path.slots[0]++; + } + +out: + btrfs_release_path(); + return ret; +} + static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) { struct rb_root dev_cache; @@ -8082,20 +8165,13 @@ static int check_chunks_and_extents(struct btrfs_fs_info *fs_info) struct cache_tree nodes; struct extent_io_tree excluded_extents; struct cache_tree corrupt_blocks; - struct btrfs_path path; - struct btrfs_key key; - struct btrfs_key found_key; int ret, err = 0; struct block_info *bits; int bits_nr; - struct extent_buffer *leaf; - int slot; - struct btrfs_root_item ri; struct list_head dropping_trees; struct list_head normal_trees; struct btrfs_root *root1; struct btrfs_root *root; - u64
[PATCH v2 1/2] btrfs-progs: Fix wrong optind re-initialization to allow mixed option and non-option
In function handle_global_options(), we reset @optind to 1. However according to man page of getopt(3) NOTES section, if we need to rescan options later, @optind should be reset to 0 to initialize the internal variables correctly. This explains the reason why in cmd_check(), getopt_long() doesn't handle the following command correctly: "btrfs check /dev/data/btrfs --check-data-csum" While mkfs.btrfs handles mixed non-option and option correctly: "mkfs.btrfs -f /dev/data/disk1 --data raid1 /dev/data/disk2" Cc: Paul Jones Cc: Hugo Mills Fixes: 010ceab56e06 ("btrfs-progs: rework option parser to use getopt for global options") Signed-off-by: Qu Wenruo --- changelog: v2: Instead of resetting @optind at handle_global_options(), reset @optind before each later getopt() call. Since there are cases uses @optind and expects it to be starting from 1. --- cmds-balance.c| 2 ++ cmds-device.c | 3 +++ cmds-fi-du.c | 1 + cmds-fi-usage.c | 1 + cmds-filesystem.c | 2 ++ cmds-inspect-dump-tree.c | 1 + cmds-inspect-tree-stats.c | 1 + cmds-inspect.c| 3 +++ cmds-qgroup.c | 3 +++ cmds-quota.c | 1 + cmds-receive.c| 1 + cmds-replace.c| 3 +++ cmds-rescue.c | 2 ++ cmds-restore.c| 1 + cmds-send.c | 1 + cmds-subvolume.c | 6 ++ 16 files changed, 32 insertions(+) diff --git a/cmds-balance.c b/cmds-balance.c index 0c91bdf13ada..6cc26c358f95 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -528,6 +528,7 @@ static int cmd_balance_start(int argc, char **argv) memset(, 0, sizeof(args)); + optind = 0; while (1) { enum { GETOPT_VAL_FULL_BALANCE = 256, GETOPT_VAL_BACKGROUND = 257 }; @@ -831,6 +832,7 @@ static int cmd_balance_status(int argc, char **argv) int verbose = 0; int ret; + optind = 0; while (1) { int opt; static const struct option longopts[] = { diff --git a/cmds-device.c b/cmds-device.c index 86459d1b9564..cd689b5017c8 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -57,6 +57,7 @@ static int cmd_device_add(int argc, char **argv) int force = 0; int last_dev; + optind = 0; while (1) { int c; static const struct option long_options[] = { @@ -267,6 +268,7 @@ static int cmd_device_scan(int argc, char **argv) int all = 0; int ret = 0; + optind = 0; while (1) { int c; static const struct option long_options[] = { @@ -403,6 +405,7 @@ static int cmd_device_stats(int argc, char **argv) __u64 flags = 0; DIR *dirstream = NULL; + optind = 0; while (1) { int c; static const struct option long_options[] = { diff --git a/cmds-fi-du.c b/cmds-fi-du.c index 7e6bb7f6a570..4e639f6dc231 100644 --- a/cmds-fi-du.c +++ b/cmds-fi-du.c @@ -565,6 +565,7 @@ int cmd_filesystem_du(int argc, char **argv) unit_mode = get_unit_mode_from_arg(, argv, 1); + optind = 0; while (1) { static const struct option long_options[] = { { "summarize", no_argument, NULL, 's'}, diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index de7ad668ba2d..cbb3cd191b3d 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -975,6 +975,7 @@ int cmd_filesystem_usage(int argc, char **argv) unit_mode = get_unit_mode_from_arg(, argv, 1); + optind = 0; while (1) { int c; diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 30a50bf55e38..06c8311bdfe3 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -685,6 +685,7 @@ static int cmd_filesystem_show(int argc, char **argv) unit_mode = get_unit_mode_from_arg(, argv, 0); + optind = 0; while (1) { int c; static const struct option long_options[] = { @@ -924,6 +925,7 @@ static int cmd_filesystem_defrag(int argc, char **argv) defrag_global_errors = 0; defrag_global_verbose = 0; defrag_global_errors = 0; + optind = 0; while(1) { int c = getopt(argc, argv, "vrc::fs:l:t:"); if (c < 0) diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c index 92a2a45b267e..c8acd55a0c3a 100644 --- a/cmds-inspect-dump-tree.c +++ b/cmds-inspect-dump-tree.c @@ -235,6 +235,7 @@ int cmd_inspect_dump_tree(int argc, char **argv) * tree blocks as possible. */ open_ctree_flags = OPEN_CTREE_PARTIAL | OPEN_CTREE_NO_BLOCK_GROUPS; + optind = 0; while (1) { int c; enum { GETOPT_VAL_FOLLOW = 256 }; diff --git a/cmds-inspect-tree-stats.c b/cmds-inspect-tree-stats.c index eced0db9f840..dfa34c52ff5f 100644 --- a/cmds-inspect-tree-stats.c +++
[PATCH 2/2] btrfs-progs: misc-tests: Fix 029 test cases for sudo test environment
Test misc/029 only works if the test case is executed as root, while for sudo usage, it doesn't work as initial mkdir and final cleanup doesn't use $SUDO_HELPER. Add "run_check $SUDO_HELPER" for such cases to allow it works under sudo usage. Signed-off-by: Qu Wenruo --- tests/misc-tests/029-send-p-different-mountpoints/test.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/misc-tests/029-send-p-different-mountpoints/test.sh b/tests/misc-tests/029-send-p-different-mountpoints/test.sh index 0b42b772e3d7..90465e1d08fd 100755 --- a/tests/misc-tests/029-send-p-different-mountpoints/test.sh +++ b/tests/misc-tests/029-send-p-different-mountpoints/test.sh @@ -14,7 +14,7 @@ prepare_test_dev SUBVOL_MNT="$TEST_MNT/subvol" TOPLEVEL_MNT="$TEST_MNT/toplevel" TEST_MNT="$TOPLEVEL_MNT" -mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT" +run_check $SUDO_HELPER mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT" run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV" run_check_mount_test_dev @@ -47,5 +47,5 @@ run_mustfail_stdout "send -p on 2 mount points" \ run_check_umount_test_dev "$SUBVOL_MNT" run_check_umount_test_dev "$TOPLEVEL_MNT" -rmdir "$SUBVOL_MNT" -rmdir "$TOPLEVEL_MNT" +run_check $SUDO_HELPER rmdir "$SUBVOL_MNT" +run_check $SUDO_HELPER rmdir "$TOPLEVEL_MNT" -- 2.17.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About more loose parameter sequence requirement
On 06/18/2018 03:05 PM, Qu Wenruo wrote: > > > On 2018年06月18日 20:02, David Sterba wrote: >> On Mon, Jun 18, 2018 at 07:40:44PM +0800, Qu Wenruo wrote: >>> >>> >>> On 2018年06月18日 19:34, David Sterba wrote: On Thu, Jun 14, 2018 at 03:17:45PM +0800, Qu Wenruo wrote: > I understand that btrfs-progs introduced restrict parameter/option order > to distinguish global and sub-command parameter/option. > > However it's really annoying if one just want to append some new options > to previous command: > > E.g. > # btrfs check /dev/data/btrfs > # !! --check-data-csum > > The last command will fail as current btrfs-progs doesn't allow any > option after parameter. > > > Despite the requirement to distinguish global and subcommand > option/parameter, is there any other requirement for such restrict > option-first-parameter-last policy? I'd say that it's a common and recommended pattern. Getopt is able to reorder the parameters so mixed options and non-options are accepted, unless POSIXLY_CORRECT (see man getopt(3)) is not set. With the more strict requirement, 'btrfs' option parser works the same regardless of that. >>> >>> But currently it doesn't work. >>> Just as the example, btrfs check /dev/data/btrfs --check-data-sum won't >>> work. >>> It's different from a lot of other common programs. >> >> With POSIXLY_CORRECT set, it expectedly won't work. With POSIXLY_CORRECT >> unset, it would work in general, but not for 'btrfs'. >> >> As this is per-application decision I find it ok, besides that I also >> find the 'options anywhere' pattern bad. Does it really save you that >> much time while typing the commands with new arguments? > > Personally speaking, yes. > >> There are >> movement shortcuts to jump by words, you get the previous command by >> up-arrow. About the same number of keystrokes. > > Not really, normally just "!! ", where I don't even need to > move my fingers to arrow keys. > (I'm all ears about extra bash shortcuts on this) It's called `set -o vi` ;-] If you remap capslock to be escape, there's not much moving around of fingers going on. CAPS-k-b-b-a and type your new option. No reaching to shift and 1 necessary at all. :] >> >> New code needs to be tested, documented and maintained, that's the cost >> I find too high for something that's convenience for people who are used >> to some shell builtins. > > The biggest problem is, the behavior isn't even consistent across > btrfs-progs. > mkfs.btrfs accept such out-of-order parameters while btrfs not. > > And most common tools, like commands provided by coretutil, they don't > care about the order. > The only daily exception is 'scp', which I found it pretty unhandy. > > And just as Paul and Hugo, I think there are quite some users preferring > out-of-order parameter/options. > > > I also understand the maintenance burden, but at least let's try if > there are better solution for this. > > Thanks, > Qu > -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About more loose parameter sequence requirement
On 2018年06月20日 07:18, Qu Wenruo wrote: > > > On 2018年06月19日 22:47, David Sterba wrote: >> On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote: New code needs to be tested, documented and maintained, that's the cost I find too high for something that's convenience for people who are used to some shell builtins. >>> >>> The biggest problem is, the behavior isn't even consistent across >>> btrfs-progs. >>> mkfs.btrfs accept such out-of-order parameters while btrfs not. >>> >>> And most common tools, like commands provided by coretutil, they don't >>> care about the order. >>> The only daily exception is 'scp', which I found it pretty unhandy. >>> >>> And just as Paul and Hugo, I think there are quite some users preferring >>> out-of-order parameter/options. >> >> Because of the feedback and interest of the relaxed mixed >> option/argument syntax, I don't object anymore. > > And in fact, the behavior of btrfs is caused by @optind wrongly initialized. > The fix is just one line (along some comments). > > https://patchwork.kernel.org/patch/10473173/ > > So no or little pressure on maintenance. I'm totally wrong. Since there are cases we call check_argc_*() directly using @optind, reinitialize it to 0 would cause them to fail. I'll update the patch to manually reset optind to 0 only for cases we call getopt(). And indeed, it adds extra maintenance effort. Thanks, Qu > > Thanks, > Qu > signature.asc Description: OpenPGP digital signature
Re: RAID56
Gandalf Corvotempesta wrote: Another kernel release was made. Any improvements in RAID56? I didn't see any changes in that sector, is something still being worked on or it's stuck waiting for something ? Based on official BTRFS status page, RAID56 is the only "unstable" item marked in red. No interested from Suse in fixing that? I think it's the real missing part for a feature-complete filesystem. Nowadays parity raid is mandatory, we can't only rely on mirroring. First of all: I am not a BTRFS developer, but I follow the mailing list closely and I too have a particular interest in the "RAID"5/6 feature which realistically is probably about 3-4 years (if not more) in the future. From what I am able to understand the pesky write hole is one of the major obstacles for having BTRFS "RAID"5/6 work reliably. There was patches to fix this a while ago, but if these patches are to be classified as a workaround or actually as "the darn thing done right" is perhaps up for discussion. In general there seems to be a lot more momentum on the "RAID"5/6 feature now compared to earlier. There also seem to be a lot of focus on fixing bugs and running tests as well. This is why I am guessing that 3-4 years ahead is a absolute minimum until "RAID"5/6 might be somewhat reliable and usable. There are a few other basics missing that may be acceptable for you as long as you know about it. For example as far as I know BTRFS does still not use the "device-id" or "BTRFS internal number" for storage devices to keep track of the storage device. This means that if you have a multi storage device filesystem with for example /dev/sda /dev/sdb /dev/sdc etc... and /dev/sdc disappears and show up again as /dev/sdx then BTRFS would not recoginize this and happily try to continue to write on /dev/sdc even if it does not exist. ...and perhaps even worse - I can imagine that if you swap device ordering and a different device takes /dev/sdc's place then BTRFS *could* overwrite data on this device - possibly making a real mess of things. I am not sure if this holds true, but if it does it's for sure a real nugget of basic functionality missing right there. BTRFS also so far have no automatic "drop device" function e.g. it will not automatically kick out a storage device that is throwing lots of errors and causing delays etc. There may be benefits to keeping this design of course, but for some dropping the device might be desirable. And no hot-spare "or hot-(reserved-)space" (which would be more accurate in BTRFS terms) is implemented either, and that is one good reason to keep an eye on your storage pool. What you *might* consider is to have your metadata in "RAID"1 or "RAID"10 and your data in "RAID5" or even "RAID6" so that if you run into problems then you might in worst case loose some data, but since "RAID"1/10 is beginning to be rather mature then it is likely that your filesystem might survive a disk failure. So if you are prepared to perhaps loose a file or two, but want to feel confident that your filesystem is surviving and will give you a report about what file(s) are toast then this may be acceptable for you as you can always restore from backups (because you do have backups right? If not, read 'any' of Duncan's posts - he explains better than most people why you need and should have backups!) Now keep in mind that this is just a humble users analysis of the situation based on whatever I have picked up from the mailing list which may or may not be entirely accurate so take it for what it is! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About more loose parameter sequence requirement
On 2018年06月19日 22:47, David Sterba wrote: > On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote: >>> New code needs to be tested, documented and maintained, that's the cost >>> I find too high for something that's convenience for people who are used >>> to some shell builtins. >> >> The biggest problem is, the behavior isn't even consistent across >> btrfs-progs. >> mkfs.btrfs accept such out-of-order parameters while btrfs not. >> >> And most common tools, like commands provided by coretutil, they don't >> care about the order. >> The only daily exception is 'scp', which I found it pretty unhandy. >> >> And just as Paul and Hugo, I think there are quite some users preferring >> out-of-order parameter/options. > > Because of the feedback and interest of the relaxed mixed > option/argument syntax, I don't object anymore. And in fact, the behavior of btrfs is caused by @optind wrongly initialized. The fix is just one line (along some comments). https://patchwork.kernel.org/patch/10473173/ So no or little pressure on maintenance. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref
On 19.06.2018 22:31, Jeff Mahoney wrote: > I like the idea here. I wasn't sold at first, but I think if we can > standardize on taking only a trans handle when one is required and both > a trans and fs_info when it's optional, it'll make the code clearer. > This cleanup can percolate up the stack to cover pretty much all of > delayed refs. I have a 25-something patches which do exactly this. Still WIP, will likely send it tomorrow. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref
On 6/18/18 7:59 AM, Nikolay Borisov wrote: > This function already takes a transaction which holds a reference to > the fs_info struct. Use that reference and remove the extra arg. No > functional changes. I like the idea here. I wasn't sold at first, but I think if we can standardize on taking only a trans handle when one is required and both a trans and fs_info when it's optional, it'll make the code clearer. This cleanup can percolate up the stack to cover pretty much all of delayed refs. Reviewed-by: Jeff Mahoney > Signed-off-by: Nikolay Borisov > --- > fs/btrfs/extent-tree.c | 15 ++- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 4850e538ab10..59645ced6fbc 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2208,12 +2208,12 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle > *trans, > } > > static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, > - struct btrfs_fs_info *fs_info, > struct btrfs_delayed_ref_node *node, > u64 parent, u64 root_objectid, > u64 owner, u64 offset, int refs_to_add, > struct btrfs_delayed_extent_op *extent_op) > { > + struct btrfs_fs_info *fs_info = trans->fs_info; > struct btrfs_path *path; > struct extent_buffer *leaf; > struct btrfs_extent_item *item; > @@ -2297,10 +2297,9 @@ static int run_delayed_data_ref(struct > btrfs_trans_handle *trans, >ref->objectid, ref->offset, >, node->ref_mod); > } else if (node->action == BTRFS_ADD_DELAYED_REF) { > - ret = __btrfs_inc_extent_ref(trans, fs_info, node, parent, > - ref_root, ref->objectid, > - ref->offset, node->ref_mod, > - extent_op); > + ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, > + ref->objectid, ref->offset, > + node->ref_mod, extent_op); > } else if (node->action == BTRFS_DROP_DELAYED_REF) { > ret = __btrfs_free_extent(trans, fs_info, node, parent, > ref_root, ref->objectid, > @@ -2450,10 +2449,8 @@ static int run_delayed_tree_ref(struct > btrfs_trans_handle *trans, > BUG_ON(!extent_op || !extent_op->update_flags); > ret = alloc_reserved_tree_block(trans, node, extent_op); > } else if (node->action == BTRFS_ADD_DELAYED_REF) { > - ret = __btrfs_inc_extent_ref(trans, fs_info, node, > - parent, ref_root, > - ref->level, 0, 1, > - extent_op); > + ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, > + ref->level, 0, 1, extent_op); > } else if (node->action == BTRFS_DROP_DELAYED_REF) { > ret = __btrfs_free_extent(trans, fs_info, node, > parent, ref_root, > -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents
On Tue, Jun 19, 2018 at 12:31:42PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > So fix this by ensuring the physical offset is always set to 0 when we > are processing an inline extent. > > Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count > is zero") > Signed-off-by: Filipe Manana Added to 4.18 queue, thanks. This is a fix for a patch that was in for-next for a few weeks but the bug was discovered only after the patch got merged to master. I wonder if there's something to be improved in the patch flow. I think it should take less time to catch bugs while the patches are unmerged, either in the mailinglist or in the development branches. Post-merge fixes will happen of course, but in this particular case it looks like something that slipped too easily. I did the "fallback" review and checked that tests regarding fiemap pass, the occasional failure you mention has not happened on any of my testing setups. There's another patch for fiemap that seems to have bigger impact and for that reason I have postponed merging it to 4.18 unlike the first one, but now I think this requires a testcase. https://patchwork.kernel.org/patch/10383491/ The patch will be in for-next topic branch until then. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H
On 2018-06-19 12:30, james harvey wrote: On Tue, Jun 19, 2018 at 11:47 AM, Marc MERLIN wrote: On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote: So, I ran this: gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & [1] 24450 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=60 gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done 0 out of about 0 chunks balanced (0 considered), -nan% left This (0/0/0, -nan%) seems alarming. I had this output once when the system spontaneously rebooted during a balance. I didn't have any bad effects afterward. Balance on '.' is running 0 out of about 73 chunks balanced (2 considered), 100% left Balance on '.' is running After about 20mn, it changed to this: 1 out of about 73 chunks balanced (6724 considered), 99% left This seems alarming. I wouldn't think # considered should ever exceed # chunks. Although, it does say "about", so maybe it can a little bit, but I wouldn't expect it to exceed it by this much. Actually, output like this is not unusual. In the above line, the 1 is how many chunks have been actually processed, the 73 is how many the command expects to process (that is, the count of chunks that fit the filtering requirements, in this case, ones which are 60% or less full), and the 6724 is how many it has checked against the filtering requirements. So, if you've got a very large number of chunks, and are selecting a small number with filters, then the considered value is likely to be significantly higher than the first two. Balance on '.' is running Now, 12H later, it's still there, only 1 out of 73. gargamel:/mnt/btrfs_pool2# btrfs fi show . Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Total devices 1 FS bytes used 12.72TiB devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 gargamel:/mnt/btrfs_pool2# btrfs fi df . Data, single: total=13.57TiB, used=12.60TiB System, DUP: total=32.00MiB, used=1.55MiB Metadata, DUP: total=121.50GiB, used=116.53GiB GlobalReserve, single: total=512.00MiB, used=848.00KiB kernel: 4.16.8 Is that expected? Should I be ready to wait days possibly for this balance to finish? It's now beeen 2 days, and it's still stuck at 1% 1 out of about 73 chunks balanced (6724 considered), 99% left First, my disclaimer. I'm not a btrfs developer, and although I've ran balance many times, I haven't really studied its output beyond the % left. I don't know why it says "about", and I don't know if it should ever be that far off. In your situation, I would run "btrfs pause ", wait to hear from a btrfs developer, and not use the volume whatsoever in the meantime. I would say this is probably good advice. I don't really know what's going on here myself actually, though it looks like the balance got stuck (the output hasn't changed for over 36 hours, unless you've got an insanely slow storage array, that's extremely unusual (it should only be moving at most 3GB of data per chunk)). That said, I would question the value of repacking chunks that are already more than half full. Anything above a 50% usage filter generally takes a long time, and has limited value in most cases (higher values are less likely to reduce the total number of allocated chunks). With `-duszge=50` or less, you're guaranteed to reduce the number of chunk if at least two match, and it isn't very time consuming for the allocator, all because you can pack at least two matching chunks into one 'new' chunk (new in quotes because it may re-pack them into existing slack space on the FS). Additionally, `-dusage=50` is usually sufficient to mitigate the typical ENOSPC issues that regular balancing is supposed to help with. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H
On Tue, Jun 19, 2018 at 11:47 AM, Marc MERLIN wrote: > On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote: >> So, I ran this: >> gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & >> [1] 24450 >> Dumping filters: flags 0x1, state 0x0, force is off >> DATA (flags 0x2): balancing, usage=60 >> gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done >> 0 out of about 0 chunks balanced (0 considered), -nan% left This (0/0/0, -nan%) seems alarming. I had this output once when the system spontaneously rebooted during a balance. I didn't have any bad effects afterward. >> Balance on '.' is running >> 0 out of about 73 chunks balanced (2 considered), 100% left >> Balance on '.' is running >> >> After about 20mn, it changed to this: >> 1 out of about 73 chunks balanced (6724 considered), 99% left This seems alarming. I wouldn't think # considered should ever exceed # chunks. Although, it does say "about", so maybe it can a little bit, but I wouldn't expect it to exceed it by this much. >> Balance on '.' is running >> >> Now, 12H later, it's still there, only 1 out of 73. >> >> gargamel:/mnt/btrfs_pool2# btrfs fi show . >> Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d >> Total devices 1 FS bytes used 12.72TiB >> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 >> >> gargamel:/mnt/btrfs_pool2# btrfs fi df . >> Data, single: total=13.57TiB, used=12.60TiB >> System, DUP: total=32.00MiB, used=1.55MiB >> Metadata, DUP: total=121.50GiB, used=116.53GiB >> GlobalReserve, single: total=512.00MiB, used=848.00KiB >> >> kernel: 4.16.8 >> >> Is that expected? Should I be ready to wait days possibly for this >> balance to finish? > > It's now beeen 2 days, and it's still stuck at 1% > 1 out of about 73 chunks balanced (6724 considered), 99% left First, my disclaimer. I'm not a btrfs developer, and although I've ran balance many times, I haven't really studied its output beyond the % left. I don't know why it says "about", and I don't know if it should ever be that far off. In your situation, I would run "btrfs pause ", wait to hear from a btrfs developer, and not use the volume whatsoever in the meantime. I can make some guesses where to go from here, but won't, as I don't want to screw things up for you. What version of btrfs-progs do you have? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote: > So, I ran this: > gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & > [1] 24450 > Dumping filters: flags 0x1, state 0x0, force is off > DATA (flags 0x2): balancing, usage=60 > gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done > 0 out of about 0 chunks balanced (0 considered), -nan% left > Balance on '.' is running > 0 out of about 73 chunks balanced (2 considered), 100% left > Balance on '.' is running > > After about 20mn, it changed to this: > 1 out of about 73 chunks balanced (6724 considered), 99% left > Balance on '.' is running > > Now, 12H later, it's still there, only 1 out of 73. > > gargamel:/mnt/btrfs_pool2# btrfs fi show . > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > Total devices 1 FS bytes used 12.72TiB > devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > gargamel:/mnt/btrfs_pool2# btrfs fi df . > Data, single: total=13.57TiB, used=12.60TiB > System, DUP: total=32.00MiB, used=1.55MiB > Metadata, DUP: total=121.50GiB, used=116.53GiB > GlobalReserve, single: total=512.00MiB, used=848.00KiB > > kernel: 4.16.8 > > Is that expected? Should I be ready to wait days possibly for this > balance to finish? It's now beeen 2 days, and it's still stuck at 1% 1 out of about 73 chunks balanced (6724 considered), 99% left Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID56
Another kernel release was made. Any improvements in RAID56? I didn't see any changes in that sector, is something still being worked on or it's stuck waiting for something ? Based on official BTRFS status page, RAID56 is the only "unstable" item marked in red. No interested from Suse in fixing that? I think it's the real missing part for a feature-complete filesystem. Nowadays parity raid is mandatory, we can't only rely on mirroring. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About more loose parameter sequence requirement
On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote: > > New code needs to be tested, documented and maintained, that's the cost > > I find too high for something that's convenience for people who are used > > to some shell builtins. > > The biggest problem is, the behavior isn't even consistent across > btrfs-progs. > mkfs.btrfs accept such out-of-order parameters while btrfs not. > > And most common tools, like commands provided by coretutil, they don't > care about the order. > The only daily exception is 'scp', which I found it pretty unhandy. > > And just as Paul and Hugo, I think there are quite some users preferring > out-of-order parameter/options. Because of the feedback and interest of the relaxed mixed option/argument syntax, I don't object anymore. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: fix race between mkfs and mount
On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote: > On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote: > > In an instrumented testing it is possible that the mount and > > a newer mkfs.btrfs thread on the same device can race and if the new > > mkfs.btrfs wins it will free the older fs_devices, then the mount thread > > will lead to oops. > > > > Thread1 Thread2 > > --- --- > > mkfs.btrfs -fq /dev/sdb > > mount /dev/sdb /btrfs > > |_btrfs_mount_root() > > |_btrfs_scan_one_device(... _devices) > > > > mkfs.btrfs -fq /dev/sdb > > |_btrfs_contol_ioctl() > > |_btrfs_scan_one_device(... > > _devices) > > |_:: > > > > |_btrfs_free_stale_devices() > > > > |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices. > > > > Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS. > > I'm not sure this is the right way to fix it, adding another bit to the > uuid_mutex and device_list_mutex combo. > > To fix the race between mount and mkfs we can add a bit of logic to the > device scanning so that mount calling scan will track the purpose and > mkfs scan will not be allowed to free that device. Last version of the proposed fix is to extend the uuid_mutex over the whole mount callback and use it around calls to btrfs_scan_one_device. That way we'll be sure the mount will always get to the device it scanned and that will not be freed by the parallel scan. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix return value on rename exchange failure
On Mon, Jun 11, 2018 at 07:24:16PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > If we failed during a rename exchange operation after starting/joining a > transaction, we would end up replacing the return value, stored in the > local 'ret' variable, with the return value from btrfs_end_transaction(). > So this could end up returning 0 (success) to user space despite the > operation having failed and aborted the transaction, because if there are > multiple tasks having a reference on the transaction at the time > btrfs_end_transaction() is called by the rename exchange, that function > returns 0 (otherwise it returns -EIO and not the original error value). > So fix this by not overwriting the return value on error after getting > a transaction handle. > > Signed-off-by: Filipe Manana 1 and 2 queued for 4.18, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Cleanups around extent increment
On Mon, Jun 18, 2018 at 02:59:23PM +0300, Nikolay Borisov wrote: > This series improves the functions involved around extent reference > increment. > The first patch just removes a redundant argument, the second one documents > the > parameters of __btrfs_inc_extent_ref. It can be considered v2 of the > standalone > version which Jeff had some input to. The final patch fixes a comment in > lookup_inline_extent_backref which transpired while Jeff was revieweing the > documentation patch. > > Nikolay Borisov (3): > btrfs: Remove fs_info argument from __btrfs_inc_extent_ref > btrfs: Document __btrfs_inc_extent_ref > btrfs: Fix comment in lookup_inline_extent_backref 2 and 3 added to misc-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix invalid-free in btrfs_extent_same
On Tue, Jun 19, 2018 at 02:54:38PM +0800, Lu Fengqi wrote: > If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) != > (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM)) > is hit, we will go to free the uninitialized cmp.src_pages and > cmp.dst_pages. > > Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") > Signed-off-by: Lu Fengqi > --- > fs/btrfs/ioctl.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index c2837a32d689..43ecbe620dea 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 > loff, u64 olen, > ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN, > dst, dst_loff, ); > if (ret) > - goto out_unlock; > + goto out_free; > > loff += BTRFS_MAX_DEDUPE_LEN; > dst_loff += BTRFS_MAX_DEDUPE_LEN; > @@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, u64 > loff, u64 olen, > ret = btrfs_extent_same_range(src, loff, tail_len, dst, > dst_loff, ); The labels now switch order and there's one more 'goto out_free' that actually also wants to unlock the pages, after error of btrfs_extent_same_range in the for loop. So this needs to be update too. > > +out_free: > + kvfree(cmp.src_pages); > + kvfree(cmp.dst_pages); > + > out_unlock: > if (same_inode) > inode_unlock(src); > else > btrfs_double_inode_unlock(src, dst); > > -out_free: > - kvfree(cmp.src_pages); > - kvfree(cmp.dst_pages); > - > return ret; > } > > -- > 2.17.1 > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref
On 18.06.2018 14:59, Nikolay Borisov wrote: > This function already takes a transaction which holds a reference to > the fs_info struct. Use that reference and remove the extra arg. No > functional changes. > > Signed-off-by: Nikolay Borisov > --- Please ignore this patch, since I'm going to re-send it as apart of a larger series dealing specifically with fs_info cleanup. The other 2 are good. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix physical offset reported by fiemap for inline extents
From: Filipe Manana Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero") introduced a regression where we no longer report 0 as the physical offset for inline extents. This is because it always sets the variable used to report the physical offset ("disko") as em->block_start plus some offset, and em->block_start has the value 18446744073709551614 ((u64) -2) for inline extents. This made the btrfs test 004 (from fstests) often fail, for example, for a file with an inline extent we have the following items in the subvolume tree: item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160 generation 25 transid 38 size 1525 nbytes 1525 block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x2(none) atime 1529342058.461891730 (2018-06-18 18:14:18) ctime 1529342058.461891730 (2018-06-18 18:14:18) mtime 1529342058.461891730 (2018-06-18 18:14:18) otime 1529342055.869892885 (2018-06-18 18:14:15) item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13 index 25 namelen 3 name: fc7 item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546 generation 38 type 0 (inline) inline extent data size 1525 ram_bytes 1525 compression 0 (none) Then when test 004 invoked fiemap against the file it got a non-zero physical offset: $ filefrag -v /mnt/p0/d4/d7/fc7 Filesystem type is: 9123683e File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..4095: 18446744073709551614.. 4093: 4096: last,not_aligned,inline,eof /mnt/p0/d4/d7/fc7: 1 extent found This resulted in the test failing like this: btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad) --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad 2018-06-18 18:15:02.385872155 +0100 @@ -1,3 +1,10 @@ QA output created by 004 *** test backref walking -*** done +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected +ERROR: 7.55578637259143e+22 is not a valid numeric value. +unexpected output from + /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1 ... (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad' to see the entire diff) Ran: btrfs/004 The large number in scientific notation reported as an invalid numeric value is the result from the filter passed to perl which multiplies the physical offset by the block size reported by fiemap. So fix this by ensuring the physical offset is always set to 0 when we are processing an inline extent. Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero") Signed-off-by: Filipe Manana --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8e4a7cdbc9f5..978327d98fc5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, end = 1; flags |= FIEMAP_EXTENT_LAST; } else if (em->block_start == EXTENT_MAP_INLINE) { + disko = 0; flags |= (FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED); } else if (em->block_start == EXTENT_MAP_DELALLOC) { -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix invalid-free in btrfs_extent_same
If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) != (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM)) is hit, we will go to free the uninitialized cmp.src_pages and cmp.dst_pages. Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Lu Fengqi --- fs/btrfs/ioctl.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c2837a32d689..43ecbe620dea 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen, ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN, dst, dst_loff, ); if (ret) - goto out_unlock; + goto out_free; loff += BTRFS_MAX_DEDUPE_LEN; dst_loff += BTRFS_MAX_DEDUPE_LEN; @@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen, ret = btrfs_extent_same_range(src, loff, tail_len, dst, dst_loff, ); +out_free: + kvfree(cmp.src_pages); + kvfree(cmp.dst_pages); + out_unlock: if (same_inode) inode_unlock(src); else btrfs_double_inode_unlock(src, dst); -out_free: - kvfree(cmp.src_pages); - kvfree(cmp.dst_pages); - return ret; } -- 2.17.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html