Re: [RFC] btrfs: Allow read-only mount with corrupted extent tree

2021-03-19 Thread Qu Wenruo




On 2021/3/19 下午11:34, Dāvis Mosāns wrote:

ceturtd., 2021. g. 18. marts, plkst. 01:49 — lietotājs Qu Wenruo
() rakstīja:




On 2021/3/18 上午5:03, Dāvis Mosāns wrote:

trešd., 2021. g. 17. marts, plkst. 12:28 — lietotājs Qu Wenruo
() rakstīja:




On 2021/3/17 上午9:29, Dāvis Mosāns wrote:

trešd., 2021. g. 17. marts, plkst. 03:18 — lietotājs Dāvis Mosāns
() rakstīja:


Currently if there's any corruption at all in extent tree
(eg. even single bit) then mounting will fail with:
"failed to read block groups: -5" (-EIO)
It happens because we immediately abort on first error when
searching in extent tree for block groups.

Now with this patch if `ignorebadroots` option is specified
then we handle such case and continue by removing already
created block groups and creating dummy block groups.

Signed-off-by: Dāvis Mosāns 
---
fs/btrfs/block-group.c | 14 ++
fs/btrfs/disk-io.c |  4 ++--
fs/btrfs/disk-io.h |  2 ++
3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 48ebc106a606..827a977614b3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2048,6 +2048,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
   ret = check_chunk_block_group_mappings(info);
error:
   btrfs_free_path(path);
+
+   if (ret == -EIO && btrfs_test_opt(info, IGNOREBADROOTS)) {
+   btrfs_put_block_group_cache(info);
+   btrfs_stop_all_workers(info);
+   btrfs_free_block_groups(info);
+   ret = btrfs_init_workqueues(info, NULL);
+   if (ret)
+   return ret;
+   ret = btrfs_init_space_info(info);
+   if (ret)
+   return ret;
+   return fill_dummy_bgs(info);


When we hit bad things in extent tree, we should ensure we're mounting
the fs RO, or we can't continue.

And we should also refuse to mount back to RW if we hit such case, so
that we don't need anything complex, just ignore the whole extent tree
and create the dummy block groups.



That's what we're doing here, `ignorebadroots` implies RO mount and
without specifying it doesn't mount at all.



This isn't that nice, but I don't really know how to properly clean up
everything related to already created block groups so this was easiest
way. It seems to work fine.
But looks like need to do something about replay log aswell because if
it's not disabled then it fails with:

[ 1397.246869] BTRFS info (device sde): start tree-log replay
[ 1398.218685] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x22ab750a level 0
[ 1398.218803] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x7dd54bb9 level 0
[ 1398.218813] BTRFS: error (device sde) in __btrfs_free_extent:3054:
errno=-5 IO failure
[ 1398.218828] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2124: errno=-5 IO failure
[ 1398.219002] BTRFS: error (device sde) in btrfs_replay_log:2254:
errno=-5 IO failure (Failed to recover log tree)
[ 1398.229048] BTRFS error (device sde): open_ctree failed


This is because we shouldn't allow to do anything write to the fs if we
have anything wrong in extent tree.



This is happening when mounting read-only. My assumption is that it
only tries to replay in memory without writing anything to disk.



We lacks the check on log tree.

Normally for such forced RO mount, log replay is not allowed.

We should output a warning to prompt user to use nologreplay, and reject
the mount.



I'm not familiar with log replay but couldn't there be something
useful (ignoring ref counts) that would still be worth replaying in
memory?


Log replay means metadata write.

Any write needs a valid extent tree to find out free space for new
metadata/data.

So no, we can't do anything but completely ignoring the log.

Thanks,
Qu


Re: [RFC] btrfs: Allow read-only mount with corrupted extent tree

2021-03-17 Thread Qu Wenruo




On 2021/3/18 上午5:03, Dāvis Mosāns wrote:

trešd., 2021. g. 17. marts, plkst. 12:28 — lietotājs Qu Wenruo
() rakstīja:




On 2021/3/17 上午9:29, Dāvis Mosāns wrote:

trešd., 2021. g. 17. marts, plkst. 03:18 — lietotājs Dāvis Mosāns
() rakstīja:


Currently if there's any corruption at all in extent tree
(eg. even single bit) then mounting will fail with:
"failed to read block groups: -5" (-EIO)
It happens because we immediately abort on first error when
searching in extent tree for block groups.

Now with this patch if `ignorebadroots` option is specified
then we handle such case and continue by removing already
created block groups and creating dummy block groups.

Signed-off-by: Dāvis Mosāns 
---
   fs/btrfs/block-group.c | 14 ++
   fs/btrfs/disk-io.c |  4 ++--
   fs/btrfs/disk-io.h |  2 ++
   3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 48ebc106a606..827a977614b3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2048,6 +2048,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
  ret = check_chunk_block_group_mappings(info);
   error:
  btrfs_free_path(path);
+
+   if (ret == -EIO && btrfs_test_opt(info, IGNOREBADROOTS)) {
+   btrfs_put_block_group_cache(info);
+   btrfs_stop_all_workers(info);
+   btrfs_free_block_groups(info);
+   ret = btrfs_init_workqueues(info, NULL);
+   if (ret)
+   return ret;
+   ret = btrfs_init_space_info(info);
+   if (ret)
+   return ret;
+   return fill_dummy_bgs(info);


When we hit bad things in extent tree, we should ensure we're mounting
the fs RO, or we can't continue.

And we should also refuse to mount back to RW if we hit such case, so
that we don't need anything complex, just ignore the whole extent tree
and create the dummy block groups.



That's what we're doing here, `ignorebadroots` implies RO mount and
without specifying it doesn't mount at all.



This isn't that nice, but I don't really know how to properly clean up
everything related to already created block groups so this was easiest
way. It seems to work fine.
But looks like need to do something about replay log aswell because if
it's not disabled then it fails with:

[ 1397.246869] BTRFS info (device sde): start tree-log replay
[ 1398.218685] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x22ab750a level 0
[ 1398.218803] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x7dd54bb9 level 0
[ 1398.218813] BTRFS: error (device sde) in __btrfs_free_extent:3054:
errno=-5 IO failure
[ 1398.218828] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2124: errno=-5 IO failure
[ 1398.219002] BTRFS: error (device sde) in btrfs_replay_log:2254:
errno=-5 IO failure (Failed to recover log tree)
[ 1398.229048] BTRFS error (device sde): open_ctree failed


This is because we shouldn't allow to do anything write to the fs if we
have anything wrong in extent tree.



This is happening when mounting read-only. My assumption is that it
only tries to replay in memory without writing anything to disk.



We lacks the check on log tree.

Normally for such forced RO mount, log replay is not allowed.

We should output a warning to prompt user to use nologreplay, and reject
the mount.

Thanks,
Qu


Re: [RFC] btrfs: Allow read-only mount with corrupted extent tree

2021-03-17 Thread Qu Wenruo




On 2021/3/17 上午9:29, Dāvis Mosāns wrote:

trešd., 2021. g. 17. marts, plkst. 03:18 — lietotājs Dāvis Mosāns
() rakstīja:


Currently if there's any corruption at all in extent tree
(eg. even single bit) then mounting will fail with:
"failed to read block groups: -5" (-EIO)
It happens because we immediately abort on first error when
searching in extent tree for block groups.

Now with this patch if `ignorebadroots` option is specified
then we handle such case and continue by removing already
created block groups and creating dummy block groups.

Signed-off-by: Dāvis Mosāns 
---
  fs/btrfs/block-group.c | 14 ++
  fs/btrfs/disk-io.c |  4 ++--
  fs/btrfs/disk-io.h |  2 ++
  3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 48ebc106a606..827a977614b3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2048,6 +2048,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 ret = check_chunk_block_group_mappings(info);
  error:
 btrfs_free_path(path);
+
+   if (ret == -EIO && btrfs_test_opt(info, IGNOREBADROOTS)) {
+   btrfs_put_block_group_cache(info);
+   btrfs_stop_all_workers(info);
+   btrfs_free_block_groups(info);
+   ret = btrfs_init_workqueues(info, NULL);
+   if (ret)
+   return ret;
+   ret = btrfs_init_space_info(info);
+   if (ret)
+   return ret;
+   return fill_dummy_bgs(info);


When we hit bad things in extent tree, we should ensure we're mounting
the fs RO, or we can't continue.

And we should also refuse to mount back to RW if we hit such case, so
that we don't need anything complex, just ignore the whole extent tree
and create the dummy block groups.



This isn't that nice, but I don't really know how to properly clean up
everything related to already created block groups so this was easiest
way. It seems to work fine.
But looks like need to do something about replay log aswell because if
it's not disabled then it fails with:

[ 1397.246869] BTRFS info (device sde): start tree-log replay
[ 1398.218685] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x22ab750a level 0
[ 1398.218803] BTRFS warning (device sde): sde checksum verify failed
on 21057127661568 wanted 0xd1506ed9 found 0x7dd54bb9 level 0
[ 1398.218813] BTRFS: error (device sde) in __btrfs_free_extent:3054:
errno=-5 IO failure
[ 1398.218828] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2124: errno=-5 IO failure
[ 1398.219002] BTRFS: error (device sde) in btrfs_replay_log:2254:
errno=-5 IO failure (Failed to recover log tree)
[ 1398.229048] BTRFS error (device sde): open_ctree failed


This is because we shouldn't allow to do anything write to the fs if we
have anything wrong in extent tree.

Thanks,
Qu


Ideally it should replay everything except extent refs. >

I also noticed that after unmount there is:

[11000.562504] BTRFS warning (device sde): page private not zero on
page 21057098481664
[11000.562510] BTRFS warning (device sde): page private not zero on
page 21057098485760

not sure what it means.


Best regards,
Dāvis



Re: [btrfs] e86bb85b1f: stress-ng.utime.ops_per_sec -70.1% regression

2021-01-12 Thread Qu Wenruo




On 2021/1/12 下午11:24, kernel test robot wrote:


Greeting,

FYI, we noticed a -70.1% regression of stress-ng.utime.ops_per_sec due to 
commit:


commit: e86bb85b1fec48bcb8dfb79ec9f104d1a38fda78 ("[PATCH] btrfs: make 
btrfs_dirty_inode() to always reserve metadata space")
url: 
https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-make-btrfs_dirty_inode-to-always-reserve-metadata-space/20210108-134133
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G 
memory
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 30s
class: filesystem
cpufreq_governor: performance
ucode: 0x5003003
fs: btrfs




If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
   
filesystem/gcc-9/performance/1HDD/btrfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/stress-ng/30s/0x5003003

commit:
   97847e0652 ("Merge branch 'for-next-next-v5.10-20201211' into 
for-next-20201211")
   e86bb85b1f ("btrfs: make btrfs_dirty_inode() to always reserve metadata 
space")

97847e06525b51ea e86bb85b1fec48bcb8dfb79ec9f
 ---
  %stddev %change %stddev
  \  |\
1098218   -40.4% 654054stress-ng.access.ops
  36607   -40.4%  21801stress-ng.access.ops_per_sec


This is a little interesting.
Although accessing an inode will update its atime, but don't we have
lazy_atime mount option?


  92962 ±  2% -44.1%  51992 ±  3%  stress-ng.chmod.ops
   3098 ±  2% -44.1%   1733 ±  3%  stress-ng.chmod.ops_per_sec
 936128 ±  6% -41.0% 552284stress-ng.chown.ops
  31204 ±  6% -41.0%  18409stress-ng.chown.ops_per_sec
1939514   -18.5%1580533stress-ng.fcntl.ops
  64650   -18.5%  52684stress-ng.fcntl.ops_per_sec
3705607 ±  2% -70.1%1109769stress-ng.utime.ops
 123519 ±  2% -70.1%  36992stress-ng.utime.ops_per_sec


Another interesting part is, only stress-ng is reporting such
regressioin on the commit?
No other report on the commit with different test env? E.g. NVME SSD?

Above operations is affected by such commit, but I'm a little surprised
only one report here.

Just because flushing on HDD is more expensive? If no other test suite
is fine, I would prefer to accept the drop, as it really streamline the
operations.

Thanks,
Qu


 381.20 ±  6% +12.3% 428.27 ±  9%  sched_debug.cfs_rq:/.load_avg.avg
   6316 ± 57% -79.8%   1278 ± 68%  softirqs.CPU77.BLOCK
  10488 ±101% -89.5%   1100 ±124%  softirqs.CPU78.BLOCK
   5605 ± 92% -82.3% 990.50 ± 32%  softirqs.CPU80.BLOCK
   6094 ±128% -89.9% 614.50 ± 44%  softirqs.CPU92.BLOCK
   4921 ±  9% +20.5%   5931 ±  5%  
slabinfo.dmaengine-unmap-16.active_objs
   4922 ±  9% +20.5%   5933 ±  5%  
slabinfo.dmaengine-unmap-16.num_objs
   9818 ±  5%  -6.9%   9139 ±  3%  
slabinfo.kmalloc-rcl-256.active_objs
  49223 ±  3% -18.4%  40177 ±  3%  
slabinfo.radix_tree_node.active_objs
 903.25 ±  3% -18.0% 740.50 ±  3%  
slabinfo.radix_tree_node.active_slabs
  50620 ±  3% -18.0%  41505 ±  3%  slabinfo.radix_tree_node.num_objs
 903.25 ±  3% -18.0% 740.50 ±  3%  
slabinfo.radix_tree_node.num_slabs
   9927 ±  3%  +5.8%  10504proc-vmstat.nr_active_anon
6043459 ±  2%  -2.2%5911900proc-vmstat.nr_dirtied
   1125-6.1%   1056 ±  4%  proc-vmstat.nr_dirty
  20361 ±  2%  +4.7%  21309proc-vmstat.nr_shmem
  66221-4.3%  63404 ±  2%  proc-vmstat.nr_slab_reclaimable
   9927 ±  3%  +5.8%  10504proc-vmstat.nr_zone_active_anon
   1225-5.8%   1154 ±  3%  proc-vmstat.nr_zone_write_pending
   11313111-2.1%   11072335proc-vmstat.pgfault
   0.00  +125.0%   0.00 ± 19%  
perf-sched.sch_delay.avg.ms.preempt_schedule_common._cond_resched.kmem_cache_alloc.start_transaction.btrfs_dirty_inode
   0.01 ± 13% -24.5%   0.01 ± 15%  
perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sy

Re: [btrfs] ccb0edc68b: xfstests.btrfs.179.fail

2020-11-26 Thread Qu Wenruo


On 2020/11/26 下午4:56, kernel test robot wrote:
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: ccb0edc68b690d0a62e9377ab509eb2f7cb610d3 ("btrfs: stop running all 
> delayed refs during snapshot")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> 
> in testcase: xfstests
> version: xfstests-x86_64-d41dcbd-1_20201116
> with following parameters:
> 
>   disk: 6HDD
>   fs: btrfs
>   test: btrfs-group-03
>   ucode: 0x28
> 
> test-description: xfstests is a regression test suite for xfs and other files 
> ystems.
> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> 
> 
> on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 2020-11-25 22:50:45 export TEST_DIR=/fs/sdb1
> 2020-11-25 22:50:45 export TEST_DEV=/dev/sdb1
> 2020-11-25 22:50:45 export FSTYP=btrfs
> 2020-11-25 22:50:45 export SCRATCH_MNT=/fs/scratch
> 2020-11-25 22:50:45 mkdir /fs/scratch -p
> 2020-11-25 22:50:45 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 
> /dev/sdb5 /dev/sdb6"
> 2020-11-25 22:50:45 sed "s:^:btrfs/:" 
> //lkp/benchmarks/xfstests/tests/btrfs-group-03
> 2020-11-25 22:50:45 ./check btrfs/150 btrfs/151 btrfs/152 btrfs/153 btrfs/155 
> btrfs/156 btrfs/157 btrfs/158 btrfs/159 btrfs/160 btrfs/161 btrfs/162 
> btrfs/163 btrfs/164 btrfs/165 btrfs/166 btrfs/167 btrfs/168 btrfs/169 
> btrfs/170 btrfs/171 btrfs/172 btrfs/173 btrfs/174 btrfs/175 btrfs/176 
> btrfs/177 btrfs/178 btrfs/179 btrfs/180 btrfs/181 btrfs/182 btrfs/183 
> btrfs/184 btrfs/185 btrfs/186 btrfs/187 btrfs/188 btrfs/189 btrfs/190 
> btrfs/191 btrfs/192 btrfs/193 btrfs/194 btrfs/195 btrfs/196 btrfs/197 
> btrfs/198 btrfs/199
> FSTYP -- btrfs
> PLATFORM  -- Linux/x86_64 lkp-hsw-d01 5.10.0-rc5-00155-gccb0edc68b69 #1 
> SMP Thu Nov 26 04:34:38 CST 2020
> MKFS_OPTIONS  -- /dev/sdb2
> MOUNT_OPTIONS -- /dev/sdb2 /fs/scratch
> 
> btrfs/150  1s
> btrfs/151  2s
> btrfs/152  6s
> btrfs/153  3s
> btrfs/155  1s
> btrfs/156 [not run] FITRIM not supported on /fs/scratch
> btrfs/157  2s
> btrfs/158  2s
> btrfs/159  11s
> btrfs/160  2s
> btrfs/161  1s
> btrfs/162  3s
> btrfs/163 - output mismatch (see 
> /lkp/benchmarks/xfstests/results//btrfs/163.out.bad)
> --- tests/btrfs/163.out   2020-11-16 06:09:57.0 +
> +++ /lkp/benchmarks/xfstests/results//btrfs/163.out.bad   2020-11-25 
> 22:51:22.553853766 +
> @@ -1,8 +1,10 @@
>  QA output created by 163
> +./common/btrfs: line 405: _require_loadable_fs_module: command not found
>  -- golden --
>  000 abab abab abab abab abab abab abab abab
>  *
>  2000
> +./common/btrfs: line 412: _reload_fs_module: command not found
> ...
> (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/163.out 
> /lkp/benchmarks/xfstests/results//btrfs/163.out.bad'  to see the entire diff)
> btrfs/164 [not run] Require module btrfs to be unloadable
> btrfs/165  1s
> btrfs/166  1s
> btrfs/167  2s
> btrfs/168  1s
> btrfs/169  2s
> btrfs/170  0s
> btrfs/171  1s
> btrfs/172 [not run] This test requires a valid $LOGWRITES_DEV
> btrfs/173  1s
> btrfs/174  1s
> btrfs/175  15s
> btrfs/176  6s
> btrfs/177  8s
> btrfs/178  1s
> btrfs/179 _check_btrfs_filesystem: filesystem on /dev/sdb2 is inconsistent
> (see /lkp/benchmarks/xfstests/results//btrfs/179.full for details)

This is known false alert.

When we have half dropped snapshots/subvolumes, btrfs check will report
false qgroup mismatch.
But if the kernel has fully dropped the subvolume/snapshot, the
btrfs-progs will report the same accounting as kernel.

I can workaround it by adding a "btrfs subv sync" to solve it for now.

The root fix is to make btrfs-check to do the same qgroup accounting for
half dropped subvolumes.

Thanks,
Qu

> 
> btrfs/180  4s
> btrfs/181  3s
> btrfs/182  3s
> btrfs/183  1s
> btrfs/184  2s
> btrfs/185  1s
> btrfs/186  1s
> btrfs/187  192s
> btrfs/188  1s
> btrfs/189  2s
> btrfs/190 [not run] This test requires a valid $LOGWRITES_DEV
> btrfs/191  2s
> btrfs/192 [not run] This test requires a valid $LOGWRITES_DEV
> btrfs/193  2s
> btrfs/194  181s
> btrfs/195  489s
> btrfs/196 [not run] This test requires a valid $LOGWRITES_DEV
> btrfs/197  7s
> btrfs/198  3s
> btrfs/199  10s
> Ran: btrfs/150 btrfs/151 btrfs/152 btrfs/153 btrfs/155 btrfs/156 btrfs/157 
> btrfs/158 btrfs/159 btrfs/160 btrfs/161 btrfs/162 btrfs/163 btrfs/164 
> btrfs/165 btrfs/166 btrfs/167 btrfs/168 btrfs/169 btrfs/170 btrfs/171 
> btrfs/172 btrfs/173 btrfs/174 btrfs/175 btrfs/176 btrfs/177 btrfs/178 
> btrfs/179 btrfs/180 btrfs/181 

Re: About regression caused by commit aea6cb99703e ("regulator: resolve supply after creating regulator")

2020-11-23 Thread Qu Wenruo



On 2020/11/23 下午2:47, Jan Kiszka wrote:
> On 22.11.20 17:35, Michał Mirosław wrote:
>> On Sun, Nov 22, 2020 at 03:43:33PM +0100, Jan Kiszka wrote:
>>> On 09.11.20 00:28, Qu Wenruo wrote:
>>>> On 2020/11/9 上午1:18, Michał Mirosław wrote:
>>>>> On Sun, Nov 08, 2020 at 03:35:33PM +0800, Qu Wenruo wrote:
>> [...]
>>>>>> It turns out that, commit aea6cb99703e ("regulator: resolve supply after
>>>>>> creating regulator") seems to be the cause.
>> [...]
>>> We are still missing some magic fix for stable trees: On the STM32MP15x,
>>> things are broken since 5.4.73 now. And 5.9.y is not booting as well on
>>> that board. Reverting the original commit make it boot again.
>>>
>>> Linus master is fine, though, but I'm tired of bisecting. Any
>>> suggestions? Or is there something queued up already?
>>
>> You might want to look at `git log --grep=aea6cb99703e` if you can't
>> wait for a stable backport.
>>
> 
> Good. Is that flagged and tested for 5.9/5.4 (and whatever is also
> affected) already?

The offending commit is only introduced in v5.10, thus I don't beleive
v5.9/v5.4 is affected unless the commit is backported.

Thanks,
Qu
> 
> Jan
> 



Re: [PATCH 4.19 29/71] btrfs: tree-checker: Verify inode item

2020-11-11 Thread Qu Wenruo



On 2020/11/11 下午9:38, Pavel Machek wrote:
> Hi!
> 
>>>> From: Qu Wenruo 
>>>>
>>>> commit 496245cac57e26d8b738d85c7a29cf9a47610f3f upstream.
>>>>
>>>> There is a report in kernel bugzilla about mismatch file type in dir
>>>> item and inode item.
>>>>
>>>> This inspires us to check inode mode in inode item.
>>>>
>>>> This patch will check the following members:
>>>
>>>> +  /* Here we use super block generation + 1 to handle log tree */
>>>> +  if (btrfs_inode_generation(leaf, iitem) > super_gen + 1) {
>>>> +  inode_item_err(fs_info, leaf, slot,
>>>> +  "invalid inode generation: has %llu expect (0, %llu]",
>>>> + btrfs_inode_generation(leaf, iitem),
>>>> + super_gen + 1);
>>>> +  return -EUCLEAN;
>>>> +  }
>>>
>>> Printk suggests btrfs_inode_generation() may not be zero, but the
>>> condition does not actually check that. Should that be added?
>>
>> Sorry, btrfs_inode_generation() here is exactly what we're checking
>> here, so what's wrong?
> 
> Quoted message says "(0, ...]", while message below says "[0, ...]". I
> assume that means that btrfs_inode_generation() may not be zero in the
> first case, but may be zero in the second case. But the code does not
> test for zero here.

Zero for inode generation is more or less in the grey zone.

For inodes which can be accessed by users, inode 0 may cause small
problems for send, but despite that, no obvious problem.

For btrfs internal generations, it can be 0 and cause nothing wrong.

So here we don't check inode_generation == 0 case at all, or we could
lead to too many false alerts for older btrfs.

Thanks,
Q

> 
> Best regards,
>   Pavel
> 
>>>> +  /* Note for ROOT_TREE_DIR_ITEM, mkfs could set its transid 0 */
>>>> +  if (btrfs_inode_transid(leaf, iitem) > super_gen + 1) {
>>>> +  inode_item_err(fs_info, leaf, slot,
>>>> +  "invalid inode generation: has %llu expect [0, %llu]",
>>>> + btrfs_inode_transid(leaf, iitem), super_gen + 1);
>>>> +  return -EUCLEAN;
>>>> +  }
> 



Re: [PATCH 4.19 29/71] btrfs: tree-checker: Verify inode item

2020-11-11 Thread Qu Wenruo



On 2020/11/11 下午9:13, Pavel Machek wrote:
> Hi!
> 
>> From: Qu Wenruo 
>>
>> commit 496245cac57e26d8b738d85c7a29cf9a47610f3f upstream.
>>
>> There is a report in kernel bugzilla about mismatch file type in dir
>> item and inode item.
>>
>> This inspires us to check inode mode in inode item.
>>
>> This patch will check the following members:
> 
>> +/* Here we use super block generation + 1 to handle log tree */
>> +if (btrfs_inode_generation(leaf, iitem) > super_gen + 1) {
>> +inode_item_err(fs_info, leaf, slot,
>> +"invalid inode generation: has %llu expect (0, %llu]",
>> +   btrfs_inode_generation(leaf, iitem),
>> +   super_gen + 1);
>> +return -EUCLEAN;
>> +}
> 
> Printk suggests btrfs_inode_generation() may not be zero, but the
> condition does not actually check that. Should that be added?

Sorry, btrfs_inode_generation() here is exactly what we're checking
here, so what's wrong?

Or did you mean the next chunk of btrfs_inode_transid() check?

That error message is wrong, and we had upstream fix for it:
f96d6960abbc ("btrfs: tree-checker: fix the error message for transid
error")

Thanks,
Qu

> 
>> +/* Note for ROOT_TREE_DIR_ITEM, mkfs could set its transid 0 */
>> +if (btrfs_inode_transid(leaf, iitem) > super_gen + 1) {
>> +inode_item_err(fs_info, leaf, slot,
>> +"invalid inode generation: has %llu expect [0, %llu]",
>> +   btrfs_inode_transid(leaf, iitem), super_gen + 1);
>> +return -EUCLEAN;
>> +}
> 
> Best regards,
>   Pavel
> 



Re: About regression caused by commit aea6cb99703e ("regulator: resolve supply after creating regulator")

2020-11-08 Thread Qu Wenruo



On 2020/11/9 上午1:18, Michał Mirosław wrote:
> On Sun, Nov 08, 2020 at 03:35:33PM +0800, Qu Wenruo wrote:
>> Hi Michał,
>>
>> Recently when testing v5.10-rc2, I found my RK3399 boards failed to boot
>> from NVME.
>>
>> It turns out that, commit aea6cb99703e ("regulator: resolve supply after
>> creating regulator") seems to be the cause.
>>
>> In RK3399 board, vpcie1v8 and vpcie0v9 of the pcie controller is
>> provided by RK808 regulator.
>> With that commit, now RK808 regulator fails to register:
>>
>> [1.402500] rk808-regulator rk808-regulator: there is no dvs0 gpio
>> [1.403104] rk808-regulator rk808-regulator: there is no dvs1 gpio
>> [1.419856] rk808 0-001b: failed to register 12 regulator
>> [1.422801] rk808-regulator: probe of rk808-regulator failed with
>> error -22
> 
> Hi,
> 
> This looks lika the problem fixed by commit cf1ad559a20d ("regulator: defer
> probe when trying to get voltage from unresolved supply") recently accepted
> to regulator tree [1]. Can you verify this?

Thanks, tested with that commit cherry picked to v5.10-rc2 and it solves
the problem.

Thanks,
Qu
> 
> [1] git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next 
>  
> Best Regards
> Michał Mirosław
> 



[PATCH] PCI: Rockchip: output proper error message for regulator error

2020-11-08 Thread Qu Wenruo
There is a regression caused by commit aea6cb99703e ("regulator: resolve
supply after creating regulator") which makes RK808 unable to register
its regulators.

This leads to vpcie1v8 and vpcie0v9 unable to be looked up, causing
rockchip pcie root controller unable to initialize.

At the same time, the dmesg shows nothing about the problem, making
debug much harder.

This patch will introduce a macro, rockchip_get_regulator(), which we
can get mandatory or optional regulator with just one line, with proper
error message when it goes wrong.

Signed-off-by: Qu Wenruo 
---
 drivers/pci/controller/pcie-rockchip-host.c | 58 ++---
 1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/controller/pcie-rockchip-host.c 
b/drivers/pci/controller/pcie-rockchip-host.c
index 9705059523a6..981ea882ba26 100644
--- a/drivers/pci/controller/pcie-rockchip-host.c
+++ b/drivers/pci/controller/pcie-rockchip-host.c
@@ -578,6 +578,33 @@ static int rockchip_pcie_setup_irq(struct rockchip_pcie 
*rockchip)
return 0;
 }
 
+#define rockchip_get_regulator(rockchip, name, optional)   \
+({ \
+   struct device *dev = rockchip->dev; \
+   int ret = 0;\
+   \
+   if (optional)   \
+   rockchip->name = devm_regulator_get_optional(dev,   \
+#name);\
+   else\
+   rockchip->name = devm_regulator_get(dev, #name);\
+   if (IS_ERR(rockchip->name)) {   \
+   ret = PTR_ERR(rockchip->name);  \
+   if (ret != -ENODEV || !optional) {  \
+   dev_err(dev, "failed to get %s regulator: %d\n",\
+   #name, ret);\
+   } else if (optional) {  \
+   dev_info(dev, "no %s regulator found, skip\n",  \
+#name);\
+   ret = 0;\
+   }   \
+   }   \
+   ret;\
+});
+
+#define OPTIONAL   true
+#define MANDATORY  false
+
 /**
  * rockchip_pcie_parse_host_dt - Parse Device Tree
  * @rockchip: PCIe port information
@@ -586,7 +613,6 @@ static int rockchip_pcie_setup_irq(struct rockchip_pcie 
*rockchip)
  */
 static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
 {
-   struct device *dev = rockchip->dev;
int err;
 
err = rockchip_pcie_parse_dt(rockchip);
@@ -597,29 +623,21 @@ static int rockchip_pcie_parse_host_dt(struct 
rockchip_pcie *rockchip)
if (err)
return err;
 
-   rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
-   if (IS_ERR(rockchip->vpcie12v)) {
-   if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
-   return PTR_ERR(rockchip->vpcie12v);
-   dev_info(dev, "no vpcie12v regulator found\n");
-   }
+   err = rockchip_get_regulator(rockchip, vpcie12v, OPTIONAL);
+   if (err)
+   return err;
 
-   rockchip->vpcie3v3 = devm_regulator_get_optional(dev, "vpcie3v3");
-   if (IS_ERR(rockchip->vpcie3v3)) {
-   if (PTR_ERR(rockchip->vpcie3v3) != -ENODEV)
-   return PTR_ERR(rockchip->vpcie3v3);
-   dev_info(dev, "no vpcie3v3 regulator found\n");
-   }
+   err = rockchip_get_regulator(rockchip, vpcie3v3, OPTIONAL);
+   if (err)
+   return err;
 
-   rockchip->vpcie1v8 = devm_regulator_get(dev, "vpcie1v8");
-   if (IS_ERR(rockchip->vpcie1v8))
-   return PTR_ERR(rockchip->vpcie1v8);
+   err = rockchip_get_regulator(rockchip, vpcie1v8, MANDATORY);
+   if (err)
+   return err;
 
-   rockchip->vpcie0v9 = devm_regulator_get(dev, "vpcie0v9");
-   if (IS_ERR(rockchip->vpcie0v9))
-   return PTR_ERR(rockchip->vpcie0v9);
+   err = rockchip_get_regulator(rockchip, vpcie0v9, MANDATORY);
 
-   return 0;
+   return err;
 }
 
 static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
-- 
2.29.2



Re: About regression caused by commit aea6cb99703e ("regulator: resolve supply after creating regulator")

2020-11-07 Thread Qu Wenruo
Also add Rockchip and device tree mail lists to the CC, just in case we
need to update the device tree for RK808.

On 2020/11/8 下午3:35, Qu Wenruo wrote:
> Hi Michał,
> 
> Recently when testing v5.10-rc2, I found my RK3399 boards failed to boot
> from NVME.
> 
> It turns out that, commit aea6cb99703e ("regulator: resolve supply after
> creating regulator") seems to be the cause.
> 
> In RK3399 board, vpcie1v8 and vpcie0v9 of the pcie controller is
> provided by RK808 regulator.
> With that commit, now RK808 regulator fails to register:
> 
> [1.402500] rk808-regulator rk808-regulator: there is no dvs0 gpio
> [1.403104] rk808-regulator rk808-regulator: there is no dvs1 gpio
> [1.419856] rk808 0-001b: failed to register 12 regulator
> [1.422801] rk808-regulator: probe of rk808-regulator failed with
> error -22
> 
> Since voltages from rk808 are not proper registered, then it prevents
> the rockchip PCIE controller to find its voltage provider:
> 
> [1.855276] rockchip_pcie_probe: parse_host_dt err=-517
> 
> 
> I currently tested with that commit reverted, then the RK808 works again.
> 
> Is this a known regression? Or the RK808 device tree is out of spec?
> 
> It would help a lot to fix the problem before the regression makes all
> RK3399 boards to lose their ability to initialize PCIE controller.
> 
> 
> BTW I didn't find that patch submitted to mail lists like
> linux-arm-kernel. I doubt if that commit really got enough testing from
> arm community, especially considering that currently ARM is the biggest
> user of device-tree and regulators.
> 
> Maybe it's a good idea to also submit such patches to arm related mail
> lists next time?
> 
> Thanks,
> Qu
> 



About regression caused by commit aea6cb99703e ("regulator: resolve supply after creating regulator")

2020-11-07 Thread Qu Wenruo
Hi Michał,

Recently when testing v5.10-rc2, I found my RK3399 boards failed to boot
from NVME.

It turns out that, commit aea6cb99703e ("regulator: resolve supply after
creating regulator") seems to be the cause.

In RK3399 board, vpcie1v8 and vpcie0v9 of the pcie controller is
provided by RK808 regulator.
With that commit, now RK808 regulator fails to register:

[1.402500] rk808-regulator rk808-regulator: there is no dvs0 gpio
[1.403104] rk808-regulator rk808-regulator: there is no dvs1 gpio
[1.419856] rk808 0-001b: failed to register 12 regulator
[1.422801] rk808-regulator: probe of rk808-regulator failed with
error -22

Since voltages from rk808 are not proper registered, then it prevents
the rockchip PCIE controller to find its voltage provider:

[1.855276] rockchip_pcie_probe: parse_host_dt err=-517


I currently tested with that commit reverted, then the RK808 works again.

Is this a known regression? Or the RK808 device tree is out of spec?

It would help a lot to fix the problem before the regression makes all
RK3399 boards to lose their ability to initialize PCIE controller.


BTW I didn't find that patch submitted to mail lists like
linux-arm-kernel. I doubt if that commit really got enough testing from
arm community, especially considering that currently ARM is the biggest
user of device-tree and regulators.

Maybe it's a good idea to also submit such patches to arm related mail
lists next time?

Thanks,
Qu



About the rockpi4 pcie controller failed to initialize problem in v5.10-rc2

2020-11-07 Thread Qu Wenruo
Hi guys,

I see your awesome contribution to support Rock Pi 4B.

However in recent rc (v5.10-rc2), I found that even with `vpcie1v8` and
`vpcie0v9` added, `regulartor_dev_lookup()` now just returns
-EPROBE_DEFER, preventing rockchip pcie controller to be initialized.

The full callchain is:

rockchip_pcie_parse_host_dt()
|- rockchip>vpcie1v8 = devm_regulator_get_optional(dev, "vpcie1v8");
   |- _regulator_get()
  |- regulator_dev_lookup()
 |- node = of_get_regulartor()
 |- if (!node) {
 |- r = of_find_regulator(); /* No @r found */
 |- return ERR_PTR(-EPROBE_DEFER);

This means we can't utilize PCIE controller completely.

But strangely, `vpcie12v` and `vpcie3v3` both initialized without problem.

Any clue on how the problem could happened? I guess it's some device
tree definition went crazy, but not familiar with device tree at all.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: Very slow realtek 8169 ethernet performance, but only one interface, on ThinkPad T14.

2020-11-05 Thread Qu Wenruo


On 2020/11/5 下午5:13, Heiner Kallweit wrote:
> On 05.11.2020 08:42, Qu Wenruo wrote:
>>
>>
>> On 2020/11/5 下午3:01, Heiner Kallweit wrote:
>>> On 05.11.2020 03:48, Qu Wenruo wrote:
>>>> Hi,
>>>>
>>>> Not sure if this is a regression or not, but just find out that after 
>>>> upgrading to v5.9 kernel, one of my ethernet port on my ThinkPad T14 
>>>> (ryzen version) becomes very slow.
>>>>
>>>> Only *2~3* Mbps.
>>>>
>>>> The laptop has two ethernet interfaces, one needs a passive adapter, the 
>>>> other one is a standard RJ45.
>>>>
>>>> The offending one is the one which needs the adapter (eth0).
>>>> While the RJ45 one is completely fine.
>>>>
>>>> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0e)
>>>> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
>>>>
>>>> The 02:00.0 one is the affected one.
>>>>
>>>> The related dmesgs are:
>>>> [   38.110293] r8169 :02:00.0: can't disable ASPM; OS doesn't have 
>>>> ASPM control
>>>> [   38.126069] libphy: r8169: probed
>>>> [   38.126250] r8169 :02:00.0 eth0: RTL8168ep/8111ep, 
>>>> 00:2b:67:b3:d9:20, XID 502, IRQ 105
>>>> [   38.126252] r8169 :02:00.0 eth0: jumbo features [frames: 9194 
>>>> bytes, tx checksumming: ko]
>>>> [   38.126294] r8169 :05:00.0: can't disable ASPM; OS doesn't have 
>>>> ASPM control
>>>> [   38.126300] r8169 :05:00.0: enabling device ( -> 0003)
>>>> [   38.139355] libphy: r8169: probed
>>>> [   38.139523] r8169 :05:00.0 eth1: RTL8168h/8111h, 00:2b:67:b3:d9:1f, 
>>>> XID 541, IRQ 107
>>>> [   38.139525] r8169 :05:00.0 eth1: jumbo features [frames: 9194 
>>>> bytes, tx checksumming: ko]
>>>> [   42.120935] Generic FE-GE Realtek PHY r8169-200:00: attached PHY driver 
>>>> [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>>>> [   42.247646] r8169 :02:00.0 eth0: Link is Down
>>>> [   42.280799] Generic FE-GE Realtek PHY r8169-500:00: attached PHY driver 
>>>> [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-500:00, irq=IGNORE)
>>>> [   42.477616] r8169 :05:00.0 eth1: Link is Down
>>>> [   76.479569] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>>>> control rx/tx
>>>> [   91.271894] r8169 :02:00.0 eth0: Link is Down
>>>> [   99.873390] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>>>> control rx/tx
>>>> [   99.878938] r8169 :02:00.0 eth0: Link is Down
>>>> [  102.579290] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>>>> control rx/tx
>>>> [  185.086002] r8169 :02:00.0 eth0: Link is Down
>>>> [  392.884584] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>>>> control rx/tx
>>>> [  392.891208] r8169 :02:00.0 eth0: Link is Down
>>>> [  395.889047] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>>>> control rx/tx
>>>> [  406.670738] r8169 :02:00.0 eth0: Link is Down
>>>>
>>>> Really nothing strange, even it negotiates to 1Gbps.
>>>>
>>>> But during iperf3, it only goes up to miserable 3Mbps.
>>>>
>>>> Is this some known bug or something special related to the passive adapter?
>>>>
>>>> Since the adapter is passive, and hasn't experience anything wrong for a 
>>>> long time, I really doubt that.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>
>>> Thanks for the report. From which kernel version did you upgrade?
>>
>> Tested back to v5.7, which still shows the miserable performance.
>>
>> So I guess it could be a faulty adapter?
>>
>>> Please test
>>> with the prior kernel version and report behavior (link stability and 
>>> speed).
>>> Under 5.9, does ethtool -S eth0 report packet errors?
>>>
>> Nope, no tx/rx_errors, no missed/aborted/underrun.
>>
>> Adding that the adapter is completely passive (no chip, just converting
>> RJ45 pins to the I shaped pins), I'm not sure that the adapter itself
>> can fail.
>>
> Each additional mechanical connection may cause reflections or other signal
> disturbance. You could try to restrict the speed to 100Mbps via ethtool,
> and see what the effective speed is then. 100Mbps uses two wire pairs only.

OK, you're right, now I can get around 60Mbps.

So definitely something wrong with the adapter.

Will use the RJ45 one and avoid use the ThinkPad proprietary interface.

Thanks,
Qu
> 
>> THanks,
>> Qu
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Very slow realtek 8169 ethernet performance, but only one interface, on ThinkPad T14.

2020-11-04 Thread Qu Wenruo


On 2020/11/5 下午3:01, Heiner Kallweit wrote:
> On 05.11.2020 03:48, Qu Wenruo wrote:
>> Hi,
>>
>> Not sure if this is a regression or not, but just find out that after 
>> upgrading to v5.9 kernel, one of my ethernet port on my ThinkPad T14 (ryzen 
>> version) becomes very slow.
>>
>> Only *2~3* Mbps.
>>
>> The laptop has two ethernet interfaces, one needs a passive adapter, the 
>> other one is a standard RJ45.
>>
>> The offending one is the one which needs the adapter (eth0).
>> While the RJ45 one is completely fine.
>>
>> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0e)
>> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
>>
>> The 02:00.0 one is the affected one.
>>
>> The related dmesgs are:
>> [   38.110293] r8169 :02:00.0: can't disable ASPM; OS doesn't have ASPM 
>> control
>> [   38.126069] libphy: r8169: probed
>> [   38.126250] r8169 :02:00.0 eth0: RTL8168ep/8111ep, 00:2b:67:b3:d9:20, 
>> XID 502, IRQ 105
>> [   38.126252] r8169 :02:00.0 eth0: jumbo features [frames: 9194 bytes, 
>> tx checksumming: ko]
>> [   38.126294] r8169 :05:00.0: can't disable ASPM; OS doesn't have ASPM 
>> control
>> [   38.126300] r8169 :05:00.0: enabling device ( -> 0003)
>> [   38.139355] libphy: r8169: probed
>> [   38.139523] r8169 :05:00.0 eth1: RTL8168h/8111h, 00:2b:67:b3:d9:1f, 
>> XID 541, IRQ 107
>> [   38.139525] r8169 :05:00.0 eth1: jumbo features [frames: 9194 bytes, 
>> tx checksumming: ko]
>> [   42.120935] Generic FE-GE Realtek PHY r8169-200:00: attached PHY driver 
>> [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>> [   42.247646] r8169 :02:00.0 eth0: Link is Down
>> [   42.280799] Generic FE-GE Realtek PHY r8169-500:00: attached PHY driver 
>> [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-500:00, irq=IGNORE)
>> [   42.477616] r8169 :05:00.0 eth1: Link is Down
>> [   76.479569] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>> [   91.271894] r8169 :02:00.0 eth0: Link is Down
>> [   99.873390] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>> [   99.878938] r8169 :02:00.0 eth0: Link is Down
>> [  102.579290] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>> [  185.086002] r8169 :02:00.0 eth0: Link is Down
>> [  392.884584] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>> [  392.891208] r8169 :02:00.0 eth0: Link is Down
>> [  395.889047] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>> [  406.670738] r8169 :02:00.0 eth0: Link is Down
>>
>> Really nothing strange, even it negotiates to 1Gbps.
>>
>> But during iperf3, it only goes up to miserable 3Mbps.
>>
>> Is this some known bug or something special related to the passive adapter?
>>
>> Since the adapter is passive, and hasn't experience anything wrong for a 
>> long time, I really doubt that.
>>
>> Thanks,
>> Qu
>>
>>
> Thanks for the report. From which kernel version did you upgrade?

Tested back to v5.7, which still shows the miserable performance.

So I guess it could be a faulty adapter?

> Please test
> with the prior kernel version and report behavior (link stability and speed).
> Under 5.9, does ethtool -S eth0 report packet errors?
> 
Nope, no tx/rx_errors, no missed/aborted/underrun.

Adding that the adapter is completely passive (no chip, just converting
RJ45 pins to the I shaped pins), I'm not sure that the adapter itself
can fail.

THanks,
Qu



signature.asc
Description: OpenPGP digital signature


Very slow realtek 8169 ethernet performance, but only one interface, on ThinkPad T14.

2020-11-04 Thread Qu Wenruo
Hi,

Not sure if this is a regression or not, but just find out that after upgrading 
to v5.9 kernel, one of my ethernet port on my ThinkPad T14 (ryzen version) 
becomes very slow.

Only *2~3* Mbps.

The laptop has two ethernet interfaces, one needs a passive adapter, the other 
one is a standard RJ45.

The offending one is the one which needs the adapter (eth0).
While the RJ45 one is completely fine.

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 
PCI Express Gigabit Ethernet Controller (rev 0e)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 
PCI Express Gigabit Ethernet Controller (rev 15)

The 02:00.0 one is the affected one.

The related dmesgs are:
[   38.110293] r8169 :02:00.0: can't disable ASPM; OS doesn't have ASPM 
control
[   38.126069] libphy: r8169: probed
[   38.126250] r8169 :02:00.0 eth0: RTL8168ep/8111ep, 00:2b:67:b3:d9:20, 
XID 502, IRQ 105
[   38.126252] r8169 :02:00.0 eth0: jumbo features [frames: 9194 bytes, tx 
checksumming: ko]
[   38.126294] r8169 :05:00.0: can't disable ASPM; OS doesn't have ASPM 
control
[   38.126300] r8169 :05:00.0: enabling device ( -> 0003)
[   38.139355] libphy: r8169: probed
[   38.139523] r8169 :05:00.0 eth1: RTL8168h/8111h, 00:2b:67:b3:d9:1f, XID 
541, IRQ 107
[   38.139525] r8169 :05:00.0 eth1: jumbo features [frames: 9194 bytes, tx 
checksumming: ko]
[   42.120935] Generic FE-GE Realtek PHY r8169-200:00: attached PHY driver 
[Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   42.247646] r8169 :02:00.0 eth0: Link is Down
[   42.280799] Generic FE-GE Realtek PHY r8169-500:00: attached PHY driver 
[Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-500:00, irq=IGNORE)
[   42.477616] r8169 :05:00.0 eth1: Link is Down
[   76.479569] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow control 
rx/tx
[   91.271894] r8169 :02:00.0 eth0: Link is Down
[   99.873390] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow control 
rx/tx
[   99.878938] r8169 :02:00.0 eth0: Link is Down
[  102.579290] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow control 
rx/tx
[  185.086002] r8169 :02:00.0 eth0: Link is Down
[  392.884584] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow control 
rx/tx
[  392.891208] r8169 :02:00.0 eth0: Link is Down
[  395.889047] r8169 :02:00.0 eth0: Link is Up - 1Gbps/Full - flow control 
rx/tx
[  406.670738] r8169 :02:00.0 eth0: Link is Down

Really nothing strange, even it negotiates to 1Gbps.

But during iperf3, it only goes up to miserable 3Mbps.

Is this some known bug or something special related to the passive adapter?

Since the adapter is passive, and hasn't experience anything wrong for a long 
time, I really doubt that.

Thanks,
Qu




signature.asc
Description: OpenPGP digital signature


v5.10-rc2 kernel unable to initialize RK3399 pcie root complex

2020-11-04 Thread Qu Wenruo
Hi,

Recently I tried to run v5.10-rc2 kernel on my RK3399 board (Rock Pi 4B,
4G ram version), most drivers work, but the PCIE RC of the board fails
to register, without obvious dmesg error.


My previous v5.9 kernel runs pretty fine on that board, and can boot
from root on LVM on NVME device without any problem.

But for v5.10-rc2 kernel, the root complex just refuses to initialize.
Although the rockchip pcie seems to be detected, but no PCIE bus added
at all.

Also tried to add CONFIG_PCI_DEBUG, but still no pci related dmesg
except the rockchip pcie controller trying to initialize.

Manjaro ARM's linux-rc kernel has the same problem too, so it doesn't
seem to be a bug in my kernel config.

For the dmesg extracted from initramfs:
https://gist.github.com/adam900710/0415c1f19c07f65f892eeb848fd8dfbe


Is there any known bug related to pci to cause such problem?

Thanks,
Qu



Re: ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!

2020-11-03 Thread Qu Wenruo


On 2020/11/3 下午5:47, Geert Uytterhoeven wrote:
> On Tue, Nov 3, 2020 at 10:43 AM Naresh Kamboju
>  wrote:
>> Linux next 20201103 tag make modules failed for i386 and arm
>> architecture builds.
>>
>> Error log:
>>   LD [M]  fs/btrfs/btrfs.o
>>   MODPOST Module.symvers
>> ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
>> scripts/Makefile.modpost:111: recipe for target 'Module.symvers' failed
>> make[2]: *** [Module.symvers] Error 1
>>
>> Full build log,
>> https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-next/DISTRO=lkft,MACHINE=intel-core2-32,label=docker-lkft/891/consoleText
>> https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-next/DISTRO=lkft,MACHINE=am57xx-evm,label=docker-lkft/891/consoleText
>>
>> --
>> Linaro LKFT
>> https://lkft.linaro.org
> 
> Yeah, I had a look earlier today, thanks to the kisskb builder, and
> the btrfs people are working on a fix.
> Interestingly, the issue was reported in September, and still entered
> linux-next, so we all had a great time to look into it ;-)

Yeah, we all know that and how to fix it (just call do_div64() for u64 /
u32).
But at that time we're already working on a better solution, other than
using do_div64(), we use sectorsize_bits shift to replace the division,
and unfortunately the bit shift fix didn't get merged until recently.

Considering that patch is only designed to be merged after the bit shift
fix patch, we're not that concerned. (Until some other guys are
complaining about the linux-next branch).

Thanks,
Qu
> 
> https://lore.kernel.org/linux-btrfs/202009160107.dzzo6dfi%25...@intel.com/
> https://lore.kernel.org/linux-btrfs/20201102073114.66750-1-...@suse.com/
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 



signature.asc
Description: OpenPGP digital signature


Re: [btrfs] 3b54a0a703: WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]

2020-09-15 Thread Qu Wenruo



On 2020/9/16 上午11:32, Oliver Sang wrote:
> On Tue, Sep 15, 2020 at 04:00:40PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/9/15 下午3:40, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/9/15 下午1:54, Oliver Sang wrote:
>>>> On Wed, Sep 09, 2020 at 03:49:30PM +0800, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2020/9/9 下午3:08, kernel test robot wrote:
>>>>>> Greeting,
>>>>>>
>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>
>>>>>> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] btrfs: 
>>>>>> Detect unbalanced tree with empty leaf before crashing btree operations")
>>>>>> url: 
>>>>>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
>>>>>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git 
>>>>>> for-next
>>>>>>
>>>>>> in testcase: fio-basic
>>>>>> with following parameters:
>>>>>>
>>>>>>  runtime: 300s
>>>>>>  disk: 1SSD
>>>>>>  fs: btrfs
>>>>>>  nr_task: 100%
>>>>>>  test_size: 128G
>>>>>>  rw: write
>>>>>>  bs: 4k
>>>>>>  ioengine: sync
>>>>>>  cpufreq_governor: performance
>>>>>>  ucode: 0x42c
>>>>>>  fs2: nfsv4
>>>>>>
>>>>>> test-description: Fio is a tool that will spawn a number of threads or 
>>>>>> processes doing a particular type of I/O action as specified by the user.
>>>>>> test-url: https://github.com/axboe/fio
>>>>>>
>>>>>>
>>>>>> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 
>>>>>> 2.40GHz with 128G memory
>>>>>>
>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire 
>>>>>> log/backtrace):
>>>>>>
>>>>>>
>>>>>> ++++
>>>>>> |
>>>>>> | 2703206ff5 | 3b54a0a703 |
>>>>>> ++++
>>>>>> | boot_successes 
>>>>>> | 9  | 0  |
>>>>>> | boot_failures  
>>>>>> | 4  ||
>>>>>> | 
>>>>>> Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#)
>>>>>>  | 4  ||
>>>>>> ++++
>>>>>>
>>>>>>
>>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot 
>>>>>>
>>>>>>
>>>>>
>>>>> According to the full dmesg, it's invalid nritems causing transaction 
>>>>> abort.
>>>>>
>>>>> I'm not sure if it's caused by corrupts fs or something else.
>>>>>
>>>>> If intel guys can reproduce it reliably, would you please add such debug
>>>>> diff to output extra info?
>>>>
>>>> Hi Qu, sorry for late. we double confirmed the issue can be reproduced 
>>>> reliably.
>>>> The error will only occur on fbc but not parent commit.
>>>>
>>>> below from applying your path for extra info
>>>> [   42.539443] [task_0]$
>>>> [   42.539445]~$
>>>> [   42.546125] rw=write$
>>>> [   42.546126]~$
>>>> [   42.551637] directory=/fs/nvme1n1p1$
>>>> [   42.551638]~$
>>>> [   42.559135] numjobs=96' | fio --output-format=json -$
>>>> [   42.559136]~$
>>>> [   42.574513] perf version 5.9.rc4.g34d4ddd359db$
>>>> [   42.574518]~$
>>>> [   56.152662] BTRFS error (device nvme1n1p1): invalid tree nritems, 
>>>> bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606

Re: [btrfs] 3b54a0a703: WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]

2020-09-15 Thread Qu Wenruo



On 2020/9/15 下午3:40, Qu Wenruo wrote:
> 
> 
> On 2020/9/15 下午1:54, Oliver Sang wrote:
>> On Wed, Sep 09, 2020 at 03:49:30PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/9/9 下午3:08, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] btrfs: 
>>>> Detect unbalanced tree with empty leaf before crashing btree operations")
>>>> url: 
>>>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
>>>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
>>>>
>>>> in testcase: fio-basic
>>>> with following parameters:
>>>>
>>>>runtime: 300s
>>>>disk: 1SSD
>>>>fs: btrfs
>>>>nr_task: 100%
>>>>test_size: 128G
>>>>rw: write
>>>>bs: 4k
>>>>ioengine: sync
>>>>cpufreq_governor: performance
>>>>ucode: 0x42c
>>>>fs2: nfsv4
>>>>
>>>> test-description: Fio is a tool that will spawn a number of threads or 
>>>> processes doing a particular type of I/O action as specified by the user.
>>>> test-url: https://github.com/axboe/fio
>>>>
>>>>
>>>> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz 
>>>> with 128G memory
>>>>
>>>> caused below changes (please refer to attached dmesg/kmsg for entire 
>>>> log/backtrace):
>>>>
>>>>
>>>> ++++
>>>> |  
>>>>   | 2703206ff5 | 3b54a0a703 |
>>>> ++++
>>>> | boot_successes   
>>>>   | 9  | 0  |
>>>> | boot_failures
>>>>   | 4  ||
>>>> | 
>>>> Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) 
>>>> | 4  ||
>>>> ++++
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot 
>>>>
>>>>
>>>
>>> According to the full dmesg, it's invalid nritems causing transaction abort.
>>>
>>> I'm not sure if it's caused by corrupts fs or something else.
>>>
>>> If intel guys can reproduce it reliably, would you please add such debug
>>> diff to output extra info?
>>
>> Hi Qu, sorry for late. we double confirmed the issue can be reproduced 
>> reliably.
>> The error will only occur on fbc but not parent commit.
>>
>> below from applying your path for extra info
>> [   42.539443] [task_0]$
>> [   42.539445]~$
>> [   42.546125] rw=write$
>> [   42.546126]~$
>> [   42.551637] directory=/fs/nvme1n1p1$
>> [   42.551638]~$
>> [   42.559135] numjobs=96' | fio --output-format=json -$
>> [   42.559136]~$
>> [   42.574513] perf version 5.9.rc4.g34d4ddd359db$
>> [   42.574518]~$
>> [   56.152662] BTRFS error (device nvme1n1p1): invalid tree nritems, 
>> bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606 128 
>> 96941895680) nritems=0
>>  expect >0$
> 
> Just as expected, this is indeed csum tree.
> And it looks like it's indeed still valid.
> 
> The csum root can still have its key from previous not emptry csum.

Wait for a minute, if it's csum root empty, we shouldn't have first_key
passed in.

So this still has something wrong.

Would you please try this diff to provide more debug info?
(Better to remove the existing diff)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 75bbe879ed18..6f29a3c38b56 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -400,10 +400,17 @@ int btrfs_verify_level_key(struct extent_buffer
*eb, int level,

/* We have @first_key, so this @eb must have at least one item */
if (btrfs_header_nritems(eb) == 0) {
+   pr_info("%s: eb start=%llu gen=%llu last_

Re: [btrfs] 3b54a0a703: WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]

2020-09-15 Thread Qu Wenruo



On 2020/9/15 下午1:54, Oliver Sang wrote:
> On Wed, Sep 09, 2020 at 03:49:30PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/9/9 下午3:08, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed the following commit (built with gcc-9):
>>>
>>> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] btrfs: 
>>> Detect unbalanced tree with empty leaf before crashing btree operations")
>>> url: 
>>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
>>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
>>>
>>> in testcase: fio-basic
>>> with following parameters:
>>>
>>> runtime: 300s
>>> disk: 1SSD
>>> fs: btrfs
>>> nr_task: 100%
>>> test_size: 128G
>>> rw: write
>>> bs: 4k
>>> ioengine: sync
>>> cpufreq_governor: performance
>>> ucode: 0x42c
>>> fs2: nfsv4
>>>
>>> test-description: Fio is a tool that will spawn a number of threads or 
>>> processes doing a particular type of I/O action as specified by the user.
>>> test-url: https://github.com/axboe/fio
>>>
>>>
>>> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz 
>>> with 128G memory
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire 
>>> log/backtrace):
>>>
>>>
>>> ++++
>>> |   
>>>  | 2703206ff5 | 3b54a0a703 |
>>> ++++
>>> | boot_successes
>>>  | 9  | 0  |
>>> | boot_failures 
>>>  | 4  ||
>>> | 
>>> Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) 
>>> | 4  ||
>>> ++++
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot 
>>>
>>>
>>
>> According to the full dmesg, it's invalid nritems causing transaction abort.
>>
>> I'm not sure if it's caused by corrupts fs or something else.
>>
>> If intel guys can reproduce it reliably, would you please add such debug
>> diff to output extra info?
> 
> Hi Qu, sorry for late. we double confirmed the issue can be reproduced 
> reliably.
> The error will only occur on fbc but not parent commit.
> 
> below from applying your path for extra info
> [   42.539443] [task_0]$
> [   42.539445]~$
> [   42.546125] rw=write$
> [   42.546126]~$
> [   42.551637] directory=/fs/nvme1n1p1$
> [   42.551638]~$
> [   42.559135] numjobs=96' | fio --output-format=json -$
> [   42.559136]~$
> [   42.574513] perf version 5.9.rc4.g34d4ddd359db$
> [   42.574518]~$
> [   56.152662] BTRFS error (device nvme1n1p1): invalid tree nritems, 
> bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606 128 
> 96941895680) nritems=0
>  expect >0$

Just as expected, this is indeed csum tree.
And it looks like it's indeed still valid.

The csum root can still have its key from previous not emptry csum.

In that case, the check is indeed too strict and causes false alert.

I'll soon send out a fix with Intel reported-by.

Thanks,
Qu

> [   56.152664] BTRFS error (device nvme1n1p1): invalid tree nritems, 
> bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606 128 
> 96941895680) nritems=0
>  expect >0$
> [   56.152666] [ cut here ]$
> [   56.168263] BTRFS: error (device nvme1n1p1) in 
> btrfs_finish_ordered_io:2687: errno=-117 Filesystem corrupted$
> [   56.168264] BTRFS info (device nvme1n1p1): forced readonly$
> [   56.205009] BTRFS: Transaction aborted (error -117)$
> [   56.210368] WARNING: CPU: 71 PID: 537 at fs/btrfs/inode.c:2687 
> btrfs_finish_ordered_io+0x70a/0x820 [btrfs]$
> [   56.220466] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfsd 
> auth_rpcgss dm_mod dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core 
> btrfs blak
> e2b_generic sr_mod xor cdrom sd_mod zstd_decompress sg zstd_compress raid6_pq 
> 

Re: [btrfs] 3b54a0a703: WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]

2020-09-09 Thread Qu Wenruo



On 2020/9/9 下午3:08, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] btrfs: 
> Detect unbalanced tree with empty leaf before crashing btree operations")
> url: 
> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
> 
> in testcase: fio-basic
> with following parameters:
> 
>   runtime: 300s
>   disk: 1SSD
>   fs: btrfs
>   nr_task: 100%
>   test_size: 128G
>   rw: write
>   bs: 4k
>   ioengine: sync
>   cpufreq_governor: performance
>   ucode: 0x42c
>   fs2: nfsv4
> 
> test-description: Fio is a tool that will spawn a number of threads or 
> processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
> 
> 
> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz 
> with 128G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> ++++
> |
> | 2703206ff5 | 3b54a0a703 |
> ++++
> | boot_successes 
> | 9  | 0  |
> | boot_failures  
> | 4  ||
> | Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) 
> | 4  ||
> ++++
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 

According to the full dmesg, it's invalid nritems causing transaction abort.

I'm not sure if it's caused by corrupts fs or something else.

If intel guys can reproduce it reliably, would you please add such debug
diff to output extra info?

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b1a148058773..b050d6fcb90a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -406,8 +406,9 @@ int btrfs_verify_level_key(struct extent_buffer *eb,
int level,
/* We have @first_key, so this @eb must have at least one item */
if (btrfs_header_nritems(eb) == 0) {
btrfs_err(fs_info,
-   "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
- eb->start);
+   "invalid tree nritems, bytenr=%llu owner=%llu level=%d
first_key=(%llu %u %llu) nritems=0 expect >0",
+ eb->start, btrfs_header_owner(eb),
btrfs_header_level(eb),
+ first_key->objectid, first_key->type,
first_key->offset);
WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
return -EUCLEAN;
}

Thanks,
Qu

> [   50.226906] WARNING: CPU: 71 PID: 500 at fs/btrfs/inode.c:2687 
> btrfs_finish_ordered_io+0x70a/0x820 [btrfs]
> [   50.236913] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfsd 
> auth_rpcgss dm_mod dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core 
> btrfs sr_mod blake2b_generic xor cdrom sd_mod zstd_decompress sg 
> zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common skx_edac 
> x86_pkg_temp_thermal ipmi_ssif intel_powerclamp coretemp kvm_intel kvm 
> irqbypass ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel 
> acpi_ipmi drm_ttm_helper ghash_clmulni_intel ttm rapl drm_kms_helper 
> intel_cstate syscopyarea sysfillrect nvme sysimgblt intel_uncore fb_sys_fops 
> nvme_core ahci libahci t10_pi drm mei_me ioatdma libata mei ipmi_si joydev 
> dca wmi ipmi_devintf ipmi_msghandler nfit libnvdimm ip_tables
> [   50.301669] CPU: 71 PID: 500 Comm: kworker/u193:5 Not tainted 
> 5.8.0-rc7-00165-g3b54a0a703f17 #1
> [   50.310904] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
> [   50.317626] RIP: 0010:btrfs_finish_ordered_io+0x70a/0x820 [btrfs]
> [   50.324255] Code: 48 0a 00 00 02 72 25 41 83 ff fb 0f 84 f2 00 00 00 41 83 
> ff e2 0f 84 e8 00 00 00 44 89 fe 48 c7 c7 70 1c 2b c1 e8 58 ae ed bf <0f> 0b 
> 44 89 f9 ba 7f 0a 00 00 48 c7 c6 50 47 2a c1 48 89 df e8 15
> [   50.344116] RSP: 0018:c90007a83d58 EFLAGS: 00010282
> [   50.349923] RAX:  RBX: 888a93ca5ea0 RCX: 
> 
> [   50.357656] RDX: 8890401e82a0 RSI: 8890401

[PATCH] module: Add more error message for failed kernel module loading

2020-09-02 Thread Qu Wenruo
When kernel module loading failed, user space only get one of the
following error messages:

- ENOEXEC
  This is the most confusing one. From corrupted ELF header to bad
  WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
  all returns this error number.

- EPERM
  This is for blacklisted modules. But mod doesn't do extra explain
  on this error either.

- ENOMEM
  The only error which needs no explain.

This means, if a user got "Exec format error" from modprobe, it provides
no meaningful way for the user to debug, and will take extra time
communicating to get extra info.

So this patch will add extra error messages for -ENOEXEC and -EPERM
errors, allowing user to do better debugging and reporting.

Signed-off-by: Qu Wenruo 
Reviewed-by: Lucas De Marchi 
---
Changelog:
v2:
- Add extra section description for the error message of
  module_enforce_rwx_sections()
- Add Reviewed-by tags.
---
 kernel/module.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1c5cff34d9f2..2c00059ac1c9 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2096,8 +2096,11 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, 
Elf_Shdr *sechdrs,
int i;
 
for (i = 0; i < hdr->e_shnum; i++) {
-   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
+   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
+   pr_err("%s: section %s (index %d) has invalid 
WRITE|EXEC flags\n",
+   mod->name, secstrings + sechdrs[i].sh_name, i);
return -ENOEXEC;
+   }
}
 
return 0;
@@ -3825,8 +3828,10 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
char *after_dashes;
 
err = elf_header_check(info);
-   if (err)
+   if (err) {
+   pr_err("Module has invalid ELF header\n");
goto free_copy;
+   }
 
err = setup_load_info(info, flags);
if (err)
@@ -3834,6 +3839,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
 
if (blacklisted(info->name)) {
err = -EPERM;
+   pr_err("Module %s is blacklisted\n", info->name);
goto free_copy;
}
 
-- 
2.28.0



[PATCH] module: Add more error message for failed kernel module loading

2020-08-31 Thread Qu Wenruo
When kernel module loading failed, user space only get one of the
following error messages:
- -ENOEXEC
  This is the most confusing one. From corrupted ELF header to bad
  WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
  all returns this error number.

- -EPERM
  This is for blacklisted modules. But mod doesn't do extra explain
  on this error either.

- -ENOMEM
  The only error which needs no explain.

This means, if a user got "Exec format error" from modprobe, it provides
no meaningful way for the user to debug, and will take extra time
communicating to get extra info.

So this patch will add extra error messages for -ENOEXEC and -EPERM
errors, allowing user to do better debugging and reporting.

Signed-off-by: Qu Wenruo 
---
 kernel/module.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 8fa2600bde6a..204bf29437b8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2068,8 +2068,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, 
Elf_Shdr *sechdrs,
int i;
 
for (i = 0; i < hdr->e_shnum; i++) {
-   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
+   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
+   pr_err(
+   "Module %s section %d has invalid WRITE|EXEC flags\n",
+   mod->name, i);
return -ENOEXEC;
+   }
}
 
return 0;
@@ -3797,8 +3801,10 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
char *after_dashes;
 
err = elf_header_check(info);
-   if (err)
+   if (err) {
+   pr_err("Module has invalid ELF header\n");
goto free_copy;
+   }
 
err = setup_load_info(info, flags);
if (err)
@@ -3806,6 +3812,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
 
if (blacklisted(info->name)) {
err = -EPERM;
+   pr_err("Module %s is blacklisted\n", info->name);
goto free_copy;
}
 
-- 
2.28.0



[PATCH] module: Add more error message for failed kernel module loading

2020-08-29 Thread Qu Wenruo
When kernel module loading failed, user space only get one of the
following error messages:
- -ENOEXEC
  This is the most confusing one. From corrupted ELF header to bad
  WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
  all returns this error number.

- -EPERM
  This is for blacklisted modules. But mod doesn't do extra explain
  on this error either.

- -ENOMEM
  The only error which needs no explain.

This means, if a user got "Exec format error" from modprobe, it provides
no meaningful way for the user to debug, and will take extra time
communicating to get extra info.

So this patch will add extra error messages for -ENOEXEC and -EPERM
errors, allowing user to do better debugging and reporting.

Signed-off-by: Qu Wenruo 
---
 kernel/module.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1c5cff34d9f2..9f748c6eeb48 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, 
Elf_Shdr *sechdrs,
int i;
 
for (i = 0; i < hdr->e_shnum; i++) {
-   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
+   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
+   pr_err(
+   "Module %s section %d has invalid WRITE|EXEC flags\n",
+   mod->name, i);
return -ENOEXEC;
+   }
}
 
return 0;
@@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
char *after_dashes;
 
err = elf_header_check(info);
-   if (err)
+   if (err) {
+   pr_err("Module has invalid ELF header\n");
goto free_copy;
+   }
 
err = setup_load_info(info, flags);
if (err)
@@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
 
if (blacklisted(info->name)) {
err = -EPERM;
+   pr_err("Module %s is blacklisted\n", info->name);
goto free_copy;
}
 
-- 
2.27.0



Re: [PATCH] btrfs: block-group: Fix free-space bitmap threshould

2020-08-20 Thread Qu Wenruo


On 2020/8/21 上午10:42, Marcos Paulo de Souza wrote:
> From: Marcos Paulo de Souza 
> 
> [BUG]
> After commit 9afc66498a0b ("btrfs: block-group: refactor how we read one
> block group item"), cache->length is being assigned after calling
> btrfs_create_block_group_cache. This causes a problem since
> set_free_space_tree_thresholds is calculate the free-space threshould to
> decide is the free-space tree should convert from extents to bitmaps.
> 
> The current code calls set_free_space_tree_thresholds with cache->length
> being 0, which then makes cache->bitmap_high_thresh being zero. This
> implies the system will always use bitmap instead of extents, which is
> not desired if the block group is not fragmented.
> 
> This behavior can be seen by a test that expects to repair systems
> with FREE_SPACE_EXTENT and FREE_SPACE_BITMAP, but the current code only
> created FREE_SPACE_BITMAP.
> 
> [FIX]
> Call set_free_space_tree_thresholds after setting cache->length.
> 
> Link: https://github.com/kdave/btrfs-progs/issues/251
> Fixes: 9afc66498a0b ("btrfs: block-group: refactor how we read one block 
> group item")
> CC: sta...@vger.kernel.org # 5.8+
> Signed-off-by: Marcos Paulo de Souza 

Reviewed-by: Qu Wenruo 

It would be even nicer if you could add some warning or self-test on
cache->length to prevent such problem from happening again.

Thanks,
Qu
> ---
>  fs/btrfs/block-group.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 44fdfa2eeb2e..01e8ba1da1d3 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1798,7 +1798,6 @@ static struct btrfs_block_group 
> *btrfs_create_block_group_cache(
>  
>   cache->fs_info = fs_info;
>   cache->full_stripe_len = btrfs_full_stripe_len(fs_info, start);
> - set_free_space_tree_thresholds(cache);
>  
>   cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED;
>  
> @@ -1908,6 +1907,8 @@ static int read_one_block_group(struct btrfs_fs_info 
> *info,
>  
>   read_block_group_item(cache, path, key);
>  
> + set_free_space_tree_thresholds(cache);
> +
>   if (need_clear) {
>   /*
>* When we mount with old space cache, we need to
> @@ -2128,6 +2129,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
> *trans, u64 bytes_used,
>   return -ENOMEM;
>  
>   cache->length = size;
> + set_free_space_tree_thresholds(cache);
>   cache->used = bytes_used;
>   cache->flags = type;
>   cache->last_byte_to_unpin = (u64)-1;
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] kobject: Restore old behaviour of kobject_del(NULL)

2020-08-03 Thread Qu Wenruo



On 2020/8/3 下午4:27, Andy Shevchenko wrote:
> The commit 079ad2fb4bf9 ("kobject: Avoid premature parent object freeing in
> kobject_cleanup()") inadvertently dropped a possibility to call kobject_del()
> with NULL pointer. Restore the old behaviour.
> 
> Fixes: 079ad2fb4bf9 ("kobject: Avoid premature parent object freeing in 
> kobject_cleanup()")
> Reported-by: Qu Wenruo 

Sorry, I should use my suse mailbox for that.

> Cc: Heikki Krogerus 
> Signed-off-by: Andy Shevchenko 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
> v2: replaced ?: with plain conditional (Greg)
>  lib/kobject.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/kobject.c b/lib/kobject.c
> index 3afb939f2a1c..9dce68c378e6 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -637,8 +637,12 @@ static void __kobject_del(struct kobject *kobj)
>   */
>  void kobject_del(struct kobject *kobj)
>  {
> - struct kobject *parent = kobj->parent;
> + struct kobject *parent;
> +
> + if (!kobj)
> + return;
>  
> + parent = kobj->parent;
>   __kobject_del(kobj);
>   kobject_put(parent);
>  }
> 



Re: [PATCH] kobject: Avoid premature parent object freeing in kobject_cleanup()

2020-08-03 Thread Qu Wenruo


On 2020/8/3 下午3:27, Andy Shevchenko wrote:
> On Mon, Aug 3, 2020 at 10:25 AM Andy Shevchenko
>  wrote:
>> On Mon, Aug 3, 2020 at 9:47 AM Qu Wenruo  wrote:
>>> On 2020/6/5 上午1:46, Rafael J. Wysocki wrote:
> 
>>>> +void kobject_del(struct kobject *kobj)
>>>> +{
>>>> + struct kobject *parent = kobj->parent;
>>>> +
>>>> + __kobject_del(kobj);
>>>> + kobject_put(parent);
>>>
>>> Could you please add an extra check on kobj before accessing kobj->parent?
>>
>> I do not understand. Where do we access it?
>> kobject_put() is NULL-aware.
> 
> Ah, I see, now.
> 
> Should be something like
> struct kobject *parent = kobj ? kobj->parent : NULL;

Exactly.

Thanks,
Qu

> 
>>> This patch in fact removes the ability to call kobject_del() on NULL
>>> pointer while not cause anything wrong.
>>>
>>> I know this is not a big deal, but such behavior change has already
>>> caused some problem for the incoming btrfs code.
>>> (Now I feels guilty just by looking into the old
>>> kobject_del()/kobject_put() and utilize that feature in btrfs)
>>>
>>> Since the old kobject_del() accepts NULL pointer intentionally, it would
>>> be much better to keep such behavior.
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] kobject: Avoid premature parent object freeing in kobject_cleanup()

2020-08-03 Thread Qu Wenruo


On 2020/6/5 上午1:46, Rafael J. Wysocki wrote:
> From: Heikki Krogerus 
> 
> If kobject_del() is invoked by kobject_cleanup() to delete the
> target kobject, it may cause its parent kobject to be freed
> before invoking the target kobject's ->release() method, which
> effectively means freeing the parent before dealing with the
> child entirely.
> 
> That is confusing at best and it may also lead to functional
> issues if the callers of kobject_cleanup() are not careful enough
> about the order in which these calls are made, so avoid the
> problem by making kobject_cleanup() drop the last reference to
> the target kobject's parent at the end, after invoking the target
> kobject's ->release() method.
> 
> [ rjw: Rewrite the subject and changelog, make kobject_cleanup()
>   drop the parent reference only when __kobject_del() has been
>   called. ]
> 
> Reported-by: Naresh Kamboju 
> Reported-by: kernel test robot 
> Fixes: 7589238a8cf3 ("Revert "software node: Simplify software_node_release() 
> function"")
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Heikki Krogerus 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> Hi Greg,
> 
> This is a replacement for commit 4ef12f719802 ("kobject: Make sure the parent
> does not get released before its children"), that you reverted, because it
> broke things and the reason why was that it was incorrect.
> 
> Namely, it called kobject_put() on the target kobject's parent in
> kobject_cleanup() unconditionally, but it should only call it after
> invoking __kobject_del() on the target kobject.
> 
> That problem is fixed in this patch and a functionally equivalent patch has
> been tested by Guenter without issues.
> 
> The underlying issue addressed by the reverted commit is still there and
> it may show up again even though the test that triggered it originally was
> fixed in the meantime.  IMO it is worth fixing even though it may not be
> readily visible in the current kernel, so please consider this one for
> applying.
> 
> Cheers!
> 
> ---
>  lib/kobject.c |   33 +++--
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> Index: linux-pm/lib/kobject.c
> ===
> --- linux-pm.orig/lib/kobject.c
> +++ linux-pm/lib/kobject.c
> @@ -599,14 +599,7 @@ out:
>  }
>  EXPORT_SYMBOL_GPL(kobject_move);
>  
> -/**
> - * kobject_del() - Unlink kobject from hierarchy.
> - * @kobj: object.
> - *
> - * This is the function that should be called to delete an object
> - * successfully added via kobject_add().
> - */
> -void kobject_del(struct kobject *kobj)
> +static void __kobject_del(struct kobject *kobj)
>  {
>   struct kernfs_node *sd;
>   const struct kobj_type *ktype;
> @@ -625,9 +618,23 @@ void kobject_del(struct kobject *kobj)
>  
>   kobj->state_in_sysfs = 0;
>   kobj_kset_leave(kobj);
> - kobject_put(kobj->parent);
>   kobj->parent = NULL;
>  }
> +
> +/**
> + * kobject_del() - Unlink kobject from hierarchy.
> + * @kobj: object.
> + *
> + * This is the function that should be called to delete an object
> + * successfully added via kobject_add().
> + */
> +void kobject_del(struct kobject *kobj)
> +{
> + struct kobject *parent = kobj->parent;
> +
> + __kobject_del(kobj);
> + kobject_put(parent);

Could you please add an extra check on kobj before accessing kobj->parent?

This patch in fact removes the ability to call kobject_del() on NULL
pointer while not cause anything wrong.

I know this is not a big deal, but such behavior change has already
caused some problem for the incoming btrfs code.
(Now I feels guilty just by looking into the old
kobject_del()/kobject_put() and utilize that feature in btrfs)

Since the old kobject_del() accepts NULL pointer intentionally, it would
be much better to keep such behavior.

Or at least mention we require a valid kobject pointer.

Thanks,
Qu

> +}
>  EXPORT_SYMBOL(kobject_del);
>  
>  /**
> @@ -663,6 +670,7 @@ EXPORT_SYMBOL(kobject_get_unless_zero);
>   */
>  static void kobject_cleanup(struct kobject *kobj)
>  {
> + struct kobject *parent = kobj->parent;
>   struct kobj_type *t = get_ktype(kobj);
>   const char *name = kobj->name;
>  
> @@ -684,7 +692,10 @@ static void kobject_cleanup(struct kobje
>   if (kobj->state_in_sysfs) {
>   pr_debug("kobject: '%s' (%p): auto cleanup kobject_del\n",
>kobject_name(kobj), kobj);
> - kobject_del(kobj);
> + __kobject_del(kobj);
> + } else {
> + /* avoid dropping the parent reference unnecessarily */
> + parent = NULL;
>   }
>  
>   if (t && t->release) {
> @@ -698,6 +709,8 @@ static void kobject_cleanup(struct kobje
>   pr_debug("kobject: '%s': free name\n", name);
>   kfree_const(name);
>   }
> +
> + kobject_put(parent);
>  }
>  
>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> 
> 
> 
> 



signature.asc
Description: OpenPGP digital 

Re: linux-next: manual merge of the btrfs tree with the btrfs-fixes tree

2020-04-30 Thread Qu Wenruo


On 2020/5/1 上午9:05, Stephen Rothwell wrote:
> Hi all,
> 
> On Fri, 1 May 2020 10:24:53 +1000 Stephen Rothwell  
> wrote:
>>
>> Today's linux-next merge of the btrfs tree got a conflict in:
>>
>>   fs/btrfs/transaction.c
>>
>> between commit:
>>
>>   fcc99734d1d4 ("btrfs: transaction: Avoid deadlock due to bad 
>> initialization timing of fs_info::journal_info")
>>
>> from the btrfs-fixes tree and commit:
>>
>>   f12ca53a6fd6 ("btrfs: force chunk allocation if our global rsv is larger 
>> than metadata")
>>
>> from the btrfs tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>>
>> -- 
>> Cheers,
>> Stephen Rothwell
>>
>> diff --cc fs/btrfs/transaction.c
>> index 2d5498136e5e,e4dbd8e3c641..
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@@ -666,15 -674,17 +672,26 @@@ got_it
>>  current->journal_info = h;
>>   
>>  /*
>>  +* btrfs_record_root_in_trans() needs to alloc new extents, and may
>>  +* call btrfs_join_transaction() while we're also starting a
>>  +* transaction.
>>  +*
>>  +* Thus it need to be called after current->journal_info initialized,
>>  +* or we can deadlock.
>>  +*/
>>  +   btrfs_record_root_in_trans(h, root);
>>  +
>> + * If the space_info is marked ALLOC_FORCE then we'll get upgraded to
>> + * ALLOC_FORCE the first run through, and then we won't allocate for
>> + * anybody else who races in later.  We don't care about the return
>> + * value here.
>> + */
>> +if (do_chunk_alloc && num_bytes) {
>> +u64 flags = h->block_rsv->space_info->flags;
>> +btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
>> +  CHUNK_ALLOC_NO_FORCE);
>> +}
>> + 
>>  return h;

The proper fix has landed in David's misc-next branch, which puts
btrfs_record_root_in_trans(); after the if () {} code block.

By that, btrfs_record_root_in_trans() has lesser chance to hit ENOSPC.

Thanks,
Qu

>>   
>>   join_fail:
> 
> 
> I fixed the missing comment start in my resolution ...
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: fix gcc-4.8 build warning

2020-04-29 Thread Qu Wenruo


On 2020/4/29 下午9:27, Arnd Bergmann wrote:
> Some older compilers like gcc-4.8 warn about mismatched curly
> braces in a initializer:
> 
> fs/btrfs/backref.c: In function 'is_shared_data_backref':
> fs/btrfs/backref.c:394:9: error: missing braces around
> initializer [-Werror=missing-braces]
>   struct prelim_ref target = {0};
>  ^
> fs/btrfs/backref.c:394:9: error: (near initialization for
> 'target.rbnode') [-Werror=missing-braces]
> 
> Use the GNU empty initializer extension to avoid this.
> 
> Fixes: ed58f2e66e84 ("btrfs: backref, don't add refs from shared block when 
> resolving normal backref")

OK, at least this fix is mentioning it's older gcc causing problem, and
the fix using GNU extension is also clear.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> Signed-off-by: Arnd Bergmann 
> ---
>  fs/btrfs/backref.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 60a69f7c0b36..ac3c34f47b56 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -392,7 +392,7 @@ static int is_shared_data_backref(struct preftrees 
> *preftrees, u64 bytenr)
>   struct rb_node **p = >direct.root.rb_root.rb_node;
>   struct rb_node *parent = NULL;
>   struct prelim_ref *ref = NULL;
> - struct prelim_ref target = {0};
> + struct prelim_ref target = {};
>   int result;
>  
>   target.parent = bytenr;
> 



signature.asc
Description: OpenPGP digital signature


[PATCH v3] tools/lib/traceevent, perf tools: Handle %pU format correctly

2019-10-21 Thread Qu Wenruo
[BUG]
For btrfs related events, there is a field for fsid, but perf never
parse it correctly.

 # perf trace -e btrfs:qgroup_meta_convert xfs_io -f -c "pwrite 0 4k" \
   /mnt/btrfs/file1
 0.000 xfs_io/77915 btrfs:qgroup_meta_reserve:(nil)U: refroot=5(FS_TREE) 
type=0x0 diff=2
  ^^ Not a correct UUID
 ...

[CAUSE]
The pretty_print() function doesn't handle the %pU format correctly.
In fact it doesn't handle %pU as uuid at all.

[FIX]
Add a new function, print_uuid_arg(), to handle %pU correctly.

Now perf trace can at least print fsid correctly:
 0.000 xfs_io/79619 
btrfs:qgroup_meta_reserve:23ad1511-dd83-47d4-a79c-e96625a15a6e 
refroot=5(FS_TREE) type=0x0 diff=2

Signed-off-by: Qu Wenruo 
---
Changelog:
v2:
- Use more comment explaining the finetunings we skipped for %pU*
- Extra check for the field before reading the data
- Use more elegant way to output uuid string
v3:
- Use a even more elegant way to output uuid string
---
 tools/lib/traceevent/event-parse.c | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index d948475585ce..a71f4a86b6ca 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -4508,6 +4509,40 @@ get_bprint_format(void *data, int size __maybe_unused,
return format;
 }
 
+static void print_uuid_arg(struct trace_seq *s, void *data, int size,
+  struct tep_event *event, struct tep_print_arg *arg)
+{
+   unsigned char *buf;
+   int i;
+
+   if (arg->type != TEP_PRINT_FIELD) {
+   trace_seq_printf(s, "ARG TYPE NOT FIELID but %d", arg->type);
+   return;
+   }
+
+   if (!arg->field.field) {
+   arg->field.field = tep_find_any_field(event, arg->field.name);
+   if (!arg->field.field) {
+   do_warning("%s: field %s not found",
+  __func__, arg->field.name);
+   return;
+   }
+   }
+   if (arg->field.field->size < 16) {
+   trace_seq_printf(s, "INVALID UUID: size have %u expect 16",
+   arg->field.field->size);
+   return;
+   }
+   buf = data + arg->field.field->offset;
+
+   for (i = 0; i < 8; i++) {
+   trace_seq_printf(s, "%02x", buf[2 * i]);
+   trace_seq_printf(s, "%02x", buf[2 * i + 1]);
+   if (1 <= i && i <= 4)
+   trace_seq_putc(s, '-');
+   }
+}
+
 static void print_mac_arg(struct trace_seq *s, int mac, void *data, int size,
  struct tep_event *event, struct tep_print_arg *arg)
 {
@@ -5074,6 +5109,22 @@ static void pretty_print(struct trace_seq *s, void 
*data, int size, struct tep_e
arg = arg->next;
break;
}
+   } else if (*ptr == 'U') {
+   /*
+* %pU has several finetunings variants
+* like %pUb and %pUL.
+* Here we ignore them, default to
+* byte-order no endian, lower case
+* letters.
+*/
+   if (isalpha(ptr[1]))
+   ptr += 2;
+   else
+   ptr++;
+
+   print_uuid_arg(s, data, size, event, 
arg);
+   arg = arg->next;
+   break;
}
 
/* fall through */
-- 
2.23.0



[PATCH] tools/lib/traceevent, perf tools: Handle %pU format correctly

2019-10-16 Thread Qu Wenruo
[BUG]
For btrfs related events, there is a field for fsid, but perf never
parse it correctly.

 # perf trace -e btrfS:qgroup_meta_convert xfs_io -f -c "pwrite 0 4k" \
   /mnt/btrfs/file1
 0.000 xfs_io/77915 btrfs:qgroup_meta_reserve:(nil)U: refroot=5(FS_TREE) 
type=0x0 diff=2
  ^^ Not a correct UUID
 ...

[CAUSE]
The pretty_print() function doesn't handle the %pU format correctly.
In fact it doesn't handle %pU as uuid at all.

[FIX]
Add a new functiono, print_uuid_arg(), to handle %pU correctly.

Now perf trace can at least print fsid correctly:
 0.000 xfs_io/79619 
btrfs:qgroup_meta_reserve:23ad1511-dd83-47d4-a79c-e96625a15a6e 
refroot=5(FS_TREE) type=0x0 diff=2

Signed-off-by: Qu Wenruo 
---
Please note in above case, the @type and @diff are not properly showed.
That's another problem, will be addressed in later patches.
---
 tools/lib/traceevent/event-parse.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index d948475585ce..4f730ed527b0 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -4508,6 +4509,33 @@ get_bprint_format(void *data, int size __maybe_unused,
return format;
 }
 
+static void print_uuid_arg(struct trace_seq *s, void *data, int size,
+  struct tep_event *event, struct tep_print_arg *arg)
+{
+   const char *fmt;
+   unsigned char *buf;
+
+   fmt = 
"%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x";
+   if (arg->type != TEP_PRINT_FIELD) {
+   trace_seq_printf(s, "ARG TYPE NOT FIELID but %d", arg->type);
+   return;
+   }
+
+   if (!arg->field.field) {
+   arg->field.field = tep_find_any_field(event, arg->field.name);
+   if (!arg->field.field) {
+   do_warning("%s: field %s not found",
+  __func__, arg->field.name);
+   return;
+   }
+   }
+   buf = data + arg->field.field->offset;
+
+   trace_seq_printf(s, fmt, buf[0], buf[1], buf[2], buf[3], buf[4], buf[5],
+buf[6], buf[7], buf[8], buf[9], buf[10], buf[11], 
buf[12],
+buf[13], buf[14], buf[15]);
+}
+
 static void print_mac_arg(struct trace_seq *s, int mac, void *data, int size,
  struct tep_event *event, struct tep_print_arg *arg)
 {
@@ -5074,6 +5102,16 @@ static void pretty_print(struct trace_seq *s, void 
*data, int size, struct tep_e
arg = arg->next;
break;
}
+   } else if (*ptr == 'U') {
+   /* Those finetunings are ignored for 
now */
+   if (isalpha(ptr[1]))
+   ptr += 2;
+   else
+   ptr++;
+
+   print_uuid_arg(s, data, size, event, 
arg);
+   arg = arg->next;
+   break;
}
 
/* fall through */
-- 
2.23.0



Re: [PATCH] erofs: move erofs out of staging

2019-08-19 Thread Qu Wenruo
[...]
>>> I have made a simple fuzzer to inject messy in inode metadata,
>>> dir data, compressed indexes and super block,
>>> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/commit/?h=experimental-fuzzer
>>>
>>> I am testing with some given dirs and the following script.
>>> Does it look reasonable?
>>>
>>> # !/bin/bash
>>>
>>> mkdir -p mntdir
>>>
>>> for ((i=0; i<1000; ++i)); do
>>> mkfs/mkfs.erofs -F$i testdir_fsl.fuzz.img testdir_fsl > /dev/null 2>&1
>>
>> mkfs fuzzes the image? Er
> 
> Thanks for your reply.
> 
> First, This is just the first step of erofs fuzzer I wrote yesterday night...
> 
>>
>> Over in XFS land we have an xfs debugging tool (xfs_db) that knows how
>> to dump (and write!) most every field of every metadata type.  This
>> makes it fairly easy to write systematic level 0 fuzzing tests that
>> check how well the filesystem reacts to garbage data (zeroing,
>> randomizing, oneing, adding and subtracting small integers) in a field.
>> (It also knows how to trash entire blocks.)

The same tool exists for btrfs, although lacks the write ability, but
that dump is more comprehensive and a great tool to learn the on-disk
format.


And for the fuzzing defending part, just a few kernel releases ago,
there is none for btrfs, and now we have a full static verification
layer to cover (almost) all on-disk data at read and write time.
(Along with enhanced runtime check)

We have covered from vague values inside tree blocks and invalid/missing
cross-ref find at runtime.

Currently the two layered check works pretty fine (well, sometimes too
good to detect older, improper behaved kernel).
- Tree blocks with vague data just get rejected by verification layer
  So that all members should fit on-disk format, from alignment to
  generation to inode mode.

  The error will trigger a good enough (TM) error message for developer
  to read, and if we have other copies, we retry other copies just as
  we hit a bad copy.

- At runtime, we have much less to check
  Only cross-ref related things can be wrong now. since everything
  inside a single tree block has already be checked.

In fact, from my respect of view, such read time check should be there
from the very beginning.
It acts kinda of a on-disk format spec. (In fact, by implementing the
verification layer itself, it already exposes a lot of btrfs design
trade-offs)

Even for a fs as complex (buggy) as btrfs, we only take 1K lines to
implement the verification layer.
So I'd like to see every new mainlined fs to have such ability.

> 
> Actually, compared with XFS, EROFS has rather simple on-disk format.
> What we inject one time is quite deterministic.
> 
> The first step just purposely writes some random fuzzed data to
> the base inode metadata, compressed indexes, or dir data field
> (one round one field) to make it validity and coverability.
> 
>>
>> You might want to write such a debugging tool for erofs so that you can
>> take apart crashed images to get a better idea of what went wrong, and
>> to write easy fuzzing tests.
> 
> Yes, we will do such a debugging tool of course. Actually Li Guifu is now
> developping a erofs-fuse to support old linux versions or other OSes for
> archiveing only use, we will base on that code to develop a better fuzzer
> tool as well.

Personally speaking, debugging tool is way more important than a running
kernel module/fuse.
It's human trying to write the code, most of time is spent educating
code readers, thus debugging tool is way more important than dead cold code.

Thanks,
Qu
> 
> Thanks,
> Gao Xiang
> 
>>
>> --D
>>
>>> umount mntdir
>>> mount -t erofs -o loop testdir_fsl.fuzz.img mntdir
>>> for j in `find mntdir -type f`; do
>>> md5sum $j > /dev/null
>>> done
>>> done
>>>
>>> Thanks,
>>> Gao Xiang
>>>

 Thanks,
 Gao Xiang




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: fix out of bounds array access while reading extent buffer

2019-06-14 Thread Qu Wenruo


On 2019/6/14 下午7:51, Young Xiao wrote:
> There is a corner case that slips through the checkers in functions
> reading extent buffer, ie.
> 
> if (start < eb->len) and (start + len > eb->len), then:
> the checkers in read_extent_buffer_to_user(), and memcmp_extent_buffer()
> WARN_ON(start > eb->len) and WARN_ON(start + len > eb->start + eb->len),
> both are OK in this corner case, but it'd actually try to access the eb->pages
> out of bounds because of (start + len > eb->len).
> 
> This is adding proper checks in order to avoid invalid memory access,
> ie. 'general protection fault', before it's too late.
> 
> See commit f716abd55d1e ("Btrfs: fix out of bounds array access while
> reading extent buffer") for details.
> 
> Signed-off-by: Young Xiao <92siuy...@gmail.com>
> ---
>  fs/btrfs/extent_io.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index db337e5..dcf3b2e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5476,8 +5476,12 @@ int read_extent_buffer_to_user(const struct 
> extent_buffer *eb,
>   unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>   int ret = 0;
>  
> - WARN_ON(start > eb->len);
> - WARN_ON(start + len > eb->start + eb->len);
> + if (start + len > eb->len) {
The original (start + len > eb->start + eb->len) check is so wrong from
the very beginning. eb->start makes no sense in the context.
So your patch makes sense.

But it's not 100% fixed.

If @start and @len overflow u64, e.g @start = 1 << 63 + 8k, @len = 1<<
63 + 8K. it can still skip the check.

So, we still need to check @start against eb->len, then @start + @len
against eb->len.

Also, shouldn't we include the equal case for @start? (although start +
len == eb->len should be OK)

> + WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> wanted %lu %lu\n",
> +  eb->start, eb->len, start, len);
> + memset(dst, 0, len);

I'd prefer not to do the memset, as @start and @len is already wrong, I
doubt the @dst could be completely some wild pointer, and set them could
easily screw up the whole kernel.

Thanks,
Qu

> + return;
> + }
>  
>   offset = offset_in_page(start_offset + start);
>  
> @@ -5554,8 +5558,12 @@ int memcmp_extent_buffer(const struct extent_buffer 
> *eb, const void *ptrv,
>   unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>   int ret = 0;
>  
> - WARN_ON(start > eb->len);
> - WARN_ON(start + len > eb->start + eb->len);
> + if (start + len > eb->len) {
> + WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> wanted %lu %lu\n",
> +  eb->start, eb->len, start, len);
> + memset(ptr, 0, len);
> + return;
> + }
>  
>   offset = offset_in_page(start_offset + start);
>  
> 



signature.asc
Description: OpenPGP digital signature


[5.2-rc REGRESSION] Random gcc crash for 'make -j12' when low on memory

2019-05-31 Thread Qu Wenruo
Hi,

When compiling the kernel on v5.2-rc (both rc1 and rc2) with "make
-j12", the gcc will randomly crash with segfault, while on v5.1-rc7
everything is OK.

The crash only happens when the VM has only 1G ram, when given 4G ram it
no longer crash.
However according to dmesg, there is no OOM triggered.

Thus this looks like a regression.

The environment is:
VM hypervisor: KVM
vCPU: 8
vRAM: 1G (crash) 4G (OK)
Distro: Archlinux
Tried kernel: Upstream v5.1-rc7 (good), v5.2-rc1 (fail), v5.2-rc2(fail)

Host CPU: Ryzen 1700 (no gcc crash on host)

Is there something related to OOM changed?

Thanks,
Qu




signature.asc
Description: OpenPGP digital signature


Re: [btrfs] ddf30cf03f: xfstests.generic.102.fail

2019-05-09 Thread Qu Wenruo



On 2019/5/10 上午11:19, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: ddf30cf03fb53b9a0ad0f355a69dbedf416edde9 ("btrfs: extent-tree: Use 
> btrfs_ref to refactor add_pinned_bytes()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Thanks for the info.

This explains why I have some unexpected ENOSPC error.
The cause is pretty aweful.

The offending patch relies completely btrfs_ref, but forgot that pinned
bytes can be minus, thus causing strange behavior.

I'll fix it soon.

Thanks for reporting again,
Qu

> 
> in testcase: xfstests
> with following parameters:
> 
>   disk: 4HDD
>   fs: btrfs
>   test: generic-quick2
> 
> test-description: xfstests is a regression test suite for xfs and other files 
> ystems.
> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> 
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> FSTYP -- btrfs
> PLATFORM  -- Linux/x86_64 vm-snb-4G-727 5.1.0-rc7-00188-gddf30cf
> MKFS_OPTIONS  -- /dev/vdb
> MOUNT_OPTIONS -- /dev/vdb /fs/scratch
> 
> generic/088 2s
> generic/089 14s
> generic/090 1s
> generic/091 19s
> generic/092 3s
> generic/098 4s
> generic/100 18s
> generic/101 0s
> generic/102- output mismatch (see 
> /lkp/benchmarks/xfstests/results//generic/102.out.bad)
> --- tests/generic/102.out2019-05-09 15:46:08.0 +0800
> +++ /lkp/benchmarks/xfstests/results//generic/102.out.bad2019-05-10 
> 09:32:39.267250059 +0800
> @@ -1,21 +1,21 @@
>  QA output created by 102
>  wrote 838860800/838860800 bytes at offset 0
>  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 838860800/838860800 bytes at offset 0
> +wrote 109576192/838860800 bytes at offset 0
>  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  wrote 838860800/838860800 bytes at offset 0
> ...
> (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/102.out 
> /lkp/benchmarks/xfstests/results//generic/102.out.bad'  to see the entire 
> diff)
> generic/103 1s
> generic/104 1s
> generic/105 0s
> generic/106 1s
> generic/107 1s
> generic/109 1s
> generic/120 16s
> generic/123  1s
> generic/124  4s
> generic/126  1s
> generic/129  6s
> generic/130 13s
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> To reproduce:
> 
> # build kernel
>   cd linux
>   cp config-5.1.0-rc7-00188-gddf30cf .config
>   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig
>   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 prepare
>   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 modules_prepare
>   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 SHELL=/bin/bash
>   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 bzImage
> 
> 
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>   bin/lkp qemu -k  -m modules.cgz job-script # job-script is 
> attached in this email
> 
> 
> 
> 
> Thanks,
> Rong Chen
> 


Re: [LKP] [btrfs] 70d28b0e4f: BUG:kernel_reboot-without-warning_in_early-boot_stage, last_printk:Probing_EDD(edd=off_to_disable)...ok

2019-04-01 Thread Qu Wenruo



On 2019/4/2 上午11:14, Rong Chen wrote:
> 
> On 4/1/19 11:40 PM, David Sterba wrote:
>> On Mon, Apr 01, 2019 at 11:02:37PM +0800,  Chen, Rong A  wrote:
>>> On 4/1/2019 10:29 PM, Qu Wenruo wrote:
>>>> On 2019/4/1 下午10:02,  Chen, Rong A  wrote:
>>>>> On 4/1/2019 9:28 PM, Nikolay Borisov wrote:
>>>>>> On 1.04.19 г. 16:24 ч., kernel test robot wrote:
>>>>>>> FYI, we noticed the following commit (built with gcc-7):
>>>>>>>
>>>>>>> commit: 70d28b0e4f8ed2d38571e7b1f9bec7f321a53102 ("btrfs:
>>>>>>> tree-checker: Verify dev item")
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>> master
>>>>>>>
>>>>>>> in testcase: trinity
>>>>>>> with following parameters:
>>>>>>>
>>>>>>>   runtime: 300s
>>>>>>>
>>>>>>> test-description: Trinity is a linux system call fuzz tester.
>>>>>>> test-url: http://codemonkey.org.uk/projects/trinity/
>>>>>>>
>>>>>>>
>>>>>>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge
>>>>>>> -smp
>>>>>>> 2 -m 2G
>>>>>>>
>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>> log/backtrace):
>>>>>>>
>>>>>>>
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>> | 36b9d2bc69 | 70d28b0e4f |
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>> boot_successes
>>>>>>> | 14 | 0  |
>>>>>>> |
>>>>>>> boot_failures
>>>>>>> | 2  | 14 |
>>>>>>> |
>>>>>>> IP-Config:Auto-configuration_of_network_failed
>>>>>>> | 2  |    |
>>>>>>> |
>>>>>>> BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok
>>>>>>>
>>>>>>> | 0  | 14 |
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> early console in setup code
>>>>>>> Probing EDD (edd=off to disable)... ok
>>>>>>> BUG: kernel reboot-without-warning in early-boot stage, last printk:
>>>>>>> Probing EDD (edd=off to disable)... ok
>>>>>>> Linux version 5.0.0-rc8-00196-g70d28b0 #1
>>>>>>> Command line: ip=vm-snb-quantal-x86_64-1415::dhcp root=/dev/ram0
>>>>>>> user=lkp
>>>>>>> job=/lkp/jobs/scheduled/vm-snb-quantal-x86_64-1415/trinity-300s-quantal-core-x86_64-2018-11-09.cgz-70d28b0-20190330-29362-1y6g0qb-2.yaml
>>>>>>>
>>>>>>> ARCH=x86_64 kconfig=x86_64-randconfig-s5-03231928
>>>>>>> branch=linux-devel/devel-hourly-2019032317
>>>>>>> commit=70d28b0e4f8ed2d38571e7b1f9bec7f321a53102
>>>>>>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/vmlinuz-5.0.0-rc8-00196-g70d28b0
>>>>>>>
>>>>>>> max_uptime=1500
>>>>>>> RESULT_ROOT=/result/trinity/300s/vm-snb-quantal-x86_64/quantal-core-x86_64-2018-11-09.cgz/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/8
>>>>>>>
>>>>>>> LKP_SERVER=inn debug apic=debug sysrq_always_enabled
>>>>>>> rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on
>>>>>>> panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic
>>>>>>> load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8
>>>>>>> systemd.log_level=err ignore_loglevel co

Re: [LKP] [btrfs] 70d28b0e4f: BUG:kernel_reboot-without-warning_in_early-boot_stage, last_printk:Probing_EDD(edd=off_to_disable)...ok

2019-04-01 Thread Qu Wenruo



On 2019/4/2 上午11:14, Rong Chen wrote:
> 
> On 4/1/19 11:40 PM, David Sterba wrote:
>> On Mon, Apr 01, 2019 at 11:02:37PM +0800,  Chen, Rong A  wrote:
>>> On 4/1/2019 10:29 PM, Qu Wenruo wrote:
>>>> On 2019/4/1 下午10:02,  Chen, Rong A  wrote:
>>>>> On 4/1/2019 9:28 PM, Nikolay Borisov wrote:
>>>>>> On 1.04.19 г. 16:24 ч., kernel test robot wrote:
>>>>>>> FYI, we noticed the following commit (built with gcc-7):
>>>>>>>
>>>>>>> commit: 70d28b0e4f8ed2d38571e7b1f9bec7f321a53102 ("btrfs:
>>>>>>> tree-checker: Verify dev item")
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>> master
>>>>>>>
>>>>>>> in testcase: trinity
>>>>>>> with following parameters:
>>>>>>>
>>>>>>>   runtime: 300s
>>>>>>>
>>>>>>> test-description: Trinity is a linux system call fuzz tester.
>>>>>>> test-url: http://codemonkey.org.uk/projects/trinity/
>>>>>>>
>>>>>>>
>>>>>>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge
>>>>>>> -smp
>>>>>>> 2 -m 2G
>>>>>>>
>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>> log/backtrace):
>>>>>>>
>>>>>>>
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>> | 36b9d2bc69 | 70d28b0e4f |
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>> boot_successes
>>>>>>> | 14 | 0  |
>>>>>>> |
>>>>>>> boot_failures
>>>>>>> | 2  | 14 |
>>>>>>> |
>>>>>>> IP-Config:Auto-configuration_of_network_failed
>>>>>>> | 2  |    |
>>>>>>> |
>>>>>>> BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok
>>>>>>>
>>>>>>> | 0  | 14 |
>>>>>>> ++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> early console in setup code
>>>>>>> Probing EDD (edd=off to disable)... ok
>>>>>>> BUG: kernel reboot-without-warning in early-boot stage, last printk:
>>>>>>> Probing EDD (edd=off to disable)... ok
>>>>>>> Linux version 5.0.0-rc8-00196-g70d28b0 #1
>>>>>>> Command line: ip=vm-snb-quantal-x86_64-1415::dhcp root=/dev/ram0
>>>>>>> user=lkp
>>>>>>> job=/lkp/jobs/scheduled/vm-snb-quantal-x86_64-1415/trinity-300s-quantal-core-x86_64-2018-11-09.cgz-70d28b0-20190330-29362-1y6g0qb-2.yaml
>>>>>>>
>>>>>>> ARCH=x86_64 kconfig=x86_64-randconfig-s5-03231928
>>>>>>> branch=linux-devel/devel-hourly-2019032317
>>>>>>> commit=70d28b0e4f8ed2d38571e7b1f9bec7f321a53102
>>>>>>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/vmlinuz-5.0.0-rc8-00196-g70d28b0
>>>>>>>
>>>>>>> max_uptime=1500
>>>>>>> RESULT_ROOT=/result/trinity/300s/vm-snb-quantal-x86_64/quantal-core-x86_64-2018-11-09.cgz/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/8
>>>>>>>
>>>>>>> LKP_SERVER=inn debug apic=debug sysrq_always_enabled
>>>>>>> rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on
>>>>>>> panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic
>>>>>>> load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8
>>>>>>> systemd.log_level=err ignore_loglevel con

Re: [btrfs] 70d28b0e4f: BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok

2019-04-01 Thread Qu Wenruo



On 2019/4/1 下午10:02,  Chen, Rong A  wrote:
> 
> On 4/1/2019 9:28 PM, Nikolay Borisov wrote:
>>
>> On 1.04.19 г. 16:24 ч., kernel test robot wrote:
>>> FYI, we noticed the following commit (built with gcc-7):
>>>
>>> commit: 70d28b0e4f8ed2d38571e7b1f9bec7f321a53102 ("btrfs:
>>> tree-checker: Verify dev item")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> in testcase: trinity
>>> with following parameters:
>>>
>>> runtime: 300s
>>>
>>> test-description: Trinity is a linux system call fuzz tester.
>>> test-url: http://codemonkey.org.uk/projects/trinity/
>>>
>>>
>>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp
>>> 2 -m 2G
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>> log/backtrace):
>>>
>>>
>>> ++++
>>>
>>> |   
>>> 
>>> | 36b9d2bc69 | 70d28b0e4f |
>>> ++++
>>>
>>> |
>>> boot_successes  
>>>   
>>> | 14 | 0  |
>>> |
>>> boot_failures   
>>>   
>>> | 2  | 14 |
>>> |
>>> IP-Config:Auto-configuration_of_network_failed  
>>>   
>>> | 2  |    |
>>> |
>>> BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok
>>> | 0  | 14 |
>>> ++++
>>>
>>>
>>>
>>>
>>> early console in setup code
>>> Probing EDD (edd=off to disable)... ok
>>> BUG: kernel reboot-without-warning in early-boot stage, last printk:
>>> Probing EDD (edd=off to disable)... ok
>>> Linux version 5.0.0-rc8-00196-g70d28b0 #1
>>> Command line: ip=vm-snb-quantal-x86_64-1415::dhcp root=/dev/ram0
>>> user=lkp
>>> job=/lkp/jobs/scheduled/vm-snb-quantal-x86_64-1415/trinity-300s-quantal-core-x86_64-2018-11-09.cgz-70d28b0-20190330-29362-1y6g0qb-2.yaml
>>> ARCH=x86_64 kconfig=x86_64-randconfig-s5-03231928
>>> branch=linux-devel/devel-hourly-2019032317
>>> commit=70d28b0e4f8ed2d38571e7b1f9bec7f321a53102
>>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/vmlinuz-5.0.0-rc8-00196-g70d28b0
>>> max_uptime=1500
>>> RESULT_ROOT=/result/trinity/300s/vm-snb-quantal-x86_64/quantal-core-x86_64-2018-11-09.cgz/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/8
>>> LKP_SERVER=inn debug apic=debug sysrq_always_enabled
>>> rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on
>>> panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic
>>> load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8
>>> systemd.log_level=err ignore_loglevel console=tty0
>>> earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw
>>> rcuperf.shutdown=0
>>>
>> Can this report be made useful by actually including output from serial
>> console? For example possible bug-ons or whatnot? dmesg.xz just contains
>> qemu's command line + some metadata about the test and :
>>
>> "BUG: kernel reboot-without-warning in early-boot stage, last printk:
>> Probing EDD (edd=off to disable)... ok"
>>
>> At least a stack trace would have been useful.
>>
>> 
> 
> 
> Hi,
> 
> We usually use the tool ("bin/lkp qemu -k  job-script") to
> reproduce it.  It seems no stack trace in the result:

So there is no regression at that commit right?

Just some false alert?

Thanks,
Qu

> 
> $ lkp qemu -k vmlinuz-5.0.0-rc8-00196-g70d28b0 job-script
> ...
> Formatting '/tmp/vdisk-nfs/disk-vm-snb-quantal-x86_64-1415-0', fmt=qcow2
> size=274877906944 cluster_size=65536 lazy_refcounts=off refcount_bits=16
> Formatting '/tmp/vdisk-nfs/disk-vm-snb-quantal-x86_64-1415-1', fmt=qcow2
> size=274877906944 cluster_size=65536 lazy_refcounts=off refcount_bits=16
> exec command: qemu-system-x86_64 -enable-kvm -fsdev
> local,id=test_dev,path=/home/nfs/.lkp//result/trinity/300s/vm-snb-quantal-x86_64/quantal-core-x86_64-2018-11-09.cgz/x86_64-randconfig-s5-03231928/gcc-7/70d28b0e4f8ed2d38571e7b1f9bec7f321a53102/2,security_model=none
> -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel
> vmlinuz-5.0.0-rc8-00196-g70d28b0 -append root=/dev/ram0 user=lkp
> job=/lkp/jobs/scheduled/vm-snb-quantal-x86_64-1415/trinity-300s-quantal-core-x86_64-2018-11-09.cgz-70d28b0-20190330-29362-1y6g0qb-2.yaml
> ARCH=x86_64 kconfig=x86_64-randconfig-s5-03231928
> branch=linux-devel/devel-hourly-2019032317
> commit=70d28b0e4f8ed2d38571e7b1f9bec7f321a53102
> 

Re: [LKP] [btrfs] 44fe89de7d: aim7.jobs-per-min -15.1% regression

2019-03-12 Thread Qu Wenruo


On 2019/3/12 下午9:50, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a -15.1% regression of aim7.jobs-per-min due to commit:
> 
> 
> commit: 44fe89de7d5157a4f31f13d94802c7619e23f462 ("btrfs: Do mandatory tree 
> block check before submitting bio")

That commit will cause extra check before writing tree block back onto disk.

It's expected to cause regression for metadata heavy workload.

I'm more interesting if there is any new real world performance
regression like database or other more dedicated workload.

Thanks,
Qu

> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> in testcase: aim7
> on test machine: 72 threads Intel(R) Xeon(R) Gold 6139 CPU @ 2.30GHz with 
> 128G memory
> with following parameters:
> 
>   disk: 4BRD_12G
>   md: RAID0
>   fs: btrfs
>   test: sync_disk_rw
>   load: 20
>   cpufreq_governor: performance
> 
> test-description: AIM7 is a traditional UNIX system level benchmark suite 
> which is used to test and measure the performance of multiuser system.
> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
> 
> In addition to that, the commit also has significant impact on the following 
> tests:
> 
> +--+---+
> | testcase: change | aim7: aim7.jobs-per-min -20.8% regression
>  |
> | test machine | 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
> with 384G memory |
> | test parameters  | cpufreq_governor=performance 
>  |
> |  | disk=4BRD_12G
>  |
> |  | fs=btrfs 
>  |
> |  | load=20  
>  |
> |  | md=RAID0 
>  |
> |  | test=sync_disk_rw
>  |
> +--+---+
> 
> 
> Details are as below:
> -->
> 
> 
> To reproduce:
> 
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
> 
> =
> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>   
> gcc-8/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/20/RAID0/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp7/sync_disk_rw/aim7
> 
> commit: 
>   a92e9c3a3c ("btrfs: extent_io: Handle error better in extent_writepages()")
>   44fe89de7d ("btrfs: Do mandatory tree block check before submitting bio")
> 
> a92e9c3a3cef4c0c 44fe89de7d5157a4f31f13d9480 
>  --- 
>fail:runs  %reproductionfail:runs
>| | |
>:4   50%   2:4 
> dmesg.WARNING:at#for_ip_interrupt_entry/0x
>  %stddev %change %stddev
>  \  |\  
>   1204   -15.1%   1022aim7.jobs-per-min
>  99.66   +18.3% 117.93aim7.time.elapsed_time
>  99.66   +18.3% 117.93aim7.time.elapsed_time.max
>   3014 ±  2%  -3.1%   2921aim7.time.minor_page_faults
>  1.236e+09   +44.6%  1.787e+09 ± 27%  cpuidle.C1.time
>  90473   +15.8% 104758 ±  2%  meminfo.AnonHugePages
> 102742   -16.5%  85841 ±  2%  meminfo.max_used_kB
>  11.68-1.89.85mpstat.cpu.-1.sys%
>   0.07 ±  7%  -0.00.05 ± 17%  mpstat.cpu.-1.usr%
> 664.25 ±  4% +18.1% 784.25 ±  6%  
> slabinfo.dmaengine-unmap-16.active_objs
> 664.25 ±  4% +18.1% 784.25 ±  6%  
> slabinfo.dmaengine-unmap-16.num_objs
>  88.00+2.3%  90.00vmstat.cpu.id
> 346712   -15.5% 293086vmstat.io.bo
> 558192   -15.4% 472096vmstat.system.cs
>  88.42+2.0%  90.19iostat.cpu.idle
>  11.50   -15.3%   9.74iostat.cpu.system
>  19892   -15.2%  16877iostat.md0.w/s
> 451087   -15.4% 381436iostat.md0.wkB/s
> 252.75 ±  2% -19.8% 202.67 ± 16%  
> sched_debug.cfs_rq:/.removed.util_avg.max
>  57.62 ±  2% +71.4%  98.79 ± 51%  sched_debug.cpu.cpu_load[1].max
>  47.75 ± 13% +81.6%  86.71 ± 43%  sched_debug.cpu.cpu_load[2].max
>   7.35 ± 13% -16.4%   6.14 ±  4%  sched_debug.cpu.cpu_load[4].avg
> 422.75  

Re: [PATCH v2 -next] btrfs: Remove unnecessary casts in btrfs_read_root_item

2019-02-20 Thread Qu Wenruo


On 2019/2/20 下午8:32, YueHaibing wrote:
> There is a messy cast here:
>   min_t(int, len, (int)sizeof(*item)));
> 
> min_t() should normally cast to unsigned.  It's not possible for
> "len" to be negative, but if it were then we definitely
> wouldn't want to pass negatives to read_extent_buffer().  Also there
> is an extra cast.
> 
> This patch shouldn't affect runtime, it's just a clean up.
> 
> Suggested-by: Dan Carpenter 
> Signed-off-by: YueHaibing 

Reviewed-by: Qu Wenruo 

The commit message is much better.

Thanks,
Qu

> ---
> v2: modify commit message as Dan suggested 
> ---
>  fs/btrfs/root-tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
> index 02d1a57af78b..893d12fbfda0 100644
> --- a/fs/btrfs/root-tree.c
> +++ b/fs/btrfs/root-tree.c
> @@ -21,12 +21,12 @@ static void btrfs_read_root_item(struct extent_buffer 
> *eb, int slot,
>   struct btrfs_root_item *item)
>  {
>   uuid_le uuid;
> - int len;
> + u32 len;
>   int need_reset = 0;
>  
>   len = btrfs_item_size_nr(eb, slot);
>   read_extent_buffer(eb, item, btrfs_item_ptr_offset(eb, slot),
> - min_t(int, len, (int)sizeof(*item)));
> +min_t(u32, len, sizeof(*item)));
>   if (len < sizeof(*item))
>   need_reset = 1;
>   if (!need_reset && btrfs_root_generation(item)
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH -next] btrfs: Fix type conversion in btrfs_read_root_item

2019-02-19 Thread Qu Wenruo


On 2019/2/20 上午11:08, YueHaibing wrote:
> btrfs_item_size_nr return value is u32, convert it to int may result
> in truncation.Also read_extent_buffer expect a unsigned param, so
> min_t should use type u32 to compare.

Btrfs has a up limit on item size, it will never exceed 64K - various
overhead.

Furthermore, btrfs has metadata read time check to exclude such
obviously corrupted tree blocks, thus corrupted tree block will never
reach here.

Thanks,
Qu

> 
> Fixes: 8ea05e3a4262 ("Btrfs: introduce subvol uuids and times")
> Signed-off-by: YueHaibing 
> ---
>  fs/btrfs/root-tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
> index 02d1a57af78b..893d12fbfda0 100644
> --- a/fs/btrfs/root-tree.c
> +++ b/fs/btrfs/root-tree.c
> @@ -21,12 +21,12 @@ static void btrfs_read_root_item(struct extent_buffer 
> *eb, int slot,
>   struct btrfs_root_item *item)
>  {
>   uuid_le uuid;
> - int len;
> + u32 len;
>   int need_reset = 0;
>  
>   len = btrfs_item_size_nr(eb, slot);
>   read_extent_buffer(eb, item, btrfs_item_ptr_offset(eb, slot),
> - min_t(int, len, (int)sizeof(*item)));
> +min_t(u32, len, sizeof(*item)));
>   if (len < sizeof(*item))
>   need_reset = 1;
>   if (!need_reset && btrfs_root_generation(item)
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [LKP] [btrfs] 05a37c4860: kmsg.BTRFS_error(device_vdd):failed_to_verify_dev_extents_against_chunks

2019-01-11 Thread Qu Wenruo


On 2019/1/11 下午10:03, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 05a37c48604c19b50873fd9663f9140c150469d1 ("btrfs: volumes: Make sure 
> no dev extent is beyond device boundary")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: xfstests
> with following parameters:
> 
>   disk: 6HDD
>   fs: btrfs
>   test: btrfs-group1
> 
> test-description: xfstests is a regression test suite for xfs and other files 
> ystems.
> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> 
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

For the LKP tests, would you please not bind all fstests test cases into
one LKP test case?

That's pretty hard for us to locate the problem. not to mention there
will be tons of generic tests, and new tests could easily screw up your
existing result.

It will make more sense to generate test cases based on
fstests/tests/btrfs/group, and save their result for each fstests test case.

> 
> 
> 
> 
> 2019-01-09 07:40:26 export TEST_DIR=/fs/vda
> 2019-01-09 07:40:26 export TEST_DEV=/dev/vda
> 2019-01-09 07:40:26 export FSTYP=btrfs
> 2019-01-09 07:40:26 export SCRATCH_MNT=/fs/scratch
> 2019-01-09 07:40:26 mkdir /fs/scratch -p
> 2019-01-09 07:40:26 export SCRATCH_DEV_POOL="/dev/vdb /dev/vdc /dev/vdd 
> /dev/vde /dev/vdf"
> 2019-01-09 07:40:26 sed "s:^:btrfs/:" 
> /lkp/lkp/src/pack/xfstests-addon/tests/btrfs-group1 | grep -F -f 
> merged_ignored_files
> ignored by lkp: btrfs/145
> ignored by lkp: btrfs/147
> ignored by lkp: btrfs/149
> ignored by lkp: btrfs/153
> ignored by lkp: btrfs/155
> 2019-01-09 07:40:26 sed "s:^:btrfs/:" 
> /lkp/lkp/src/pack/xfstests-addon/tests/btrfs-group1 | grep -v -F -f 
> merged_ignored_files
> 2019-01-09 07:40:26 ./check btrfs/010 btrfs/026 btrfs/027 btrfs/028 btrfs/116 
> btrfs/117 btrfs/118 btrfs/119 btrfs/120 btrfs/121 btrfs/122 btrfs/123 
> btrfs/124 btrfs/125 btrfs/126 btrfs/127 btrfs/128 btrfs/129 btrfs/131 
> btrfs/132 btrfs/133 btrfs/134 btrfs/135 btrfs/136 btrfs/137 btrfs/138 
> btrfs/139 btrfs/140 btrfs/141 btrfs/142 btrfs/143 btrfs/144 btrfs/146 
> btrfs/148 btrfs/150 btrfs/151 btrfs/152 btrfs/154 btrfs/156 btrfs/157 
> btrfs/158 btrfs/159 btrfs/160 btrfs/161 btrfs/162 btrfs/163 btrfs/164 
> btrfs/165 btrfs/166 btrfs/167 btrfs/168 btrfs/169 btrfs/170 btrfs/171
> FSTYP -- btrfs
> PLATFORM  -- Linux/x86_64 vm-snb-4G-105 4.20.0-rc7-00010-g05a37c4
> MKFS_OPTIONS  -- /dev/vdb
> MOUNT_OPTIONS -- /dev/vdb /fs/scratch
> 
> btrfs/010  157s
> btrfs/026  4s
> btrfs/027  7s
> btrfs/028  31s
> btrfs/116 [not run] FITRIM not supported on /fs/scratch
> btrfs/117  6s
> btrfs/118  1s
> btrfs/119  1s
> btrfs/120  1s
> btrfs/121  1s
> btrfs/122  8s
> btrfs/123  2s
> btrfs/124  25s
> btrfs/125  15s
> btrfs/126  0s
> btrfs/127  1s
> btrfs/128  0s
> btrfs/129  1s
> btrfs/131  1s
> btrfs/132  32s
> btrfs/133  2s
> btrfs/134  1s
> btrfs/135  1s
> btrfs/136  91s
> btrfs/137  0s
> btrfs/138  80s
> btrfs/139 - output mismatch (see 
> /lkp/benchmarks/xfstests/results//btrfs/139.out.bad)
> --- tests/btrfs/139.out   2018-09-19 20:13:26.0 +
> +++ /lkp/benchmarks/xfstests/results//btrfs/139.out.bad   2019-01-09 
> 07:48:30.61900 +
> @@ -1,4 +1,616 @@
>  QA output created by 139
> +pwrite: Disk quota exceeded
> +/fs/scratch/subvol/file_26: Disk quota exceeded
> +/fs/scratch/subvol/file_27: Disk quota exceeded
> +/fs/scratch/subvol/file_28: Disk quota exceeded
> +/fs/scratch/subvol/file_29: Disk quota exceeded
> +/fs/scratch/subvol/file_30: Disk quota exceeded

That's a known regression, it's recommended to blacklist this test case.
We know the cause, but find it pretty tricky to fix.

> ...
> (Run 'diff -u tests/btrfs/139.out 
> /lkp/benchmarks/xfstests/results//btrfs/139.out.bad'  to see the entire diff)
> btrfs/140  6s
> btrfs/141  1s
> btrfs/142  0s
> btrfs/143  2s
> btrfs/144  1s
> btrfs/146  1s
> btrfs/148  1s
> btrfs/150  0s
> btrfs/151  3s
> btrfs/152  3s
> btrfs/154 [failed, exit status 1]- output mismatch (see 
> /lkp/benchmarks/xfstests/results//btrfs/154.out.bad)
> --- tests/btrfs/154.out   2018-09-19 20:13:26.0 +
> +++ /lkp/benchmarks/xfstests/results//btrfs/154.out.bad   2019-01-09 
> 07:48:51.92800 +
> @@ -6,5 +6,5 @@
>  scan missing dev and write
>  
>  run balance
> -
> -mount reconstructed dev only and check md5sum
> +failed: '/bin/btrfs balance start --full-balance -dconvert=raid1 
> -mconvert=raid1 /fs/scratch'
> +(see /lkp/benchmarks/xfstests/results//btrfs/154.full for details)
> ...
> (Run 'diff -u tests/btrfs/154.out 
> 

Re: [PATCH v2] btrfs: add a check for sysfs_create_group

2018-12-25 Thread Qu Wenruo


On 2018/12/26 下午1:37, Kangjie Lu wrote:
> In case sysfs_create_group fails, let's check its return value and
> issues an error message.
> 
> Signed-off-by: Kangjie Lu 
> ---
>  fs/btrfs/sysfs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 3717c864ba23..24ef416e700b 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -889,6 +889,8 @@ void btrfs_sysfs_feature_update(struct btrfs_fs_info 
> *fs_info,
>*/
>   sysfs_remove_group(fsid_kobj, _feature_attr_group);
>   ret = sysfs_create_group(fsid_kobj, _feature_attr_group);
> + if (ret)
> + btrfs_err(fs_info, "failed to create 
> btrfs_feature_attr_group.\n");

Forgot to mention, for btrfs_* infrastructure, no need for the ending '\n'.

Despite that, looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

>  }
>  
>  static int btrfs_init_debugfs(void)
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: add a check for sysfs_create_group

2018-12-25 Thread Qu Wenruo


On 2018/12/26 上午11:46, Kangjie Lu wrote:
> In case sysfs_create_group fails, let's check its return value and
> issues an error message.
> 
> Signed-off-by: Kangjie Lu 
> ---
>  fs/btrfs/sysfs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 3717c864ba23..62529153a51a 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -889,6 +889,8 @@ void btrfs_sysfs_feature_update(struct btrfs_fs_info 
> *fs_info,
>*/
>   sysfs_remove_group(fsid_kobj, _feature_attr_group);
>   ret = sysfs_create_group(fsid_kobj, _feature_attr_group);
> + if (ret)
> + pr_err("failed to create btrfs_feature_attr_group.\n");

Btrfs have better error message infrastructures (e.g. distinguish
different filesystems).

Please use btrfs_error() or btrfs_warn() instead.

Despite that, I think the patch looks good.

Thanks,
Qu

>  }
>  
>  static int btrfs_init_debugfs(void)
> 



signature.asc
Description: OpenPGP digital signature


ridiculously slow VM memory performance on Ryzen CPU

2018-04-25 Thread Qu Wenruo
Hi,

When testing IO heavy work on my VM backed by Ryzen 1700 CPU, I turned
to brd modules, but surprisingly, the speed is even slower than some HDD:

---
$ sudo modprobe brd rd_nr=1 rd_size=1048576
$ dd if=/dev/zero of=/dev/ram0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.9928 s, 107 MB/s
---
107MB is pretty lame...
Even some HDD could be faster than this.

On host, it's much better:
---
$ if=/dev/zero of=/dev/ram0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.754641 s, 1.4 GB/s
---

For host hardware:
CPU: Ryzen 1700 All cores @ 3.8G
Mem: DDR4 2400 dual channel (8G x 2)

For host software:
Kernel:   4.16.3-1-ARCH
Qemu: 2.11.1-2
Distribution: Archlinux

VM setup is mostly default setup done by libvirt.

I'm not sure if this is related to this bug:
https://www.redhat.com/archives/vfio-users/2017-April/msg00019.html

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


ridiculously slow VM memory performance on Ryzen CPU

2018-04-25 Thread Qu Wenruo
Hi,

When testing IO heavy work on my VM backed by Ryzen 1700 CPU, I turned
to brd modules, but surprisingly, the speed is even slower than some HDD:

---
$ sudo modprobe brd rd_nr=1 rd_size=1048576
$ dd if=/dev/zero of=/dev/ram0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.9928 s, 107 MB/s
---
107MB is pretty lame...
Even some HDD could be faster than this.

On host, it's much better:
---
$ if=/dev/zero of=/dev/ram0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.754641 s, 1.4 GB/s
---

For host hardware:
CPU: Ryzen 1700 All cores @ 3.8G
Mem: DDR4 2400 dual channel (8G x 2)

For host software:
Kernel:   4.16.3-1-ARCH
Qemu: 2.11.1-2
Distribution: Archlinux

VM setup is mostly default setup done by libvirt.

I'm not sure if this is related to this bug:
https://www.redhat.com/archives/vfio-users/2017-April/msg00019.html

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [lkp-robot] [btrfs] 7eafb77890: stderr.fs_mark:fsync_failed_Input/output_error

2018-04-22 Thread Qu Wenruo
The latest patch handles it by introducing new @super_num parameter and
different check timing.

Thanks,
Qu

On 2018年04月23日 09:20, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 7eafb77890ff459863e3bc772465cb641c14f754 ("btrfs: Do super block 
> verification before writing it to disk")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> in testcase: fsmark
> with following parameters:
> 
>   iterations: 1x
>   nr_threads: 1t
>   disk: 1BRD_32G
>   fs: btrfs
>   fs2: nfsv4
>   filesize: 4K
>   test_size: 4G
>   sync_method: fsyncBeforeClose
>   nr_files_per_directory: 1fpd
>   cpufreq_governor: performance
> 
> test-description: The fsmark is a file system benchmark to test synchronous 
> write workloads, for example, mail servers workload.
> test-url: https://sourceforge.net/projects/fsmark/
> 
> 
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 64G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> [  128.496659] [ cut here ]
> [  128.502439] BTRFS: Transaction aborted (error -117)
> [  128.50[  128.521041] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver 
> nfsd auth_rpcgss dm_mod brd btrfs xor zstd_decompress zstd_compress xxhash 
> raid6_pq sd_mod sg intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
> mgag200 ghash_clmulni_intel snd_pcm ttm snd_timer pcbc drm_kms_helper ipmi_si 
> mxm_wmi snd syscopyarea aesni_intel ipmi_devintf sysfillrect soundcore 
> crypto_simd sysimgblt ahci fb_sys_fops glue_helper cryptd libahci drm 
> ipmi_msghandler pcspkr libata shpchp wmi acpi_power_meter acpi_pad ip_tables
> [  128.581706] CPU: 21 PID: 1581 Comm: nfsd Not tainted 
> 4.16.0-rc7-00174-g7eafb77 #1
> [  128.590723] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS 
> SE5C610.86B.01.01.0019.101220160604 10/12/2016
> [  128.602982] RIP: 0010:btrfs_sync_log+0x9c4/0xbe0 [btrfs]
> [  128.609603] RSP: 0018:c90008903ad0 EFLAGS: 00010282
> [  128.616053] RAX: 0027 RBX: 881037f0c800 RCX: 
> 
> [  128.624712] RDX: 88085f75ed00 RSI: 88085f756918 RDI: 
> 88085f756918
> [  128.633366] RBP: c90008903bb0 R08: 0807 R09: 
> 00aa
> [  128.641948] R10: c90008903a00 R11: 88085a4c2ec0 R12: 
> 881074934000
> [  128.650601] R13: ff8b R14: 88107a63 R15: 
> 881074933800
> [  128.659213] FS:  () GS:88085f74() 
> knlGS:
> [  128.668879] CS:  0010 DS:  ES:  CR0: 80050033
> [  128.675904] CR2: 7f4a6a747660 CR3: 00107f20a001 CR4: 
> 003606e0
> [  128.684570] DR0:  DR1:  DR2: 
> 
> [  128.693185] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  128.701795] Call Trace:
> [  128.705166]  ? btrfs_sync_file+0x2f3/0x3f0 [btrfs]
> [  128.711151]  btrfs_sync_file+0x2f3/0x3f0 [btrfs]
> [  128.716910]  btrfs_file_write_iter+0x440/0x550 [btrfs]
> [  128.723284]  do_iter_readv_writev+0x116/0x170
> [  128.728753]  do_iter_write+0x80/0x190
> [  128.733487]  nfsd_vfs_write+0xaf/0x370 [nfsd]
> [  128.738885]  nfsd4_write+0x179/0x1c0 [nfsd]
> [  128.744121]  nfsd4_proc_compound+0x3f1/0x640 [nfsd]
> [  128.750123]  nfsd_dispatch+0xf5/0x230 [nfsd]
> [  128.755499]  svc_process_common+0x496/0x680
> [  128.760732]  ? nfsd_destroy+0x60/0x60 [nfsd]
> [  128.765996]  svc_process+0xed/0x1b0
> [  128.770499]  nfsd+0xf1/0x160 [nfsd]
> [  128.774899]  kthread+0x11e/0x140
> [  128.778988]  ? kthread_associate_blkcg+0xb0/0xb0
> [  128.784679]  ret_from_fork+0x35/0x40
> [  128.789187] Code: 00 00 48 8b 42 50 f0 48 0f ba a8 e8 cd 00 00 02 72 1b 41 
> 83 fd fb 0f 84 6f 01 00 00 44 89 ee 48 c7 c7 40 ec 76 a0 e8 3c f1 95 e0 <0f> 
> 0b 48 8b bd 60 ff ff ff 44 89 e9 ba 1d 0c 00 00 48 c7 c6 20 
> [  128.811411] ---[ end trace 6c998d6c6547e8f7 ]---
> [  128.817132] BTRFS: error (device ram0) in btrfs_sync_log:3101: errno=-117 
> unknown
> [  128.826014] BTRFS info (device ram0): forced readonly
> [  128.835886] fs_mark: fsync failed Input/output error
> 
> 2018-04-19 17:23:24 fs_mark -d /nfs/ram0/1 -D 100 -N 1 -n 100 -L 1 -S 
> 1 -s 4096
> 
> #  fs_mark  -d  /nfs/ram0/1  -D  100  -N  1  -n  100  -L  1  -S  1  
> -s  4096 
> # Version 3.3, 1 thread(s) starting at Thu Apr 19 17:23:25 2018
> # Sync method: INBAND FSYNC: fsync() per file in write loop.
> # Directories:  Round Robin between directories across 100 
> subdirectories with 1 files per subdirectory.
> # File names: 40 bytes long, (16 initial bytes of time stamp with 24 
> random bytes at end of name)
> # Files info: size 4096 bytes, written with an IO size of 16384 bytes per 
> write
> # App 

Re: [lkp-robot] [btrfs] 7eafb77890: stderr.fs_mark:fsync_failed_Input/output_error

2018-04-22 Thread Qu Wenruo
The latest patch handles it by introducing new @super_num parameter and
different check timing.

Thanks,
Qu

On 2018年04月23日 09:20, kernel test robot wrote:
> 
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 7eafb77890ff459863e3bc772465cb641c14f754 ("btrfs: Do super block 
> verification before writing it to disk")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> in testcase: fsmark
> with following parameters:
> 
>   iterations: 1x
>   nr_threads: 1t
>   disk: 1BRD_32G
>   fs: btrfs
>   fs2: nfsv4
>   filesize: 4K
>   test_size: 4G
>   sync_method: fsyncBeforeClose
>   nr_files_per_directory: 1fpd
>   cpufreq_governor: performance
> 
> test-description: The fsmark is a file system benchmark to test synchronous 
> write workloads, for example, mail servers workload.
> test-url: https://sourceforge.net/projects/fsmark/
> 
> 
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 64G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> [  128.496659] [ cut here ]
> [  128.502439] BTRFS: Transaction aborted (error -117)
> [  128.50[  128.521041] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver 
> nfsd auth_rpcgss dm_mod brd btrfs xor zstd_decompress zstd_compress xxhash 
> raid6_pq sd_mod sg intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
> mgag200 ghash_clmulni_intel snd_pcm ttm snd_timer pcbc drm_kms_helper ipmi_si 
> mxm_wmi snd syscopyarea aesni_intel ipmi_devintf sysfillrect soundcore 
> crypto_simd sysimgblt ahci fb_sys_fops glue_helper cryptd libahci drm 
> ipmi_msghandler pcspkr libata shpchp wmi acpi_power_meter acpi_pad ip_tables
> [  128.581706] CPU: 21 PID: 1581 Comm: nfsd Not tainted 
> 4.16.0-rc7-00174-g7eafb77 #1
> [  128.590723] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS 
> SE5C610.86B.01.01.0019.101220160604 10/12/2016
> [  128.602982] RIP: 0010:btrfs_sync_log+0x9c4/0xbe0 [btrfs]
> [  128.609603] RSP: 0018:c90008903ad0 EFLAGS: 00010282
> [  128.616053] RAX: 0027 RBX: 881037f0c800 RCX: 
> 
> [  128.624712] RDX: 88085f75ed00 RSI: 88085f756918 RDI: 
> 88085f756918
> [  128.633366] RBP: c90008903bb0 R08: 0807 R09: 
> 00aa
> [  128.641948] R10: c90008903a00 R11: 88085a4c2ec0 R12: 
> 881074934000
> [  128.650601] R13: ff8b R14: 88107a63 R15: 
> 881074933800
> [  128.659213] FS:  () GS:88085f74() 
> knlGS:
> [  128.668879] CS:  0010 DS:  ES:  CR0: 80050033
> [  128.675904] CR2: 7f4a6a747660 CR3: 00107f20a001 CR4: 
> 003606e0
> [  128.684570] DR0:  DR1:  DR2: 
> 
> [  128.693185] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  128.701795] Call Trace:
> [  128.705166]  ? btrfs_sync_file+0x2f3/0x3f0 [btrfs]
> [  128.711151]  btrfs_sync_file+0x2f3/0x3f0 [btrfs]
> [  128.716910]  btrfs_file_write_iter+0x440/0x550 [btrfs]
> [  128.723284]  do_iter_readv_writev+0x116/0x170
> [  128.728753]  do_iter_write+0x80/0x190
> [  128.733487]  nfsd_vfs_write+0xaf/0x370 [nfsd]
> [  128.738885]  nfsd4_write+0x179/0x1c0 [nfsd]
> [  128.744121]  nfsd4_proc_compound+0x3f1/0x640 [nfsd]
> [  128.750123]  nfsd_dispatch+0xf5/0x230 [nfsd]
> [  128.755499]  svc_process_common+0x496/0x680
> [  128.760732]  ? nfsd_destroy+0x60/0x60 [nfsd]
> [  128.765996]  svc_process+0xed/0x1b0
> [  128.770499]  nfsd+0xf1/0x160 [nfsd]
> [  128.774899]  kthread+0x11e/0x140
> [  128.778988]  ? kthread_associate_blkcg+0xb0/0xb0
> [  128.784679]  ret_from_fork+0x35/0x40
> [  128.789187] Code: 00 00 48 8b 42 50 f0 48 0f ba a8 e8 cd 00 00 02 72 1b 41 
> 83 fd fb 0f 84 6f 01 00 00 44 89 ee 48 c7 c7 40 ec 76 a0 e8 3c f1 95 e0 <0f> 
> 0b 48 8b bd 60 ff ff ff 44 89 e9 ba 1d 0c 00 00 48 c7 c6 20 
> [  128.811411] ---[ end trace 6c998d6c6547e8f7 ]---
> [  128.817132] BTRFS: error (device ram0) in btrfs_sync_log:3101: errno=-117 
> unknown
> [  128.826014] BTRFS info (device ram0): forced readonly
> [  128.835886] fs_mark: fsync failed Input/output error
> 
> 2018-04-19 17:23:24 fs_mark -d /nfs/ram0/1 -D 100 -N 1 -n 100 -L 1 -S 
> 1 -s 4096
> 
> #  fs_mark  -d  /nfs/ram0/1  -D  100  -N  1  -n  100  -L  1  -S  1  
> -s  4096 
> # Version 3.3, 1 thread(s) starting at Thu Apr 19 17:23:25 2018
> # Sync method: INBAND FSYNC: fsync() per file in write loop.
> # Directories:  Round Robin between directories across 100 
> subdirectories with 1 files per subdirectory.
> # File names: 40 bytes long, (16 initial bytes of time stamp with 24 
> random bytes at end of name)
> # Files info: size 4096 bytes, written with an IO size of 16384 bytes per 
> write
> # App 

Re: linux-next: build warning after merge of the btrfs-kdave tree

2017-12-21 Thread Qu Wenruo


On 2017年12月22日 00:49, David Sterba wrote:
> On Wed, Dec 20, 2017 at 08:12:11AM +0800, Qu Wenruo wrote:
>> On 2017年12月20日 06:20, Stephen Rothwell wrote:
>>> After merging the btrfs-kdave tree, today's linux-next build (powerpc
>>> ppc64_defconfig) produced this warning:
>>>
>>> fs/btrfs/qgroup.c: In function 'qgroup_reserve':
>>> fs/btrfs/qgroup.c:2432:1: warning: label 'retry' defined but not used 
>>> [-Wunused-label]
>>>  retry:
>>>  ^
>>>
>>> Introduced by commit
>>>
>>>   b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting 
>>> EDQUOT"")
>>>
>> Sorry, I forgot to clean it up.
>>
>> I'll update the patchset along with new patches to handle qgroup limit
>> better.
> 
> Meanwhile I've applied the fix from Arnd to silence the warning in
> linux-next builds.
> 

Some (not much, may be 2 or 3) patches is going to be updated:

btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" ( For the
lable)

And with 2 more new patches.

Do I need to resend the patchset or use separate patches for them?

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: linux-next: build warning after merge of the btrfs-kdave tree

2017-12-21 Thread Qu Wenruo


On 2017年12月22日 00:49, David Sterba wrote:
> On Wed, Dec 20, 2017 at 08:12:11AM +0800, Qu Wenruo wrote:
>> On 2017年12月20日 06:20, Stephen Rothwell wrote:
>>> After merging the btrfs-kdave tree, today's linux-next build (powerpc
>>> ppc64_defconfig) produced this warning:
>>>
>>> fs/btrfs/qgroup.c: In function 'qgroup_reserve':
>>> fs/btrfs/qgroup.c:2432:1: warning: label 'retry' defined but not used 
>>> [-Wunused-label]
>>>  retry:
>>>  ^
>>>
>>> Introduced by commit
>>>
>>>   b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting 
>>> EDQUOT"")
>>>
>> Sorry, I forgot to clean it up.
>>
>> I'll update the patchset along with new patches to handle qgroup limit
>> better.
> 
> Meanwhile I've applied the fix from Arnd to silence the warning in
> linux-next builds.
> 

Some (not much, may be 2 or 3) patches is going to be updated:

btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" ( For the
lable)

And with 2 more new patches.

Do I need to resend the patchset or use separate patches for them?

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: linux-next: build warning after merge of the btrfs-kdave tree

2017-12-19 Thread Qu Wenruo


On 2017年12月20日 06:20, Stephen Rothwell wrote:
> Hi David,
> 
> After merging the btrfs-kdave tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> fs/btrfs/qgroup.c: In function 'qgroup_reserve':
> fs/btrfs/qgroup.c:2432:1: warning: label 'retry' defined but not used 
> [-Wunused-label]
>  retry:
>  ^
> 
> Introduced by commit
> 
>   b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting 
> EDQUOT"")
> 
Sorry, I forgot to clean it up.

I'll update the patchset along with new patches to handle qgroup limit
better.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: linux-next: build warning after merge of the btrfs-kdave tree

2017-12-19 Thread Qu Wenruo


On 2017年12月20日 06:20, Stephen Rothwell wrote:
> Hi David,
> 
> After merging the btrfs-kdave tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> fs/btrfs/qgroup.c: In function 'qgroup_reserve':
> fs/btrfs/qgroup.c:2432:1: warning: label 'retry' defined but not used 
> [-Wunused-label]
>  retry:
>  ^
> 
> Introduced by commit
> 
>   b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting 
> EDQUOT"")
> 
Sorry, I forgot to clean it up.

I'll update the patchset along with new patches to handle qgroup limit
better.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-12-06 Thread Qu Wenruo


On 2017年12月06日 22:18, Arnd Bergmann wrote:
> The return value of sizeof() is of type size_t, so we must print it
> using the %z format modifier rather than %l to avoid this warning
> on some architectures:
> 
> fs/btrfs/tree-checker.c: In function 'check_dir_item':
> fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 
> 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} 
> [-Werror=format=]

Any idea about which architecture will cause such warning?
On x86_64 I always fail to get such warning.

> 
> Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item")

Reviewed-by: Qu Wenruo <w...@suse.com>

Thanks,
Qu

> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> ---
>  fs/btrfs/tree-checker.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 66dac0a4b01f..7c55e3ba5a6c 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -270,7 +270,7 @@ static int check_dir_item(struct btrfs_root *root,
>   /* header itself should not cross item boundary */
>   if (cur + sizeof(*di) > item_size) {
>   dir_item_err(root, leaf, slot,
> - "dir item header crosses item boundary, have %lu boundary %u",
> + "dir item header crosses item boundary, have %zu boundary %u",
>   cur + sizeof(*di), item_size);
>   return -EUCLEAN;
>   }
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-12-06 Thread Qu Wenruo


On 2017年12月06日 22:18, Arnd Bergmann wrote:
> The return value of sizeof() is of type size_t, so we must print it
> using the %z format modifier rather than %l to avoid this warning
> on some architectures:
> 
> fs/btrfs/tree-checker.c: In function 'check_dir_item':
> fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 
> 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} 
> [-Werror=format=]

Any idea about which architecture will cause such warning?
On x86_64 I always fail to get such warning.

> 
> Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item")

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> Signed-off-by: Arnd Bergmann 
> ---
>  fs/btrfs/tree-checker.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 66dac0a4b01f..7c55e3ba5a6c 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -270,7 +270,7 @@ static int check_dir_item(struct btrfs_root *root,
>   /* header itself should not cross item boundary */
>   if (cur + sizeof(*di) > item_size) {
>   dir_item_err(root, leaf, slot,
> - "dir item header crosses item boundary, have %lu boundary %u",
> + "dir item header crosses item boundary, have %zu boundary %u",
>   cur + sizeof(*di), item_size);
>   return -EUCLEAN;
>   }
> 



signature.asc
Description: OpenPGP digital signature


Re: [GIT PULL] Btrfs changes for 4.15

2017-11-13 Thread Qu Wenruo
pression level for zlib
> 
> Goldwyn Rodrigues (1):
>   btrfs: cleanup extent locking sequence
> 
> Gu JinXiang (2):
>   btrfs: Use bd_dev to generate index when dev_state_hashtable add items.
>   btrfs: Fix bug for misused dev_t when lookup in dev state hash table.
> 
> Hans van Kranenburg (1):
>   btrfs: prefix sysfs attribute struct names
> 
> Josef Bacik (22):
>   btrfs: change how we decide to commit transactions during flushing
>   btrfs: fix send ioctl on 32bit with 64bit kernel
>   btrfs: add ref-verify mount option
>   btrfs: pass root to various extent ref mod functions
>   Btrfs: add a extent ref verify tool
>   Btrfs: only check delayed ref usage in should_end_transaction
>   btrfs: add a helper to return a head ref
>   btrfs: move extent_op cleanup to a helper
>   btrfs: breakout empty head cleanup to a helper
>   btrfs: move ref_mod modification into the if (ref) logic
>   btrfs: move all ref head cleanup to the helper function
>   btrfs: remove delayed_ref_node from ref_head
>   btrfs: remove type argument from comp_tree_refs
>   btrfs: add assertions for releasing trans handle reservations
>   Btrfs: rework outstanding_extents
>   btrfs: add tracepoints for outstanding extents mods
>   btrfs: make the delalloc block rsv per inode
>   btrfs: switch args for comp_*_refs
>   btrfs: add a comp_refs() helper
>   btrfs: track refs in a rb_tree instead of a list
>   btrfs: don't call btrfs_start_delalloc_roots in flushoncommit
>   btrfs: move btrfs_truncate_block out of trans handle
> 
> Kuanling Huang (1):
>   Btrfs: send, apply asynchronous page cache readahead to enhance page 
> read
> 
> Liu Bo (13):
>   Btrfs: remove batch plug in run_scheduled_IO
>   Btrfs: move finish_wait out of the loop
>   Btrfs: use wait_event instead of a single function
>   Btrfs: protect conditions within root->log_mutex while waiting
>   Btrfs: search parity device wisely
>   Btrfs: do not async submit for nodatasum inodes
>   Btrfs: make plug in writing meta blocks really work
>   Btrfs: remove bio_flags which indicates a meta block of log-tree
>   Btrfs: fix confusing worker helper info in stacktrace
>   Btrfs: fix memory leak in raid56
>   Btrfs: remove nr_async_bios
>   Btrfs: do not make defrag wait on async_delalloc_pages
>   Btrfs: remove nr_async_submits and async_submit_draining
> 
> Nikolay Borisov (11):
>   btrfs: Remove redundant forward declarations
>   btrfs: Remove unused variable
>   btrfs: Remove unused parameters from various functions
>   btrfs: Remove unused arguments from btrfs_changed_cb_t
>   btrfs: Remove unused parameter from check_direct_IO
>   btrfs: Rework error handling of add_extent_mapping in 
> __btrfs_alloc_chunk
>   btrfs: Remove redundant argument of __link_block_group
>   btrfs: Explicitly handle btrfs_update_root failure
>   btrfs: Refactor transaction handling in received subvolume ioctl
>   btrfs: Replace opencoded sizes with their symbolic constants
>   btrfs: send: remove unused code
> 
> Omar Sandoval (2):
>   Btrfs: make some volumes.c functions static
>   Btrfs: fix __user casting in ioctl.c
> 
> Qu Wenruo (9):
>   btrfs: Refactor check_leaf function for later expansion
>   btrfs: Check if item pointer overlaps with the item itself
>   btrfs: Add sanity check for EXTENT_DATA when reading out leaf
>   btrfs: Add checker for EXTENT_CSUM
>   btrfs: Move leaf and node validation checker to tree-checker.c
>   btrfs: tree-checker: Enhance btrfs_check_node output
>   btrfs: tree-checker: Enhance output for btrfs_check_leaf
>   btrfs: tree-checker: Enhance output for check_csum_item
>   btrfs: tree-checker: Enhance output for check_extent_data_item
> 
> Rakesh Pandit (1):
>   btrfs: use appropriate replacements for __sb_{start,end}_write calls
> 
> Satoru Takeuchi (1):
>   btrfs: convert all mount option checking code to use btrfs_test_opt
> 
> Thomas Meyer (1):
>   btrfs: Fix bool initialization/comparison
> 
> Timofey Titovets (9):
>   Btrfs: cleanup 'start' subtraction from try uncompressed inline extent
>   Btrfs: compress_file_range remove dead variable num_bytes
>   Btrfs: compression: separate heuristic/compression workspaces
>   Btrfs: heuristic: add bucket and sample counters and other defines
>   Btrfs: heuristic: implement sampling logic
>   Btrfs: heuristic: add detection of repeated data patterns
>   Btrfs: heuristic: add byte set calculation
>   Btrfs: heuristic: add byte co

Re: [GIT PULL] Btrfs changes for 4.15

2017-11-13 Thread Qu Wenruo
pression level for zlib
> 
> Goldwyn Rodrigues (1):
>   btrfs: cleanup extent locking sequence
> 
> Gu JinXiang (2):
>   btrfs: Use bd_dev to generate index when dev_state_hashtable add items.
>   btrfs: Fix bug for misused dev_t when lookup in dev state hash table.
> 
> Hans van Kranenburg (1):
>   btrfs: prefix sysfs attribute struct names
> 
> Josef Bacik (22):
>   btrfs: change how we decide to commit transactions during flushing
>   btrfs: fix send ioctl on 32bit with 64bit kernel
>   btrfs: add ref-verify mount option
>   btrfs: pass root to various extent ref mod functions
>   Btrfs: add a extent ref verify tool
>   Btrfs: only check delayed ref usage in should_end_transaction
>   btrfs: add a helper to return a head ref
>   btrfs: move extent_op cleanup to a helper
>   btrfs: breakout empty head cleanup to a helper
>   btrfs: move ref_mod modification into the if (ref) logic
>   btrfs: move all ref head cleanup to the helper function
>   btrfs: remove delayed_ref_node from ref_head
>   btrfs: remove type argument from comp_tree_refs
>   btrfs: add assertions for releasing trans handle reservations
>   Btrfs: rework outstanding_extents
>   btrfs: add tracepoints for outstanding extents mods
>   btrfs: make the delalloc block rsv per inode
>   btrfs: switch args for comp_*_refs
>   btrfs: add a comp_refs() helper
>   btrfs: track refs in a rb_tree instead of a list
>   btrfs: don't call btrfs_start_delalloc_roots in flushoncommit
>   btrfs: move btrfs_truncate_block out of trans handle
> 
> Kuanling Huang (1):
>   Btrfs: send, apply asynchronous page cache readahead to enhance page 
> read
> 
> Liu Bo (13):
>   Btrfs: remove batch plug in run_scheduled_IO
>   Btrfs: move finish_wait out of the loop
>   Btrfs: use wait_event instead of a single function
>   Btrfs: protect conditions within root->log_mutex while waiting
>   Btrfs: search parity device wisely
>   Btrfs: do not async submit for nodatasum inodes
>   Btrfs: make plug in writing meta blocks really work
>   Btrfs: remove bio_flags which indicates a meta block of log-tree
>   Btrfs: fix confusing worker helper info in stacktrace
>   Btrfs: fix memory leak in raid56
>   Btrfs: remove nr_async_bios
>   Btrfs: do not make defrag wait on async_delalloc_pages
>   Btrfs: remove nr_async_submits and async_submit_draining
> 
> Nikolay Borisov (11):
>   btrfs: Remove redundant forward declarations
>   btrfs: Remove unused variable
>   btrfs: Remove unused parameters from various functions
>   btrfs: Remove unused arguments from btrfs_changed_cb_t
>   btrfs: Remove unused parameter from check_direct_IO
>   btrfs: Rework error handling of add_extent_mapping in 
> __btrfs_alloc_chunk
>   btrfs: Remove redundant argument of __link_block_group
>   btrfs: Explicitly handle btrfs_update_root failure
>   btrfs: Refactor transaction handling in received subvolume ioctl
>   btrfs: Replace opencoded sizes with their symbolic constants
>   btrfs: send: remove unused code
> 
> Omar Sandoval (2):
>   Btrfs: make some volumes.c functions static
>   Btrfs: fix __user casting in ioctl.c
> 
> Qu Wenruo (9):
>   btrfs: Refactor check_leaf function for later expansion
>   btrfs: Check if item pointer overlaps with the item itself
>   btrfs: Add sanity check for EXTENT_DATA when reading out leaf
>   btrfs: Add checker for EXTENT_CSUM
>   btrfs: Move leaf and node validation checker to tree-checker.c
>   btrfs: tree-checker: Enhance btrfs_check_node output
>   btrfs: tree-checker: Enhance output for btrfs_check_leaf
>   btrfs: tree-checker: Enhance output for check_csum_item
>   btrfs: tree-checker: Enhance output for check_extent_data_item
> 
> Rakesh Pandit (1):
>   btrfs: use appropriate replacements for __sb_{start,end}_write calls
> 
> Satoru Takeuchi (1):
>   btrfs: convert all mount option checking code to use btrfs_test_opt
> 
> Thomas Meyer (1):
>   btrfs: Fix bool initialization/comparison
> 
> Timofey Titovets (9):
>   Btrfs: cleanup 'start' subtraction from try uncompressed inline extent
>   Btrfs: compress_file_range remove dead variable num_bytes
>   Btrfs: compression: separate heuristic/compression workspaces
>   Btrfs: heuristic: add bucket and sample counters and other defines
>   Btrfs: heuristic: implement sampling logic
>   Btrfs: heuristic: add detection of repeated data patterns
>   Btrfs: heuristic: add byte set calculation
>   Btrfs: heuristic: add byte co

Re: [PATCH] btrfs: tests: Fix a memory leak in error handling path in 'run_test()'

2017-09-10 Thread Qu Wenruo



On 2017年09月10日 19:19, Christophe JAILLET wrote:

If 'btrfs_alloc_path()' fails, we must free the resourses already
allocated, as done in the other error handling paths in this function.

Signed-off-by: Christophe JAILLET <christophe.jail...@wanadoo.fr>


Reviewed-by: Qu Wenruo <quwenruo.bt...@gmx.com>

BTW, I also checked all btrfs_alloc_path() in self tests, not such leak 
remaining.


Thanks,
Qu

---
  fs/btrfs/tests/free-space-tree-tests.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tests/free-space-tree-tests.c 
b/fs/btrfs/tests/free-space-tree-tests.c
index 1458bb0ea124..8444a018cca2 100644
--- a/fs/btrfs/tests/free-space-tree-tests.c
+++ b/fs/btrfs/tests/free-space-tree-tests.c
@@ -500,7 +500,8 @@ static int run_test(test_func_t test_func, int bitmaps, u32 
sectorsize,
path = btrfs_alloc_path();
if (!path) {
test_msg("Couldn't allocate path\n");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto out;
}
  
  	ret = add_block_group_free_space(, root->fs_info, cache);




Re: [PATCH] btrfs: tests: Fix a memory leak in error handling path in 'run_test()'

2017-09-10 Thread Qu Wenruo



On 2017年09月10日 19:19, Christophe JAILLET wrote:

If 'btrfs_alloc_path()' fails, we must free the resourses already
allocated, as done in the other error handling paths in this function.

Signed-off-by: Christophe JAILLET 


Reviewed-by: Qu Wenruo 

BTW, I also checked all btrfs_alloc_path() in self tests, not such leak 
remaining.


Thanks,
Qu

---
  fs/btrfs/tests/free-space-tree-tests.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tests/free-space-tree-tests.c 
b/fs/btrfs/tests/free-space-tree-tests.c
index 1458bb0ea124..8444a018cca2 100644
--- a/fs/btrfs/tests/free-space-tree-tests.c
+++ b/fs/btrfs/tests/free-space-tree-tests.c
@@ -500,7 +500,8 @@ static int run_test(test_func_t test_func, int bitmaps, u32 
sectorsize,
path = btrfs_alloc_path();
if (!path) {
test_msg("Couldn't allocate path\n");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto out;
}
  
  	ret = add_block_group_free_space(, root->fs_info, cache);




Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-07 Thread Qu Wenruo



At 03/07/2017 03:41 PM, Reshetova, Elena wrote:

At 03/06/2017 05:43 PM, Reshetova, Elena wrote:



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?


Always in  my environment, but I would not much go into investigating why it

happens, if it works for you.

My test environment is far from ideal: I am testing in VM with rather old

userspace and couple of additional changes in,

so there are many things that can potentially go wrong. Anyway the strace for

078 is in the attachment.

Thanks for the strace.

However no "-f" is passed to strace, so it doesn't contain much useful info.



If the patches pass all tests on your side, could you please take them in and

propagate further?

I will continue with other kernel subsystems.


The patchset itself looks like a common cleanup, while I did encounter
several cases (almost all scrub tests) causing kernel warning due to
underflow.


Oh, could you please send me the warning outputs? I can hopefully analyze and 
fix them.


Attached. Which is the generated by running btrfs/070 test case.
And I canceled the case almost instantly, so output is not much, but 
still contains enough info.


Both refcount_inc() and refcount_sub_and_test() are causing warning.

So now I'm not sure which is the cause, btrfs or bad use of refcount?

Thanks,
Qu



Best Regards,
Elena.



So I'm afraid the patchset will not be merged until we fix all the
underflows.

But thanks for the patchset, it helps us to expose a lot of problem.

Thanks,
Qu



Best Regards,
Elena.




I also see the -EINTR output, but that seems to be designed for
btrfs/11[45].

btrfs/078 is unrelated to qgroup, and all these three test pass in my
test environment, which is v4.11-rc1 with your patches applied.

I ran these 3 tests in a row with default and space_cache=v2 mount
options, and 5 times for each mount option, no hang at all.

It would help much if more info can be provided, from blocked process
backtrace to test mount option to base commit.

Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c 

Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-07 Thread Qu Wenruo



At 03/07/2017 03:41 PM, Reshetova, Elena wrote:

At 03/06/2017 05:43 PM, Reshetova, Elena wrote:



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?


Always in  my environment, but I would not much go into investigating why it

happens, if it works for you.

My test environment is far from ideal: I am testing in VM with rather old

userspace and couple of additional changes in,

so there are many things that can potentially go wrong. Anyway the strace for

078 is in the attachment.

Thanks for the strace.

However no "-f" is passed to strace, so it doesn't contain much useful info.



If the patches pass all tests on your side, could you please take them in and

propagate further?

I will continue with other kernel subsystems.


The patchset itself looks like a common cleanup, while I did encounter
several cases (almost all scrub tests) causing kernel warning due to
underflow.


Oh, could you please send me the warning outputs? I can hopefully analyze and 
fix them.


Attached. Which is the generated by running btrfs/070 test case.
And I canceled the case almost instantly, so output is not much, but 
still contains enough info.


Both refcount_inc() and refcount_sub_and_test() are causing warning.

So now I'm not sure which is the cause, btrfs or bad use of refcount?

Thanks,
Qu



Best Regards,
Elena.



So I'm afraid the patchset will not be merged until we fix all the
underflows.

But thanks for the patchset, it helps us to expose a lot of problem.

Thanks,
Qu



Best Regards,
Elena.




I also see the -EINTR output, but that seems to be designed for
btrfs/11[45].

btrfs/078 is unrelated to qgroup, and all these three test pass in my
test environment, which is v4.11-rc1 with your patches applied.

I ran these 3 tests in a row with default and space_cache=v2 mount
options, and 5 times for each mount option, no hang at all.

It would help much if more info can be provided, from blocked process
backtrace to test mount option to base commit.

Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c 

Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-06 Thread Qu Wenruo



At 03/06/2017 05:43 PM, Reshetova, Elena wrote:



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?


Always in  my environment, but I would not much go into investigating why it 
happens, if it works for you.
My test environment is far from ideal: I am testing in VM with rather old 
userspace and couple of additional changes in,
so there are many things that can potentially go wrong. Anyway the strace for 
078 is in the attachment.


Thanks for the strace.

However no "-f" is passed to strace, so it doesn't contain much useful info.



If the patches pass all tests on your side, could you please take them in and 
propagate further?
I will continue with other kernel subsystems.


The patchset itself looks like a common cleanup, while I did encounter 
several cases (almost all scrub tests) causing kernel warning due to 
underflow.


So I'm afraid the patchset will not be merged until we fix all the 
underflows.


But thanks for the patchset, it helps us to expose a lot of problem.

Thanks,
Qu



Best Regards,
Elena.




I also see the -EINTR output, but that seems to be designed for
btrfs/11[45].

btrfs/078 is unrelated to qgroup, and all these three test pass in my
test environment, which is v4.11-rc1 with your patches applied.

I ran these 3 tests in a row with default and space_cache=v2 mount
options, and 5 times for each mount option, no hang at all.

It would help much if more info can be provided, from blocked process
backtrace to test mount option to base commit.

Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 

Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-06 Thread Qu Wenruo



At 03/06/2017 05:43 PM, Reshetova, Elena wrote:



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?


Always in  my environment, but I would not much go into investigating why it 
happens, if it works for you.
My test environment is far from ideal: I am testing in VM with rather old 
userspace and couple of additional changes in,
so there are many things that can potentially go wrong. Anyway the strace for 
078 is in the attachment.


Thanks for the strace.

However no "-f" is passed to strace, so it doesn't contain much useful info.



If the patches pass all tests on your side, could you please take them in and 
propagate further?
I will continue with other kernel subsystems.


The patchset itself looks like a common cleanup, while I did encounter 
several cases (almost all scrub tests) causing kernel warning due to 
underflow.


So I'm afraid the patchset will not be merged until we fix all the 
underflows.


But thanks for the patchset, it helps us to expose a lot of problem.

Thanks,
Qu



Best Regards,
Elena.




I also see the -EINTR output, but that seems to be designed for
btrfs/11[45].

btrfs/078 is unrelated to qgroup, and all these three test pass in my
test environment, which is v4.11-rc1 with your patches applied.

I ran these 3 tests in a row with default and space_cache=v2 mount
options, and 5 times for each mount option, no hang at all.

It would help much if more info can be provided, from blocked process
backtrace to test mount option to base commit.

Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 

Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?

I also see the -EINTR output, but that seems to be designed for 
btrfs/11[45].


btrfs/078 is unrelated to qgroup, and all these three test pass in my 
test environment, which is v4.11-rc1 with your patches applied.


I ran these 3 tests in a row with default and space_cache=v2 mount 
options, and 5 times for each mount option, no hang at all.


It would help much if more info can be provided, from blocked process 
backtrace to test mount option to base commit.


Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)






Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?

I also see the -EINTR output, but that seems to be designed for 
btrfs/11[45].


btrfs/078 is unrelated to qgroup, and all these three test pass in my 
test environment, which is v4.11-rc1 with your patches applied.


I ran these 3 tests in a row with default and space_cache=v2 mount 
options, and 5 times for each mount option, no hang at all.


It would help much if more info can be provided, from blocked process 
backtrace to test mount option to base commit.


Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)






Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


-EINTR? That's strange.

Any blocked process backtrace?


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096


Known one, and fixes already sent to mail list while not merged yet:
https://patchwork.kernel.org/patch/9592765/

Thanks,
Qu


 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)






Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


-EINTR? That's strange.

Any blocked process backtrace?


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096


Known one, and fixes already sent to mail list while not merged yet:
https://patchwork.kernel.org/patch/9592765/

Thanks,
Qu


 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)






Re: [PATCH 1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()

2016-12-21 Thread Qu Wenruo



At 12/21/2016 04:28 PM, Sebastian Andrzej Siewior wrote:

On 2016-12-21 08:33:03 [+0800], Qu Wenruo wrote:

The trace point only uses the pointer, and this helps us to pair with
btrfs_work_queued/sched.


| /* For situiations that the work is freed */
| DECLARE_EVENT_CLASS(btrfs__work__done,
|
| TP_PROTO(struct btrfs_work *work),
|
| TP_ARGS(work),
|
| TP_STRUCT__entry_btrfs(
| __field(void *, work)
| ),
|
| TP_fast_assign_btrfs(btrfs_work_owner(work),
| __entry->work   = work;
| ),
|
| TP_printk_btrfs("work->%p", __entry->work)
| );

and btrfs_work_owner exapnds to:

| struct btrfs_fs_info *
| btrfs_work_owner(struct btrfs_work *work)
| {
| return work->wq->fs_info;
| }

voilà


Oh I got it, thanks very much.

The btrfs_work_owner() is newly introduced, no wonder I didn't know that.


I think we can fix it by extracting fs_info pointer before running the 
work, and using the extracted one in the trace point.


Thanks,
Qu





But I still don't understand why backtrace is triggered.
Since we're just recording a pointer, not touching it.

Would you please explain the problem with more details on how it trigger the
problem?


enabled all events played with the fs which was just an upgrade and git
tree sync + checkout so nothing special.



So I think we should either remove the tracepoint completely or change
the arguments to take something else than a potentially freed 'work'.


I'm mostly OK to remove the tracepoint, but such all_workd_done() trace
should still help to determine if it's a workqueue stalled.

Thanks,
Qu


Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







Re: [PATCH 1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()

2016-12-21 Thread Qu Wenruo



At 12/21/2016 04:28 PM, Sebastian Andrzej Siewior wrote:

On 2016-12-21 08:33:03 [+0800], Qu Wenruo wrote:

The trace point only uses the pointer, and this helps us to pair with
btrfs_work_queued/sched.


| /* For situiations that the work is freed */
| DECLARE_EVENT_CLASS(btrfs__work__done,
|
| TP_PROTO(struct btrfs_work *work),
|
| TP_ARGS(work),
|
| TP_STRUCT__entry_btrfs(
| __field(void *, work)
| ),
|
| TP_fast_assign_btrfs(btrfs_work_owner(work),
| __entry->work   = work;
| ),
|
| TP_printk_btrfs("work->%p", __entry->work)
| );

and btrfs_work_owner exapnds to:

| struct btrfs_fs_info *
| btrfs_work_owner(struct btrfs_work *work)
| {
| return work->wq->fs_info;
| }

voilà


Oh I got it, thanks very much.

The btrfs_work_owner() is newly introduced, no wonder I didn't know that.


I think we can fix it by extracting fs_info pointer before running the 
work, and using the extracted one in the trace point.


Thanks,
Qu





But I still don't understand why backtrace is triggered.
Since we're just recording a pointer, not touching it.

Would you please explain the problem with more details on how it trigger the
problem?


enabled all events played with the fs which was just an upgrade and git
tree sync + checkout so nothing special.



So I think we should either remove the tracepoint completely or change
the arguments to take something else than a potentially freed 'work'.


I'm mostly OK to remove the tracepoint, but such all_workd_done() trace
should still help to determine if it's a workqueue stalled.

Thanks,
Qu


Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







Re: [PATCH 1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()

2016-12-20 Thread Qu Wenruo



At 12/21/2016 01:26 AM, David Sterba wrote:

Adding Qu to CC,

On Wed, Dec 14, 2016 at 03:05:29PM +0100, Sebastian Andrzej Siewior wrote:

For btrfs_scrubparity_helper() the ->func() is set to
scrub_parity_bio_endio_worker(). This functions invokes
scrub_free_parity() which kfrees() the `work' object. All is good as
long as trace events are not enabled because we boom with a backtrace
like this:
| Workqueue: btrfs-endio btrfs_endio_helper
| RIP: 0010:[]  [] 
trace_event_raw_event_btrfs__work__done+0x4e/0xa0
| Call Trace:
|  [] btrfs_scrubparity_helper+0x59d/0x780
|  [] btrfs_endio_helper+0x9/0x10
|  [] process_one_work+0x26e/0x7b0
|  [] worker_thread+0x46/0x560
|  [] kthread+0xee/0x110
|  [] ret_from_fork+0x2a/0x40

So in order to avoid this, I remove the trace point.

Signed-off-by: Sebastian Andrzej Siewior 
---
 fs/btrfs/async-thread.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index e0f071f6b5a7..d0dfc3d2e199 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -318,8 +318,6 @@ static void normal_work_helper(struct btrfs_work *work)
set_bit(WORK_DONE_BIT, >flags);
run_ordered_work(wq);
}
-   if (!need_order)
-   trace_btrfs_all_work_done(work);


The comment in the function says we can't touch 'work' after the
callbacks. I don't see any way to use it in a tracepoint here. The
"all_work_done" pairs with a preceding trace_btrfs_work_sched in the
same function or from within run_ordered_work, also called after the
free callback.


The trace point only uses the pointer, and this helps us to pair with 
btrfs_work_queued/sched.


But I still don't understand why backtrace is triggered.
Since we're just recording a pointer, not touching it.

Would you please explain the problem with more details on how it trigger 
the problem?




So I think we should either remove the tracepoint completely or change
the arguments to take something else than a potentially freed 'work'.


I'm mostly OK to remove the tracepoint, but such all_workd_done() trace 
should still help to determine if it's a workqueue stalled.


Thanks,
Qu



I'm a bit puzzled by the comment in trace/events/btrfs.h

http://lxr.free-electrons.com/source/include/trace/events/btrfs.h#L1165

/* For situiations that the work is freed */
DECLARE_EVENT_CLASS(btrfs__work__done,

so we're expecing a freed pointer anyway? That sounds wrong.

I'll queue the patch for 4.10 as it fixes a crash.







Re: [PATCH 1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()

2016-12-20 Thread Qu Wenruo



At 12/21/2016 01:26 AM, David Sterba wrote:

Adding Qu to CC,

On Wed, Dec 14, 2016 at 03:05:29PM +0100, Sebastian Andrzej Siewior wrote:

For btrfs_scrubparity_helper() the ->func() is set to
scrub_parity_bio_endio_worker(). This functions invokes
scrub_free_parity() which kfrees() the `work' object. All is good as
long as trace events are not enabled because we boom with a backtrace
like this:
| Workqueue: btrfs-endio btrfs_endio_helper
| RIP: 0010:[]  [] 
trace_event_raw_event_btrfs__work__done+0x4e/0xa0
| Call Trace:
|  [] btrfs_scrubparity_helper+0x59d/0x780
|  [] btrfs_endio_helper+0x9/0x10
|  [] process_one_work+0x26e/0x7b0
|  [] worker_thread+0x46/0x560
|  [] kthread+0xee/0x110
|  [] ret_from_fork+0x2a/0x40

So in order to avoid this, I remove the trace point.

Signed-off-by: Sebastian Andrzej Siewior 
---
 fs/btrfs/async-thread.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index e0f071f6b5a7..d0dfc3d2e199 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -318,8 +318,6 @@ static void normal_work_helper(struct btrfs_work *work)
set_bit(WORK_DONE_BIT, >flags);
run_ordered_work(wq);
}
-   if (!need_order)
-   trace_btrfs_all_work_done(work);


The comment in the function says we can't touch 'work' after the
callbacks. I don't see any way to use it in a tracepoint here. The
"all_work_done" pairs with a preceding trace_btrfs_work_sched in the
same function or from within run_ordered_work, also called after the
free callback.


The trace point only uses the pointer, and this helps us to pair with 
btrfs_work_queued/sched.


But I still don't understand why backtrace is triggered.
Since we're just recording a pointer, not touching it.

Would you please explain the problem with more details on how it trigger 
the problem?




So I think we should either remove the tracepoint completely or change
the arguments to take something else than a potentially freed 'work'.


I'm mostly OK to remove the tracepoint, but such all_workd_done() trace 
should still help to determine if it's a workqueue stalled.


Thanks,
Qu



I'm a bit puzzled by the comment in trace/events/btrfs.h

http://lxr.free-electrons.com/source/include/trace/events/btrfs.h#L1165

/* For situiations that the work is freed */
DECLARE_EVENT_CLASS(btrfs__work__done,

so we're expecing a freed pointer anyway? That sounds wrong.

I'll queue the patch for 4.10 as it fixes a crash.







Re: [PATCH] f2fs: support multiple devices

2016-11-09 Thread Qu Wenruo



At 11/10/2016 06:57 AM, Andreas Dilger wrote:

On Nov 9, 2016, at 1:56 PM, Jaegeuk Kim  wrote:


This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block
allocation and background GC policy to boost IO speed by exploiting them
accoording to each device speed.


How will you integrate this into FIEMAP, since it is now possible if a
file is split across multiple devices then it will return ambiguous block
numbers for a file.  I've been meaning to merge the FIEMAP handling in
Lustre to support multiple devices in a single filesystem, so that this
can be detected in userspace.

struct ll_fiemap_extent {
__u64 fe_logical;  /* logical offset in bytes for the start of
* the extent from the beginning of the file
*/
__u64 fe_physical; /* physical offset in bytes for the start
* of the extent from the beginning of the disk
*/
__u64 fe_length;   /* length in bytes for this extent */
__u64 fe_reserved64[2];
__u32 fe_flags;/* FIEMAP_EXTENT_* flags for this extent */
__u32 fe_device;   /* device number for this extent */
__u32 fe_reserved[2];
};


Btrfs introduce a new layer for multi-device (even for single device).

So fiemap returned by btrfs is never real device bytenr, but logical 
address in btrfs logical address space.

Much like traditional soft RAID.



This adds the 32-bit "fe_device" field, which would optionally be filled
in by the filesystem (zero otherwise).  It would return the kernel device
number (i.e. st_dev), or for network filesystem (with FIEMAP_EXTENT_NET
set) this could just return an integer device number since the device
number is meaningless (and may conflict) on a remote system.

Since AFAIK Btrfs also has multiple device support there are an increasing
number of places where this would be useful.


AFAIK, btrfs multi-device is here due to scrub with its data/meta csum.

Unlike device-mapper based multi-device, btrfs has csum so it can detect 
which mirror is correct.

This makes btrfs scrub a little better than soft raid.
For example, for RAID1 if two mirror differs from each other, btrfs can 
find the correct one and rewrite it into the other mirror.


And further more, btrfs supports snapshot and is faster than 
device-mapper based snapshot(LVM).
This makes it a little more worthy to implement multi-device support in 
btrfs.



But for f2fs, no data csum, no snapshot.
I don't really see the point to use so many codes to implement it, 
especially we can use mdadm or LVM to implement it.



Not to mention btrfs multi-device support still has quite a lot of bugs, 
like scrub can corrupt correct data stripes.


Personally speaking, I am not a fan of btrfs multi-device management, 
despite the above advantage.

As the complexity is really not worthy.
(So I think XFS with LVM is much better than Btrfs considering the 
stability)


Thanks,
Qu


Cheers, Andreas



Signed-off-by: Jaegeuk Kim 
---
fs/f2fs/data.c  |  55 ---
fs/f2fs/f2fs.h  |  29 --
fs/f2fs/segment.c   | 119 +
fs/f2fs/super.c | 138 ++--
include/linux/f2fs_fs.h |  10 +++-
5 files changed, 277 insertions(+), 74 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 47ded0c..e2be24e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -88,6 +88,46 @@ static void f2fs_write_end_io(struct bio *bio)
}

/*
+ * Return true, if pre_bio's bdev is same as its target device.
+ */
+struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
+   block_t blk_addr, struct bio *bio)
+{
+   struct block_device *bdev = sbi->sb->s_bdev;
+   int i;
+
+   for (i = 0; i < sbi->s_ndevs; i++) {
+   if (FDEV(i).start_blk <= blk_addr &&
+   FDEV(i).end_blk >= blk_addr) {
+   blk_addr -= FDEV(i).start_blk;
+   bdev = FDEV(i).bdev;
+   break;
+   }
+   }
+   if (bio) {
+   bio->bi_bdev = bdev;
+   bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blk_addr);
+   }
+   return bdev;
+}
+
+int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr)
+{
+   int i;
+
+   for (i = 0; i < sbi->s_ndevs; i++)
+   if (FDEV(i).start_blk <= blkaddr && FDEV(i).end_blk >= blkaddr)
+   return i;
+   return 0;
+}
+
+static bool __same_bdev(struct f2fs_sb_info *sbi,
+   block_t blk_addr, struct bio *bio)
+{
+   return f2fs_target_device(sbi, blk_addr, NULL) == bio->bi_bdev;
+}

Re: [PATCH] f2fs: support multiple devices

2016-11-09 Thread Qu Wenruo



At 11/10/2016 06:57 AM, Andreas Dilger wrote:

On Nov 9, 2016, at 1:56 PM, Jaegeuk Kim  wrote:


This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block
allocation and background GC policy to boost IO speed by exploiting them
accoording to each device speed.


How will you integrate this into FIEMAP, since it is now possible if a
file is split across multiple devices then it will return ambiguous block
numbers for a file.  I've been meaning to merge the FIEMAP handling in
Lustre to support multiple devices in a single filesystem, so that this
can be detected in userspace.

struct ll_fiemap_extent {
__u64 fe_logical;  /* logical offset in bytes for the start of
* the extent from the beginning of the file
*/
__u64 fe_physical; /* physical offset in bytes for the start
* of the extent from the beginning of the disk
*/
__u64 fe_length;   /* length in bytes for this extent */
__u64 fe_reserved64[2];
__u32 fe_flags;/* FIEMAP_EXTENT_* flags for this extent */
__u32 fe_device;   /* device number for this extent */
__u32 fe_reserved[2];
};


Btrfs introduce a new layer for multi-device (even for single device).

So fiemap returned by btrfs is never real device bytenr, but logical 
address in btrfs logical address space.

Much like traditional soft RAID.



This adds the 32-bit "fe_device" field, which would optionally be filled
in by the filesystem (zero otherwise).  It would return the kernel device
number (i.e. st_dev), or for network filesystem (with FIEMAP_EXTENT_NET
set) this could just return an integer device number since the device
number is meaningless (and may conflict) on a remote system.

Since AFAIK Btrfs also has multiple device support there are an increasing
number of places where this would be useful.


AFAIK, btrfs multi-device is here due to scrub with its data/meta csum.

Unlike device-mapper based multi-device, btrfs has csum so it can detect 
which mirror is correct.

This makes btrfs scrub a little better than soft raid.
For example, for RAID1 if two mirror differs from each other, btrfs can 
find the correct one and rewrite it into the other mirror.


And further more, btrfs supports snapshot and is faster than 
device-mapper based snapshot(LVM).
This makes it a little more worthy to implement multi-device support in 
btrfs.



But for f2fs, no data csum, no snapshot.
I don't really see the point to use so many codes to implement it, 
especially we can use mdadm or LVM to implement it.



Not to mention btrfs multi-device support still has quite a lot of bugs, 
like scrub can corrupt correct data stripes.


Personally speaking, I am not a fan of btrfs multi-device management, 
despite the above advantage.

As the complexity is really not worthy.
(So I think XFS with LVM is much better than Btrfs considering the 
stability)


Thanks,
Qu


Cheers, Andreas



Signed-off-by: Jaegeuk Kim 
---
fs/f2fs/data.c  |  55 ---
fs/f2fs/f2fs.h  |  29 --
fs/f2fs/segment.c   | 119 +
fs/f2fs/super.c | 138 ++--
include/linux/f2fs_fs.h |  10 +++-
5 files changed, 277 insertions(+), 74 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 47ded0c..e2be24e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -88,6 +88,46 @@ static void f2fs_write_end_io(struct bio *bio)
}

/*
+ * Return true, if pre_bio's bdev is same as its target device.
+ */
+struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
+   block_t blk_addr, struct bio *bio)
+{
+   struct block_device *bdev = sbi->sb->s_bdev;
+   int i;
+
+   for (i = 0; i < sbi->s_ndevs; i++) {
+   if (FDEV(i).start_blk <= blk_addr &&
+   FDEV(i).end_blk >= blk_addr) {
+   blk_addr -= FDEV(i).start_blk;
+   bdev = FDEV(i).bdev;
+   break;
+   }
+   }
+   if (bio) {
+   bio->bi_bdev = bdev;
+   bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blk_addr);
+   }
+   return bdev;
+}
+
+int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr)
+{
+   int i;
+
+   for (i = 0; i < sbi->s_ndevs; i++)
+   if (FDEV(i).start_blk <= blkaddr && FDEV(i).end_blk >= blkaddr)
+   return i;
+   return 0;
+}
+
+static bool __same_bdev(struct f2fs_sb_info *sbi,
+   block_t blk_addr, struct bio *bio)
+{
+   return f2fs_target_device(sbi, blk_addr, NULL) == bio->bi_bdev;
+}
+
+/*
 * Low-level block read/write IO 

Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-08 Thread Qu Wenruo


At 09/05/2016 09:19 AM, Zhao Lei wrote:
> Hi, Sean Fu
>
>> From: Sean Fu [mailto:fxinr...@gmail.com]
>> Sent: Sunday, September 04, 2016 7:54 PM
>> To: dste...@suse.com
>> Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
>> zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
>> linux-kernel@vger.kernel.org; Sean Fu 
>> Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
>> btrfs_read_chunk_tree.
>>
>> The input argument root is already set with "fs_info->chunk_root".
>> "chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
>> "open_ctree".
>> “root->fs_info = fs_info” in "btrfs_alloc_root".
>>
> The root argument of this function means "any root".
> And the function is designed getting chunk root from
> "any root" in head.
>
> Since there is only one caller of this function,
> and the caller always send chunk_root as root argument in
> current code, we can remove above conversion,
> and I suggest renaming root to chunk_root to make it clear,
> something like:
>
> - btrfs_read_chunk_tree(struct btrfs_root *root)
> + btrfs_read_chunk_tree(struct btrfs_root *chunk_root)

Since root is only used to get fs_info->chunk_root, why not use fs_info 
directly?

Thanks,
Qu

>
> Thanks
> Zhaolei
>
>> Signed-off-by: Sean Fu 
>> ---
>>  fs/btrfs/volumes.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 366b335..384a6d2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
>>  int ret;
>>  int slot;
>>
>> -root = root->fs_info->chunk_root;
>> -
>>  path = btrfs_alloc_path();
>>  if (!path)
>>  return -ENOMEM;
>> --
>> 2.6.2
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>





Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-08 Thread Qu Wenruo


At 09/05/2016 09:19 AM, Zhao Lei wrote:
> Hi, Sean Fu
>
>> From: Sean Fu [mailto:fxinr...@gmail.com]
>> Sent: Sunday, September 04, 2016 7:54 PM
>> To: dste...@suse.com
>> Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
>> zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
>> linux-kernel@vger.kernel.org; Sean Fu 
>> Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
>> btrfs_read_chunk_tree.
>>
>> The input argument root is already set with "fs_info->chunk_root".
>> "chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
>> "open_ctree".
>> “root->fs_info = fs_info” in "btrfs_alloc_root".
>>
> The root argument of this function means "any root".
> And the function is designed getting chunk root from
> "any root" in head.
>
> Since there is only one caller of this function,
> and the caller always send chunk_root as root argument in
> current code, we can remove above conversion,
> and I suggest renaming root to chunk_root to make it clear,
> something like:
>
> - btrfs_read_chunk_tree(struct btrfs_root *root)
> + btrfs_read_chunk_tree(struct btrfs_root *chunk_root)

Since root is only used to get fs_info->chunk_root, why not use fs_info 
directly?

Thanks,
Qu

>
> Thanks
> Zhaolei
>
>> Signed-off-by: Sean Fu 
>> ---
>>  fs/btrfs/volumes.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 366b335..384a6d2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
>>  int ret;
>>  int slot;
>>
>> -root = root->fs_info->chunk_root;
>> -
>>  path = btrfs_alloc_path();
>>  if (!path)
>>  return -ENOMEM;
>> --
>> 2.6.2
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>





Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-06 Thread Qu Wenruo



At 09/07/2016 09:38 AM, Sean Fu wrote:

On Mon, Sep 05, 2016 at 03:56:41PM +0800, Qu Wenruo wrote:



At 09/05/2016 09:19 AM, Zhao Lei wrote:

Hi, Sean Fu


From: Sean Fu [mailto:fxinr...@gmail.com]
Sent: Sunday, September 04, 2016 7:54 PM
To: dste...@suse.com
Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
linux-kernel@vger.kernel.org; Sean Fu <fxinr...@gmail.com>
Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
btrfs_read_chunk_tree.

The input argument root is already set with "fs_info->chunk_root".
"chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
"open_ctree".
“root->fs_info = fs_info” in "btrfs_alloc_root".


The root argument of this function means "any root".
And the function is designed getting chunk root from
"any root" in head.

Since there is only one caller of this function,
and the caller always send chunk_root as root argument in
current code, we can remove above conversion,
and I suggest renaming root to chunk_root to make it clear,
something like:

- btrfs_read_chunk_tree(struct btrfs_root *root)
+ btrfs_read_chunk_tree(struct btrfs_root *chunk_root)


Since root is only used to get fs_info->chunk_root, why not use fs_info
directly?

Sorry for late reply.
chunk_root is processed in btrfs_read_chunk_tree.
Why should we pass fs_info directly to btrfs_read_chunk_tree?
Could you give me more detail?

Many thanks


Normally we should only pass btrfs_root as parameter if it's a 
file/log/relocation tree which can't be grabbed directly from fs_info.


For system wide trees, which are already in fs_info, like 
fs_info->extent_root/chunk_root/..., we should pass fs_info.


Which is much much safer than passing a btrfs_root.
Careless caller can pass wrong tree and cause undefined behavior.

And such behavior makes caller more aware of what they really want to do.
Cases like just to grab sectorsize/nodesize shouldn't need a full 
btrfs_root.

(Jeff's patchset has already done such things quite well)

Thanks,
Qu



Thanks,
Qu



Thanks
Zhaolei


Signed-off-by: Sean Fu <fxinr...@gmail.com>
---
fs/btrfs/volumes.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..384a6d2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
int ret;
int slot;

-   root = root->fs_info->chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
2.6.2






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-06 Thread Qu Wenruo



At 09/07/2016 09:38 AM, Sean Fu wrote:

On Mon, Sep 05, 2016 at 03:56:41PM +0800, Qu Wenruo wrote:



At 09/05/2016 09:19 AM, Zhao Lei wrote:

Hi, Sean Fu


From: Sean Fu [mailto:fxinr...@gmail.com]
Sent: Sunday, September 04, 2016 7:54 PM
To: dste...@suse.com
Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
linux-kernel@vger.kernel.org; Sean Fu 
Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
btrfs_read_chunk_tree.

The input argument root is already set with "fs_info->chunk_root".
"chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
"open_ctree".
“root->fs_info = fs_info” in "btrfs_alloc_root".


The root argument of this function means "any root".
And the function is designed getting chunk root from
"any root" in head.

Since there is only one caller of this function,
and the caller always send chunk_root as root argument in
current code, we can remove above conversion,
and I suggest renaming root to chunk_root to make it clear,
something like:

- btrfs_read_chunk_tree(struct btrfs_root *root)
+ btrfs_read_chunk_tree(struct btrfs_root *chunk_root)


Since root is only used to get fs_info->chunk_root, why not use fs_info
directly?

Sorry for late reply.
chunk_root is processed in btrfs_read_chunk_tree.
Why should we pass fs_info directly to btrfs_read_chunk_tree?
Could you give me more detail?

Many thanks


Normally we should only pass btrfs_root as parameter if it's a 
file/log/relocation tree which can't be grabbed directly from fs_info.


For system wide trees, which are already in fs_info, like 
fs_info->extent_root/chunk_root/..., we should pass fs_info.


Which is much much safer than passing a btrfs_root.
Careless caller can pass wrong tree and cause undefined behavior.

And such behavior makes caller more aware of what they really want to do.
Cases like just to grab sectorsize/nodesize shouldn't need a full 
btrfs_root.

(Jeff's patchset has already done such things quite well)

Thanks,
Qu



Thanks,
Qu



Thanks
Zhaolei


Signed-off-by: Sean Fu 
---
fs/btrfs/volumes.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..384a6d2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
int ret;
int slot;

-   root = root->fs_info->chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
2.6.2






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-05 Thread Qu Wenruo



At 09/05/2016 09:19 AM, Zhao Lei wrote:

Hi, Sean Fu


From: Sean Fu [mailto:fxinr...@gmail.com]
Sent: Sunday, September 04, 2016 7:54 PM
To: dste...@suse.com
Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
linux-kernel@vger.kernel.org; Sean Fu 
Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
btrfs_read_chunk_tree.

The input argument root is already set with "fs_info->chunk_root".
"chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
"open_ctree".
“root->fs_info = fs_info” in "btrfs_alloc_root".


The root argument of this function means "any root".
And the function is designed getting chunk root from
"any root" in head.

Since there is only one caller of this function,
and the caller always send chunk_root as root argument in
current code, we can remove above conversion,
and I suggest renaming root to chunk_root to make it clear,
something like:

- btrfs_read_chunk_tree(struct btrfs_root *root)
+ btrfs_read_chunk_tree(struct btrfs_root *chunk_root)


Since root is only used to get fs_info->chunk_root, why not use fs_info 
directly?


Thanks,
Qu



Thanks
Zhaolei


Signed-off-by: Sean Fu 
---
 fs/btrfs/volumes.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..384a6d2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
int ret;
int slot;

-   root = root->fs_info->chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
2.6.2






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






Re: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in btrfs_read_chunk_tree.

2016-09-05 Thread Qu Wenruo



At 09/05/2016 09:19 AM, Zhao Lei wrote:

Hi, Sean Fu


From: Sean Fu [mailto:fxinr...@gmail.com]
Sent: Sunday, September 04, 2016 7:54 PM
To: dste...@suse.com
Cc: c...@fb.com; anand.j...@oracle.com; fdman...@suse.com;
zhao...@cn.fujitsu.com; linux-bt...@vger.kernel.org;
linux-kernel@vger.kernel.org; Sean Fu 
Subject: [PATCH] Btrfs: remove unnecessary code of chunk_root assignment in
btrfs_read_chunk_tree.

The input argument root is already set with "fs_info->chunk_root".
"chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info)" in caller
"open_ctree".
“root->fs_info = fs_info” in "btrfs_alloc_root".


The root argument of this function means "any root".
And the function is designed getting chunk root from
"any root" in head.

Since there is only one caller of this function,
and the caller always send chunk_root as root argument in
current code, we can remove above conversion,
and I suggest renaming root to chunk_root to make it clear,
something like:

- btrfs_read_chunk_tree(struct btrfs_root *root)
+ btrfs_read_chunk_tree(struct btrfs_root *chunk_root)


Since root is only used to get fs_info->chunk_root, why not use fs_info 
directly?


Thanks,
Qu



Thanks
Zhaolei


Signed-off-by: Sean Fu 
---
 fs/btrfs/volumes.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..384a6d2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6600,8 +6600,6 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
int ret;
int slot;

-   root = root->fs_info->chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
2.6.2






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






Re: use-after-free in perf_trace_btrfs__work

2016-01-21 Thread Qu Wenruo



Chris Mason wrote on 2016/01/21 12:06 -0500:

On Thu, Jan 14, 2016 at 10:07:31PM -0500, Dave Jones wrote:

I just hit a bunch of instances of this spew..
This is on Linus' tree from a few hours ago

==
BUG: KASAN: use-after-free in perf_trace_btrfs__work+0x1b1/0x2a0 [btrfs] at 
addr 8800b7ea2e60
Read of size 8 by task trinity-c14/6745
=
BUG kmalloc-256 (Not tainted): kasan: bad access detected
-

Disabling lock debugging due to kernel taint
INFO: Allocated in btrfs_wq_submit_bio+0xd1/0x300 [btrfs] age=63 cpu=1 pid=6745
___slab_alloc.constprop.70+0x4de/0x580
__slab_alloc.isra.67.constprop.69+0x48/0x80
kmem_cache_alloc_trace+0x24c/0x2e0
btrfs_wq_submit_bio+0xd1/0x300 [btrfs]
btrfs_submit_bio_hook+0x118/0x260 [btrfs]
neigh_sysctl_register+0x201/0x360
devinet_sysctl_register+0x73/0xe0
inetdev_init+0x119/0x1f0
inetdev_event+0x5b3/0x7e0
notifier_call_chain+0x4e/0xd0
raw_notifier_call_chain+0x16/0x20
call_netdevice_notifiers_info+0x3d/0x70
register_netdevice+0x62d/0x730
register_netdev+0x1a/0x30
loopback_net_init+0x5d/0xd0
ops_init+0x5b/0x1e0
INFO: Freed in run_one_async_free+0x12/0x20 [btrfs] age=177 cpu=1 pid=8018
__slab_free+0x19e/0x2d0
kfree+0x24e/0x270
run_one_async_free+0x12/0x20 [btrfs]
btrfs_scrubparity_helper+0x38d/0x740 [btrfs]
btrfs_worker_helper+0xe/0x10 [btrfs]
process_one_work+0x417/0xa40
worker_thread+0x8b/0x730
kthread+0x199/0x1c0
ret_from_fork+0x3f/0x70
INFO: Slab 0xea0002dfa800 objects=28 used=28 fp=0x  (null) 
flags=0x40004080
INFO: Object 0x8800b7ea2da0 @offset=11680 fp=0x8800b7ea2480


static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
   struct btrfs_work *work)
{
 unsigned long flags;

 work->wq = wq;
 thresh_queue_hook(wq);
 if (work->ordered_func) {
 spin_lock_irqsave(>list_lock, flags);
 list_add_tail(>ordered_list, >ordered_list);
 spin_unlock_irqrestore(>list_lock, flags);
 }
 queue_work(wq->normal_wq, >normal_work);
 trace_btrfs_work_queued(work);
}

Qu, 'work' can be freed before queue_work returns.  I don't see any reason
here to have it after the queue_work() call, do you?

-chris



Right, trace_btrfs_work_queued() should be called at the very beginning.

I'll submit the fix soon.

Thanks,
Qu




Re: use-after-free in perf_trace_btrfs__work

2016-01-21 Thread Qu Wenruo



Chris Mason wrote on 2016/01/21 12:06 -0500:

On Thu, Jan 14, 2016 at 10:07:31PM -0500, Dave Jones wrote:

I just hit a bunch of instances of this spew..
This is on Linus' tree from a few hours ago

==
BUG: KASAN: use-after-free in perf_trace_btrfs__work+0x1b1/0x2a0 [btrfs] at 
addr 8800b7ea2e60
Read of size 8 by task trinity-c14/6745
=
BUG kmalloc-256 (Not tainted): kasan: bad access detected
-

Disabling lock debugging due to kernel taint
INFO: Allocated in btrfs_wq_submit_bio+0xd1/0x300 [btrfs] age=63 cpu=1 pid=6745
___slab_alloc.constprop.70+0x4de/0x580
__slab_alloc.isra.67.constprop.69+0x48/0x80
kmem_cache_alloc_trace+0x24c/0x2e0
btrfs_wq_submit_bio+0xd1/0x300 [btrfs]
btrfs_submit_bio_hook+0x118/0x260 [btrfs]
neigh_sysctl_register+0x201/0x360
devinet_sysctl_register+0x73/0xe0
inetdev_init+0x119/0x1f0
inetdev_event+0x5b3/0x7e0
notifier_call_chain+0x4e/0xd0
raw_notifier_call_chain+0x16/0x20
call_netdevice_notifiers_info+0x3d/0x70
register_netdevice+0x62d/0x730
register_netdev+0x1a/0x30
loopback_net_init+0x5d/0xd0
ops_init+0x5b/0x1e0
INFO: Freed in run_one_async_free+0x12/0x20 [btrfs] age=177 cpu=1 pid=8018
__slab_free+0x19e/0x2d0
kfree+0x24e/0x270
run_one_async_free+0x12/0x20 [btrfs]
btrfs_scrubparity_helper+0x38d/0x740 [btrfs]
btrfs_worker_helper+0xe/0x10 [btrfs]
process_one_work+0x417/0xa40
worker_thread+0x8b/0x730
kthread+0x199/0x1c0
ret_from_fork+0x3f/0x70
INFO: Slab 0xea0002dfa800 objects=28 used=28 fp=0x  (null) 
flags=0x40004080
INFO: Object 0x8800b7ea2da0 @offset=11680 fp=0x8800b7ea2480


static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
   struct btrfs_work *work)
{
 unsigned long flags;

 work->wq = wq;
 thresh_queue_hook(wq);
 if (work->ordered_func) {
 spin_lock_irqsave(>list_lock, flags);
 list_add_tail(>ordered_list, >ordered_list);
 spin_unlock_irqrestore(>list_lock, flags);
 }
 queue_work(wq->normal_wq, >normal_work);
 trace_btrfs_work_queued(work);
}

Qu, 'work' can be freed before queue_work returns.  I don't see any reason
here to have it after the queue_work() call, do you?

-chris



Right, trace_btrfs_work_queued() should be called at the very beginning.

I'll submit the fix soon.

Thanks,
Qu




Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



David Sterba wrote on 2015/12/30 17:17 +0100:

On Wed, Dec 30, 2015 at 10:10:44PM +0800, Qu Wenruo wrote:

Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)

If provide some configurable features, then it should be able to be
tuned at both right time and mkfs time.
Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe
length doesn't really affect much except RAID5/6)


I think that we need configurable stripe size regardless. The
performance drop is measurable if the stripe size used by filesystem
does not match the hardware.


Right, I just missed the benchmark from Christoph and forgot the case of 
RAID 5/6.





I totally understand that implement will cost you a lot of more time,
not only kernel part but also user-tool part.

But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more
patches (except the more time spent) are always good.

More patches, more reputation built in community, and more patches also
means better split code structures for easier review.


Let me note that a good reputation is also built from patch reviews
(hint hint).


I must admit I'm a bad reviewer.
As when I review something, I always has an eager to rewrite part or all 
the patch to follow my idea, even it's just a choice between different 
design.


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 05:54 PM, Sanidhya Solanki wrote:

On Wed, 30 Dec 2015 19:59:16 +0800
Qu Wenruo  wrote:

Not really sure about the difference between 2 and 3.


I should have made it clear before, I was asking the exact use case in
mind when listing the choices. Option 2 would be for SysAdmins running
production software and configuring it as they desire.
Option 3 is what we have in the Kernel now, before my patch, where the
option exists, but it is fixed by the code. You can change it, but you
need to be someone fairly involved in the upstream work (like a
distribution Maintainer). This is what my patch implements (well, this
and option 3).
Option 1 leaves it as a compile time option.


When you mention runtime option, did you mean ioctl/mount/balance
convert option?


Yes, that is correct.


And what's the third one? Default mkfs time option?
If you can make it mkfs time option, it won't be really hard to make
it configurable.


This would be ideal for all use-cases, but make the implementation
much larger than it would be for the other options. Hence, I asked
what the exact use case was for the end-user being targeted.


I didn't consider David means something that.
As far as I read, he means balance convert option along with mkfs
option.


Hence, why I asked.


At least from what I have learned in recent btrfs development,

He> He> either

we provide a good enough interfaces (normally, balance convert ioctl
with mkfs time option) to configure some on-disk fields.


Just confirming before starting the implementation.

So fixed kernel value is not a really good idea, and should at least
be replace by mkfs time option.


Will do after confirmation.


Understood now.

Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)

If provide some configurable features, then it should be able to be 
tuned at both right time and mkfs time.

Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe 
length doesn't really affect much except RAID5/6)



I totally understand that implement will cost you a lot of more time, 
not only kernel part but also user-tool part.


But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more 
patches (except the more time spent) are always good.


More patches, more reputation built in community, and more patches also 
means better split code structures for easier review.

And also you will need to do more debugging/tests, to polish your skill.

Thanks,
Qu



Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 02:39 PM, Sanidhya Solanki wrote:

On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba  wrote:


So you want to make the stripe size configurable?...


As I see it there are 3 ways to do it:
-Make it a compile time option that only configures it for a single
system with any devices that are added to the RAID.
-Make it a runtime option that can change based on how the
administrator configures it.
-A non-user facing option that is configurable by someone like a
distribution maintainer for all systems using the Binary Distribution.


Not really sure about the difference between 2 and 3.

When you mention runtime option, did you mean ioctl/mount/balance 
convert option?


And what's the third one? Default mkfs time option?

If you can make it mkfs time option, it won't be really hard to make it 
configurable.




As I see it, DS would like something like the third option, but CAM
(ostensibly a SysAdmin) wants the second option.


I didn't consider David means something that.

As far as I read, he means balance convert option along with mkfs option.



On the other hand, I implemented the first option.


At least from what I have learned in recent btrfs development, either we 
provide a good enough interfaces (normally, balance convert ioctl with 
mkfs time option) to configure some on-disk fields.


Or we just leave it to fixed value(normally 0, just like for encryption 
of EXTENT_DATA, and that's the case for current stripe_size).


So fixed kernel value is not a really good idea, and should at least be 
replace by mkfs time option.




The first and third option can co-exit, the second is an orthogonal
target that needs to be setup separately.

Or we can make all options co-exist, but make it more complicated.


No need.
Just refer to how btrfs kernel handle chunk profile.

It can be specified at mkfs time (by -d and -m options), and can also be 
converted later by balance ioctl. (by btrfs balance convert filter).


The only tricky thing I am a little considered about is, how do we keep 
the default chunk stripe size for a fs.


Thanks,
Qu


Please let me know which implementation is preferable, and, if you just
want me to expand the description (as DS' mail asked for) or redo the
entire setup.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



David Sterba wrote on 2015/12/30 17:17 +0100:

On Wed, Dec 30, 2015 at 10:10:44PM +0800, Qu Wenruo wrote:

Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)

If provide some configurable features, then it should be able to be
tuned at both right time and mkfs time.
Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe
length doesn't really affect much except RAID5/6)


I think that we need configurable stripe size regardless. The
performance drop is measurable if the stripe size used by filesystem
does not match the hardware.


Right, I just missed the benchmark from Christoph and forgot the case of 
RAID 5/6.





I totally understand that implement will cost you a lot of more time,
not only kernel part but also user-tool part.

But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more
patches (except the more time spent) are always good.

More patches, more reputation built in community, and more patches also
means better split code structures for easier review.


Let me note that a good reputation is also built from patch reviews
(hint hint).


I must admit I'm a bad reviewer.
As when I review something, I always has an eager to rewrite part or all 
the patch to follow my idea, even it's just a choice between different 
design.


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 05:54 PM, Sanidhya Solanki wrote:

On Wed, 30 Dec 2015 19:59:16 +0800
Qu Wenruo <quwenruo.bt...@gmx.com> wrote:

Not really sure about the difference between 2 and 3.


I should have made it clear before, I was asking the exact use case in
mind when listing the choices. Option 2 would be for SysAdmins running
production software and configuring it as they desire.
Option 3 is what we have in the Kernel now, before my patch, where the
option exists, but it is fixed by the code. You can change it, but you
need to be someone fairly involved in the upstream work (like a
distribution Maintainer). This is what my patch implements (well, this
and option 3).
Option 1 leaves it as a compile time option.


When you mention runtime option, did you mean ioctl/mount/balance
convert option?


Yes, that is correct.


And what's the third one? Default mkfs time option?
If you can make it mkfs time option, it won't be really hard to make
it configurable.


This would be ideal for all use-cases, but make the implementation
much larger than it would be for the other options. Hence, I asked
what the exact use case was for the end-user being targeted.


I didn't consider David means something that.
As far as I read, he means balance convert option along with mkfs
option.


Hence, why I asked.


At least from what I have learned in recent btrfs development,

He> He> either

we provide a good enough interfaces (normally, balance convert ioctl
with mkfs time option) to configure some on-disk fields.


Just confirming before starting the implementation.

So fixed kernel value is not a really good idea, and should at least
be replace by mkfs time option.


Will do after confirmation.


Understood now.

Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)

If provide some configurable features, then it should be able to be 
tuned at both right time and mkfs time.

Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe 
length doesn't really affect much except RAID5/6)



I totally understand that implement will cost you a lot of more time, 
not only kernel part but also user-tool part.


But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more 
patches (except the more time spent) are always good.


More patches, more reputation built in community, and more patches also 
means better split code structures for easier review.

And also you will need to do more debugging/tests, to polish your skill.

Thanks,
Qu



Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 02:39 PM, Sanidhya Solanki wrote:

On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba  wrote:


So you want to make the stripe size configurable?...


As I see it there are 3 ways to do it:
-Make it a compile time option that only configures it for a single
system with any devices that are added to the RAID.
-Make it a runtime option that can change based on how the
administrator configures it.
-A non-user facing option that is configurable by someone like a
distribution maintainer for all systems using the Binary Distribution.


Not really sure about the difference between 2 and 3.

When you mention runtime option, did you mean ioctl/mount/balance 
convert option?


And what's the third one? Default mkfs time option?

If you can make it mkfs time option, it won't be really hard to make it 
configurable.




As I see it, DS would like something like the third option, but CAM
(ostensibly a SysAdmin) wants the second option.


I didn't consider David means something that.

As far as I read, he means balance convert option along with mkfs option.



On the other hand, I implemented the first option.


At least from what I have learned in recent btrfs development, either we 
provide a good enough interfaces (normally, balance convert ioctl with 
mkfs time option) to configure some on-disk fields.


Or we just leave it to fixed value(normally 0, just like for encryption 
of EXTENT_DATA, and that's the case for current stripe_size).


So fixed kernel value is not a really good idea, and should at least be 
replace by mkfs time option.




The first and third option can co-exit, the second is an orthogonal
target that needs to be setup separately.

Or we can make all options co-exist, but make it more complicated.


No need.
Just refer to how btrfs kernel handle chunk profile.

It can be specified at mkfs time (by -d and -m options), and can also be 
converted later by balance ioctl. (by btrfs balance convert filter).


The only tricky thing I am a little considered about is, how do we keep 
the default chunk stripe size for a fs.


Thanks,
Qu


Please let me know which implementation is preferable, and, if you just
want me to expand the description (as DS' mail asked for) or redo the
entire setup.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] btrfs: Remove unneeded cast to s64 for qgroup rfer state

2015-08-31 Thread Qu Wenruo



Alexandru Moise wrote on 2015/08/31 09:32 +0300:

On Mon, Aug 31, 2015 at 09:44:49AM +0800, Qu Wenruo wrote:

>From the perspective of users, qgroup's referenced or exclusive
is negative,but user can not continue to write data! a workaround
way is to cast u64 to s64 when doing qgroup reservation


I am unable to reproduce this problem without his modification.
I could be wrong in reverting this, so I'm gonna CC Wang as well so
he is aware of this patch.


The cast is a workaround for a quite old qgroup bug, which will
cause excl/rfer overflow to minus.

The remove of cast rfer/exel now is OK, as qgroup keeps maturing,
especially after 4.2-rc1 rfer/exel will keep sane under most case
(exception will be qgroup reassign and subvolume deletion, but will
not case minus value even under than case).


rfer/exel and reserved are all of type unsigned int, how exactly would
they overflow to minus?


Due to qgroup bugs of course,
In old implement, btrfs_find_all_roots() will not always find the 
correct roots.


Causing quota to minus more bytes on existing qgroups.

For example qg->rfer is 16K, btrfs_find_all_roots() think the qg 
previously own a 32K extent but not now, and qgroup accounting decides 
to decrease qg->rfer by 32K, now you get -16K, which is a super huge 
number if used as u64.






But I'm not a fan to remove it now.
As qgroup still has a known huge bug for the qg->reserved part, we
are aware of it and working on it actively.


Can you tell me more about this known huge bug and how you can
reproduce it using the present implementation?



Check the fstest patch I submitted:
https://patchwork.kernel.org/patch/7023301/

Btrfs qgroup has qgroup reserved space leak problem, and under some 
case, it can also overflow to minus.(I don't have a minus reproducer, 
but it already happened several times in my test environment)


That's what we are fixing now, trying to make it public before 4.3-rc1.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] btrfs: Remove unneeded cast to s64 for qgroup rfer state

2015-08-31 Thread Qu Wenruo



Alexandru Moise wrote on 2015/08/31 09:32 +0300:

On Mon, Aug 31, 2015 at 09:44:49AM +0800, Qu Wenruo wrote:

>From the perspective of users, qgroup's referenced or exclusive
is negative,but user can not continue to write data! a workaround
way is to cast u64 to s64 when doing qgroup reservation


I am unable to reproduce this problem without his modification.
I could be wrong in reverting this, so I'm gonna CC Wang as well so
he is aware of this patch.


The cast is a workaround for a quite old qgroup bug, which will
cause excl/rfer overflow to minus.

The remove of cast rfer/exel now is OK, as qgroup keeps maturing,
especially after 4.2-rc1 rfer/exel will keep sane under most case
(exception will be qgroup reassign and subvolume deletion, but will
not case minus value even under than case).


rfer/exel and reserved are all of type unsigned int, how exactly would
they overflow to minus?


Due to qgroup bugs of course,
In old implement, btrfs_find_all_roots() will not always find the 
correct roots.


Causing quota to minus more bytes on existing qgroups.

For example qg->rfer is 16K, btrfs_find_all_roots() think the qg 
previously own a 32K extent but not now, and qgroup accounting decides 
to decrease qg->rfer by 32K, now you get -16K, which is a super huge 
number if used as u64.






But I'm not a fan to remove it now.
As qgroup still has a known huge bug for the qg->reserved part, we
are aware of it and working on it actively.


Can you tell me more about this known huge bug and how you can
reproduce it using the present implementation?



Check the fstest patch I submitted:
https://patchwork.kernel.org/patch/7023301/

Btrfs qgroup has qgroup reserved space leak problem, and under some 
case, it can also overflow to minus.(I don't have a minus reproducer, 
but it already happened several times in my test environment)


That's what we are fixing now, trying to make it public before 4.3-rc1.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] btrfs: Remove unneeded cast to s64 for qgroup rfer state

2015-08-30 Thread Qu Wenruo



Alexandru Moise wrote on 2015/08/29 11:45 +:

This patch reverts commit: b4fcd6be6bbd702ae1a6545c9b413681850a9814
Wang Shilong added those casts as a workaround for a bug reproduced
using the following steps:

Steps to reproduce:

mkfs.btrfs 
mount  
dd if=/dev/zero of=//data bs=1M count=10
sync
btrfs quota enable 
btrfs qgroup create 0/5 
btrfs qgroup limit 5M 0/5 
rm -f //data
sync
btrfs qgroup show 
dd if=/dev/zero of=data bs=1M count=1

>From the perspective of users, qgroup's referenced or exclusive
is negative,but user can not continue to write data! a workaround
way is to cast u64 to s64 when doing qgroup reservation


I am unable to reproduce this problem without his modification.
I could be wrong in reverting this, so I'm gonna CC Wang as well so
he is aware of this patch.


The cast is a workaround for a quite old qgroup bug, which will cause 
excl/rfer overflow to minus.


The remove of cast rfer/exel now is OK, as qgroup keeps maturing, 
especially after 4.2-rc1 rfer/exel will keep sane under most case
(exception will be qgroup reassign and subvolume deletion, but will not 
case minus value even under than case).


But I'm not a fan to remove it now.
As qgroup still has a known huge bug for the qg->reserved part, we are 
aware of it and working on it actively.


So for such cleanup, I'd prefer to do it when we rework the accounting 
part of qgroup.


Thanks,
Qu



Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
  fs/btrfs/qgroup.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8a82029..9c75e86 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2077,14 +2077,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 
num_bytes)
qg = u64_to_ptr(unode->aux);

if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
-   qg->reserved + (s64)qg->rfer + num_bytes >
+   qg->reserved + qg->rfer + num_bytes >
qg->max_rfer) {
ret = -EDQUOT;
goto out;
}

if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
-   qg->reserved + (s64)qg->excl + num_bytes >
+   qg->reserved + qg->excl + num_bytes >
qg->max_excl) {
ret = -EDQUOT;
goto out;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] btrfs: Remove unneeded cast to s64 for qgroup rfer state

2015-08-30 Thread Qu Wenruo



Alexandru Moise wrote on 2015/08/29 11:45 +:

This patch reverts commit: b4fcd6be6bbd702ae1a6545c9b413681850a9814
Wang Shilong added those casts as a workaround for a bug reproduced
using the following steps:

Steps to reproduce:

mkfs.btrfs disk
mount disk mnt
dd if=/dev/zero of=/mnt/data bs=1M count=10
sync
btrfs quota enable mnt
btrfs qgroup create 0/5 mnt
btrfs qgroup limit 5M 0/5 mnt
rm -f /mnt/data
sync
btrfs qgroup show mnt
dd if=/dev/zero of=data bs=1M count=1

From the perspective of users, qgroup's referenced or exclusive
is negative,but user can not continue to write data! a workaround
way is to cast u64 to s64 when doing qgroup reservation


I am unable to reproduce this problem without his modification.
I could be wrong in reverting this, so I'm gonna CC Wang as well so
he is aware of this patch.


The cast is a workaround for a quite old qgroup bug, which will cause 
excl/rfer overflow to minus.


The remove of cast rfer/exel now is OK, as qgroup keeps maturing, 
especially after 4.2-rc1 rfer/exel will keep sane under most case
(exception will be qgroup reassign and subvolume deletion, but will not 
case minus value even under than case).


But I'm not a fan to remove it now.
As qgroup still has a known huge bug for the qg-reserved part, we are 
aware of it and working on it actively.


So for such cleanup, I'd prefer to do it when we rework the accounting 
part of qgroup.


Thanks,
Qu



Signed-off-by: Alexandru Moise 00moses.alexande...@gmail.com
---
  fs/btrfs/qgroup.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8a82029..9c75e86 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2077,14 +2077,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 
num_bytes)
qg = u64_to_ptr(unode-aux);

if ((qg-lim_flags  BTRFS_QGROUP_LIMIT_MAX_RFER) 
-   qg-reserved + (s64)qg-rfer + num_bytes 
+   qg-reserved + qg-rfer + num_bytes 
qg-max_rfer) {
ret = -EDQUOT;
goto out;
}

if ((qg-lim_flags  BTRFS_QGROUP_LIMIT_MAX_EXCL) 
-   qg-reserved + (s64)qg-excl + num_bytes 
+   qg-reserved + qg-excl + num_bytes 
qg-max_excl) {
ret = -EDQUOT;
goto out;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/6] Btrfs: show subvolume name and ID in /proc/mounts

2015-05-11 Thread Qu Wenruo



 Original Message  
Subject: Re: [PATCH v2 0/6] Btrfs: show subvolume name and ID in 
/proc/mounts

From: Omar Sandoval 
To: David Sterba , Qu Wenruo , 


Date: 2015年05月11日 17:42


On Thu, Apr 09, 2015 at 02:34:50PM -0700, Omar Sandoval wrote:

Here's version 2 of providing the subvolume name and ID in /proc/mounts.

It turns out that getting the name of a subvolume reliably is a bit
trickier than it would seem because of how mounting subvolumes by ID is
implemented. In particular, in that case, the dentry we get for the root
of the mount is not necessarily attached to the dentry tree, which means
that the obvious solution of just dumping the dentry does not work. The
solution I put together makes the tradeoff of churning a bit more code
in order to avoid implementing this with weird hacks.

Changes from v1 (https://lkml.org/lkml/2015/4/8/16):

- Put subvol= last in show_options
- Change commit log to remove comment about userspace having no way to
   know which subvolume is mounted, as David pointed out you can use
   btrfs inspect-internal rootid 
- Split up patch 2
- Minor coding style fixes

This still applies to v4.0-rc7. Tested manually and with the script
below (updated from v1).

Thanks!

Omar Sandoval (6):
   Btrfs: lock superblock before remounting for rw subvol
   Btrfs: remove all subvol options before mounting top-level
   Btrfs: clean up error handling in mount_subvol()
   Btrfs: fail on mismatched subvol and subvolid mount options
   Btrfs: unify subvol= and subvolid= mounting
   Btrfs: show subvol= and subvolid= in /proc/mounts

  fs/btrfs/super.c | 376 ---
  fs/seq_file.c|   1 +
  2 files changed, 251 insertions(+), 126 deletions(-)



Hi, everyone,

Just wanted to revive this so we can hopefully come up with a solution
we agree on in time for 4.2.

Just to recap, my approach (and also Qu Wenruo's original approach) is
to convert subvolid= mounts to subvol= mounts at mount time, which makes
showing the subvolume in /proc/mounts easy. The benefit of this approach
is that looking at mount information, which is supposed to be a
lightweight operation, is simple and always works. Additionally, we'll
have the info in a convenient format in /proc/mounts in addition to
/proc/$PID/mountinfo. The only caveat is that a mount by subvolid can
fail if the mount races with a rename of the subvolume.

Qu Wenruo's second approach was to instead convert the subvolid to a
subvolume path when reading /proc/$PID/mountinfo. The benefit of this
approach is that mounts by subvolid will always succeed in the face of
concurrent renames. However, instead, getting the subvolume path in
mountinfo can now fail, and it makes what should probably be a
lightweight operation somewhat complex.

In terms of the code, I think the original approach is cleaner: the
heavy lifting is done when mounting instead of when reading a proc file.
Additionally, I don't think that the concurrent rename race will be much
of a problem in practice. I can't imagine that too many people are
actively renaming subvolumes at the same time as they are mounting them,
and even if they are, I don't think it's so surprising that it would
fail. On the other hand, reading mount info while renaming subvolumes
might be marginally more common, and personally, if that failed, I'd be
unpleasantly surprised.

Orthogonal to that decision is the precedence of subvolid= and subvol=.
Although it's true that mount options usually have last-one-wins
behavior, I think David's argument regarding the principle of least
surprise is solid. Namely, someone's going to be unhappy with a
seemingly arbitrary decision when they don't match.

Sorry for the long-winded email! Thoughts, David, Qu?

Thanks,

I'm OK with your patchset, just as you mentioned, concurrently mount 
with rename is not such a common thing.

And I'm also happy with the cleaner unified mount codes.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/6] Btrfs: show subvolume name and ID in /proc/mounts

2015-05-11 Thread Qu Wenruo



 Original Message  
Subject: Re: [PATCH v2 0/6] Btrfs: show subvolume name and ID in 
/proc/mounts

From: Omar Sandoval osan...@osandov.com
To: David Sterba dste...@suse.cz, Qu Wenruo quwen...@cn.fujitsu.com, 
linux-bt...@vger.kernel.org

Date: 2015年05月11日 17:42


On Thu, Apr 09, 2015 at 02:34:50PM -0700, Omar Sandoval wrote:

Here's version 2 of providing the subvolume name and ID in /proc/mounts.

It turns out that getting the name of a subvolume reliably is a bit
trickier than it would seem because of how mounting subvolumes by ID is
implemented. In particular, in that case, the dentry we get for the root
of the mount is not necessarily attached to the dentry tree, which means
that the obvious solution of just dumping the dentry does not work. The
solution I put together makes the tradeoff of churning a bit more code
in order to avoid implementing this with weird hacks.

Changes from v1 (https://lkml.org/lkml/2015/4/8/16):

- Put subvol= last in show_options
- Change commit log to remove comment about userspace having no way to
   know which subvolume is mounted, as David pointed out you can use
   btrfs inspect-internal rootid mountpoint
- Split up patch 2
- Minor coding style fixes

This still applies to v4.0-rc7. Tested manually and with the script
below (updated from v1).

Thanks!

Omar Sandoval (6):
   Btrfs: lock superblock before remounting for rw subvol
   Btrfs: remove all subvol options before mounting top-level
   Btrfs: clean up error handling in mount_subvol()
   Btrfs: fail on mismatched subvol and subvolid mount options
   Btrfs: unify subvol= and subvolid= mounting
   Btrfs: show subvol= and subvolid= in /proc/mounts

  fs/btrfs/super.c | 376 ---
  fs/seq_file.c|   1 +
  2 files changed, 251 insertions(+), 126 deletions(-)



Hi, everyone,

Just wanted to revive this so we can hopefully come up with a solution
we agree on in time for 4.2.

Just to recap, my approach (and also Qu Wenruo's original approach) is
to convert subvolid= mounts to subvol= mounts at mount time, which makes
showing the subvolume in /proc/mounts easy. The benefit of this approach
is that looking at mount information, which is supposed to be a
lightweight operation, is simple and always works. Additionally, we'll
have the info in a convenient format in /proc/mounts in addition to
/proc/$PID/mountinfo. The only caveat is that a mount by subvolid can
fail if the mount races with a rename of the subvolume.

Qu Wenruo's second approach was to instead convert the subvolid to a
subvolume path when reading /proc/$PID/mountinfo. The benefit of this
approach is that mounts by subvolid will always succeed in the face of
concurrent renames. However, instead, getting the subvolume path in
mountinfo can now fail, and it makes what should probably be a
lightweight operation somewhat complex.

In terms of the code, I think the original approach is cleaner: the
heavy lifting is done when mounting instead of when reading a proc file.
Additionally, I don't think that the concurrent rename race will be much
of a problem in practice. I can't imagine that too many people are
actively renaming subvolumes at the same time as they are mounting them,
and even if they are, I don't think it's so surprising that it would
fail. On the other hand, reading mount info while renaming subvolumes
might be marginally more common, and personally, if that failed, I'd be
unpleasantly surprised.

Orthogonal to that decision is the precedence of subvolid= and subvol=.
Although it's true that mount options usually have last-one-wins
behavior, I think David's argument regarding the principle of least
surprise is solid. Namely, someone's going to be unhappy with a
seemingly arbitrary decision when they don't match.

Sorry for the long-winded email! Thoughts, David, Qu?

Thanks,

I'm OK with your patchset, just as you mentioned, concurrently mount 
with rename is not such a common thing.

And I'm also happy with the cleaner unified mount codes.

Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval 
To: Qu Wenruo 
Date: 2015年04月08日 15:17


On Wed, Apr 08, 2015 at 02:06:14PM +0800, Qu Wenruo wrote:



 Original Message  
Subject: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval 
To: Chris Mason , Josef Bacik , David Sterba
, 
Date: 2015年04月08日 13:34


Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.


Hi, Qu,


Oh, this patch is what I have tried long long ago, and want to do the same
thing, to show subvolume mount for btrfs.


Thanks for pointing that out, I didn't come across your post when I was
looking around. I figured that someone must have thought of it first :)


But it came to me that, superblock->show_path() is a better method to do it.

You can implement btrfs_show_path() to allow mountinfo to get the subvolume
name from subvolid, and don't change the mount routine much.


Hm, I don't think that the changes to the mount code would be
unwarranted. Having one code path makes it more obvious what's going on.
Do you mind elaborating on why you preferred doing it in ->show_path()?

The story seems to be long.

At that time, I also tried to do the subvolid->path convert and it seems 
works.


But another problem, IIRC, btrfs losing its security label bug,
will be triggered more easy if we all go through the "subvol=" routine,
as that routine will use vfs_mount twice. The second time it will
definitely lost the security label.

Although the problem is later resolved by handling security label 
internally, but it drove me not touching the mount routine.



Also another problem is, "subvolid=" routine can also happen when the fs 
is already mounted, so there may be some operations ,like deleting files 
and dirs, interfere your subvolid->path search codes.
(During your while loop, there is a race windows between your 
release_path() and search_slot())

Resulting a mount failure even nothing goes wrong.

->show_path() method can't avoid above race problem, but the good thing 
is, even race happens, it won't disturb our mount.

Just a -EBUSY when showing /proc/self/mountinfo, not a mount failure.


Thanks,
Qu


Thanks!


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval 
To: Chris Mason , Josef Bacik , David Sterba 
, 

Date: 2015年04月08日 13:34


Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
Oh, this patch is what I have tried long long ago, and want to do the 
same thing, to show subvolume mount for btrfs.


But it came to me that, superblock->show_path() is a better method to do it.

You can implement btrfs_show_path() to allow mountinfo to get the 
subvolume name from subvolid, and don't change the mount routine much.


Thanks,
Qu


Signed-off-by: Omar Sandoval 
---
  fs/btrfs/super.c | 347 ---
  1 file changed, 225 insertions(+), 122 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d38be09..5ab9801 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -841,33 +841,153 @@ out:
return error;
  }

-static struct dentry *get_default_root(struct super_block *sb,
-  u64 subvol_objectid)
+static char *get_subvol_name_from_objectid(struct btrfs_fs_info *fs_info,
+  u64 subvol_objectid)
  {
-   struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *root = fs_info->tree_root;
-   struct btrfs_root *new_root;
-   struct btrfs_dir_item *di;
-   struct btrfs_path *path;
-   struct btrfs_key location;
-   struct inode *inode;
-   u64 dir_id;
-   int new = 0;
+   struct btrfs_root *fs_root;
+   struct btrfs_root_ref *root_ref;
+   struct btrfs_inode_ref *inode_ref;
+   struct btrfs_key key;
+   struct btrfs_path *path = NULL;
+   char *name = NULL, *ptr;
+   u64 dirid;
+   int len;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   path->leave_spinning = 1;
+
+   name = kmalloc(PATH_MAX, GFP_NOFS);
+   if (!name) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   ptr = name + PATH_MAX - 1;
+   ptr[0] = '\0';

/*
-* We have a specific subvol we want to mount, just setup location and
-* go look up the root.
+* Walk up the subvolume trees in the tree of tree roots by root
+* backrefs until we hit the top-level subvolume.
 */
-   if (subvol_objectid) {
-   location.objectid = subvol_objectid;
-   location.type = BTRFS_ROOT_ITEM_KEY;
-   location.offset = (u64)-1;
-   goto find_root;
+   while (subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
+   key.objectid = subvol_objectid;
+   key.type = BTRFS_ROOT_BACKREF_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
+   if (ret < 0) {
+   goto err;
+   } else if (ret > 0) {
+   ret = btrfs_previous_item(root, path, subvol_objectid,
+ BTRFS_ROOT_BACKREF_KEY);
+   if (ret < 0) {
+   goto err;
+   } else if (ret > 0) {
+   ret = -ENOENT;
+   goto err;
+   }
+   }
+
+   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
+   subvol_objectid = key.offset;
+
+   root_ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_root_ref);
+   len = btrfs_root_ref_name_len(path->nodes[0], root_ref);
+   ptr -= len + 1;
+   if (ptr < name) {
+   ret = -ENAMETOOLONG;
+   goto err;
+   }
+   read_extent_buffer(path->nodes[0], ptr + 1,
+  (unsigned long)(root_ref + 1), len);
+   ptr[0] = '/';
+   dirid = btrfs_root_ref_dirid(path->nodes[0], root_ref);
+   btrfs_release_path(path);
+
+   key.objectid = subvol_objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   fs_root = btrfs_read_fs_root_no_name(fs_info, );
+   if (IS_ERR(fs_root)) {
+   ret = PTR_ERR(fs_root);
+   goto err;
+   }
+
+   

Re: [PATCH 3/3] Btrfs: show subvol= and subvolid= in /proc/mounts

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: [PATCH 3/3] Btrfs: show subvol= and subvolid= in /proc/mounts
From: Omar Sandoval 
To: Chris Mason , Josef Bacik , David Sterba 
, 

Date: 2015年04月08日 13:34


Currently, userspace has no way to know which subvolume is mounted.But,
now that we're guaranteed to have a meaningful root dentry, we can just
export and use seq_dentry() in btrfs_show_options(). The subvolume ID is
easy to get, so put that in there, too.

Oh, I sent patch like this long long ago but still not merged.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/36997

My patch doesn't do it in mount options, but add it to /proc/self/mountinfo.

In fact, if you mount subvolume with "-o subvol=", then 
/proc/self/mountinfo should has the result like below:

73 33 0:35 / /mnt/test rw,relatime shared:57 - btrfs /dev/sdb rw,space_cache
75 33 0:35 /test /mnt/scratch rw,relatime shared:59 - btrfs /dev/sdb 
rw,space_cache


The only problem is, if you mount with "-o subvolid=" as the *FIRST* 
mount of the fs, then mountinfo can't show it.


My patch will fix the above problem but not merged yet...

Thanks,
Qu


Signed-off-by: Omar Sandoval 
---
  fs/btrfs/super.c | 4 
  fs/seq_file.c| 1 +
  2 files changed, 5 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 5ab9801..5e14bb6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1193,6 +1193,10 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
seq_puts(seq, ",fatal_errors=panic");
if (info->commit_interval != BTRFS_DEFAULT_COMMIT_INTERVAL)
seq_printf(seq, ",commit=%d", info->commit_interval);
+   seq_puts(seq, ",subvol=");
+   seq_dentry(seq, dentry, " \t\n\\");
+   seq_printf(seq, ",subvolid=%llu",
+ BTRFS_I(d_inode(dentry))->root->root_key.objectid);
return 0;
  }

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 555f821..52b4927 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -538,6 +538,7 @@ int seq_dentry(struct seq_file *m, struct dentry *dentry, 
const char *esc)

return res;
  }
+EXPORT_SYMBOL(seq_dentry);

  static void *single_start(struct seq_file *p, loff_t *pos)
  {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Btrfs: show subvol= and subvolid= in /proc/mounts

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: [PATCH 3/3] Btrfs: show subvol= and subvolid= in /proc/mounts
From: Omar Sandoval osan...@osandov.com
To: Chris Mason c...@fb.com, Josef Bacik jba...@fb.com, David Sterba 
dste...@suse.cz, linux-bt...@vger.kernel.org

Date: 2015年04月08日 13:34


Currently, userspace has no way to know which subvolume is mounted.But,
now that we're guaranteed to have a meaningful root dentry, we can just
export and use seq_dentry() in btrfs_show_options(). The subvolume ID is
easy to get, so put that in there, too.

Oh, I sent patch like this long long ago but still not merged.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/36997

My patch doesn't do it in mount options, but add it to /proc/self/mountinfo.

In fact, if you mount subvolume with -o subvol=, then 
/proc/self/mountinfo should has the result like below:

73 33 0:35 / /mnt/test rw,relatime shared:57 - btrfs /dev/sdb rw,space_cache
75 33 0:35 /test /mnt/scratch rw,relatime shared:59 - btrfs /dev/sdb 
rw,space_cache


The only problem is, if you mount with -o subvolid= as the *FIRST* 
mount of the fs, then mountinfo can't show it.


My patch will fix the above problem but not merged yet...

Thanks,
Qu


Signed-off-by: Omar Sandoval osan...@osandov.com
---
  fs/btrfs/super.c | 4 
  fs/seq_file.c| 1 +
  2 files changed, 5 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 5ab9801..5e14bb6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1193,6 +1193,10 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
seq_puts(seq, ,fatal_errors=panic);
if (info-commit_interval != BTRFS_DEFAULT_COMMIT_INTERVAL)
seq_printf(seq, ,commit=%d, info-commit_interval);
+   seq_puts(seq, ,subvol=);
+   seq_dentry(seq, dentry,  \t\n\\);
+   seq_printf(seq, ,subvolid=%llu,
+ BTRFS_I(d_inode(dentry))-root-root_key.objectid);
return 0;
  }

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 555f821..52b4927 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -538,6 +538,7 @@ int seq_dentry(struct seq_file *m, struct dentry *dentry, 
const char *esc)

return res;
  }
+EXPORT_SYMBOL(seq_dentry);

  static void *single_start(struct seq_file *p, loff_t *pos)
  {


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval osan...@osandov.com
To: Chris Mason c...@fb.com, Josef Bacik jba...@fb.com, David Sterba 
dste...@suse.cz, linux-bt...@vger.kernel.org

Date: 2015年04月08日 13:34


Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
Oh, this patch is what I have tried long long ago, and want to do the 
same thing, to show subvolume mount for btrfs.


But it came to me that, superblock-show_path() is a better method to do it.

You can implement btrfs_show_path() to allow mountinfo to get the 
subvolume name from subvolid, and don't change the mount routine much.


Thanks,
Qu


Signed-off-by: Omar Sandoval osan...@osandov.com
---
  fs/btrfs/super.c | 347 ---
  1 file changed, 225 insertions(+), 122 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d38be09..5ab9801 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -841,33 +841,153 @@ out:
return error;
  }

-static struct dentry *get_default_root(struct super_block *sb,
-  u64 subvol_objectid)
+static char *get_subvol_name_from_objectid(struct btrfs_fs_info *fs_info,
+  u64 subvol_objectid)
  {
-   struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *root = fs_info-tree_root;
-   struct btrfs_root *new_root;
-   struct btrfs_dir_item *di;
-   struct btrfs_path *path;
-   struct btrfs_key location;
-   struct inode *inode;
-   u64 dir_id;
-   int new = 0;
+   struct btrfs_root *fs_root;
+   struct btrfs_root_ref *root_ref;
+   struct btrfs_inode_ref *inode_ref;
+   struct btrfs_key key;
+   struct btrfs_path *path = NULL;
+   char *name = NULL, *ptr;
+   u64 dirid;
+   int len;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   path-leave_spinning = 1;
+
+   name = kmalloc(PATH_MAX, GFP_NOFS);
+   if (!name) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   ptr = name + PATH_MAX - 1;
+   ptr[0] = '\0';

/*
-* We have a specific subvol we want to mount, just setup location and
-* go look up the root.
+* Walk up the subvolume trees in the tree of tree roots by root
+* backrefs until we hit the top-level subvolume.
 */
-   if (subvol_objectid) {
-   location.objectid = subvol_objectid;
-   location.type = BTRFS_ROOT_ITEM_KEY;
-   location.offset = (u64)-1;
-   goto find_root;
+   while (subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
+   key.objectid = subvol_objectid;
+   key.type = BTRFS_ROOT_BACKREF_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+   if (ret  0) {
+   goto err;
+   } else if (ret  0) {
+   ret = btrfs_previous_item(root, path, subvol_objectid,
+ BTRFS_ROOT_BACKREF_KEY);
+   if (ret  0) {
+   goto err;
+   } else if (ret  0) {
+   ret = -ENOENT;
+   goto err;
+   }
+   }
+
+   btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
+   subvol_objectid = key.offset;
+
+   root_ref = btrfs_item_ptr(path-nodes[0], path-slots[0],
+ struct btrfs_root_ref);
+   len = btrfs_root_ref_name_len(path-nodes[0], root_ref);
+   ptr -= len + 1;
+   if (ptr  name) {
+   ret = -ENAMETOOLONG;
+   goto err;
+   }
+   read_extent_buffer(path-nodes[0], ptr + 1,
+  (unsigned long)(root_ref + 1), len);
+   ptr[0] = '/';
+   dirid = btrfs_root_ref_dirid(path-nodes[0], root_ref);
+   btrfs_release_path(path);
+
+   key.objectid = subvol_objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   fs_root = btrfs_read_fs_root_no_name(fs_info, key);
+   if (IS_ERR(fs_root)) {
+ 

Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting

2015-04-08 Thread Qu Wenruo



 Original Message  
Subject: Re: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval osan...@osandov.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年04月08日 15:17


On Wed, Apr 08, 2015 at 02:06:14PM +0800, Qu Wenruo wrote:



 Original Message  
Subject: [PATCH 2/3] Btrfs: unify subvol= and subvolid= mounting
From: Omar Sandoval osan...@osandov.com
To: Chris Mason c...@fb.com, Josef Bacik jba...@fb.com, David Sterba
dste...@suse.cz, linux-bt...@vger.kernel.org
Date: 2015年04月08日 13:34


Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.


Hi, Qu,


Oh, this patch is what I have tried long long ago, and want to do the same
thing, to show subvolume mount for btrfs.


Thanks for pointing that out, I didn't come across your post when I was
looking around. I figured that someone must have thought of it first :)


But it came to me that, superblock-show_path() is a better method to do it.

You can implement btrfs_show_path() to allow mountinfo to get the subvolume
name from subvolid, and don't change the mount routine much.


Hm, I don't think that the changes to the mount code would be
unwarranted. Having one code path makes it more obvious what's going on.
Do you mind elaborating on why you preferred doing it in -show_path()?

The story seems to be long.

At that time, I also tried to do the subvolid-path convert and it seems 
works.


But another problem, IIRC, btrfs losing its security label bug,
will be triggered more easy if we all go through the subvol= routine,
as that routine will use vfs_mount twice. The second time it will
definitely lost the security label.

Although the problem is later resolved by handling security label 
internally, but it drove me not touching the mount routine.



Also another problem is, subvolid= routine can also happen when the fs 
is already mounted, so there may be some operations ,like deleting files 
and dirs, interfere your subvolid-path search codes.
(During your while loop, there is a race windows between your 
release_path() and search_slot())

Resulting a mount failure even nothing goes wrong.

-show_path() method can't avoid above race problem, but the good thing 
is, even race happens, it won't disturb our mount.

Just a -EBUSY when showing /proc/self/mountinfo, not a mount failure.


Thanks,
Qu


Thanks!


Thanks,
Qu



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >