Re: [PATCH xfstests] generic/192: Move 'cd /' to the place where the program exits
On Wed, Oct 09, 2019 at 04:27:57PM +0800, Zhihao Cheng wrote: > Running generic/192 with overlayfs(Let ubifs as base fs) yields the > following output: > > generic/192 - output mismatch > QA output created by 192 > sleep for 5 seconds > test > +./common/rc: line 316: src/t_dir_type: No such file or directory > delta1 is in range > delta2 is in range > ... > > When the use case fails, the call stack in generic/192 is: > > local unknowns=$(src/t_dir_type $dir u | wc -l) common/rc:316 > _supports_filetype common/rc:299 > _overlay_mount common/overlay:52 > _overlay_test_mount common/overlay:93 > _test_mount common/rc:407 > _test_cycle_mount generic/192:50 > > Before _test_cycle_mount() being invoked, generic/192 executed 'cd /' > to change work dir from 'xfstests-dev' to '/', so src/t_dir_type was not > found. > > Signed-off-by: Zhihao Cheng Thanks for the debug! But I think the right fix is to call t_dir_type via "$here", i.e. local unknowns=$($here/src/t_dir_type $dir u | wc -l) 'here', which points to the top level dir of xfstests source code, is defined in every test in test setup, and is guaranteed not to be empty. Thanks, Eryu > --- > tests/generic/192 | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/tests/generic/192 b/tests/generic/192 > index 50b3d6fd..5550f39e 100755 > --- a/tests/generic/192 > +++ b/tests/generic/192 > @@ -15,7 +15,12 @@ echo "QA output created by $seq" > here=`pwd` > tmp=/tmp/$$ > status=1 # failure is the default! > -trap "exit \$status" 0 1 2 3 15 > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > +} > > _access_time() > { > @@ -46,7 +51,6 @@ sleep $delay # sleep to allow time to move on for access > cat $testfile > time2=`_access_time $testfile | tee -a $seqres.full` > > -cd / > _test_cycle_mount > time3=`_access_time $testfile | tee -a $seqres.full` > > -- > 2.13.6 >
Re: [PATCH xfstests v3] overlay: Enable character device to be the base fs partition
On Wed, Sep 25, 2019 at 10:18:39AM +0300, Amir Goldstein wrote: > On Wed, Sep 25, 2019 at 9:29 AM Zhihao Cheng wrote: > > > > When running overlay tests using character devices as base fs partitions, > > all overlay usecase results become 'notrun'. Function > > '_overay_config_override' (common/config) detects that the current base > > fs partition is not a block device and will set FSTYP to base fs. The > > overlay usecase will check the current FSTYP, and if it is not 'overlay' > > or 'generic', it will skip the execution. > > > > For example, using UBIFS as base fs skips all overlay usecases: > > > > FSTYP -- ubifs # FSTYP should be overridden as 'overlay' > > MKFS_OPTIONS -- /dev/ubi0_1 # Character device > > MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch > > > > overlay/001 [not run] not suitable for this filesystem type: ubifs > > overlay/002 [not run] not suitable for this filesystem type: ubifs > > overlay/003 [not run] not suitable for this filesystem type: ubifs > > > > When checking that the base fs partition is a block/character device, > > FSTYP is overwritten as 'overlay'. This patch allows the base fs > > partition to be a character device that can also execute overlay > > usecases (such as ubifs). > > > > Signed-off-by: Zhihao Cheng > > Signed-off-by: Amir Goldstein > > Looks fine. > Eryu, you may change this to Reviewed-by Sure, thanks for the review! Eryu
Re: [PATCH xfstests] overlay: Enable character device to be the base fs partition
On Tue, Sep 24, 2019 at 10:19:38PM +0800, Zhihao Cheng wrote: > As far as I know, _require_scratch_shutdown() is called after > _overay_config_override(), at this moment, FSTYP equals to base fs. According > the implementation of _require_scratch_shutdown: > 3090 _require_scratch_shutdown() > 3091 { > 3092 [ -x src/godown ] || _notrun "src/godown executable not found" > 3093 > 3094 _scratch_mkfs > /dev/null 2>&1 || _notrun "_scratch_mkfs failed on > $SCRATCH_DEV" > 3095 _scratch_mount > 3096 > 3097 if [ $FSTYP = "overlay" ]; then > # FSTYP = base fs > 3098 if [ -z $OVL_BASE_SCRATCH_DEV ]; then > 3099 # In lagacy overlay usage, it may specify directory as > 3100 # SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV > 3101 # will be null, so check OVL_BASE_SCRATCH_DEV before > 3102 # running shutdown to avoid shutting down base fs accidently. > 3103 _notrun "$SCRATCH_DEV is not a block device" > 3104 else > 3105 src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \ > 3106 || _notrun "Underlying filesystem does not support shutdown" > 3107 fi > 3108 else > 3109 src/godown -f $SCRATCH_MNT 2>&1 \ > 3110 || _notrun "$FSTYP does not support shutdown" > # Executes this path > 3111 fi > 3112 > 3113 _scratch_unmount > 3114 } > So, we can't get output: _notrun "$SCRATCH_DEV is not a block device". > Instead, the verbose should like: > after _overlay_config_override FSTYP=ubifs# Additional print message > FSTYP -- ubifs > PLATFORM -- Linux/x86_64 > MKFS_OPTIONS -- /dev/ubi0_1 > MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch > > generic/042 [not run] ubifs does not support shutdown > > But I'll consider describing error more concisely in v2. > > 在 2019/9/24 20:33, Amir Goldstein 写道: > > On Tue, Sep 24, 2019 at 12:34 PM Zhihao Cheng > > wrote: > >> > >> When running overlay tests using character devices as base fs partitions, > >> all overlay usecase results become 'notrun'. Function > >> '_overay_config_override' (common/config) detects that the current base > >> fs partition is not a block device and will set FSTYP to base fs. The > >> overlay usecase will check the current FSTYP, and if it is not 'overlay' > >> or 'generic', it will skip the execution. > >> > >> For example, using UBIFS as base fs skips all overlay usecases: > >> > >> FSTYP -- ubifs # FSTYP should be overridden as 'overlay' > >> MKFS_OPTIONS -- /dev/ubi0_1 # Character device > >> MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch > >> > >> overlay/001 [not run] not suitable for this filesystem type: ubifs > >> overlay/002 [not run] not suitable for this filesystem type: ubifs > >> overlay/003 [not run] not suitable for this filesystem type: ubifs > >> ... > >> > >> When checking that the base fs partition is a block/character device, > >> FSTYP is overwritten as 'overlay'. This patch allows the base fs > >> partition to be a character device that can also execute overlay > >> usecases (such as ubifs). > >> > >> Signed-off-by: Zhihao Cheng > >> --- > >> common/config | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/common/config b/common/config > >> index 4c86a49..a22acdb 100644 > >> --- a/common/config > >> +++ b/common/config > >> @@ -550,7 +550,7 @@ _overlay_config_override() > >> #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the > >> values > >> #of the configured base fs and SCRATCH/TEST_DEV vars are set > >> to the > >> #overlayfs base and mount dirs inside base fs mount. > >> - [ -b "$TEST_DEV" ] || return 0 > >> + [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0 > >> > >> # Config file may specify base fs type, but we obay -overlay flag > >> [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP" > >> @@ -570,7 +570,7 @@ _overlay_config_override() > >> export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT" > >> export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS" > >> > >> - [ -b "$SCRATCH_DEV" ] || return 0 > >> + [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0 > >> > >> # Store original base fs vars > >> export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV" > >> -- > >> 2.7.4 > >> > > > > Looks fine. > > > > One nit: there is a message in _require_scratch_shutdown(): > > _notrun "$SCRATCH_DEV is not a block device" > > for when $OVL_BASE_SCRATCH_DEV is not defined. > > > > Could probably use a better describing error anyway. I think what Amir suggested is that, as you add char device support to overlay base device, the message in _require_scratch_shutdown() should be updated accordingly, not the commit log. Thanks, Eryu
Re: [PATCH] generic/381: enable on systems which allows usernames that begin with digits
On Fri, Dec 15, 2017 at 12:41:07PM -0800, Luis R. Rodriguez wrote: > Some systems are not allowing usernames prefixed with a number now, this > test however relies on the assumption that you can end up with usernames > of such type, given the purpose of the test is to ensure that xfs_quota > can differentiate between UIDs and names beginning with numbers. > > systemd >= 232 (circa 2017) no longer allows usernames starting with digits > [0], there is a systemd exploit (CVE-2017-182 [1]) for why that was done, > however even upstream shadow useradd also does not allow similar user types > since shadow version v4.0.1 (circa 2007) [2] but there no easy way to check > shadow's useradd's version. > > You can still shoehorn in these types of users by manually editing files, > but that's just shooting yourself on the foot given all the precautions > taken now by userspace, so just check for the systemd version for now as > requirement for running this test. > > [0] https://github.com/systemd/systemd/issues/6237 > [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-182 > [2] > https://github.com/shadow-maint/shadow/commit/9db6abfa42c946b4046f4b2fe67dc43ba862eb0e > > Signed-off-by: Luis R. Rodriguez> --- > README| 7 +-- > common/config | 1 + > common/rc | 42 ++ > tests/generic/381 | 1 + > 4 files changed, 49 insertions(+), 2 deletions(-) > > diff --git a/README b/README > index ed69332e774e..aff7bdae7cb4 100644 > --- a/README > +++ b/README > @@ -20,8 +20,11 @@ ___ > - run make > - run make install > - create fsgqa test user ("sudo useradd fsgqa") > -- create 123456-fsgqa test user ("sudo useradd 123456-fsgqa") > - > +- Only on systems which allow usernames that start with a digit (older > + than systemd 232 and/or has shadow older than v4.0.1), create the > + 123456-fsgqa test user: > +sudo useradd 123456-fsgqa > + IMHO, this doc update is sufficient, generic/381 already _notrun if there's no 123456-fsgqa user present because of _require_user 123456-fsgqa And we don't rely on any version check in fstests, usually we check on the actual behavior, e.g. actually mkfs & mount the fs to see if the current kernel and userspace support a given feature. Thanks, Eryu
Re: [PATCH] generic/381: enable on systems which allows usernames that begin with digits
On Fri, Dec 15, 2017 at 12:41:07PM -0800, Luis R. Rodriguez wrote: > Some systems are not allowing usernames prefixed with a number now, this > test however relies on the assumption that you can end up with usernames > of such type, given the purpose of the test is to ensure that xfs_quota > can differentiate between UIDs and names beginning with numbers. > > systemd >= 232 (circa 2017) no longer allows usernames starting with digits > [0], there is a systemd exploit (CVE-2017-182 [1]) for why that was done, > however even upstream shadow useradd also does not allow similar user types > since shadow version v4.0.1 (circa 2007) [2] but there no easy way to check > shadow's useradd's version. > > You can still shoehorn in these types of users by manually editing files, > but that's just shooting yourself on the foot given all the precautions > taken now by userspace, so just check for the systemd version for now as > requirement for running this test. > > [0] https://github.com/systemd/systemd/issues/6237 > [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-182 > [2] > https://github.com/shadow-maint/shadow/commit/9db6abfa42c946b4046f4b2fe67dc43ba862eb0e > > Signed-off-by: Luis R. Rodriguez > --- > README| 7 +-- > common/config | 1 + > common/rc | 42 ++ > tests/generic/381 | 1 + > 4 files changed, 49 insertions(+), 2 deletions(-) > > diff --git a/README b/README > index ed69332e774e..aff7bdae7cb4 100644 > --- a/README > +++ b/README > @@ -20,8 +20,11 @@ ___ > - run make > - run make install > - create fsgqa test user ("sudo useradd fsgqa") > -- create 123456-fsgqa test user ("sudo useradd 123456-fsgqa") > - > +- Only on systems which allow usernames that start with a digit (older > + than systemd 232 and/or has shadow older than v4.0.1), create the > + 123456-fsgqa test user: > +sudo useradd 123456-fsgqa > + IMHO, this doc update is sufficient, generic/381 already _notrun if there's no 123456-fsgqa user present because of _require_user 123456-fsgqa And we don't rely on any version check in fstests, usually we check on the actual behavior, e.g. actually mkfs & mount the fs to see if the current kernel and userspace support a given feature. Thanks, Eryu
Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build
On Thu, Dec 14, 2017 at 06:55:03PM +0100, Luis R. Rodriguez wrote: > On Thu, Dec 14, 2017 at 01:51:02PM +0800, Eryu Guan wrote: > > On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote: > > > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h. > > > The old m4 macro had detection support for some old gdbm libraries > > > but not for new ones. > > > > > > We fix compilation of src/dbtest.c by making the autoconf helper > > > check for this new arrangement: > > > > > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true, > > ^^ ndbm.h? > > > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already > > > had a HAVE_GDBM_H but there was never a respective autoconf settter for > > > it. We can just re-use this and fix it for new arrangement. > > > > > > Signed-off-by: Luis R. Rodriguez <mcg...@kernel.org> > > > > This looks fine to me. > > > > The only system I have by hand that have both and but > > not any is openSUSE Tumbleweed. > > Indeed, openSUSE and SLE releases. > > > Without this patch, > > dbtest was not built on openSUSE, and was built successfully with this > > patch applied. > > Yeap. > > > And dbtest is still built on RHEL6/7 and Fedora. > > Feel free to modify the commit log accordingly then. Curious, what packages > does Fedora/ RHEL6/7 use for the requirement here? > > We just have one: > > $ rpm -ql gdbm-devel-1.12-1.282.x86_64 > /usr/bin/gdbm_dump > /usr/bin/gdbm_load > /usr/bin/gdbmtool > /usr/include/dbm.h > /usr/include/gdbm.h > /usr/include/ndbm.h > /usr/lib64/libgdbm.a > /usr/lib64/libgdbm.so > /usr/lib64/libgdbm_compat.a > /usr/lib64/libgdbm_compat.so > /usr/lib64/libndbm.a > /usr/lib64/libndbm.so > /usr/share/info/gdbm.info.gz > /usr/share/man/man1/gdbm_dump.1.gz > /usr/share/man/man1/gdbm_load.1.gz > /usr/share/man/man1/gdbmtool.1.gz > /usr/share/man/man3/gdbm.3.gz gdbm-devel too, but it has gdbm/[gn]dbm.h pointing to ../[gn]dbm.h, so there's no such problem and dbtest is building normally. # rpm -ql gdbm-devel /usr/include/dbm.h /usr/include/gdbm /usr/include/gdbm.h /usr/include/gdbm/dbm.h /usr/include/gdbm/gdbm.h /usr/include/gdbm/ndbm.h /usr/include/ndbm.h /usr/lib64/libgdbm.so /usr/lib64/libgdbm_compat.so /usr/share/info/gdbm.info.gz /usr/share/man/man3/gdbm.3.gz > > > BTW, I'll queue patch 3 and this patch for next fstests release, while > > other patches seem not necessary, > > I think patch 2 is fine too. > > > I agreed with Dave that groups are not > > for excluding tests, the required tools and environments should be > > detected by tests and _notrun if not met. > > Yeah makes sense now. I think we should also document when adding > a group makes sense as well. > > > (The README change looks fine, > > but it doesn't apply due to the "fsgqa-381" change, so I drop it too for > > now.) > > Feel free to modify it, its not a big deal. OK, I'll modify on commit, thanks! Thanks, Eryu
Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build
On Thu, Dec 14, 2017 at 06:55:03PM +0100, Luis R. Rodriguez wrote: > On Thu, Dec 14, 2017 at 01:51:02PM +0800, Eryu Guan wrote: > > On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote: > > > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h. > > > The old m4 macro had detection support for some old gdbm libraries > > > but not for new ones. > > > > > > We fix compilation of src/dbtest.c by making the autoconf helper > > > check for this new arrangement: > > > > > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true, > > ^^ ndbm.h? > > > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already > > > had a HAVE_GDBM_H but there was never a respective autoconf settter for > > > it. We can just re-use this and fix it for new arrangement. > > > > > > Signed-off-by: Luis R. Rodriguez > > > > This looks fine to me. > > > > The only system I have by hand that have both and but > > not any is openSUSE Tumbleweed. > > Indeed, openSUSE and SLE releases. > > > Without this patch, > > dbtest was not built on openSUSE, and was built successfully with this > > patch applied. > > Yeap. > > > And dbtest is still built on RHEL6/7 and Fedora. > > Feel free to modify the commit log accordingly then. Curious, what packages > does Fedora/ RHEL6/7 use for the requirement here? > > We just have one: > > $ rpm -ql gdbm-devel-1.12-1.282.x86_64 > /usr/bin/gdbm_dump > /usr/bin/gdbm_load > /usr/bin/gdbmtool > /usr/include/dbm.h > /usr/include/gdbm.h > /usr/include/ndbm.h > /usr/lib64/libgdbm.a > /usr/lib64/libgdbm.so > /usr/lib64/libgdbm_compat.a > /usr/lib64/libgdbm_compat.so > /usr/lib64/libndbm.a > /usr/lib64/libndbm.so > /usr/share/info/gdbm.info.gz > /usr/share/man/man1/gdbm_dump.1.gz > /usr/share/man/man1/gdbm_load.1.gz > /usr/share/man/man1/gdbmtool.1.gz > /usr/share/man/man3/gdbm.3.gz gdbm-devel too, but it has gdbm/[gn]dbm.h pointing to ../[gn]dbm.h, so there's no such problem and dbtest is building normally. # rpm -ql gdbm-devel /usr/include/dbm.h /usr/include/gdbm /usr/include/gdbm.h /usr/include/gdbm/dbm.h /usr/include/gdbm/gdbm.h /usr/include/gdbm/ndbm.h /usr/include/ndbm.h /usr/lib64/libgdbm.so /usr/lib64/libgdbm_compat.so /usr/share/info/gdbm.info.gz /usr/share/man/man3/gdbm.3.gz > > > BTW, I'll queue patch 3 and this patch for next fstests release, while > > other patches seem not necessary, > > I think patch 2 is fine too. > > > I agreed with Dave that groups are not > > for excluding tests, the required tools and environments should be > > detected by tests and _notrun if not met. > > Yeah makes sense now. I think we should also document when adding > a group makes sense as well. > > > (The README change looks fine, > > but it doesn't apply due to the "fsgqa-381" change, so I drop it too for > > now.) > > Feel free to modify it, its not a big deal. OK, I'll modify on commit, thanks! Thanks, Eryu
Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build
On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote: > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h. > The old m4 macro had detection support for some old gdbm libraries > but not for new ones. > > We fix compilation of src/dbtest.c by making the autoconf helper > check for this new arrangement: > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true, ^^ ndbm.h? > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already > had a HAVE_GDBM_H but there was never a respective autoconf settter for > it. We can just re-use this and fix it for new arrangement. > > Signed-off-by: Luis R. RodriguezThis looks fine to me. The only system I have by hand that have both and but not any is openSUSE Tumbleweed. Without this patch, dbtest was not built on openSUSE, and was built successfully with this patch applied. And dbtest is still built on RHEL6/7 and Fedora. BTW, I'll queue patch 3 and this patch for next fstests release, while other patches seem not necessary, I agreed with Dave that groups are not for excluding tests, the required tools and environments should be detected by tests and _notrun if not met. (The README change looks fine, but it doesn't apply due to the "fsgqa-381" change, so I drop it too for now.) Thanks, Eryu
Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build
On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote: > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h. > The old m4 macro had detection support for some old gdbm libraries > but not for new ones. > > We fix compilation of src/dbtest.c by making the autoconf helper > check for this new arrangement: > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true, ^^ ndbm.h? > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already > had a HAVE_GDBM_H but there was never a respective autoconf settter for > it. We can just re-use this and fix it for new arrangement. > > Signed-off-by: Luis R. Rodriguez This looks fine to me. The only system I have by hand that have both and but not any is openSUSE Tumbleweed. Without this patch, dbtest was not built on openSUSE, and was built successfully with this patch applied. And dbtest is still built on RHEL6/7 and Fedora. BTW, I'll queue patch 3 and this patch for next fstests release, while other patches seem not necessary, I agreed with Dave that groups are not for excluding tests, the required tools and environments should be detected by tests and _notrun if not met. (The README change looks fine, but it doesn't apply due to the "fsgqa-381" change, so I drop it too for now.) Thanks, Eryu
[PATCH] lib/string: avoid reading beyond src buffer in strscpy
strscpy() tries to copy sizeof(unsigned long) bytes a time from src to dest when possible, and stops the loop when 'max' is less than sizeof(unsigned long). But it doesn't check if (src+res) goes beyond src buffer and does out-of-bound access to the underlying memory. KASAN reported global-out-of-bound bug when reading seccomp actions_logged file in procfs: cat /proc/sys/kernel/seccomp/actions_logged Because seccomp_names_from_actions_logged() is copying short strings (less than sizeof(unsigned long)) to buffer 'names'. e.g. ret = strscpy(names, " ", size); Fixed by capping the 'max' value according to the src buffer size, to make sure we won't go beyond src buffer. Cc: Andrew Morton <a...@linux-foundation.org> Cc: Chris Metcalf <cmetc...@ezchip.com> Cc: Kees Cook <keesc...@chromium.org> Signed-off-by: Eryu Guan <eg...@redhat.com> --- lib/string.c | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/string.c b/lib/string.c index 64a9e33f1daa..13a0147eea00 100644 --- a/lib/string.c +++ b/lib/string.c @@ -179,6 +179,7 @@ ssize_t strscpy(char *dest, const char *src, size_t count) { const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS; size_t max = count; + size_t src_sz = strlen(src) + 1; long res = 0; if (count == 0) @@ -200,6 +201,10 @@ ssize_t strscpy(char *dest, const char *src, size_t count) max = 0; #endif + /* avoid reading beyond src buffer */ + if (max > src_sz) + max = src_sz; + while (max >= sizeof(unsigned long)) { unsigned long c, data; -- 2.14.3
[PATCH] lib/string: avoid reading beyond src buffer in strscpy
strscpy() tries to copy sizeof(unsigned long) bytes a time from src to dest when possible, and stops the loop when 'max' is less than sizeof(unsigned long). But it doesn't check if (src+res) goes beyond src buffer and does out-of-bound access to the underlying memory. KASAN reported global-out-of-bound bug when reading seccomp actions_logged file in procfs: cat /proc/sys/kernel/seccomp/actions_logged Because seccomp_names_from_actions_logged() is copying short strings (less than sizeof(unsigned long)) to buffer 'names'. e.g. ret = strscpy(names, " ", size); Fixed by capping the 'max' value according to the src buffer size, to make sure we won't go beyond src buffer. Cc: Andrew Morton Cc: Chris Metcalf Cc: Kees Cook Signed-off-by: Eryu Guan --- lib/string.c | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/string.c b/lib/string.c index 64a9e33f1daa..13a0147eea00 100644 --- a/lib/string.c +++ b/lib/string.c @@ -179,6 +179,7 @@ ssize_t strscpy(char *dest, const char *src, size_t count) { const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS; size_t max = count; + size_t src_sz = strlen(src) + 1; long res = 0; if (count == 0) @@ -200,6 +201,10 @@ ssize_t strscpy(char *dest, const char *src, size_t count) max = 0; #endif + /* avoid reading beyond src buffer */ + if (max > src_sz) + max = src_sz; + while (max >= sizeof(unsigned long)) { unsigned long c, data; -- 2.14.3
Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
On Tue, Oct 31, 2017 at 01:10:41AM +0100, Fengguang Wu wrote: > Hi Eryu, > > On Mon, Oct 30, 2017 at 03:44:29PM +0800, Eryu Guan wrote: > > Hi Fengguang, > > > > On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote: > > > CC fsdevel. > > > > > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > > > Hi Linus, > > > > > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > > > > > They hit the RC release mainly due to various imperfections in 0day's > > > > auto bisection. So I manually list them here and CC the likely easy to > > > > debug ones to the corresponding maintainers in the followup emails. > > > > > > > > boot_successes: 4700 > > > > boot_failures: 247 > > > > > > [...] > > > > > > > WARNING:at_fs/direct-io.c:#dio_complete: 7 > > > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3 > > > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1 > > > > > > The first warning happens on btrfs and is bisected to this commit. > > > The other 2 warnings happen on xfs. > > > > I noticed that the warnings are triggered by generic/095 and > > generic/208, they're known to generate such warnings and I think these > > warnings are kind of 'known issue', please see comments above > > _filter_aiodio_dmesg() in fstests/common/filter. > > > > Please make sure your local fstests contains the following 3 commits: > > > > ca93e26865ab common: move _filter_xfs_dmesg() to common/filter > > 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg() > > 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs > > OK. > > > we filtered out such warnings in fstests on purpose so the affected > > tests won't fail because of such dmesg warnings. > > We may also teach 0day robot to ignore the warning when running the > above 2 fstests. > > The more generic way of filtering may be to inject a dmesg line like > >THIS TEST WILL TRIGGER KERNEL WARNING, PLEASE IGNORE. > > just before the specific tests startup. Then 3rd party dmesg parsing > scripts can handle such purpose-made warnings in a uniform way. fstests doesn't know, prior to the test, if the warnings the test is going to trigger are intentional or real bugs, fstests records the dmesg log and analyzes the log after test and reports PASS if all the warnings are intentional (based on the whitelist filter). But I think it's possible to insert such a message to dmesg *after* test if fstests finds that all the warnings are intentional. Does that work for 0day robot? Thanks, Eryu
Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
On Tue, Oct 31, 2017 at 01:10:41AM +0100, Fengguang Wu wrote: > Hi Eryu, > > On Mon, Oct 30, 2017 at 03:44:29PM +0800, Eryu Guan wrote: > > Hi Fengguang, > > > > On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote: > > > CC fsdevel. > > > > > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > > > Hi Linus, > > > > > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > > > > > They hit the RC release mainly due to various imperfections in 0day's > > > > auto bisection. So I manually list them here and CC the likely easy to > > > > debug ones to the corresponding maintainers in the followup emails. > > > > > > > > boot_successes: 4700 > > > > boot_failures: 247 > > > > > > [...] > > > > > > > WARNING:at_fs/direct-io.c:#dio_complete: 7 > > > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3 > > > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1 > > > > > > The first warning happens on btrfs and is bisected to this commit. > > > The other 2 warnings happen on xfs. > > > > I noticed that the warnings are triggered by generic/095 and > > generic/208, they're known to generate such warnings and I think these > > warnings are kind of 'known issue', please see comments above > > _filter_aiodio_dmesg() in fstests/common/filter. > > > > Please make sure your local fstests contains the following 3 commits: > > > > ca93e26865ab common: move _filter_xfs_dmesg() to common/filter > > 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg() > > 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs > > OK. > > > we filtered out such warnings in fstests on purpose so the affected > > tests won't fail because of such dmesg warnings. > > We may also teach 0day robot to ignore the warning when running the > above 2 fstests. > > The more generic way of filtering may be to inject a dmesg line like > >THIS TEST WILL TRIGGER KERNEL WARNING, PLEASE IGNORE. > > just before the specific tests startup. Then 3rd party dmesg parsing > scripts can handle such purpose-made warnings in a uniform way. fstests doesn't know, prior to the test, if the warnings the test is going to trigger are intentional or real bugs, fstests records the dmesg log and analyzes the log after test and reports PASS if all the warnings are intentional (based on the whitelist filter). But I think it's possible to insert such a message to dmesg *after* test if fstests finds that all the warnings are intentional. Does that work for 0day robot? Thanks, Eryu
Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
Hi Fengguang, On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote: > CC fsdevel. > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > Hi Linus, > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > They hit the RC release mainly due to various imperfections in 0day's > > auto bisection. So I manually list them here and CC the likely easy to > > debug ones to the corresponding maintainers in the followup emails. > > > > boot_successes: 4700 > > boot_failures: 247 > > [...] > > > WARNING:at_fs/direct-io.c:#dio_complete: 7 > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3 > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1 > > The first warning happens on btrfs and is bisected to this commit. > The other 2 warnings happen on xfs. I noticed that the warnings are triggered by generic/095 and generic/208, they're known to generate such warnings and I think these warnings are kind of 'known issue', please see comments above _filter_aiodio_dmesg() in fstests/common/filter. Please make sure your local fstests contains the following 3 commits: ca93e26865ab common: move _filter_xfs_dmesg() to common/filter 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg() 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs we filtered out such warnings in fstests on purpose so the affected tests won't fail because of such dmesg warnings. Thanks, Eryu > > commit 332391a9935da939319e473b4680e173df75afcf > Author: Lukas Czerner> AuthorDate: Thu Sep 21 08:16:29 2017 -0600 > Commit: Jens Axboe > CommitDate: Mon Sep 25 08:56:05 2017 -0600 > >fs: Fix page cache inconsistency when mixing buffered and AIO DIO > >Currently when mixing buffered reads and asynchronous direct writes it >is possible to end up with the situation where we have stale data in the >page cache while the new data is already written to disk. This is >permanent until the affected pages are flushed away. Despite the fact >that mixing buffered and direct IO is ill-advised it does pose a thread >for a data integrity, is unexpected and should be fixed. > >Fix this by deferring completion of asynchronous direct writes to a >process context in the case that there are mapped pages to be found in >the inode. Later before the completion in dio_complete() invalidate >the pages in question. This ensures that after the completion the pages >in the written area are either unmapped, or populated with up-to-date >data. Also do the same for the iomap case which uses >iomap_dio_complete() instead. > >This has a side effect of deferring the completion to a process context >for every AIO DIO that happens on inode that has pages mapped. However >since the consensus is that this is ill-advised practice the performance >implication should not be a problem. > >This was based on proposal from Jeff Moyer, thanks! > >Reviewed-by: Jan Kara >Reviewed-by: Darrick J. Wong >Reviewed-by: Jeff Moyer >Signed-off-by: Lukas Czerner >Signed-off-by: Jens Axboe > --- > fs/direct-io.c | 47 ++- > fs/iomap.c | 29 - > mm/filemap.c | 6 ++ > 3 files changed, 64 insertions(+), 18 deletions(-) > > The call traces are: > > [ 334.461955] BTRFS info (device vdb): has skinny extents > [ 334.463231] BTRFS info (device vdb): flagging fs with big metadata feature > [ 334.469746] BTRFS info (device vdb): creating UUID tree > [ 336.190840] [ cut here ] > [ 336.193338] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 > dio_complete+0x1d4/0x220 > [ 336.196925] Modules linked in: btrfs xor zstd_decompress zstd_compress > xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod > cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore > serio_raw parport_pc pcspkr parport floppy ata_piix libata i2c_piix4 ip_tables > [ 336.203746] CPU: 0 PID: 6379 Comm: kworker/0:0 Not tainted 4.14.0-rc6 #1 > [ 336.204799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [ 336.207480] Workqueue: dio/vdb dio_aio_complete_work > [ 336.208373] task: 880079094a80 task.stack: c995 > [ 336.210495] RIP: 0010:dio_complete+0x1d4/0x220 > [ 336.211288] RSP: 0018:c9953e20 EFLAGS: 00010286 > [ 336.213145] RAX: fff0 RBX: 8800aae8c780 RCX: > c9953c70 > [ 336.214232] RDX: 8000 RSI: 02b4 RDI: > 81cb9ccb > [ 336.216605] RBP: c9953e48 R08: R09: > 880079c8b6c0 > [ 336.217726] R10: 0074 R11: R12: > 2000 > [ 336.220141] R13: 2000 R14: 0003 R15: >
Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
Hi Fengguang, On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote: > CC fsdevel. > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > Hi Linus, > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > They hit the RC release mainly due to various imperfections in 0day's > > auto bisection. So I manually list them here and CC the likely easy to > > debug ones to the corresponding maintainers in the followup emails. > > > > boot_successes: 4700 > > boot_failures: 247 > > [...] > > > WARNING:at_fs/direct-io.c:#dio_complete: 7 > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3 > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1 > > The first warning happens on btrfs and is bisected to this commit. > The other 2 warnings happen on xfs. I noticed that the warnings are triggered by generic/095 and generic/208, they're known to generate such warnings and I think these warnings are kind of 'known issue', please see comments above _filter_aiodio_dmesg() in fstests/common/filter. Please make sure your local fstests contains the following 3 commits: ca93e26865ab common: move _filter_xfs_dmesg() to common/filter 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg() 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs we filtered out such warnings in fstests on purpose so the affected tests won't fail because of such dmesg warnings. Thanks, Eryu > > commit 332391a9935da939319e473b4680e173df75afcf > Author: Lukas Czerner > AuthorDate: Thu Sep 21 08:16:29 2017 -0600 > Commit: Jens Axboe > CommitDate: Mon Sep 25 08:56:05 2017 -0600 > >fs: Fix page cache inconsistency when mixing buffered and AIO DIO > >Currently when mixing buffered reads and asynchronous direct writes it >is possible to end up with the situation where we have stale data in the >page cache while the new data is already written to disk. This is >permanent until the affected pages are flushed away. Despite the fact >that mixing buffered and direct IO is ill-advised it does pose a thread >for a data integrity, is unexpected and should be fixed. > >Fix this by deferring completion of asynchronous direct writes to a >process context in the case that there are mapped pages to be found in >the inode. Later before the completion in dio_complete() invalidate >the pages in question. This ensures that after the completion the pages >in the written area are either unmapped, or populated with up-to-date >data. Also do the same for the iomap case which uses >iomap_dio_complete() instead. > >This has a side effect of deferring the completion to a process context >for every AIO DIO that happens on inode that has pages mapped. However >since the consensus is that this is ill-advised practice the performance >implication should not be a problem. > >This was based on proposal from Jeff Moyer, thanks! > >Reviewed-by: Jan Kara >Reviewed-by: Darrick J. Wong >Reviewed-by: Jeff Moyer >Signed-off-by: Lukas Czerner >Signed-off-by: Jens Axboe > --- > fs/direct-io.c | 47 ++- > fs/iomap.c | 29 - > mm/filemap.c | 6 ++ > 3 files changed, 64 insertions(+), 18 deletions(-) > > The call traces are: > > [ 334.461955] BTRFS info (device vdb): has skinny extents > [ 334.463231] BTRFS info (device vdb): flagging fs with big metadata feature > [ 334.469746] BTRFS info (device vdb): creating UUID tree > [ 336.190840] [ cut here ] > [ 336.193338] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 > dio_complete+0x1d4/0x220 > [ 336.196925] Modules linked in: btrfs xor zstd_decompress zstd_compress > xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod > cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore > serio_raw parport_pc pcspkr parport floppy ata_piix libata i2c_piix4 ip_tables > [ 336.203746] CPU: 0 PID: 6379 Comm: kworker/0:0 Not tainted 4.14.0-rc6 #1 > [ 336.204799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1 04/01/2014 > [ 336.207480] Workqueue: dio/vdb dio_aio_complete_work > [ 336.208373] task: 880079094a80 task.stack: c995 > [ 336.210495] RIP: 0010:dio_complete+0x1d4/0x220 > [ 336.211288] RSP: 0018:c9953e20 EFLAGS: 00010286 > [ 336.213145] RAX: fff0 RBX: 8800aae8c780 RCX: > c9953c70 > [ 336.214232] RDX: 8000 RSI: 02b4 RDI: > 81cb9ccb > [ 336.216605] RBP: c9953e48 R08: R09: > 880079c8b6c0 > [ 336.217726] R10: 0074 R11: R12: > 2000 > [ 336.220141] R13: 2000 R14: 0003 R15: > 00074000 > [ 336.221232] FS: () GS:88013fc0() > knlGS: > [ 336.223915] CS:
Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote: > On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote: > > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote: > > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote: > > > > Hello, > > > > > > > > One thing that comes up a lot every LSF is the fact that we have no > > > > general way > > > > that we do performance testing. Every fs developer has a set of > > > > scripts or > > > > things that they run with varying degrees of consistency, but nothing > > > > central > > > > that we all use. I for one am getting tired of finding regressions > > > > when we are > > > > deploying new kernels internally, so I wired this thing up to try and > > > > address > > > > this need. > > > > > > > > We all hate convoluted setups, the more brain power we have to put in > > > > to setting > > > > something up the less likely we are to use it, so I took the xfstests > > > > approach > > > > of making it relatively simple to get running and relatively easy to > > > > add new > > > > tests. For right now the only thing this framework does is run fio > > > > scripts. I > > > > chose fio because it already gathers loads of performance data about > > > > it's runs. > > > > We have everything we need there, latency, bandwidth, cpu time, and all > > > > broken > > > > down by reads, writes, and trims. I figure most of us are familiar > > > > enough with > > > > fio and how it works to make it relatively easy to add new tests to the > > > > framework. > > > > > > > > I've posted my code up on github, you can get it here > > > > > > > > https://github.com/josefbacik/fsperf > > > > > > > > All (well most) of the results from fio are stored in a local sqlite > > > > database. > > > > Right now the comparison stuff is very crude, it simply checks against > > > > the > > > > previous run and it only checks a few of the keys by default. You can > > > > check > > > > latency if you want, but while writing this stuff up it seemed that > > > > latency was > > > > too variable from run to run to be useful in a "did my thing regress or > > > > improve" > > > > sort of way. > > > > > > > > The configuration is brain dead simple, the README has examples. All > > > > you need > > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you > > > > are good > > > > to go. > > > > > > Why re-invent the test infrastructure? Why not just make it a > > > tests/perf subdir in fstests? > > > > > > > Probably should have led with that shouldn't I have? There's nothing > > keeping me > > from doing it, but I didn't want to try and shoehorn in a python thing into > > fstests. I need python to do the sqlite and the json parsing to dump into > > the > > sqlite database. > > > > Now if you (and others) are not opposed to this being dropped into > > tests/perf > > then I'll work that up. But it's definitely going to need to be done in > > python. > > I know you yourself have said you aren't opposed to using python in the > > past, so > > if that's still the case then I can definitely wire it all up. > > I have no problems with people using python for stuff like this but, > OTOH, I'm not the fstests maintainer anymore :P I have no problem either if python is really needed, after all this is a very useful infrastructure improvement. But the python version problem brought up by Ted made me a bit nervous, we need to work that round carefully. OTOH, I'm just curious, what is the specific reason that python is a hard requirement? If we can use perl, that'll be much easier for fstests. BTW, opinions from key fs developers/fstests users, like you, are also very important and welcomed :) Thanks, Eryu > > > > > The plan is to add lots of workloads as we discover regressions and > > > > such. We > > > > don't want anything that takes too long to run otherwise people won't > > > > run this, > > > > so the existing tests don't take much longer than a few minutes each. > > > > I will be > > > > adding some more comparison options so you can compare against averages > > > > of all > > > > previous runs and such. > > > > > > Yup, that fits exactly into what fstests is for... :P > > > > > > Integrating into fstests means it will be immediately available to > > > all fs developers, it'll run on everything that everyone already has > > > setup for filesystem testing, and it will have familiar mkfs/mount > > > option setup behaviour so there's no new hoops for everyone to jump > > > through to run it... > > > > > > > TBF I specifically made it as easy as possible because I know we all hate > > trying > > to learn new shit. > > Yeah, it's also hard to get people to change their workflows to add > a whole new test harness into them. It's easy if it's just a new > command to an existing workflow :P > > > I figured this was different enough to warrant a separate > > project, especially since I'm going to add block
Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote: > On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote: > > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote: > > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote: > > > > Hello, > > > > > > > > One thing that comes up a lot every LSF is the fact that we have no > > > > general way > > > > that we do performance testing. Every fs developer has a set of > > > > scripts or > > > > things that they run with varying degrees of consistency, but nothing > > > > central > > > > that we all use. I for one am getting tired of finding regressions > > > > when we are > > > > deploying new kernels internally, so I wired this thing up to try and > > > > address > > > > this need. > > > > > > > > We all hate convoluted setups, the more brain power we have to put in > > > > to setting > > > > something up the less likely we are to use it, so I took the xfstests > > > > approach > > > > of making it relatively simple to get running and relatively easy to > > > > add new > > > > tests. For right now the only thing this framework does is run fio > > > > scripts. I > > > > chose fio because it already gathers loads of performance data about > > > > it's runs. > > > > We have everything we need there, latency, bandwidth, cpu time, and all > > > > broken > > > > down by reads, writes, and trims. I figure most of us are familiar > > > > enough with > > > > fio and how it works to make it relatively easy to add new tests to the > > > > framework. > > > > > > > > I've posted my code up on github, you can get it here > > > > > > > > https://github.com/josefbacik/fsperf > > > > > > > > All (well most) of the results from fio are stored in a local sqlite > > > > database. > > > > Right now the comparison stuff is very crude, it simply checks against > > > > the > > > > previous run and it only checks a few of the keys by default. You can > > > > check > > > > latency if you want, but while writing this stuff up it seemed that > > > > latency was > > > > too variable from run to run to be useful in a "did my thing regress or > > > > improve" > > > > sort of way. > > > > > > > > The configuration is brain dead simple, the README has examples. All > > > > you need > > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you > > > > are good > > > > to go. > > > > > > Why re-invent the test infrastructure? Why not just make it a > > > tests/perf subdir in fstests? > > > > > > > Probably should have led with that shouldn't I have? There's nothing > > keeping me > > from doing it, but I didn't want to try and shoehorn in a python thing into > > fstests. I need python to do the sqlite and the json parsing to dump into > > the > > sqlite database. > > > > Now if you (and others) are not opposed to this being dropped into > > tests/perf > > then I'll work that up. But it's definitely going to need to be done in > > python. > > I know you yourself have said you aren't opposed to using python in the > > past, so > > if that's still the case then I can definitely wire it all up. > > I have no problems with people using python for stuff like this but, > OTOH, I'm not the fstests maintainer anymore :P I have no problem either if python is really needed, after all this is a very useful infrastructure improvement. But the python version problem brought up by Ted made me a bit nervous, we need to work that round carefully. OTOH, I'm just curious, what is the specific reason that python is a hard requirement? If we can use perl, that'll be much easier for fstests. BTW, opinions from key fs developers/fstests users, like you, are also very important and welcomed :) Thanks, Eryu > > > > > The plan is to add lots of workloads as we discover regressions and > > > > such. We > > > > don't want anything that takes too long to run otherwise people won't > > > > run this, > > > > so the existing tests don't take much longer than a few minutes each. > > > > I will be > > > > adding some more comparison options so you can compare against averages > > > > of all > > > > previous runs and such. > > > > > > Yup, that fits exactly into what fstests is for... :P > > > > > > Integrating into fstests means it will be immediately available to > > > all fs developers, it'll run on everything that everyone already has > > > setup for filesystem testing, and it will have familiar mkfs/mount > > > option setup behaviour so there's no new hoops for everyone to jump > > > through to run it... > > > > > > > TBF I specifically made it as easy as possible because I know we all hate > > trying > > to learn new shit. > > Yeah, it's also hard to get people to change their workflows to add > a whole new test harness into them. It's easy if it's just a new > command to an existing workflow :P > > > I figured this was different enough to warrant a separate > > project, especially since I'm going to add block
Re: [PATCH] generic: Add nocheck shutdown stress test
On Thu, Sep 21, 2017 at 02:34:43PM -0700, Khazhismel Kumykov wrote: > Most shutdown tests only run on filesystems with metadata journaling, so > we lose coverage. Add a shutdown stress test that doesn't check for > consistency, so does not require journaling. > > Signed-off-by: Khazhismel KumykovTest looks reasonable to me, some minor issues inline. > --- > tests/generic/999 | 84 > +++ > tests/generic/999.out | 2 ++ > tests/generic/group | 1 + > 3 files changed, 87 insertions(+) > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > > diff --git a/tests/generic/999 b/tests/generic/999 > new file mode 100755 > index ..71b9aa4c > --- /dev/null > +++ b/tests/generic/999 > @@ -0,0 +1,84 @@ > +#!/bin/bash > +# FS QA Test No. 999 > +# > +# Shutdown stress test - exercise shutdown codepath with fsstress, > +# make sure we don't BUG/WARN. Coverage for all fs with shutdown. > +# > +#--- > +# Copyright (c) 2013 Red Hat, Inc. All Rights Reserved. > +# Copyright (c) 2017 Google, Inc. All Rights Reserved. Why Red Hat copyright? Is this based on a RH test? > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +# > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +_cleanup() > +{ > + cd / > + _scratch_unmount 2>/dev/null > + rm -f $tmp.* > +} > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > + > +_require_scratch_nocheck > +_require_scratch_shutdown > +_require_command "$KILLALL_PROG" killall > + > +rm -f $seqres.full > + > +_scratch_mkfs > $seqres.full 2>&1 > +_scratch_mount > + > +SLEEP_TIME=$((10 * $TIME_FACTOR)) > +PROCS=$((4 * LOAD_FACTOR)) > + > +load_dir=$SCRATCH_MNT/test > + > +$XFS_FSR_PROG -v $load_dir >> $seqres.full 2>&1 $load_dir is not created yet, xfs_fsr always fails with ENOENT. But even with $load_dir created, it seems xfs_fsr only accepts a block device name or a file name or the XFS mount point. And this only works for XFS. Further more, I don't think we need to do defrag here, it's a newly created filesystem and xfs_fsr (or e4defrag) exits quickly, as there's nothing to defrag. > +$FSSTRESS_PROG $FSSTRESS_AVOID -n1000 -p $PROCS -d $load_dir >> > $seqres.full 2>&1 & > +sleep $SLEEP_TIME > +sync > + > +# now shutdown and unmount > +sleep 5 > +$here/src/godown $load_dir > +$KILLALL_PROG -q $FSSTRESS_PROG > +wait > + > +# for some reason fsstress processes manage to live on beyond the wait? > +sleep 5 > + > +_scratch_unmount > + > +echo "No output is good. Failures are loud." Better to be consistent with other tests here :) echo "Silence is golden" Thanks, Eryu > + > +status=0 > +exit > diff --git a/tests/generic/999.out b/tests/generic/999.out > new file mode 100644 > index ..68a51fe7 > --- /dev/null > +++ b/tests/generic/999.out > @@ -0,0 +1,2 @@ > +QA output created by 999 > +No output is good. Failures are loud. > diff --git a/tests/generic/group b/tests/generic/group > index f922b496..891386ac 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -463,3 +463,4 @@ > 458 auto quick clone > 459 auto dangerous > 460 auto quick rw > +999 auto shutdown stress > -- > 2.14.1.821.g8fa685d3b7-goog >
Re: [PATCH] generic: Add nocheck shutdown stress test
On Thu, Sep 21, 2017 at 02:34:43PM -0700, Khazhismel Kumykov wrote: > Most shutdown tests only run on filesystems with metadata journaling, so > we lose coverage. Add a shutdown stress test that doesn't check for > consistency, so does not require journaling. > > Signed-off-by: Khazhismel Kumykov Test looks reasonable to me, some minor issues inline. > --- > tests/generic/999 | 84 > +++ > tests/generic/999.out | 2 ++ > tests/generic/group | 1 + > 3 files changed, 87 insertions(+) > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > > diff --git a/tests/generic/999 b/tests/generic/999 > new file mode 100755 > index ..71b9aa4c > --- /dev/null > +++ b/tests/generic/999 > @@ -0,0 +1,84 @@ > +#!/bin/bash > +# FS QA Test No. 999 > +# > +# Shutdown stress test - exercise shutdown codepath with fsstress, > +# make sure we don't BUG/WARN. Coverage for all fs with shutdown. > +# > +#--- > +# Copyright (c) 2013 Red Hat, Inc. All Rights Reserved. > +# Copyright (c) 2017 Google, Inc. All Rights Reserved. Why Red Hat copyright? Is this based on a RH test? > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +# > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +_cleanup() > +{ > + cd / > + _scratch_unmount 2>/dev/null > + rm -f $tmp.* > +} > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > + > +_require_scratch_nocheck > +_require_scratch_shutdown > +_require_command "$KILLALL_PROG" killall > + > +rm -f $seqres.full > + > +_scratch_mkfs > $seqres.full 2>&1 > +_scratch_mount > + > +SLEEP_TIME=$((10 * $TIME_FACTOR)) > +PROCS=$((4 * LOAD_FACTOR)) > + > +load_dir=$SCRATCH_MNT/test > + > +$XFS_FSR_PROG -v $load_dir >> $seqres.full 2>&1 $load_dir is not created yet, xfs_fsr always fails with ENOENT. But even with $load_dir created, it seems xfs_fsr only accepts a block device name or a file name or the XFS mount point. And this only works for XFS. Further more, I don't think we need to do defrag here, it's a newly created filesystem and xfs_fsr (or e4defrag) exits quickly, as there's nothing to defrag. > +$FSSTRESS_PROG $FSSTRESS_AVOID -n1000 -p $PROCS -d $load_dir >> > $seqres.full 2>&1 & > +sleep $SLEEP_TIME > +sync > + > +# now shutdown and unmount > +sleep 5 > +$here/src/godown $load_dir > +$KILLALL_PROG -q $FSSTRESS_PROG > +wait > + > +# for some reason fsstress processes manage to live on beyond the wait? > +sleep 5 > + > +_scratch_unmount > + > +echo "No output is good. Failures are loud." Better to be consistent with other tests here :) echo "Silence is golden" Thanks, Eryu > + > +status=0 > +exit > diff --git a/tests/generic/999.out b/tests/generic/999.out > new file mode 100644 > index ..68a51fe7 > --- /dev/null > +++ b/tests/generic/999.out > @@ -0,0 +1,2 @@ > +QA output created by 999 > +No output is good. Failures are loud. > diff --git a/tests/generic/group b/tests/generic/group > index f922b496..891386ac 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -463,3 +463,4 @@ > 458 auto quick clone > 459 auto dangerous > 460 auto quick rw > +999 auto shutdown stress > -- > 2.14.1.821.g8fa685d3b7-goog >
Re: [PATCH 3/3] ext4: test for inline data + DAX corruption
On Mon, Sep 11, 2017 at 10:45:21PM -0600, Ross Zwisler wrote: > Add a regression test for the following kernel commit: > > ext4: prevent data corruption with inline data + DAX > > The test passes either if we don't encounter corruption, or if mounting > with DAX + inline data fails. The latter is the way that we prevent this > issue in the kernel. > > Signed-off-by: Ross ZwislerBesides the gitignore entry order issue and call the test program from $here/src/... issue, need another require rule: _require_ext4_mkfs_feature "inline_data" Otherwise this looks fine to me. Thanks, Eryu
Re: [PATCH 3/3] ext4: test for inline data + DAX corruption
On Mon, Sep 11, 2017 at 10:45:21PM -0600, Ross Zwisler wrote: > Add a regression test for the following kernel commit: > > ext4: prevent data corruption with inline data + DAX > > The test passes either if we don't encounter corruption, or if mounting > with DAX + inline data fails. The latter is the way that we prevent this > issue in the kernel. > > Signed-off-by: Ross Zwisler Besides the gitignore entry order issue and call the test program from $here/src/... issue, need another require rule: _require_ext4_mkfs_feature "inline_data" Otherwise this looks fine to me. Thanks, Eryu
Re: [PATCH 2/3] ext4: test for DAX + journaling corruption
On Mon, Sep 11, 2017 at 10:45:20PM -0600, Ross Zwisler wrote: > Add a regression test for the following kernel commit: > > ext4: prevent data corruption with journaling + DAX > > The test passes if either we successfully compare the data between the mmap > with journaling turned on and the one with journaling turned off, or if we > fail the chattr command to turn on or off journaling. The latter is how we > prevent this issue in the kernel. Yeah, I noticed that mounting ext4 with "-o dax,data=journal" is not allowed, enabling data journaling on a dax mount should be stopped too. > > Signed-off-by: Ross Zwisler> --- > .gitignore | 1 + > src/Makefile| 2 +- > src/t_ext4_dax_journal_corruption.c | 93 > + > tests/ext4/030 | 68 +++ > tests/ext4/030.out | 2 + > tests/ext4/group| 1 + > 6 files changed, 166 insertions(+), 1 deletion(-) > create mode 100644 src/t_ext4_dax_journal_corruption.c > create mode 100755 tests/ext4/030 > create mode 100644 tests/ext4/030.out > > diff --git a/.gitignore b/.gitignore > index 2accc37..4bdc5bf 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -154,6 +154,7 @@ > /src/t_mmap_stale_pmd > /src/t_mmap_cow_race > /src/t_mmap_fallocate > +/src/t_ext4_dax_journal_corruption Better to add new entry in alphabetical order, I know there're already some out-of-order entries there, but this one is not affected and better to stop adding new ones :) > > # dmapi/ binaries > /dmapi/src/common/cmd/read_invis > diff --git a/src/Makefile b/src/Makefile > index b8aff49..e6558e2 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \ > multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \ > t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \ > holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \ > - t_mmap_cow_race t_mmap_fallocate fsync-err > + t_mmap_cow_race t_mmap_fallocate fsync-err t_ext4_dax_journal_corruption > > LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ > preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \ > diff --git a/src/t_ext4_dax_journal_corruption.c > b/src/t_ext4_dax_journal_corruption.c > new file mode 100644 > index 000..e0d63f8 > --- /dev/null > +++ b/src/t_ext4_dax_journal_corruption.c > @@ -0,0 +1,93 @@ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define PAGE(a) ((a)*0x1000) > +#define STRLEN 256 > + > +void err_exit(char *op) > +{ > + fprintf(stderr, "%s: %s\n", op, strerror(errno)); > + exit(1); > +} > + > +void chattr_cmd(char *chattr, char *cmd, char *file) > +{ > + int ret; > + char command[STRLEN]; > + > + ret = snprintf(command, STRLEN, "%s %s %s 2>/dev/null", chattr, cmd, > file); > + if (ret < 0) > + err_exit("snprintf"); > + > + ret = system(command); > + if (ret) /* Success - the kernel fix is to have this chattr fail */ > + exit(77); > +} > + > +int main(int argc, char *argv[]) > +{ > + int fd, err, len = PAGE(1); > + char *data, *dax_data, *chattr, *file; > + char string[STRLEN]; > + > + if (argc < 3) { > + printf("Usage: %s \n", > basename(argv[0])); > + exit(0); > + } > + > + chattr = argv[1]; > + file = argv[2]; > + > + srand(time(NULL)); > + snprintf(string, STRLEN, "random number %d\n", rand()); > + > + fd = open(file, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR); > + if (fd < 0) > + err_exit("fd"); > + > + /* begin with journaling off and DAX on */ > + chattr_cmd(chattr, "-j", file); > + > + ftruncate(fd, 0); > + fallocate(fd, 0, 0, len); > + > + dax_data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0); > + if (!dax_data) > + err_exit("mmap dax_data"); > + > + /* turns on journaling, and turns off DAX */ > + chattr_cmd(chattr, "+j", file); I'm a bit confused here, just from the test code, it's not obvious to me how DAX is turned off. I looked at the kernel code and there's a comment saying: "Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated. E.g. S_DAX may get cleared / set." But isn't the per-inode dax flag proposal rejected? Anyway, some comments to explain how is DAX being turned off would be good here. > + > + data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > + if (!data) > + err_exit("mmap data"); > + > + /* > + * Write the data using the non-DAX mapping, and try and read it back > + * using the DAX mapping. > + */ > + strcpy(data, string); > + if (strcmp(dax_data, string) != 0) > +
Re: [PATCH 2/3] ext4: test for DAX + journaling corruption
On Mon, Sep 11, 2017 at 10:45:20PM -0600, Ross Zwisler wrote: > Add a regression test for the following kernel commit: > > ext4: prevent data corruption with journaling + DAX > > The test passes if either we successfully compare the data between the mmap > with journaling turned on and the one with journaling turned off, or if we > fail the chattr command to turn on or off journaling. The latter is how we > prevent this issue in the kernel. Yeah, I noticed that mounting ext4 with "-o dax,data=journal" is not allowed, enabling data journaling on a dax mount should be stopped too. > > Signed-off-by: Ross Zwisler > --- > .gitignore | 1 + > src/Makefile| 2 +- > src/t_ext4_dax_journal_corruption.c | 93 > + > tests/ext4/030 | 68 +++ > tests/ext4/030.out | 2 + > tests/ext4/group| 1 + > 6 files changed, 166 insertions(+), 1 deletion(-) > create mode 100644 src/t_ext4_dax_journal_corruption.c > create mode 100755 tests/ext4/030 > create mode 100644 tests/ext4/030.out > > diff --git a/.gitignore b/.gitignore > index 2accc37..4bdc5bf 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -154,6 +154,7 @@ > /src/t_mmap_stale_pmd > /src/t_mmap_cow_race > /src/t_mmap_fallocate > +/src/t_ext4_dax_journal_corruption Better to add new entry in alphabetical order, I know there're already some out-of-order entries there, but this one is not affected and better to stop adding new ones :) > > # dmapi/ binaries > /dmapi/src/common/cmd/read_invis > diff --git a/src/Makefile b/src/Makefile > index b8aff49..e6558e2 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \ > multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \ > t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \ > holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \ > - t_mmap_cow_race t_mmap_fallocate fsync-err > + t_mmap_cow_race t_mmap_fallocate fsync-err t_ext4_dax_journal_corruption > > LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ > preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \ > diff --git a/src/t_ext4_dax_journal_corruption.c > b/src/t_ext4_dax_journal_corruption.c > new file mode 100644 > index 000..e0d63f8 > --- /dev/null > +++ b/src/t_ext4_dax_journal_corruption.c > @@ -0,0 +1,93 @@ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define PAGE(a) ((a)*0x1000) > +#define STRLEN 256 > + > +void err_exit(char *op) > +{ > + fprintf(stderr, "%s: %s\n", op, strerror(errno)); > + exit(1); > +} > + > +void chattr_cmd(char *chattr, char *cmd, char *file) > +{ > + int ret; > + char command[STRLEN]; > + > + ret = snprintf(command, STRLEN, "%s %s %s 2>/dev/null", chattr, cmd, > file); > + if (ret < 0) > + err_exit("snprintf"); > + > + ret = system(command); > + if (ret) /* Success - the kernel fix is to have this chattr fail */ > + exit(77); > +} > + > +int main(int argc, char *argv[]) > +{ > + int fd, err, len = PAGE(1); > + char *data, *dax_data, *chattr, *file; > + char string[STRLEN]; > + > + if (argc < 3) { > + printf("Usage: %s \n", > basename(argv[0])); > + exit(0); > + } > + > + chattr = argv[1]; > + file = argv[2]; > + > + srand(time(NULL)); > + snprintf(string, STRLEN, "random number %d\n", rand()); > + > + fd = open(file, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR); > + if (fd < 0) > + err_exit("fd"); > + > + /* begin with journaling off and DAX on */ > + chattr_cmd(chattr, "-j", file); > + > + ftruncate(fd, 0); > + fallocate(fd, 0, 0, len); > + > + dax_data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0); > + if (!dax_data) > + err_exit("mmap dax_data"); > + > + /* turns on journaling, and turns off DAX */ > + chattr_cmd(chattr, "+j", file); I'm a bit confused here, just from the test code, it's not obvious to me how DAX is turned off. I looked at the kernel code and there's a comment saying: "Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was updated. E.g. S_DAX may get cleared / set." But isn't the per-inode dax flag proposal rejected? Anyway, some comments to explain how is DAX being turned off would be good here. > + > + data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > + if (!data) > + err_exit("mmap data"); > + > + /* > + * Write the data using the non-DAX mapping, and try and read it back > + * using the DAX mapping. > + */ > + strcpy(data, string); > + if (strcmp(dax_data, string) != 0) > + printf("Data miscompare\n");
Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df
On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote: > Hi, > > ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on > ppc64le. From the calltrace > I guess the reason is probably that the tests mount ext2 file system > using ext4 driver. > > A simple way to reproduce: > > # dd of=wangli if=/dev/zero count=1024 bs=1024 > # mkfs -t ext2 wangli > # mount -t ext4 wangli /mnt/ I can't reproduce this crash either by your reproducer nor by ltp access04 test on ppc64le host. > > > Are there any new changes in ext4 (on kernel-4.12-rc4) recently? I don't think it's an ext4 bug, I've seen similar crashes twice in 4.12-rc4 kernel testings, once testing XFS running fstests, and once running ltp on ext3. But it seems not related to filesystem code. [ 828.119270] run fstests generic/034 at 2017-06-06 19:16:10 [ 828.720341] XFS (sda5): Unmounting Filesystem [ 828.814003] device-mapper: uevent: version 1.0.3 [ 828.814096] Unable to handle kernel paging request for unaligned access at address 0xc001c52c5e7f [ 828.814103] Faulting instruction address: 0xc04d214c [ 828.814109] Oops: Kernel access of bad area, sig: 7 [#1] [ 828.814113] SMP NR_CPUS=2048 [ 828.814114] NUMA [ 828.814117] pSeries [ 828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 [ 828.814155] task: c003fe13c800 task.stack: c0046ec68000 [ 828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 [ 828.814168] REGS: c0046ec6b3d0 TRAP: 0600 Not tainted (4.12.0-rc4) [ 828.814173] MSR: 80010280b033[ 828.814184] CR: 28228244 XER: 0005 [ 828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR: SOFTE: 0 [ 828.814191] GPR00: c011c848 c0046ec6b650 c1049100 c003f3b77020 [ 828.814191] GPR04: c003f3b77020 c001c52c5e7f 0001 [ 828.814191] GPR08: 0008f92d89943c42 002448b7 0008 [ 828.814191] GPR12: c0130900 cfac6900 d7dd3908 d7dd3908 [ 828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 [ 828.814191] GPR20: 52f8 4000 c0cc5780 [ 828.814191] GPR24: 0001c45ffc5f 0001c45ffc5f c107dd00 [ 828.814191] GPR28: c003f3b77834 0004 0800 c003f3b77000 [ 828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 [ 828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 [ 828.814268] Call Trace: [ 828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 (unreliable) [ 828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 [ 828.814290] [c0046ec6b770] [c01059dc] alloc_unbound_pwq+0x3bc/0x4c0 [ 828.814296] [c0046ec6b7d0] [c010601c] apply_wqattrs_prepare+0x2ac/0x320 [ 828.814304] [c0046ec6b840] [c01060cc] apply_workqueue_attrs_locked+0x3c/0xa0 [ 828.814313] [c0046ec6b870] [c010662c] apply_workqueue_attrs+0x4c/0x80 [ 828.814322] [c0046ec6b8b0] [c01081cc] __alloc_workqueue_key+0x16c/0x4e0 [ 828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 [dm_mod] [ 828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] [ 828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 [ 828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 [ 828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 [ 828.814414] [c0046ec6bd30] [c0198388] SyS_finit_module+0xa8/0x110 [ 828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 [ 828.814429] Instruction dump: [ 828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 6000 [ 828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad [ 828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked out a reliable reproducer (unless you can reliably reproduce it with your reproducer :). Thanks, Eryu
Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df
On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote: > Hi, > > ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on > ppc64le. From the calltrace > I guess the reason is probably that the tests mount ext2 file system > using ext4 driver. > > A simple way to reproduce: > > # dd of=wangli if=/dev/zero count=1024 bs=1024 > # mkfs -t ext2 wangli > # mount -t ext4 wangli /mnt/ I can't reproduce this crash either by your reproducer nor by ltp access04 test on ppc64le host. > > > Are there any new changes in ext4 (on kernel-4.12-rc4) recently? I don't think it's an ext4 bug, I've seen similar crashes twice in 4.12-rc4 kernel testings, once testing XFS running fstests, and once running ltp on ext3. But it seems not related to filesystem code. [ 828.119270] run fstests generic/034 at 2017-06-06 19:16:10 [ 828.720341] XFS (sda5): Unmounting Filesystem [ 828.814003] device-mapper: uevent: version 1.0.3 [ 828.814096] Unable to handle kernel paging request for unaligned access at address 0xc001c52c5e7f [ 828.814103] Faulting instruction address: 0xc04d214c [ 828.814109] Oops: Kernel access of bad area, sig: 7 [#1] [ 828.814113] SMP NR_CPUS=2048 [ 828.814114] NUMA [ 828.814117] pSeries [ 828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 [ 828.814155] task: c003fe13c800 task.stack: c0046ec68000 [ 828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 [ 828.814168] REGS: c0046ec6b3d0 TRAP: 0600 Not tainted (4.12.0-rc4) [ 828.814173] MSR: 80010280b033 [ 828.814184] CR: 28228244 XER: 0005 [ 828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR: SOFTE: 0 [ 828.814191] GPR00: c011c848 c0046ec6b650 c1049100 c003f3b77020 [ 828.814191] GPR04: c003f3b77020 c001c52c5e7f 0001 [ 828.814191] GPR08: 0008f92d89943c42 002448b7 0008 [ 828.814191] GPR12: c0130900 cfac6900 d7dd3908 d7dd3908 [ 828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 [ 828.814191] GPR20: 52f8 4000 c0cc5780 [ 828.814191] GPR24: 0001c45ffc5f 0001c45ffc5f c107dd00 [ 828.814191] GPR28: c003f3b77834 0004 0800 c003f3b77000 [ 828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 [ 828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 [ 828.814268] Call Trace: [ 828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 (unreliable) [ 828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 [ 828.814290] [c0046ec6b770] [c01059dc] alloc_unbound_pwq+0x3bc/0x4c0 [ 828.814296] [c0046ec6b7d0] [c010601c] apply_wqattrs_prepare+0x2ac/0x320 [ 828.814304] [c0046ec6b840] [c01060cc] apply_workqueue_attrs_locked+0x3c/0xa0 [ 828.814313] [c0046ec6b870] [c010662c] apply_workqueue_attrs+0x4c/0x80 [ 828.814322] [c0046ec6b8b0] [c01081cc] __alloc_workqueue_key+0x16c/0x4e0 [ 828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 [dm_mod] [ 828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] [ 828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 [ 828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 [ 828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 [ 828.814414] [c0046ec6bd30] [c0198388] SyS_finit_module+0xa8/0x110 [ 828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 [ 828.814429] Instruction dump: [ 828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 6000 [ 828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad [ 828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked out a reliable reproducer (unless you can reliably reproduce it with your reproducer :). Thanks, Eryu
Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test
On Tue, Jun 06, 2017 at 06:15:57AM -0400, Jeff Layton wrote: > On Tue, 2017-06-06 at 16:58 +0800, Eryu Guan wrote: > > On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote: > > > I'm working on a set of kernel patches to change how writeback errors > > > are handled and reported in the kernel. Instead of reporting a > > > writeback error to only the first fsync caller on the file, I aim > > > to make the kernel report them once on every file description. > > > > > > This patch adds a test for the new behavior. Basically, open many fds > > > to the same file, turn on dm_error, write to each of the fds, and then > > > fsync them all to ensure that they all get an error back. > > > > > > To do that, I'm adding a new tools/dmerror script that the C program > > > can use to load the error table. For now, that's all it can do, but > > > we can fill it out with other commands as necessary. > > > > > > Signed-off-by: Jeff Layton <jlay...@redhat.com> > > > > Thanks for the new tests! And sorry for the late review.. > > > > It's testing a new behavior on error reporting on writeback, I'm not > > sure if we can call it a new feature or it fixed a bug? But it's more > > like a behavior change, I'm not sure how to categorize it. > > > > Because if it's testing a new feature, we usually let test do proper > > detection of current test environment (based on actual behavior not > > kernel version) and _notrun on filesystems that don't have this feature > > yet, instead of failing the test; if it's testing a bug fix, we could > > leave the test fail on unfixed filesystems, this also serves as a > > reminder that there's bug to fix. > > > > Thanks for the review! I'm not sure how to categorize this either. Since > the plan is to convert all the filesystems piecemeal, maybe we should > just consider it a new feature. Then we need a new _require rule to properly detect for the 'feature' support. I'm not sure if this is doable, but something like _require_statx, _require_seek_data_hole would be good. > > > I pulled your test kernel tree, and test passed on EXT4 but failed on > > other local filesystems (XFS, btrfs). I assume that's expected. > > > > Besides this kinda high-level question, some minor comments inline. > > > > Yes, ext4 should pass on my latest kernel tree, but everything else > should fail. With the new _require rule, test should _notrun on XFS and btrfs then. > > > > --- > > > common/dmerror | 13 ++-- > > > doc/auxiliary-programs.txt | 8 +++ > > > src/Makefile | 2 +- > > > src/fsync-err.c| 161 > > > + > > > > New binary needs an entry in .gitignore file. > > > > OK, thanks. Will fix. > > > > tests/generic/999 | 76 + > > > tests/generic/999.out | 3 + > > > tests/generic/group| 1 + > > > tools/dmerror | 44 + > > > > This file is used by the test, then it should be in src/ directory and > > be installed along with other executable files on "make install". > > Because files under tools/ are not installed. Most people will run tests > > in the root dir of xfstests and this is not a problem, but there're > > still cases people do "make && make install" and run fstests from > > /var/lib/xfstests (default installation target). > > > > Ok, no problem. I'll move it. I wasn't sure here since dmerror is a > shell script, and most of the stuff in src/ is stuff that needs to be > built. We do have a few perl or shell scripts in src/ dir, we can see this from src/Makefile $(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh $(PKG_LIB_DIR)/src > > > > 8 files changed, 302 insertions(+), 6 deletions(-) > > > create mode 100644 src/fsync-err.c > > > create mode 100755 tests/generic/999 > > > create mode 100644 tests/generic/999.out > > > create mode 100755 tools/dmerror > > > > > > diff --git a/common/dmerror b/common/dmerror > > > index d46c5d0b7266..238baa213b1f 100644 > > > --- a/common/dmerror > > > +++ b/common/dmerror > > > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then > > > _notrun "Cannot run tests with DAX on dmerror devices" > > > fi > > > > > > -_dmerror_init() > > > +_dmerror_setup() > > > { > > > local dm_backing_dev=$SCRATCH_DEV &
Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test
On Tue, Jun 06, 2017 at 06:15:57AM -0400, Jeff Layton wrote: > On Tue, 2017-06-06 at 16:58 +0800, Eryu Guan wrote: > > On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote: > > > I'm working on a set of kernel patches to change how writeback errors > > > are handled and reported in the kernel. Instead of reporting a > > > writeback error to only the first fsync caller on the file, I aim > > > to make the kernel report them once on every file description. > > > > > > This patch adds a test for the new behavior. Basically, open many fds > > > to the same file, turn on dm_error, write to each of the fds, and then > > > fsync them all to ensure that they all get an error back. > > > > > > To do that, I'm adding a new tools/dmerror script that the C program > > > can use to load the error table. For now, that's all it can do, but > > > we can fill it out with other commands as necessary. > > > > > > Signed-off-by: Jeff Layton > > > > Thanks for the new tests! And sorry for the late review.. > > > > It's testing a new behavior on error reporting on writeback, I'm not > > sure if we can call it a new feature or it fixed a bug? But it's more > > like a behavior change, I'm not sure how to categorize it. > > > > Because if it's testing a new feature, we usually let test do proper > > detection of current test environment (based on actual behavior not > > kernel version) and _notrun on filesystems that don't have this feature > > yet, instead of failing the test; if it's testing a bug fix, we could > > leave the test fail on unfixed filesystems, this also serves as a > > reminder that there's bug to fix. > > > > Thanks for the review! I'm not sure how to categorize this either. Since > the plan is to convert all the filesystems piecemeal, maybe we should > just consider it a new feature. Then we need a new _require rule to properly detect for the 'feature' support. I'm not sure if this is doable, but something like _require_statx, _require_seek_data_hole would be good. > > > I pulled your test kernel tree, and test passed on EXT4 but failed on > > other local filesystems (XFS, btrfs). I assume that's expected. > > > > Besides this kinda high-level question, some minor comments inline. > > > > Yes, ext4 should pass on my latest kernel tree, but everything else > should fail. With the new _require rule, test should _notrun on XFS and btrfs then. > > > > --- > > > common/dmerror | 13 ++-- > > > doc/auxiliary-programs.txt | 8 +++ > > > src/Makefile | 2 +- > > > src/fsync-err.c| 161 > > > + > > > > New binary needs an entry in .gitignore file. > > > > OK, thanks. Will fix. > > > > tests/generic/999 | 76 + > > > tests/generic/999.out | 3 + > > > tests/generic/group| 1 + > > > tools/dmerror | 44 + > > > > This file is used by the test, then it should be in src/ directory and > > be installed along with other executable files on "make install". > > Because files under tools/ are not installed. Most people will run tests > > in the root dir of xfstests and this is not a problem, but there're > > still cases people do "make && make install" and run fstests from > > /var/lib/xfstests (default installation target). > > > > Ok, no problem. I'll move it. I wasn't sure here since dmerror is a > shell script, and most of the stuff in src/ is stuff that needs to be > built. We do have a few perl or shell scripts in src/ dir, we can see this from src/Makefile $(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh $(PKG_LIB_DIR)/src > > > > 8 files changed, 302 insertions(+), 6 deletions(-) > > > create mode 100644 src/fsync-err.c > > > create mode 100755 tests/generic/999 > > > create mode 100644 tests/generic/999.out > > > create mode 100755 tools/dmerror > > > > > > diff --git a/common/dmerror b/common/dmerror > > > index d46c5d0b7266..238baa213b1f 100644 > > > --- a/common/dmerror > > > +++ b/common/dmerror > > > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then > > > _notrun "Cannot run tests with DAX on dmerror devices" > > > fi > > > > > > -_dmerror_init() > > > +_dmerror_setup() > > > { > > > local dm_backing_dev=$SCRATCH_DEV > &g
Re: [xfstests PATCH v3 5/5] btrfs: allow it to use $SCRATCH_LOGDEV
On Wed, May 31, 2017 at 09:08:20AM -0400, Jeff Layton wrote: > With btrfs, we can't really put the log on a separate device. What we > can do however is mirror the metadata across two devices and make the > data striped across all devices. When we turn on dmerror then the > metadata can fall back to using the other mirror while the data errors > out. > > Note that the current incarnation of btrfs has a fixed 64k stripe > width. If that ever changes or becomes settable, we may need to adjust > the amount of data that the test program writes. > > Signed-off-by: Jeff Layton> --- > common/rc | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/common/rc b/common/rc > index 83765aacfb06..078270451b53 100644 > --- a/common/rc > +++ b/common/rc > @@ -830,6 +830,8 @@ _scratch_mkfs() > ;; > btrfs) > mkfs_cmd="$MKFS_BTRFS_PROG" > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + mkfs_cmd="$mkfs_cmd -d raid0 -m raid1 $SCRATCH_LOGDEV" I don't think this is the correct way to do it. If btrfs doesn't support external log device, then this test doesn't fit btrfs, or we need other ways to test btrfs. One of the problems of this hack is that raid1 requires all devices are in the same size, we have a _require_scratch_dev_pool_equal_size() rule to check on it, but this hack doesn't do the proper check and test fails if SCRATCH_LOGDEV is smaller or bigger in size. If btrfs "-d raid0 -m raid1" is capable to do this writeback error test, perhaps you can write a new btrfs test and mkfs with "-d raid0 -m raid1" explicitly. e.g. ... _require_scratch_dev_pool 2 _require_scratch_dev_pool_equal_size ... _scratch_mkfs "-d raid0 -m raid1" ... Thanks, Eryu > mkfs_filter="cat" > ;; > ext3) > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 5/5] btrfs: allow it to use $SCRATCH_LOGDEV
On Wed, May 31, 2017 at 09:08:20AM -0400, Jeff Layton wrote: > With btrfs, we can't really put the log on a separate device. What we > can do however is mirror the metadata across two devices and make the > data striped across all devices. When we turn on dmerror then the > metadata can fall back to using the other mirror while the data errors > out. > > Note that the current incarnation of btrfs has a fixed 64k stripe > width. If that ever changes or becomes settable, we may need to adjust > the amount of data that the test program writes. > > Signed-off-by: Jeff Layton > --- > common/rc | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/common/rc b/common/rc > index 83765aacfb06..078270451b53 100644 > --- a/common/rc > +++ b/common/rc > @@ -830,6 +830,8 @@ _scratch_mkfs() > ;; > btrfs) > mkfs_cmd="$MKFS_BTRFS_PROG" > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + mkfs_cmd="$mkfs_cmd -d raid0 -m raid1 $SCRATCH_LOGDEV" I don't think this is the correct way to do it. If btrfs doesn't support external log device, then this test doesn't fit btrfs, or we need other ways to test btrfs. One of the problems of this hack is that raid1 requires all devices are in the same size, we have a _require_scratch_dev_pool_equal_size() rule to check on it, but this hack doesn't do the proper check and test fails if SCRATCH_LOGDEV is smaller or bigger in size. If btrfs "-d raid0 -m raid1" is capable to do this writeback error test, perhaps you can write a new btrfs test and mkfs with "-d raid0 -m raid1" explicitly. e.g. ... _require_scratch_dev_pool 2 _require_scratch_dev_pool_equal_size ... _scratch_mkfs "-d raid0 -m raid1" ... Thanks, Eryu > mkfs_filter="cat" > ;; > ext3) > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 4/5] ext3: allow it to put journal on a separate device when doing scratch_mkfs
On Wed, May 31, 2017 at 09:08:19AM -0400, Jeff Layton wrote: > Signed-off-by: Jeff Layton> --- > common/rc | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/common/rc b/common/rc > index 391d36f373cd..83765aacfb06 100644 > --- a/common/rc > +++ b/common/rc > @@ -832,7 +832,16 @@ _scratch_mkfs() > mkfs_cmd="$MKFS_BTRFS_PROG" > mkfs_filter="cat" > ;; > - ext2|ext3) > + ext3) > + mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F" > + mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \"" > + > + # put journal on separate device? > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \ > + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV" Similar to that ext4 patch, need $MKFS_OPTIONS when creating journal device. Thanks, Eryu > + ;; > + ext2) > mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F" > mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \"" > ;; > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 4/5] ext3: allow it to put journal on a separate device when doing scratch_mkfs
On Wed, May 31, 2017 at 09:08:19AM -0400, Jeff Layton wrote: > Signed-off-by: Jeff Layton > --- > common/rc | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/common/rc b/common/rc > index 391d36f373cd..83765aacfb06 100644 > --- a/common/rc > +++ b/common/rc > @@ -832,7 +832,16 @@ _scratch_mkfs() > mkfs_cmd="$MKFS_BTRFS_PROG" > mkfs_filter="cat" > ;; > - ext2|ext3) > + ext3) > + mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F" > + mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \"" > + > + # put journal on separate device? > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \ > + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV" Similar to that ext4 patch, need $MKFS_OPTIONS when creating journal device. Thanks, Eryu > + ;; > + ext2) > mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F" > mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \"" > ;; > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 3/5] generic: test writeback error handling on dmerror devices
On Wed, May 31, 2017 at 09:08:18AM -0400, Jeff Layton wrote: > Ensure that we get an error back on all fds when a block device is > open by multiple writers and writeback fails. > > Signed-off-by: Jeff Layton> --- > tests/generic/998 | 64 > +++ > tests/generic/998.out | 2 ++ > tests/generic/group | 1 + > 3 files changed, 67 insertions(+) > create mode 100755 tests/generic/998 > create mode 100644 tests/generic/998.out > > diff --git a/tests/generic/998 b/tests/generic/998 > new file mode 100755 > index ..fbadb47507c2 > --- /dev/null > +++ b/tests/generic/998 > @@ -0,0 +1,64 @@ > +#! /bin/bash > +# FS QA Test No. 998 > +# > +# Test writeback error handling when writing to block devices via pagecache. > +# See src/fsync-err.c for details of what test actually does. > +# > +#--- > +# Copyright (c) 2017, Jeff Layton > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1# failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > +cd / > +rm -rf $tmp.* $testdir > +_dmerror_cleanup > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/dmerror > + > +# real QA test starts here > +_supported_os Linux > +_require_scratch _require_scratch_nocheck Then we don't have to re-create a filesystem on SCRATCH_DEV before exiting. > +_require_logdev > +_require_dm_target error > +_require_test_program fsync-err > + > +rm -f $seqres.full > + > +$XFS_IO_PROG -d -c "pwrite -S 0x7c -b 1048576 0 $((64 * 1048576))" > $SCRATCH_DEV >> $seqres.full I don't see why this is needed, add some comments? Or remove it if it's not needed? > +_dmerror_init > + > +$here/src/fsync-err $DMERROR_DEV > + > +# success, all done > +_dmerror_load_working_table > +_dmerror_cleanup > +_scratch_mkfs > $seqres.full 2>&1 > +status=0 > +exit > diff --git a/tests/generic/998.out b/tests/generic/998.out > new file mode 100644 > index ..658c438820e2 > --- /dev/null > +++ b/tests/generic/998.out > @@ -0,0 +1,2 @@ > +QA output created by 998 > +Test passed! > diff --git a/tests/generic/group b/tests/generic/group > index 39f7b14657f1..9fc384363ca7 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -440,4 +440,5 @@ > 435 auto encrypt > 436 auto quick rw > 437 auto quick > +998 auto quick This is a test for block device, not filesystems, I'd remove it from auto and quick group, but add it to 'blockdev' group. So it won't be run if someone wants to test filesystems. Thanks, Eryu > 999 auto quick > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 3/5] generic: test writeback error handling on dmerror devices
On Wed, May 31, 2017 at 09:08:18AM -0400, Jeff Layton wrote: > Ensure that we get an error back on all fds when a block device is > open by multiple writers and writeback fails. > > Signed-off-by: Jeff Layton > --- > tests/generic/998 | 64 > +++ > tests/generic/998.out | 2 ++ > tests/generic/group | 1 + > 3 files changed, 67 insertions(+) > create mode 100755 tests/generic/998 > create mode 100644 tests/generic/998.out > > diff --git a/tests/generic/998 b/tests/generic/998 > new file mode 100755 > index ..fbadb47507c2 > --- /dev/null > +++ b/tests/generic/998 > @@ -0,0 +1,64 @@ > +#! /bin/bash > +# FS QA Test No. 998 > +# > +# Test writeback error handling when writing to block devices via pagecache. > +# See src/fsync-err.c for details of what test actually does. > +# > +#--- > +# Copyright (c) 2017, Jeff Layton > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1# failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > +cd / > +rm -rf $tmp.* $testdir > +_dmerror_cleanup > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/dmerror > + > +# real QA test starts here > +_supported_os Linux > +_require_scratch _require_scratch_nocheck Then we don't have to re-create a filesystem on SCRATCH_DEV before exiting. > +_require_logdev > +_require_dm_target error > +_require_test_program fsync-err > + > +rm -f $seqres.full > + > +$XFS_IO_PROG -d -c "pwrite -S 0x7c -b 1048576 0 $((64 * 1048576))" > $SCRATCH_DEV >> $seqres.full I don't see why this is needed, add some comments? Or remove it if it's not needed? > +_dmerror_init > + > +$here/src/fsync-err $DMERROR_DEV > + > +# success, all done > +_dmerror_load_working_table > +_dmerror_cleanup > +_scratch_mkfs > $seqres.full 2>&1 > +status=0 > +exit > diff --git a/tests/generic/998.out b/tests/generic/998.out > new file mode 100644 > index ..658c438820e2 > --- /dev/null > +++ b/tests/generic/998.out > @@ -0,0 +1,2 @@ > +QA output created by 998 > +Test passed! > diff --git a/tests/generic/group b/tests/generic/group > index 39f7b14657f1..9fc384363ca7 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -440,4 +440,5 @@ > 435 auto encrypt > 436 auto quick rw > 437 auto quick > +998 auto quick This is a test for block device, not filesystems, I'd remove it from auto and quick group, but add it to 'blockdev' group. So it won't be run if someone wants to test filesystems. Thanks, Eryu > 999 auto quick > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 2/5] ext4: allow ext4 to use $SCRATCH_LOGDEV
On Wed, May 31, 2017 at 09:08:17AM -0400, Jeff Layton wrote: > The writeback error handling test requires that you put the journal on a > separate device. This allows us to use dmerror to simulate data > writeback failure, without affecting the journal. > > xfs already has infrastructure for this (a'la $SCRATCH_LOGDEV), so wire > up the ext4 code so that it can do the same thing when _scratch_mkfs is > called. > > Signed-off-by: Jeff Layton> Reviewed-by: Darrick J. Wong > --- > common/rc | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/common/rc b/common/rc > index 743df427c047..391d36f373cd 100644 > --- a/common/rc > +++ b/common/rc > @@ -676,6 +676,9 @@ _scratch_mkfs_ext4() > local tmp=`mktemp` > local mkfs_status > > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \ > + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV" Need $MKFS_OPTIONS too when creating journal device, otherwise mkfs will fail when making non-default block size ext4, i.e. journal device has 4k block size, but ext4 has 1k block size if we have MKFS_OPTIONS="-b 1024" Thanks, Eryu > > _scratch_do_mkfs "$mkfs_cmd" "$mkfs_filter" $* 2>$tmp.mkfserr > 1>$tmp.mkfsstd > mkfs_status=$? > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 2/5] ext4: allow ext4 to use $SCRATCH_LOGDEV
On Wed, May 31, 2017 at 09:08:17AM -0400, Jeff Layton wrote: > The writeback error handling test requires that you put the journal on a > separate device. This allows us to use dmerror to simulate data > writeback failure, without affecting the journal. > > xfs already has infrastructure for this (a'la $SCRATCH_LOGDEV), so wire > up the ext4 code so that it can do the same thing when _scratch_mkfs is > called. > > Signed-off-by: Jeff Layton > Reviewed-by: Darrick J. Wong > --- > common/rc | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/common/rc b/common/rc > index 743df427c047..391d36f373cd 100644 > --- a/common/rc > +++ b/common/rc > @@ -676,6 +676,9 @@ _scratch_mkfs_ext4() > local tmp=`mktemp` > local mkfs_status > > + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ > + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \ > + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV" Need $MKFS_OPTIONS too when creating journal device, otherwise mkfs will fail when making non-default block size ext4, i.e. journal device has 4k block size, but ext4 has 1k block size if we have MKFS_OPTIONS="-b 1024" Thanks, Eryu > > _scratch_do_mkfs "$mkfs_cmd" "$mkfs_filter" $* 2>$tmp.mkfserr > 1>$tmp.mkfsstd > mkfs_status=$? > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test
On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote: > I'm working on a set of kernel patches to change how writeback errors > are handled and reported in the kernel. Instead of reporting a > writeback error to only the first fsync caller on the file, I aim > to make the kernel report them once on every file description. > > This patch adds a test for the new behavior. Basically, open many fds > to the same file, turn on dm_error, write to each of the fds, and then > fsync them all to ensure that they all get an error back. > > To do that, I'm adding a new tools/dmerror script that the C program > can use to load the error table. For now, that's all it can do, but > we can fill it out with other commands as necessary. > > Signed-off-by: Jeff LaytonThanks for the new tests! And sorry for the late review.. It's testing a new behavior on error reporting on writeback, I'm not sure if we can call it a new feature or it fixed a bug? But it's more like a behavior change, I'm not sure how to categorize it. Because if it's testing a new feature, we usually let test do proper detection of current test environment (based on actual behavior not kernel version) and _notrun on filesystems that don't have this feature yet, instead of failing the test; if it's testing a bug fix, we could leave the test fail on unfixed filesystems, this also serves as a reminder that there's bug to fix. I pulled your test kernel tree, and test passed on EXT4 but failed on other local filesystems (XFS, btrfs). I assume that's expected. Besides this kinda high-level question, some minor comments inline. > --- > common/dmerror | 13 ++-- > doc/auxiliary-programs.txt | 8 +++ > src/Makefile | 2 +- > src/fsync-err.c| 161 > + New binary needs an entry in .gitignore file. > tests/generic/999 | 76 + > tests/generic/999.out | 3 + > tests/generic/group| 1 + > tools/dmerror | 44 + This file is used by the test, then it should be in src/ directory and be installed along with other executable files on "make install". Because files under tools/ are not installed. Most people will run tests in the root dir of xfstests and this is not a problem, but there're still cases people do "make && make install" and run fstests from /var/lib/xfstests (default installation target). > 8 files changed, 302 insertions(+), 6 deletions(-) > create mode 100644 src/fsync-err.c > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > create mode 100755 tools/dmerror > > diff --git a/common/dmerror b/common/dmerror > index d46c5d0b7266..238baa213b1f 100644 > --- a/common/dmerror > +++ b/common/dmerror > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then > _notrun "Cannot run tests with DAX on dmerror devices" > fi > > -_dmerror_init() > +_dmerror_setup() > { > local dm_backing_dev=$SCRATCH_DEV > > - $DMSETUP_PROG remove error-test > /dev/null 2>&1 > - > local blk_dev_size=`blockdev --getsz $dm_backing_dev` > > DMERROR_DEV='/dev/mapper/error-test' > > DMLINEAR_TABLE="0 $blk_dev_size linear $dm_backing_dev 0" > > + DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0" > +} > + > +_dmerror_init() > +{ > + _dmerror_setup > + $DMSETUP_PROG remove error-test > /dev/null 2>&1 > $DMSETUP_PROG create error-test --table "$DMLINEAR_TABLE" || \ > _fatal "failed to create dm linear device" > - > - DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0" > } > > _dmerror_mount() > diff --git a/doc/auxiliary-programs.txt b/doc/auxiliary-programs.txt > index 21ef118596b6..191ac0596511 100644 > --- a/doc/auxiliary-programs.txt > +++ b/doc/auxiliary-programs.txt > @@ -16,6 +16,7 @@ note the dependency with: > Contents: > > - af_unix -- Create an AF_UNIX socket > + - fsync-err -- tests fsync error reporting after failed writeback > - open_by_handle-- open_by_handle_at syscall exercise > - stat_test -- statx syscall exercise > - t_dir_type-- print directory entries and their file type > @@ -30,6 +31,13 @@ af_unix > > The af_unix program creates an AF_UNIX socket at the given location. > > +fsync-err > + Specialized program for testing how the kernel reports errors that > + occur during writeback. Works in conjunction with the dmerror script > + in tools/ to write data to a device, and then force it to fail > + writeback and test that errors are reported during fsync and cleared > + afterward. > + > open_by_handle > > The open_by_handle program exercises the open_by_handle_at() system > diff --git a/src/Makefile b/src/Makefile > index 4ec01975f8f7..b79c4d84d31b 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64
Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test
On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote: > I'm working on a set of kernel patches to change how writeback errors > are handled and reported in the kernel. Instead of reporting a > writeback error to only the first fsync caller on the file, I aim > to make the kernel report them once on every file description. > > This patch adds a test for the new behavior. Basically, open many fds > to the same file, turn on dm_error, write to each of the fds, and then > fsync them all to ensure that they all get an error back. > > To do that, I'm adding a new tools/dmerror script that the C program > can use to load the error table. For now, that's all it can do, but > we can fill it out with other commands as necessary. > > Signed-off-by: Jeff Layton Thanks for the new tests! And sorry for the late review.. It's testing a new behavior on error reporting on writeback, I'm not sure if we can call it a new feature or it fixed a bug? But it's more like a behavior change, I'm not sure how to categorize it. Because if it's testing a new feature, we usually let test do proper detection of current test environment (based on actual behavior not kernel version) and _notrun on filesystems that don't have this feature yet, instead of failing the test; if it's testing a bug fix, we could leave the test fail on unfixed filesystems, this also serves as a reminder that there's bug to fix. I pulled your test kernel tree, and test passed on EXT4 but failed on other local filesystems (XFS, btrfs). I assume that's expected. Besides this kinda high-level question, some minor comments inline. > --- > common/dmerror | 13 ++-- > doc/auxiliary-programs.txt | 8 +++ > src/Makefile | 2 +- > src/fsync-err.c| 161 > + New binary needs an entry in .gitignore file. > tests/generic/999 | 76 + > tests/generic/999.out | 3 + > tests/generic/group| 1 + > tools/dmerror | 44 + This file is used by the test, then it should be in src/ directory and be installed along with other executable files on "make install". Because files under tools/ are not installed. Most people will run tests in the root dir of xfstests and this is not a problem, but there're still cases people do "make && make install" and run fstests from /var/lib/xfstests (default installation target). > 8 files changed, 302 insertions(+), 6 deletions(-) > create mode 100644 src/fsync-err.c > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > create mode 100755 tools/dmerror > > diff --git a/common/dmerror b/common/dmerror > index d46c5d0b7266..238baa213b1f 100644 > --- a/common/dmerror > +++ b/common/dmerror > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then > _notrun "Cannot run tests with DAX on dmerror devices" > fi > > -_dmerror_init() > +_dmerror_setup() > { > local dm_backing_dev=$SCRATCH_DEV > > - $DMSETUP_PROG remove error-test > /dev/null 2>&1 > - > local blk_dev_size=`blockdev --getsz $dm_backing_dev` > > DMERROR_DEV='/dev/mapper/error-test' > > DMLINEAR_TABLE="0 $blk_dev_size linear $dm_backing_dev 0" > > + DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0" > +} > + > +_dmerror_init() > +{ > + _dmerror_setup > + $DMSETUP_PROG remove error-test > /dev/null 2>&1 > $DMSETUP_PROG create error-test --table "$DMLINEAR_TABLE" || \ > _fatal "failed to create dm linear device" > - > - DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0" > } > > _dmerror_mount() > diff --git a/doc/auxiliary-programs.txt b/doc/auxiliary-programs.txt > index 21ef118596b6..191ac0596511 100644 > --- a/doc/auxiliary-programs.txt > +++ b/doc/auxiliary-programs.txt > @@ -16,6 +16,7 @@ note the dependency with: > Contents: > > - af_unix -- Create an AF_UNIX socket > + - fsync-err -- tests fsync error reporting after failed writeback > - open_by_handle-- open_by_handle_at syscall exercise > - stat_test -- statx syscall exercise > - t_dir_type-- print directory entries and their file type > @@ -30,6 +31,13 @@ af_unix > > The af_unix program creates an AF_UNIX socket at the given location. > > +fsync-err > + Specialized program for testing how the kernel reports errors that > + occur during writeback. Works in conjunction with the dmerror script > + in tools/ to write data to a device, and then force it to fail > + writeback and test that errors are reported during fsync and cleared > + afterward. > + > open_by_handle > > The open_by_handle program exercises the open_by_handle_at() system > diff --git a/src/Makefile b/src/Makefile > index 4ec01975f8f7..b79c4d84d31b 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \ >
Re: [PATCH] clean_bdev_aliases: Prevent cleaning blocks that are not in block range
On Mon, Jan 02, 2017 at 04:58:03PM +0100, Jan Kara wrote: > On Sun 25-12-16 19:01:03, Chandan Rajendra wrote: > > The first block to be cleaned may start at a non-zero page offset. In > > such a scenario clean_bdev_aliases() will end up cleaning blocks that > > do not fall in the range of blocks to be cleaned. This commit fixes the > > issue by skipping blocks that do not fall in valid block range. > > > > Signed-off-by: Chandan Rajendra <chan...@linux.vnet.ibm.com> > > Ah, very good catch! How did you spot this? I failed to notice this patch, and I came up with a same patch today myself, and I'm still testing it. I found this by xfstests, many tests (tens of tests) failed fsck after test when testing extN if blocksize < pagesize. E.g. generic/013 could reproduce the fs corruption quite reliablely for me. Reviewed-by: Eryu Guan <eg...@redhat.com> > > The patch looks good. You can add: > > Reviewed-by: Jan Kara <j...@suse.cz> > > Jens, please merge this fix quickly as we may end up discarding changes to > innocent metadata blocks due to this... Thanks! > > Honza > > --- > > fs/buffer.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 1df2bd5..28484b3 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -1660,7 +1660,7 @@ void clean_bdev_aliases(struct block_device *bdev, > > sector_t block, sector_t len) > > head = page_buffers(page); > > bh = head; > > do { > > - if (!buffer_mapped(bh)) > > + if (!buffer_mapped(bh) || (bh->b_blocknr < > > block)) > > goto next; > > if (bh->b_blocknr >= block + len) > > break; > > -- > > 2.5.5 > > > -- > Jan Kara <j...@suse.com> > SUSE Labs, CR > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] clean_bdev_aliases: Prevent cleaning blocks that are not in block range
On Mon, Jan 02, 2017 at 04:58:03PM +0100, Jan Kara wrote: > On Sun 25-12-16 19:01:03, Chandan Rajendra wrote: > > The first block to be cleaned may start at a non-zero page offset. In > > such a scenario clean_bdev_aliases() will end up cleaning blocks that > > do not fall in the range of blocks to be cleaned. This commit fixes the > > issue by skipping blocks that do not fall in valid block range. > > > > Signed-off-by: Chandan Rajendra > > Ah, very good catch! How did you spot this? I failed to notice this patch, and I came up with a same patch today myself, and I'm still testing it. I found this by xfstests, many tests (tens of tests) failed fsck after test when testing extN if blocksize < pagesize. E.g. generic/013 could reproduce the fs corruption quite reliablely for me. Reviewed-by: Eryu Guan > > The patch looks good. You can add: > > Reviewed-by: Jan Kara > > Jens, please merge this fix quickly as we may end up discarding changes to > innocent metadata blocks due to this... Thanks! > > Honza > > --- > > fs/buffer.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 1df2bd5..28484b3 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -1660,7 +1660,7 @@ void clean_bdev_aliases(struct block_device *bdev, > > sector_t block, sector_t len) > > head = page_buffers(page); > > bh = head; > > do { > > - if (!buffer_mapped(bh)) > > + if (!buffer_mapped(bh) || (bh->b_blocknr < > > block)) > > goto next; > > if (bh->b_blocknr >= block + len) > > break; > > -- > > 2.5.5 > > > -- > Jan Kara > SUSE Labs, CR > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] mm/filemap: don't allow partially uptodate page for pipes
Starting from 4.9-rc1 kernel, I started noticing some test failures of sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when testing on sub-page block size filesystems (tested both XFS and ext4), these syscalls start to return EIO in the tests. e.g. sendfile021 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 26, got: -1 sendfile022 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 24, got: -1 sendfile023 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 22, got: -1 sendfile024 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 20, got: -1 This is because that in sub-page block size cases, we don't need the whole page to be uptodate, only the part we care about is uptodate is OK (if fs has ->is_partially_uptodate defined). But page_cache_pipe_buf_confirm() doesn't have the ability to check the partially-uptodate case, it needs the whole page to be uptodate. So it returns EIO in this case. This is a regression introduced by commit 82c156f85384 ("switch generic_file_splice_read() to use of ->read_iter()"). Prior to the change, generic_file_splice_read() doesn't allow partially-uptodate page either, so it worked fine. Fix it by skipping the partially-uptodate check if we're working on a pipe in do_generic_file_read(), so we read the whole page from disk as long as the page is not uptodate. Signed-off-by: Eryu Guan <guane...@gmail.com> --- I think the other way to fix it is to add the ability to check & allow partially-uptodate page to page_cache_pipe_buf_confirm(), but that is much harder to do and seems gain little. v2: - Update summary a little bit - Update commit log - Add comment to the code - Add more people/list to cc v1: http://marc.info/?l=linux-mm=147756897431777=2 mm/filemap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 849f459..670264d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1734,6 +1734,9 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos, if (inode->i_blkbits == PAGE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date; + /* pipes can't handle partially uptodate pages */ + if (unlikely(iter->type & ITER_PIPE)) + goto page_not_up_to_date; if (!trylock_page(page)) goto page_not_up_to_date; /* Did it get truncated before we got the lock? */ -- 2.7.4
[PATCH v2] mm/filemap: don't allow partially uptodate page for pipes
Starting from 4.9-rc1 kernel, I started noticing some test failures of sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when testing on sub-page block size filesystems (tested both XFS and ext4), these syscalls start to return EIO in the tests. e.g. sendfile021 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 26, got: -1 sendfile022 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 24, got: -1 sendfile023 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 22, got: -1 sendfile024 TFAIL : sendfile02.c:133: sendfile(2) failed to return expected value, expected: 20, got: -1 This is because that in sub-page block size cases, we don't need the whole page to be uptodate, only the part we care about is uptodate is OK (if fs has ->is_partially_uptodate defined). But page_cache_pipe_buf_confirm() doesn't have the ability to check the partially-uptodate case, it needs the whole page to be uptodate. So it returns EIO in this case. This is a regression introduced by commit 82c156f85384 ("switch generic_file_splice_read() to use of ->read_iter()"). Prior to the change, generic_file_splice_read() doesn't allow partially-uptodate page either, so it worked fine. Fix it by skipping the partially-uptodate check if we're working on a pipe in do_generic_file_read(), so we read the whole page from disk as long as the page is not uptodate. Signed-off-by: Eryu Guan --- I think the other way to fix it is to add the ability to check & allow partially-uptodate page to page_cache_pipe_buf_confirm(), but that is much harder to do and seems gain little. v2: - Update summary a little bit - Update commit log - Add comment to the code - Add more people/list to cc v1: http://marc.info/?l=linux-mm=147756897431777=2 mm/filemap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 849f459..670264d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1734,6 +1734,9 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos, if (inode->i_blkbits == PAGE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date; + /* pipes can't handle partially uptodate pages */ + if (unlikely(iter->type & ITER_PIPE)) + goto page_not_up_to_date; if (!trylock_page(page)) goto page_not_up_to_date; /* Did it get truncated before we got the lock? */ -- 2.7.4
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Sun, Aug 14, 2016 at 04:52:16PM +0200, Greg Kroah-Hartman wrote: > > Did a fix for this ever get into Linus's tree? > > thanks, > > greg k-h Yes, please see commit c1892c37769c ("vfs: fix deadlock in file_remove_privs() on overlayfs"), which is tagged as "stable". Thanks, Eryu
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Sun, Aug 14, 2016 at 04:52:16PM +0200, Greg Kroah-Hartman wrote: > > Did a fix for this ever get into Linus's tree? > > thanks, > > greg k-h Yes, please see commit c1892c37769c ("vfs: fix deadlock in file_remove_privs() on overlayfs"), which is tagged as "stable". Thanks, Eryu
[BUG] commit 444d13ff10f introduced boot failure on s390x
Hi, I hit boot failure on s390x host starting from 4.8-rc1 kernel, 4.7 kernel works fine. And I bisected to this commit 444d13ff10fb commit 444d13ff10fb13bc3e64859c3cf9ce43dcfeb075 Author: Jessica YuDate: Wed Jul 27 12:06:21 2016 +0930 modules: add ro_after_init support Add ro_after_init support for modules by adding a new page-aligned section in the module layout (after rodata) for ro_after_init data and enabling RO protection for that section after module init runs. Signed-off-by: Jessica Yu Acked-by: Kees Cook Signed-off-by: Rusty Russell and I've only hit this panic on s390x hosts. Console log is appended at the end of email. Thanks, Eryu [2.050197] device-mapper: uevent: version 1.0.3 [2.050370] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-d e...@redhat.com [2.057615] Unable to handle kernel pointer dereference in virtual kernel add ress space [2.057619] Failing address: 03ff8001d000 TEID: 03ff8001d407 [2.057620] Fault in home space mode while using kernel ASCE. [2.057622] AS:00a7c007 R3:7c974007 S:7cc24800 P: 0239b21d [2.057665] Oops: 0004 ilc:3 [#1] SMP [2.057667] Modules linked in: dm_mod [2.057670] CPU: 0 PID: 399 Comm: modprobe Not tainted 4.7.0+ #7 [2.057672] Hardware name: IBM 2827 H43 400 (z/VM) [2.057673] task: 7cccd100 ti: 02324000 task.ti: 0232 4000 [2.057675] Krnl PSW : 0704c0018000 0043a5c8 (__list_add_rcu+0x50 /0xa8) [2.057683]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI: 0 EA:3 Krnl GPRS: 0006 0098b278 03ff80117208 0098b278 [2.057685]03ff8001df08 001c913c 0002 000 08001e55c [2.057686]0560 02327e00 03ff80117218 000 003ff80117208 [2.057687]0098b278 03ff8001df08 02327cc0 000 002327c90 [2.057696] Krnl Code: 0043a5b6: e3d0b024stg %r13,0(% r11) 0043a5bc: e3c0b0080024 stg %r12,8(%r11) #0043a5c2: e3b0c024 stg %r11,0(%r12) >0043a5c8: e3b0d0080024 stg %r11,8(%r13) 0043a5ce: e340f0b80004 lg %r4,184(%r15) 0043a5d4: ebbff0a4 lmg %r11,%r15,160(%r15) 0043a5da: 07f4 bcr 15,%r4 0043a5dc: e34040080004 lg %r4,8(%r4) [2.057706] Call Trace: [2.057708] ([<02327cc0>] 0x2327cc0) [2.057714] ([<001c98d0>] load_module+0x8e0/0x1870) [2.057715] ([<001caa74>] SyS_finit_module+0xb4/0xf0) [2.057720] ([<006678b6>] system_call+0xd6/0x264) [2.057721] Last Breaking-Event-Address: [2.057722] [<001c98ca>] load_module+0x8da/0x1870 [2.057723] [2.057724] Kernel panic - not syncing: Fatal exception: panic_on_oops
[BUG] commit 444d13ff10f introduced boot failure on s390x
Hi, I hit boot failure on s390x host starting from 4.8-rc1 kernel, 4.7 kernel works fine. And I bisected to this commit 444d13ff10fb commit 444d13ff10fb13bc3e64859c3cf9ce43dcfeb075 Author: Jessica Yu Date: Wed Jul 27 12:06:21 2016 +0930 modules: add ro_after_init support Add ro_after_init support for modules by adding a new page-aligned section in the module layout (after rodata) for ro_after_init data and enabling RO protection for that section after module init runs. Signed-off-by: Jessica Yu Acked-by: Kees Cook Signed-off-by: Rusty Russell and I've only hit this panic on s390x hosts. Console log is appended at the end of email. Thanks, Eryu [2.050197] device-mapper: uevent: version 1.0.3 [2.050370] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-d e...@redhat.com [2.057615] Unable to handle kernel pointer dereference in virtual kernel add ress space [2.057619] Failing address: 03ff8001d000 TEID: 03ff8001d407 [2.057620] Fault in home space mode while using kernel ASCE. [2.057622] AS:00a7c007 R3:7c974007 S:7cc24800 P: 0239b21d [2.057665] Oops: 0004 ilc:3 [#1] SMP [2.057667] Modules linked in: dm_mod [2.057670] CPU: 0 PID: 399 Comm: modprobe Not tainted 4.7.0+ #7 [2.057672] Hardware name: IBM 2827 H43 400 (z/VM) [2.057673] task: 7cccd100 ti: 02324000 task.ti: 0232 4000 [2.057675] Krnl PSW : 0704c0018000 0043a5c8 (__list_add_rcu+0x50 /0xa8) [2.057683]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI: 0 EA:3 Krnl GPRS: 0006 0098b278 03ff80117208 0098b278 [2.057685]03ff8001df08 001c913c 0002 000 08001e55c [2.057686]0560 02327e00 03ff80117218 000 003ff80117208 [2.057687]0098b278 03ff8001df08 02327cc0 000 002327c90 [2.057696] Krnl Code: 0043a5b6: e3d0b024stg %r13,0(% r11) 0043a5bc: e3c0b0080024 stg %r12,8(%r11) #0043a5c2: e3b0c024 stg %r11,0(%r12) >0043a5c8: e3b0d0080024 stg %r11,8(%r13) 0043a5ce: e340f0b80004 lg %r4,184(%r15) 0043a5d4: ebbff0a4 lmg %r11,%r15,160(%r15) 0043a5da: 07f4 bcr 15,%r4 0043a5dc: e34040080004 lg %r4,8(%r4) [2.057706] Call Trace: [2.057708] ([<02327cc0>] 0x2327cc0) [2.057714] ([<001c98d0>] load_module+0x8e0/0x1870) [2.057715] ([<001caa74>] SyS_finit_module+0xb4/0xf0) [2.057720] ([<006678b6>] system_call+0xd6/0x264) [2.057721] Last Breaking-Event-Address: [2.057722] [<001c98ca>] load_module+0x8da/0x1870 [2.057723] [2.057724] Kernel panic - not syncing: Fatal exception: panic_on_oops
Re: [lkp] [fs] 6141b4d642: xfstests.generic.079.fail
On Tue, Aug 09, 2016 at 11:14:38PM +0800, kernel test robot wrote: > > FYI, we noticed the following commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus > commit 6141b4d64295ec08a1b48c7fcac8a566658cd64f ("fs: return EPERM on > immutable inode") > > in testcase: xfstests > with following parameters: > > disk: 4HDD > fs: btrfs > test: generic-quick1 > > > on test machine: 4 threads Sandy Bridge with 6G memory > > caused below changes: I think generic/079 needs update, because the expected errno was changed after this commit. I've sent a patch to fste...@vger.kernel.org. Thanks, Eryu
Re: [lkp] [fs] 6141b4d642: xfstests.generic.079.fail
On Tue, Aug 09, 2016 at 11:14:38PM +0800, kernel test robot wrote: > > FYI, we noticed the following commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus > commit 6141b4d64295ec08a1b48c7fcac8a566658cd64f ("fs: return EPERM on > immutable inode") > > in testcase: xfstests > with following parameters: > > disk: 4HDD > fs: btrfs > test: generic-quick1 > > > on test machine: 4 threads Sandy Bridge with 6G memory > > caused below changes: I think generic/079 needs update, because the expected errno was changed after this commit. I've sent a patch to fste...@vger.kernel.org. Thanks, Eryu
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Wed, Aug 03, 2016 at 09:45:06AM +0200, Greg Kroah-Hartman wrote: > On Wed, Aug 03, 2016 at 03:36:44PM +0800, Eryu Guan wrote: > > On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote: > > > 4.4-stable review patch. If anyone has any objections, please let me > > > know. > > > > > > -- > > > > > > From: Vivek Goyal <vgo...@redhat.com> > > > > > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. > > > > > > Right now when a new overlay inode is created, we initialize overlay > > > inode's ->i_mode from underlying inode ->i_mode but we retain only > > > file type bits (S_IFMT) and discard permission bits. > > > > > > This patch changes it and retains permission bits too. This should allow > > > overlay to do permission checks on overlay inode itself in task context. > > > > > > [SzM] It also fixes clearing suid/sgid bits on write. > > > > This patch introduced a hang when writing to suid file, fstests > > generic/355 could reproduce the hang easily, it only failed the test > > without this patch and didn't hang the kernel. > > > > Should we skip it for now and wait for a further fix? > > Does Linus's tree have the same problem? Yes, 4.7 kernel hangs as well. > > > (The 4.6-stable tree faces the same question) > > Are we just missing a patch to be applied here? AFAICT, no patch is available to fix it yet. But I've reported the bug to Miklos (in RH bugzilla). > > And this is already in the released stable kernels... This patch is not in 4.7-rc7, it first appears in 4.7 kernel, seems like the soak time is not long enough to let testings finish :) I finished my 4.7 kernel testing and identified the culprit this Monday. Thanks, Eryu
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Wed, Aug 03, 2016 at 09:45:06AM +0200, Greg Kroah-Hartman wrote: > On Wed, Aug 03, 2016 at 03:36:44PM +0800, Eryu Guan wrote: > > On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote: > > > 4.4-stable review patch. If anyone has any objections, please let me > > > know. > > > > > > -- > > > > > > From: Vivek Goyal > > > > > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. > > > > > > Right now when a new overlay inode is created, we initialize overlay > > > inode's ->i_mode from underlying inode ->i_mode but we retain only > > > file type bits (S_IFMT) and discard permission bits. > > > > > > This patch changes it and retains permission bits too. This should allow > > > overlay to do permission checks on overlay inode itself in task context. > > > > > > [SzM] It also fixes clearing suid/sgid bits on write. > > > > This patch introduced a hang when writing to suid file, fstests > > generic/355 could reproduce the hang easily, it only failed the test > > without this patch and didn't hang the kernel. > > > > Should we skip it for now and wait for a further fix? > > Does Linus's tree have the same problem? Yes, 4.7 kernel hangs as well. > > > (The 4.6-stable tree faces the same question) > > Are we just missing a patch to be applied here? AFAICT, no patch is available to fix it yet. But I've reported the bug to Miklos (in RH bugzilla). > > And this is already in the released stable kernels... This patch is not in 4.7-rc7, it first appears in 4.7 kernel, seems like the soak time is not long enough to let testings finish :) I finished my 4.7 kernel testing and identified the culprit this Monday. Thanks, Eryu
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Vivek Goyal <vgo...@redhat.com> > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. > > Right now when a new overlay inode is created, we initialize overlay > inode's ->i_mode from underlying inode ->i_mode but we retain only > file type bits (S_IFMT) and discard permission bits. > > This patch changes it and retains permission bits too. This should allow > overlay to do permission checks on overlay inode itself in task context. > > [SzM] It also fixes clearing suid/sgid bits on write. This patch introduced a hang when writing to suid file, fstests generic/355 could reproduce the hang easily, it only failed the test without this patch and didn't hang the kernel. Should we skip it for now and wait for a further fix? (The 4.6-stable tree faces the same question) Thanks, Eryu > > Signed-off-by: Vivek Goyal <vgo...@redhat.com> > Reported-by: Eryu Guan <eg...@redhat.com> > Signed-off-by: Miklos Szeredi <mszer...@redhat.com> > Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and > f_inode to the underlay") > Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> > > --- > fs/overlayfs/inode.c |3 +-- > fs/overlayfs/overlayfs.h |1 + > 2 files changed, 2 insertions(+), 2 deletions(-) > > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -412,12 +412,11 @@ struct inode *ovl_new_inode(struct super > if (!inode) > return NULL; > > - mode &= S_IFMT; > - > inode->i_ino = get_next_ino(); > inode->i_mode = mode; > inode->i_flags |= S_NOATIME | S_NOCMTIME; > > + mode &= S_IFMT; > switch (mode) { > case S_IFDIR: > inode->i_private = oe; > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -181,6 +181,7 @@ static inline void ovl_copyattr(struct i > { > to->i_uid = from->i_uid; > to->i_gid = from->i_gid; > + to->i_mode = from->i_mode; > } > > /* dir.c */ > >
Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode
On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Vivek Goyal > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream. > > Right now when a new overlay inode is created, we initialize overlay > inode's ->i_mode from underlying inode ->i_mode but we retain only > file type bits (S_IFMT) and discard permission bits. > > This patch changes it and retains permission bits too. This should allow > overlay to do permission checks on overlay inode itself in task context. > > [SzM] It also fixes clearing suid/sgid bits on write. This patch introduced a hang when writing to suid file, fstests generic/355 could reproduce the hang easily, it only failed the test without this patch and didn't hang the kernel. Should we skip it for now and wait for a further fix? (The 4.6-stable tree faces the same question) Thanks, Eryu > > Signed-off-by: Vivek Goyal > Reported-by: Eryu Guan > Signed-off-by: Miklos Szeredi > Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and > f_inode to the underlay") > Signed-off-by: Greg Kroah-Hartman > > --- > fs/overlayfs/inode.c |3 +-- > fs/overlayfs/overlayfs.h |1 + > 2 files changed, 2 insertions(+), 2 deletions(-) > > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -412,12 +412,11 @@ struct inode *ovl_new_inode(struct super > if (!inode) > return NULL; > > - mode &= S_IFMT; > - > inode->i_ino = get_next_ino(); > inode->i_mode = mode; > inode->i_flags |= S_NOATIME | S_NOCMTIME; > > + mode &= S_IFMT; > switch (mode) { > case S_IFDIR: > inode->i_private = oe; > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -181,6 +181,7 @@ static inline void ovl_copyattr(struct i > { > to->i_uid = from->i_uid; > to->i_gid = from->i_gid; > + to->i_mode = from->i_mode; > } > > /* dir.c */ > >
[PATCH v2 RESEND] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner <dchin...@redhat.com> Signed-off-by: Eryu Guan <guane...@gmail.com> --- v2: - update commit log to mention that it's found by running LTP fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index e0621ca..e4da0ec 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1800,7 +1800,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (gfs2_holder_initialized(_gh)) diff --git a/fs/namei.c b/fs/namei.c index c386a32..adb0414 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -410,7 +410,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; /* * Updating mtime will likely cause i_uid and i_gid to be diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 9a7c878..3d6820f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.7.4
[PATCH v2 RESEND] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner Signed-off-by: Eryu Guan --- v2: - update commit log to mention that it's found by running LTP fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index e0621ca..e4da0ec 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1800,7 +1800,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (gfs2_holder_initialized(_gh)) diff --git a/fs/namei.c b/fs/namei.c index c386a32..adb0414 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -410,7 +410,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; /* * Updating mtime will likely cause i_uid and i_gid to be diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 9a7c878..3d6820f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.7.4
[PATCH v2] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner <dchin...@redhat.com> Signed-off-by: Eryu Guan <guane...@gmail.com> --- v2: - update commit message to include the background on noticing this issue fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index bb30f9a..4c68d91 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (unlock) diff --git a/fs/namei.c b/fs/namei.c index 1d9ca2d..7f4a40a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; } retval = do_inode_permission(inode, mask); diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bcb6c19..4c4c58f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.5.5
[PATCH v2] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner Signed-off-by: Eryu Guan --- v2: - update commit message to include the background on noticing this issue fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index bb30f9a..4c68d91 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (unlock) diff --git a/fs/namei.c b/fs/namei.c index 1d9ca2d..7f4a40a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; } retval = do_inode_permission(inode, mask); diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bcb6c19..4c4c58f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.5.5
[PATCH] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. And EPERM looks more reasonable to me. So converting all EACCES to EPERM on immutable inode. Signed-off-by: Eryu Guan <guane...@gmail.com> --- I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index bb30f9a..4c68d91 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (unlock) diff --git a/fs/namei.c b/fs/namei.c index 1d9ca2d..7f4a40a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; } retval = do_inode_permission(inode, mask); diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bcb6c19..4c4c58f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.5.5
[PATCH] fs: return EPERM on immutable inode
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. And EPERM looks more reasonable to me. So converting all EACCES to EPERM on immutable inode. Signed-off-by: Eryu Guan --- I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. fs/gfs2/inode.c| 2 +- fs/namei.c | 2 +- fs/utimes.c| 3 ++- fs/xfs/xfs_ioctl.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index bb30f9a..4c68d91 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask) } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) - error = -EACCES; + error = -EPERM; else error = generic_permission(inode, mask); if (unlock) diff --git a/fs/namei.c b/fs/namei.c index 1d9ca2d..7f4a40a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask) * Nobody gets write access to an immutable file. */ if (IS_IMMUTABLE(inode)) - return -EACCES; + return -EPERM; } retval = do_inode_permission(inode, mask); diff --git a/fs/utimes.c b/fs/utimes.c index 85c40f4..794f5f5 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec *times) * then we need to check permissions, because * inode_change_ok() won't do it. */ - error = -EACCES; + error = -EPERM; if (IS_IMMUTABLE(inode)) goto mnt_drop_write_and_out; + error = -EACCES; if (!inode_owner_or_capable(inode)) { error = inode_permission(inode, MAY_WRITE); if (error) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bcb6c19..4c4c58f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -232,7 +232,7 @@ xfs_open_by_handle( } if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) { - error = -EACCES; + error = -EPERM; goto out_dput; } -- 2.5.5
Re: [PATCH] list: kill list_force_poison()
On Tue, Mar 01, 2016 at 01:44:32PM -0800, Dan Williams wrote: > Given we have uninitialized list_heads being passed to list_add() it > will always be the case that those uninitialized values randomly trigger > the poison value. Especially since a list_add() operation will seed the > stack with the poison value for later stack allocations to trip over. > For example, see these two false positive reports: > > list_add attempted on force-poisoned entry > WARNING: at lib/list_debug.c:34 > [..] > NIP [c043c390] __list_add+0xb0/0x150 > LR [c043c38c] __list_add+0xac/0x150 > Call Trace: > [c00fb5fc3320] [c043c38c] __list_add+0xac/0x150 (unreliable) > [c00fb5fc33a0] [c081b454] __down+0x4c/0xf8 > [c00fb5fc3410] [c010b6f8] down+0x68/0x70 > [c00fb5fc3450] [d000201ebf4c] xfs_buf_lock+0x4c/0x150 [xfs] > > list_add attempted on force-poisoned entry(0500), > new->next == d59ecdb0, new->prev == 0500 > WARNING: at lib/list_debug.c:33 > [..] > NIP [c042db78] __list_add+0xa8/0x140 > LR [c042db74] __list_add+0xa4/0x140 > Call Trace: > [c004c749f620] [c042db74] __list_add+0xa4/0x140 (unreliable) > [c004c749f6b0] [c08010ec] rwsem_down_read_failed+0x6c/0x1a0 > [c004c749f760] [c0800828] down_read+0x58/0x60 > [c004c749f7e0] [d5a1a6bc] xfs_log_commit_cil+0x7c/0x600 [xfs] > > Reported-by: Eryu Guan <eg...@redhat.com> > Cc: Ross Zwisler <ross.zwis...@linux.intel.com> > Cc: <x...@oss.sgi.com> > Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() > for dax-gup") > Signed-off-by: Dan Williams <dan.j.willi...@intel.com> With this patch applied, I don't see the warning after 1000 iterations (of course, the warning has been removed :-)). Thanks! Eryu P.S. With the RFC patch posted eariler, warnings are still triggered. > --- > include/linux/list.h | 11 --- > kernel/memremap.c|9 +++-- > lib/list_debug.c |9 - > 3 files changed, 7 insertions(+), 22 deletions(-) > > diff --git a/include/linux/list.h b/include/linux/list.h > index 30cf4200ab40..5356f4d661a7 100644 > --- a/include/linux/list.h > +++ b/include/linux/list.h > @@ -113,17 +113,6 @@ extern void __list_del_entry(struct list_head *entry); > extern void list_del(struct list_head *entry); > #endif > > -#ifdef CONFIG_DEBUG_LIST > -/* > - * See devm_memremap_pages() which wants DEBUG_LIST=y to assert if one > - * of the pages it allocates is ever passed to list_add() > - */ > -extern void list_force_poison(struct list_head *entry); > -#else > -/* fallback to the less strict LIST_POISON* definitions */ > -#define list_force_poison list_del > -#endif > - > /** > * list_replace - replace old entry by new one > * @old : the element to be replaced > diff --git a/kernel/memremap.c b/kernel/memremap.c > index b981a7b023f0..778191e3e887 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -351,8 +351,13 @@ void *devm_memremap_pages(struct device *dev, struct > resource *res, > for_each_device_pfn(pfn, page_map) { > struct page *page = pfn_to_page(pfn); > > - /* ZONE_DEVICE pages must never appear on a slab lru */ > - list_force_poison(>lru); > + /* > + * ZONE_DEVICE pages union ->lru with a ->pgmap back > + * pointer. It is a bug if a ZONE_DEVICE page is ever > + * freed or placed on a driver-private list. Seed the > + * storage with LIST_POISON* values. > + */ > + list_del(>lru); > page->pgmap = pgmap; > } > devres_add(dev, page_map); > diff --git a/lib/list_debug.c b/lib/list_debug.c > index 3345a089ef7b..3859bf63561c 100644 > --- a/lib/list_debug.c > +++ b/lib/list_debug.c > @@ -12,13 +12,6 @@ > #include > #include > > -static struct list_head force_poison; > -void list_force_poison(struct list_head *entry) > -{ > - entry->next = _poison; > - entry->prev = _poison; > -} > - > /* > * Insert a new entry between two known consecutive entries. > * > @@ -30,8 +23,6 @@ void __list_add(struct list_head *new, > struct list_head *prev, > struct list_head *next) > { > - WARN(new->next == _poison || new->prev == _poison, > - "list_add attempted on force-poisoned entry\n"); > WARN(next->prev != prev, > "list_add corruption. next->prev should be " > "prev (%p), but was %p. (next=%p).\n", >
Re: [PATCH] list: kill list_force_poison()
On Tue, Mar 01, 2016 at 01:44:32PM -0800, Dan Williams wrote: > Given we have uninitialized list_heads being passed to list_add() it > will always be the case that those uninitialized values randomly trigger > the poison value. Especially since a list_add() operation will seed the > stack with the poison value for later stack allocations to trip over. > For example, see these two false positive reports: > > list_add attempted on force-poisoned entry > WARNING: at lib/list_debug.c:34 > [..] > NIP [c043c390] __list_add+0xb0/0x150 > LR [c043c38c] __list_add+0xac/0x150 > Call Trace: > [c00fb5fc3320] [c043c38c] __list_add+0xac/0x150 (unreliable) > [c00fb5fc33a0] [c081b454] __down+0x4c/0xf8 > [c00fb5fc3410] [c010b6f8] down+0x68/0x70 > [c00fb5fc3450] [d000201ebf4c] xfs_buf_lock+0x4c/0x150 [xfs] > > list_add attempted on force-poisoned entry(0500), > new->next == d59ecdb0, new->prev == 0500 > WARNING: at lib/list_debug.c:33 > [..] > NIP [c042db78] __list_add+0xa8/0x140 > LR [c042db74] __list_add+0xa4/0x140 > Call Trace: > [c004c749f620] [c042db74] __list_add+0xa4/0x140 (unreliable) > [c004c749f6b0] [c08010ec] rwsem_down_read_failed+0x6c/0x1a0 > [c004c749f760] [c0800828] down_read+0x58/0x60 > [c004c749f7e0] [d5a1a6bc] xfs_log_commit_cil+0x7c/0x600 [xfs] > > Reported-by: Eryu Guan > Cc: Ross Zwisler > Cc: > Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() > for dax-gup") > Signed-off-by: Dan Williams With this patch applied, I don't see the warning after 1000 iterations (of course, the warning has been removed :-)). Thanks! Eryu P.S. With the RFC patch posted eariler, warnings are still triggered. > --- > include/linux/list.h | 11 --- > kernel/memremap.c|9 +++-- > lib/list_debug.c |9 - > 3 files changed, 7 insertions(+), 22 deletions(-) > > diff --git a/include/linux/list.h b/include/linux/list.h > index 30cf4200ab40..5356f4d661a7 100644 > --- a/include/linux/list.h > +++ b/include/linux/list.h > @@ -113,17 +113,6 @@ extern void __list_del_entry(struct list_head *entry); > extern void list_del(struct list_head *entry); > #endif > > -#ifdef CONFIG_DEBUG_LIST > -/* > - * See devm_memremap_pages() which wants DEBUG_LIST=y to assert if one > - * of the pages it allocates is ever passed to list_add() > - */ > -extern void list_force_poison(struct list_head *entry); > -#else > -/* fallback to the less strict LIST_POISON* definitions */ > -#define list_force_poison list_del > -#endif > - > /** > * list_replace - replace old entry by new one > * @old : the element to be replaced > diff --git a/kernel/memremap.c b/kernel/memremap.c > index b981a7b023f0..778191e3e887 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -351,8 +351,13 @@ void *devm_memremap_pages(struct device *dev, struct > resource *res, > for_each_device_pfn(pfn, page_map) { > struct page *page = pfn_to_page(pfn); > > - /* ZONE_DEVICE pages must never appear on a slab lru */ > - list_force_poison(>lru); > + /* > + * ZONE_DEVICE pages union ->lru with a ->pgmap back > + * pointer. It is a bug if a ZONE_DEVICE page is ever > + * freed or placed on a driver-private list. Seed the > + * storage with LIST_POISON* values. > + */ > + list_del(>lru); > page->pgmap = pgmap; > } > devres_add(dev, page_map); > diff --git a/lib/list_debug.c b/lib/list_debug.c > index 3345a089ef7b..3859bf63561c 100644 > --- a/lib/list_debug.c > +++ b/lib/list_debug.c > @@ -12,13 +12,6 @@ > #include > #include > > -static struct list_head force_poison; > -void list_force_poison(struct list_head *entry) > -{ > - entry->next = _poison; > - entry->prev = _poison; > -} > - > /* > * Insert a new entry between two known consecutive entries. > * > @@ -30,8 +23,6 @@ void __list_add(struct list_head *new, > struct list_head *prev, > struct list_head *next) > { > - WARN(new->next == _poison || new->prev == _poison, > - "list_add attempted on force-poisoned entry\n"); > WARN(next->prev != prev, > "list_add corruption. next->prev should be " > "prev (%p), but was %p. (next=%p).\n", >
Re: [PATCH 2/2] fsnotify: turn fsnotify reaper thread into a workqueue job
On Sun, Feb 14, 2016 at 07:15:23PM -0500, Jeff Layton wrote: > We don't require a dedicated thread for fsnotify cleanup. Switch it over > to a workqueue job instead that runs on the system_unbound_wq. > > In the interest of not thrashing the queued job too often when there are > a lot of marks being removed, we delay the reaper job slightly when > queueing it, to allow several to gather on the list. > > Cc: Jan Kara <j...@suse.com> > Cc: Eric Paris <epa...@parisplace.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Eryu Guan <guane...@gmail.com> > Signed-off-by: Jeff Layton <jeff.lay...@primarydata.com> With these two patches applied on top of r.5-rc3, the same test passed 2-hour stress run, also survived stress test that forks 5 processes running the same test program for 30 minutes. Thanks for looking into this! Eryu
Re: [PATCH 2/2] fsnotify: turn fsnotify reaper thread into a workqueue job
On Sun, Feb 14, 2016 at 07:15:23PM -0500, Jeff Layton wrote: > We don't require a dedicated thread for fsnotify cleanup. Switch it over > to a workqueue job instead that runs on the system_unbound_wq. > > In the interest of not thrashing the queued job too often when there are > a lot of marks being removed, we delay the reaper job slightly when > queueing it, to allow several to gather on the list. > > Cc: Jan Kara > Cc: Eric Paris > Cc: Andrew Morton > Cc: Eryu Guan > Signed-off-by: Jeff Layton With these two patches applied on top of r.5-rc3, the same test passed 2-hour stress run, also survived stress test that forks 5 processes running the same test program for 30 minutes. Thanks for looking into this! Eryu
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Mon, Aug 24, 2015 at 04:24:25PM +1000, Dave Chinner wrote: > On Mon, Aug 24, 2015 at 11:18:16AM +0800, Eryu Guan wrote: > > On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote: > > > > > > Eryu, can you change the way you run the event trace to be: > > > > > > $ sudo trace-cmd -o ./check > > > > > > rather than running the trace as a background operation elsewhere? > > > Maybe that will give better results. [snip] > Anyway, Eryum long and short of it is that you don't need to worry > about testing all the different combinations - we now know that the > completion events are occurring, so let's focus on whether the sync > code is not waiting for them correctly. Can you trace the following > events: > > xfs_log_force > xfs_setfilesize > writeback_queue > writeback_exec > writeback_start > writeback_queue_io > writeback_written > writeback_pages_written > > basically I'm trying to see if we've got all the BDI events as we'd > expect then to be queued and run for sync, and when the ->sync_fs > call occurs during the sync process before shutdown and unmount... I collected two versions of trace info with crc enabled. http://128.199.137.77/writeback-crc/ This version traced the same events as previous runs. http://128.199.137.77/writeback-crc-v2/ And this version only traced the events you listed above. And the results of other tests to check(all done with v4 xfs, with no tracepoints enabled): > Other things to check (separately): > - change godown to godown -f Passed 100 loops. > - add a "sleep 5" before running godown after sync Failed, if you need the trace info please let me know. > - add a "sleep 5; sync" before running godown Passed 100 loops. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Mon, Aug 24, 2015 at 04:24:25PM +1000, Dave Chinner wrote: On Mon, Aug 24, 2015 at 11:18:16AM +0800, Eryu Guan wrote: On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote: Eryu, can you change the way you run the event trace to be: $ sudo trace-cmd options -o outfile location ./check test options rather than running the trace as a background operation elsewhere? Maybe that will give better results. [snip] Anyway, Eryum long and short of it is that you don't need to worry about testing all the different combinations - we now know that the completion events are occurring, so let's focus on whether the sync code is not waiting for them correctly. Can you trace the following events: xfs_log_force xfs_setfilesize writeback_queue writeback_exec writeback_start writeback_queue_io writeback_written writeback_pages_written basically I'm trying to see if we've got all the BDI events as we'd expect then to be queued and run for sync, and when the -sync_fs call occurs during the sync process before shutdown and unmount... I collected two versions of trace info with crc enabled. http://128.199.137.77/writeback-crc/ This version traced the same events as previous runs. http://128.199.137.77/writeback-crc-v2/ And this version only traced the events you listed above. And the results of other tests to check(all done with v4 xfs, with no tracepoints enabled): Other things to check (separately): - change godown to godown -f Passed 100 loops. - add a sleep 5 before running godown after sync Failed, if you need the trace info please let me know. - add a sleep 5; sync before running godown Passed 100 loops. Thanks, Eryu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote: > > Eryu, can you change the way you run the event trace to be: > > $ sudo trace-cmd -o ./check > > rather than running the trace as a background operation elsewhere? > Maybe that will give better results. The results are here http://128.199.137.77/writeback-v3/ > > Also, it would be informative to us if you can reproduce this with a > v5 filesystem (i.e. mkfs.xfs -m crc=1) because it has much better > on-disk information for sequence-of-event triage like this. If you > can reproduce it with a v5 filesystem, can you post the trace and > metadump? This seems to be harder to reproduce with tracepoints enabled, but I'll keep trying, and the tests below. Thanks, Eryu > > Other things to check (separately): > > - change godown to godown -f > - add a "sleep 5" before running godown after sync > - add a "sleep 5; sync" before running godown > > i.e. I'm wondering if sync is not waiting for everything, and so we > aren't capturing the IO completions because the filesystem is > already shut down by the time they are delivered... > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com > > ___ > xfs mailing list > x...@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote: Eryu, can you change the way you run the event trace to be: $ sudo trace-cmd options -o outfile location ./check test options rather than running the trace as a background operation elsewhere? Maybe that will give better results. The results are here http://128.199.137.77/writeback-v3/ Also, it would be informative to us if you can reproduce this with a v5 filesystem (i.e. mkfs.xfs -m crc=1) because it has much better on-disk information for sequence-of-event triage like this. If you can reproduce it with a v5 filesystem, can you post the trace and metadump? This seems to be harder to reproduce with tracepoints enabled, but I'll keep trying, and the tests below. Thanks, Eryu Other things to check (separately): - change godown to godown -f - add a sleep 5 before running godown after sync - add a sleep 5; sync before running godown i.e. I'm wondering if sync is not waiting for everything, and so we aren't capturing the IO completions because the filesystem is already shut down by the time they are delivered... Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ xfs mailing list x...@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Sat, Aug 22, 2015 at 10:30:25AM +1000, Dave Chinner wrote: > On Fri, Aug 21, 2015 at 06:20:53PM +0800, Eryu Guan wrote: [snip] > > Eryu, can you try again, this time manually specifying the writeback > tracepoints so you exclude the really noisy ones? You can also drop > the xfs_file_buffered_write and xfs_file_fsync tracepoints as well, > as we can see that the incoming side of the code is doing the right > thing I excluded the writeback tracepoints you mentioned writeback_mark_inode_dirty writeback_dirty_inode_start writeback_dirty_inode writeback_dirty_page writeback_write_inode and left all other writeback tracepoints enabled, also dropped xfs_file_buffered_write and xfs_file_fsync. This time I can reproduce generic/048 quickly and please download the trace info from below http://128.199.137.77/writeback-v2/ Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: > On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: [snip] > > I'd suggest looking at some of the XFS tracepoints during the test: > > tracepointtrigger > xfs_file_buffered_write once per write syscall > xfs_file_sync once per fsync per inode > xfs_vm_writepage every ->writepage call > xfs_setfilesize every IO completion that updates inode > size > > And it's probably best to also include all the writeback > tracepoints, too, for context. That will tell you what inodes and > what part of them are getting written back and when I finally reproduced generic/048 with both xfs and writeback tracepoints enabled, please download the trace dat file and trace report file from http://128.199.137.77/writeback/ Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Sat, Aug 22, 2015 at 10:30:25AM +1000, Dave Chinner wrote: On Fri, Aug 21, 2015 at 06:20:53PM +0800, Eryu Guan wrote: [snip] Eryu, can you try again, this time manually specifying the writeback tracepoints so you exclude the really noisy ones? You can also drop the xfs_file_buffered_write and xfs_file_fsync tracepoints as well, as we can see that the incoming side of the code is doing the right thing I excluded the writeback tracepoints you mentioned writeback_mark_inode_dirty writeback_dirty_inode_start writeback_dirty_inode writeback_dirty_page writeback_write_inode and left all other writeback tracepoints enabled, also dropped xfs_file_buffered_write and xfs_file_fsync. This time I can reproduce generic/048 quickly and please download the trace info from below http://128.199.137.77/writeback-v2/ Thanks, Eryu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: [snip] I'd suggest looking at some of the XFS tracepoints during the test: tracepointtrigger xfs_file_buffered_write once per write syscall xfs_file_sync once per fsync per inode xfs_vm_writepage every -writepage call xfs_setfilesize every IO completion that updates inode size And it's probably best to also include all the writeback tracepoints, too, for context. That will tell you what inodes and what part of them are getting written back and when I finally reproduced generic/048 with both xfs and writeback tracepoints enabled, please download the trace dat file and trace report file from http://128.199.137.77/writeback/ Thanks, Eryu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Thu, Aug 20, 2015 at 02:12:24PM +0800, Eryu Guan wrote: > On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: > > On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: > > > Hello, > > > > > > On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote: > > > > Hmm... the only possibility I can think of is tot_write_bandwidth > > > > being zero when it shouldn't be. I've been staring at the code for a > > > > while now but nothing rings a bell. Time for another debug patch, I > > > > guess. > > > > > > So, I can now reproduce the bug (it takes a lot of trials but lowering > > > the number of tested files helps quite a bit) and instrumented all the > > > early exit paths w/o the fix patch. bdi_has_dirty_io() and > > > wb_has_dirty_io() are never out of sync with the actual dirty / io > > > lists even when the test 048 fails, so the bug at least is not caused > > > by writeback skipping due to buggy bdi/wb_has_dirty_io() result. > > > Whenever it skips, all the lists are actually empty (verified while > > > holding list_lock). > > > > > > One suspicion I have is that this could be a subtle timing issue which > > > is being exposed by the new short-cut path. Anything which adds delay > > > seems to make the issue go away. Dave, does anything ring a bell? > > > > No, it doesn't. The data writeback mechanisms XFS uses are all > > generic. It marks inodes I_DIRTY_PAGES and lets the generic code > > take care of everything else. Yes, we do delayed allocation during > > writeback, and we log the inode size updates during IO completion, > > so if inode sizes are not getting updated, then Occam's Razor > > suggests that writeback is not happening. > > > > I'd suggest looking at some of the XFS tracepoints during the test: > > > > tracepoint trigger > > xfs_file_buffered_write once per write syscall > > xfs_file_sync once per fsync per inode > > xfs_vm_writepageevery ->writepage call > > xfs_setfilesize every IO completion that updates inode > > size > > I gave the tracepoints a try, but my root fs is xfs so I got many > noises. I'll try to install a new vm with ext4 as root fs. But I'm not > sure if the new vm could reproduce the failure, will see. I installed a new vm with ext4 as root fs and got some trace info. On the new vm, only generic/048 is reproducible, generic/049 always passes. And I can only reproduce generic/048 when xfs tracepoints are enabled, if writeback tracepoints are enabled too, I can no longer reproduce the failure. All tests are done on 4.2-rc7 kernel. This is the trace-cmd I'm using: cd /mnt/ext4 trace-cmd record -e xfs_file_buffered_write \ -e xfs_file_fsync \ -e xfs_writepage \ -e xfs_setfilesize & pushd /path/to/xfstests ./check generic/048 popd kill -s 2 $! trace-cmd report >trace_report.txt I attached three files: 1) xfs-trace-generic-048.txt.bz2[1] trace report result 2) xfs-trace-generic-048.diff generic/048 failure diff output, could know which files has incorrect size 3) xfs-trace-generic-048.metadump.bz2 metadump of SCRATCH_DEV, which contains the test files If more info is needed please let me know. Thanks, Eryu [1] attach this file in a following mail, to avoid xfs list 500k limit --- tests/generic/048.out 2015-08-20 15:00:06.21000 +0800 +++ /root/xfstests/results//generic/048.out.bad 2015-08-20 20:52:58.84700 +0800 @@ -1 +1,9 @@ QA output created by 048 +file /mnt/testarea/scratch/982 has incorrect size - sync failed +file /mnt/testarea/scratch/983 has incorrect size - sync failed +file /mnt/testarea/scratch/984 has incorrect size - sync failed +file /mnt/testarea/scratch/985 has incorrect size - sync failed +file /mnt/testarea/scratch/987 has incorrect size - sync failed +file /mnt/testarea/scratch/989 has incorrect size - sync failed +file /mnt/testarea/scratch/991 has incorrect size - sync failed +file /mnt/testarea/scratch/993 has incorrect size - sync failed xfs-trace-generic-048.metadump.bz2 Description: BZip2 compressed data
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: > On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: > > Hello, > > > > On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote: > > > Hmm... the only possibility I can think of is tot_write_bandwidth > > > being zero when it shouldn't be. I've been staring at the code for a > > > while now but nothing rings a bell. Time for another debug patch, I > > > guess. > > > > So, I can now reproduce the bug (it takes a lot of trials but lowering > > the number of tested files helps quite a bit) and instrumented all the > > early exit paths w/o the fix patch. bdi_has_dirty_io() and > > wb_has_dirty_io() are never out of sync with the actual dirty / io > > lists even when the test 048 fails, so the bug at least is not caused > > by writeback skipping due to buggy bdi/wb_has_dirty_io() result. > > Whenever it skips, all the lists are actually empty (verified while > > holding list_lock). > > > > One suspicion I have is that this could be a subtle timing issue which > > is being exposed by the new short-cut path. Anything which adds delay > > seems to make the issue go away. Dave, does anything ring a bell? > > No, it doesn't. The data writeback mechanisms XFS uses are all > generic. It marks inodes I_DIRTY_PAGES and lets the generic code > take care of everything else. Yes, we do delayed allocation during > writeback, and we log the inode size updates during IO completion, > so if inode sizes are not getting updated, then Occam's Razor > suggests that writeback is not happening. > > I'd suggest looking at some of the XFS tracepoints during the test: > > tracepointtrigger > xfs_file_buffered_write once per write syscall > xfs_file_sync once per fsync per inode > xfs_vm_writepage every ->writepage call > xfs_setfilesize every IO completion that updates inode > size I gave the tracepoints a try, but my root fs is xfs so I got many noises. I'll try to install a new vm with ext4 as root fs. But I'm not sure if the new vm could reproduce the failure, will see. BTW, I guess xfs_vm_writepage should be xfs_writepage, and xfs_file_sync should be xfs_file_fsync? Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: Hello, On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote: Hmm... the only possibility I can think of is tot_write_bandwidth being zero when it shouldn't be. I've been staring at the code for a while now but nothing rings a bell. Time for another debug patch, I guess. So, I can now reproduce the bug (it takes a lot of trials but lowering the number of tested files helps quite a bit) and instrumented all the early exit paths w/o the fix patch. bdi_has_dirty_io() and wb_has_dirty_io() are never out of sync with the actual dirty / io lists even when the test 048 fails, so the bug at least is not caused by writeback skipping due to buggy bdi/wb_has_dirty_io() result. Whenever it skips, all the lists are actually empty (verified while holding list_lock). One suspicion I have is that this could be a subtle timing issue which is being exposed by the new short-cut path. Anything which adds delay seems to make the issue go away. Dave, does anything ring a bell? No, it doesn't. The data writeback mechanisms XFS uses are all generic. It marks inodes I_DIRTY_PAGES and lets the generic code take care of everything else. Yes, we do delayed allocation during writeback, and we log the inode size updates during IO completion, so if inode sizes are not getting updated, then Occam's Razor suggests that writeback is not happening. I'd suggest looking at some of the XFS tracepoints during the test: tracepointtrigger xfs_file_buffered_write once per write syscall xfs_file_sync once per fsync per inode xfs_vm_writepage every -writepage call xfs_setfilesize every IO completion that updates inode size I gave the tracepoints a try, but my root fs is xfs so I got many noises. I'll try to install a new vm with ext4 as root fs. But I'm not sure if the new vm could reproduce the failure, will see. BTW, I guess xfs_vm_writepage should be xfs_writepage, and xfs_file_sync should be xfs_file_fsync? Thanks, Eryu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
On Thu, Aug 20, 2015 at 02:12:24PM +0800, Eryu Guan wrote: On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote: On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote: Hello, On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote: Hmm... the only possibility I can think of is tot_write_bandwidth being zero when it shouldn't be. I've been staring at the code for a while now but nothing rings a bell. Time for another debug patch, I guess. So, I can now reproduce the bug (it takes a lot of trials but lowering the number of tested files helps quite a bit) and instrumented all the early exit paths w/o the fix patch. bdi_has_dirty_io() and wb_has_dirty_io() are never out of sync with the actual dirty / io lists even when the test 048 fails, so the bug at least is not caused by writeback skipping due to buggy bdi/wb_has_dirty_io() result. Whenever it skips, all the lists are actually empty (verified while holding list_lock). One suspicion I have is that this could be a subtle timing issue which is being exposed by the new short-cut path. Anything which adds delay seems to make the issue go away. Dave, does anything ring a bell? No, it doesn't. The data writeback mechanisms XFS uses are all generic. It marks inodes I_DIRTY_PAGES and lets the generic code take care of everything else. Yes, we do delayed allocation during writeback, and we log the inode size updates during IO completion, so if inode sizes are not getting updated, then Occam's Razor suggests that writeback is not happening. I'd suggest looking at some of the XFS tracepoints during the test: tracepoint trigger xfs_file_buffered_write once per write syscall xfs_file_sync once per fsync per inode xfs_vm_writepageevery -writepage call xfs_setfilesize every IO completion that updates inode size I gave the tracepoints a try, but my root fs is xfs so I got many noises. I'll try to install a new vm with ext4 as root fs. But I'm not sure if the new vm could reproduce the failure, will see. I installed a new vm with ext4 as root fs and got some trace info. On the new vm, only generic/048 is reproducible, generic/049 always passes. And I can only reproduce generic/048 when xfs tracepoints are enabled, if writeback tracepoints are enabled too, I can no longer reproduce the failure. All tests are done on 4.2-rc7 kernel. This is the trace-cmd I'm using: cd /mnt/ext4 trace-cmd record -e xfs_file_buffered_write \ -e xfs_file_fsync \ -e xfs_writepage \ -e xfs_setfilesize pushd /path/to/xfstests ./check generic/048 popd kill -s 2 $! trace-cmd report trace_report.txt I attached three files: 1) xfs-trace-generic-048.txt.bz2[1] trace report result 2) xfs-trace-generic-048.diff generic/048 failure diff output, could know which files has incorrect size 3) xfs-trace-generic-048.metadump.bz2 metadump of SCRATCH_DEV, which contains the test files If more info is needed please let me know. Thanks, Eryu [1] attach this file in a following mail, to avoid xfs list 500k limit --- tests/generic/048.out 2015-08-20 15:00:06.21000 +0800 +++ /root/xfstests/results//generic/048.out.bad 2015-08-20 20:52:58.84700 +0800 @@ -1 +1,9 @@ QA output created by 048 +file /mnt/testarea/scratch/982 has incorrect size - sync failed +file /mnt/testarea/scratch/983 has incorrect size - sync failed +file /mnt/testarea/scratch/984 has incorrect size - sync failed +file /mnt/testarea/scratch/985 has incorrect size - sync failed +file /mnt/testarea/scratch/987 has incorrect size - sync failed +file /mnt/testarea/scratch/989 has incorrect size - sync failed +file /mnt/testarea/scratch/991 has incorrect size - sync failed +file /mnt/testarea/scratch/993 has incorrect size - sync failed xfs-trace-generic-048.metadump.bz2 Description: BZip2 compressed data