Re: [PATCH xfstests] generic/192: Move 'cd /' to the place where the program exits

2019-10-13 Thread Eryu Guan
On Wed, Oct 09, 2019 at 04:27:57PM +0800, Zhihao Cheng wrote:
> Running generic/192 with overlayfs(Let ubifs as base fs) yields the
> following output:
> 
>   generic/192 - output mismatch
>  QA output created by 192
>  sleep for 5 seconds
>  test
> +./common/rc: line 316: src/t_dir_type: No such file or directory
>  delta1 is in range
>  delta2 is in range
> ...
> 
> When the use case fails, the call stack in generic/192 is:
> 
>   local unknowns=$(src/t_dir_type $dir u | wc -l) common/rc:316
>   _supports_filetype  common/rc:299
>   _overlay_mount  common/overlay:52
>   _overlay_test_mount common/overlay:93
>   _test_mount common/rc:407
>   _test_cycle_mount   generic/192:50
> 
> Before _test_cycle_mount() being invoked, generic/192 executed 'cd /'
> to change work dir from 'xfstests-dev' to '/', so src/t_dir_type was not
> found.
> 
> Signed-off-by: Zhihao Cheng 

Thanks for the debug! But I think the right fix is to call t_dir_type
via "$here", i.e.

local unknowns=$($here/src/t_dir_type $dir u | wc -l)

'here', which points to the top level dir of xfstests source code, is
defined in every test in test setup, and is guaranteed not to be empty.

Thanks,
Eryu

> ---
>  tests/generic/192 | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/generic/192 b/tests/generic/192
> index 50b3d6fd..5550f39e 100755
> --- a/tests/generic/192
> +++ b/tests/generic/192
> @@ -15,7 +15,12 @@ echo "QA output created by $seq"
>  here=`pwd`
>  tmp=/tmp/$$
>  status=1 # failure is the default!
> -trap "exit \$status" 0 1 2 3 15
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> +}
>  
>  _access_time()
>  {
> @@ -46,7 +51,6 @@ sleep $delay # sleep to allow time to move on for access
>  cat $testfile
>  time2=`_access_time $testfile | tee -a $seqres.full`
>  
> -cd /
>  _test_cycle_mount
>  time3=`_access_time $testfile | tee -a $seqres.full`
>  
> -- 
> 2.13.6
> 


Re: [PATCH xfstests v3] overlay: Enable character device to be the base fs partition

2019-09-25 Thread Eryu Guan
On Wed, Sep 25, 2019 at 10:18:39AM +0300, Amir Goldstein wrote:
> On Wed, Sep 25, 2019 at 9:29 AM Zhihao Cheng  wrote:
> >
> > When running overlay tests using character devices as base fs partitions,
> > all overlay usecase results become 'notrun'. Function
> > '_overay_config_override' (common/config) detects that the current base
> > fs partition is not a block device and will set FSTYP to base fs. The
> > overlay usecase will check the current FSTYP, and if it is not 'overlay'
> > or 'generic', it will skip the execution.
> >
> > For example, using UBIFS as base fs skips all overlay usecases:
> >
> >   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
> >   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
> >   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> >
> >   overlay/001   [not run] not suitable for this filesystem type: ubifs
> >   overlay/002   [not run] not suitable for this filesystem type: ubifs
> >   overlay/003   [not run] not suitable for this filesystem type: ubifs
> >
> > When checking that the base fs partition is a block/character device,
> > FSTYP is overwritten as 'overlay'. This patch allows the base fs
> > partition to be a character device that can also execute overlay
> > usecases (such as ubifs).
> >
> > Signed-off-by: Zhihao Cheng 
> > Signed-off-by: Amir Goldstein 
> 
> Looks fine.
> Eryu, you may change this to Reviewed-by

Sure, thanks for the review!

Eryu


Re: [PATCH xfstests] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Eryu Guan
On Tue, Sep 24, 2019 at 10:19:38PM +0800, Zhihao Cheng wrote:
> As far as I know, _require_scratch_shutdown() is called after 
> _overay_config_override(), at this moment, FSTYP equals to base fs. According 
> the implementation of _require_scratch_shutdown:
> 3090 _require_scratch_shutdown()
> 3091 {
> 3092 [ -x src/godown ] || _notrun "src/godown executable not found"
> 3093
> 3094 _scratch_mkfs > /dev/null 2>&1 || _notrun "_scratch_mkfs failed on 
> $SCRATCH_DEV"
> 3095 _scratch_mount
> 3096
> 3097 if [ $FSTYP = "overlay" ]; then  
>   # FSTYP = base fs
> 3098 if [ -z $OVL_BASE_SCRATCH_DEV ]; then
> 3099 # In lagacy overlay usage, it may specify directory as
> 3100 # SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
> 3101 # will be null, so check OVL_BASE_SCRATCH_DEV before
> 3102 # running shutdown to avoid shutting down base fs accidently.
> 3103 _notrun "$SCRATCH_DEV is not a block device"
> 3104 else
> 3105 src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
> 3106 || _notrun "Underlying filesystem does not support shutdown"
> 3107 fi
> 3108 else
> 3109 src/godown -f $SCRATCH_MNT 2>&1 \
> 3110 || _notrun "$FSTYP does not support shutdown"
>   # Executes this path
> 3111 fi
> 3112
> 3113 _scratch_unmount
> 3114 }
> So, we can't get output: _notrun "$SCRATCH_DEV is not a block device". 
> Instead, the verbose should like:
>   after _overlay_config_override FSTYP=ubifs# Additional print message
>   FSTYP -- ubifs
>   PLATFORM  -- Linux/x86_64
>   MKFS_OPTIONS  -- /dev/ubi0_1
>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> 
>   generic/042 [not run] ubifs does not support shutdown
> 
> But I'll consider describing error more concisely in v2.
> 
> 在 2019/9/24 20:33, Amir Goldstein 写道:
> > On Tue, Sep 24, 2019 at 12:34 PM Zhihao Cheng  
> > wrote:
> >>
> >> When running overlay tests using character devices as base fs partitions,
> >> all overlay usecase results become 'notrun'. Function
> >> '_overay_config_override' (common/config) detects that the current base
> >> fs partition is not a block device and will set FSTYP to base fs. The
> >> overlay usecase will check the current FSTYP, and if it is not 'overlay'
> >> or 'generic', it will skip the execution.
> >>
> >> For example, using UBIFS as base fs skips all overlay usecases:
> >>
> >>   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
> >>   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
> >>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> >>
> >>   overlay/001   [not run] not suitable for this filesystem type: ubifs
> >>   overlay/002   [not run] not suitable for this filesystem type: ubifs
> >>   overlay/003   [not run] not suitable for this filesystem type: ubifs
> >>   ...
> >>
> >> When checking that the base fs partition is a block/character device,
> >> FSTYP is overwritten as 'overlay'. This patch allows the base fs
> >> partition to be a character device that can also execute overlay
> >> usecases (such as ubifs).
> >>
> >> Signed-off-by: Zhihao Cheng 
> >> ---
> >>  common/config | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/common/config b/common/config
> >> index 4c86a49..a22acdb 100644
> >> --- a/common/config
> >> +++ b/common/config
> >> @@ -550,7 +550,7 @@ _overlay_config_override()
> >> #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the 
> >> values
> >> #of the configured base fs and SCRATCH/TEST_DEV vars are set 
> >> to the
> >> #overlayfs base and mount dirs inside base fs mount.
> >> -   [ -b "$TEST_DEV" ] || return 0
> >> +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
> >>
> >> # Config file may specify base fs type, but we obay -overlay flag
> >> [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> >> @@ -570,7 +570,7 @@ _overlay_config_override()
> >> export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> >> export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
> >>
> >> -   [ -b "$SCRATCH_DEV" ] || return 0
> >> +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
> >>
> >> # Store original base fs vars
> >> export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> >> --
> >> 2.7.4
> >>
> > 
> > Looks fine.
> > 
> > One nit: there is a message in _require_scratch_shutdown():
> > _notrun "$SCRATCH_DEV is not a block device"
> > for when $OVL_BASE_SCRATCH_DEV is not defined.
> > 
> > Could probably use a better describing error anyway.

I think what Amir suggested is that, as you add char device support to
overlay base device, the message in _require_scratch_shutdown() should
be updated accordingly, not the commit log.

Thanks,
Eryu


Re: [PATCH] generic/381: enable on systems which allows usernames that begin with digits

2017-12-21 Thread Eryu Guan
On Fri, Dec 15, 2017 at 12:41:07PM -0800, Luis R. Rodriguez wrote:
> Some systems are not allowing usernames prefixed with a number now, this
> test however relies on the assumption that you can end up with usernames
> of such type, given the purpose of the test is to ensure that xfs_quota
> can differentiate between UIDs and names beginning with numbers.
> 
> systemd >= 232 (circa 2017) no longer allows usernames starting with digits
> [0], there is a systemd exploit (CVE-2017-182 [1]) for why that was done,
> however even upstream shadow useradd also does not allow similar user types
> since shadow version v4.0.1 (circa 2007) [2] but there no easy way to check
> shadow's useradd's version.
> 
> You can still shoehorn in these types of users by manually editing files,
> but that's just shooting yourself on the foot given all the precautions
> taken now by userspace, so just check for the systemd version for now as
> requirement for running this test.
> 
> [0] https://github.com/systemd/systemd/issues/6237
> [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-182
> [2] 
> https://github.com/shadow-maint/shadow/commit/9db6abfa42c946b4046f4b2fe67dc43ba862eb0e
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  README|  7 +--
>  common/config |  1 +
>  common/rc | 42 ++
>  tests/generic/381 |  1 +
>  4 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/README b/README
> index ed69332e774e..aff7bdae7cb4 100644
> --- a/README
> +++ b/README
> @@ -20,8 +20,11 @@ ___
>  - run make
>  - run make install
>  - create fsgqa test user ("sudo useradd fsgqa")
> -- create 123456-fsgqa test user ("sudo useradd 123456-fsgqa")
> - 
> +- Only on systems which allow usernames that start with a digit (older
> +  than  systemd 232 and/or has shadow older than v4.0.1), create the
> +  123456-fsgqa test user:
> +sudo useradd 123456-fsgqa
> +

IMHO, this doc update is sufficient, generic/381 already _notrun if
there's no 123456-fsgqa user present because of

_require_user 123456-fsgqa

And we don't rely on any version check in fstests, usually we check on
the actual behavior, e.g. actually mkfs & mount the fs to see if the
current kernel and userspace support a given feature.

Thanks,
Eryu


Re: [PATCH] generic/381: enable on systems which allows usernames that begin with digits

2017-12-21 Thread Eryu Guan
On Fri, Dec 15, 2017 at 12:41:07PM -0800, Luis R. Rodriguez wrote:
> Some systems are not allowing usernames prefixed with a number now, this
> test however relies on the assumption that you can end up with usernames
> of such type, given the purpose of the test is to ensure that xfs_quota
> can differentiate between UIDs and names beginning with numbers.
> 
> systemd >= 232 (circa 2017) no longer allows usernames starting with digits
> [0], there is a systemd exploit (CVE-2017-182 [1]) for why that was done,
> however even upstream shadow useradd also does not allow similar user types
> since shadow version v4.0.1 (circa 2007) [2] but there no easy way to check
> shadow's useradd's version.
> 
> You can still shoehorn in these types of users by manually editing files,
> but that's just shooting yourself on the foot given all the precautions
> taken now by userspace, so just check for the systemd version for now as
> requirement for running this test.
> 
> [0] https://github.com/systemd/systemd/issues/6237
> [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-182
> [2] 
> https://github.com/shadow-maint/shadow/commit/9db6abfa42c946b4046f4b2fe67dc43ba862eb0e
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  README|  7 +--
>  common/config |  1 +
>  common/rc | 42 ++
>  tests/generic/381 |  1 +
>  4 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/README b/README
> index ed69332e774e..aff7bdae7cb4 100644
> --- a/README
> +++ b/README
> @@ -20,8 +20,11 @@ ___
>  - run make
>  - run make install
>  - create fsgqa test user ("sudo useradd fsgqa")
> -- create 123456-fsgqa test user ("sudo useradd 123456-fsgqa")
> - 
> +- Only on systems which allow usernames that start with a digit (older
> +  than  systemd 232 and/or has shadow older than v4.0.1), create the
> +  123456-fsgqa test user:
> +sudo useradd 123456-fsgqa
> +

IMHO, this doc update is sufficient, generic/381 already _notrun if
there's no 123456-fsgqa user present because of

_require_user 123456-fsgqa

And we don't rely on any version check in fstests, usually we check on
the actual behavior, e.g. actually mkfs & mount the fs to see if the
current kernel and userspace support a given feature.

Thanks,
Eryu


Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build

2017-12-14 Thread Eryu Guan
On Thu, Dec 14, 2017 at 06:55:03PM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 14, 2017 at 01:51:02PM +0800, Eryu Guan wrote:
> > On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote:
> > > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h.
> > > The old m4 macro had detection support for some old gdbm libraries
> > > but not for new ones.
> > > 
> > > We fix compilation of src/dbtest.c by making the autoconf helper
> > > check for this new arrangement:
> > > 
> > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true,
> >  ^^ ndbm.h?
> > > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already
> > > had a HAVE_GDBM_H but there was never a respective autoconf settter for
> > > it. We can just re-use this and fix it for new arrangement.
> > > 
> > > Signed-off-by: Luis R. Rodriguez <mcg...@kernel.org>
> > 
> > This looks fine to me.
> > 
> > The only system I have by hand that have both  and  but
> > not any  is openSUSE Tumbleweed.
> 
> Indeed, openSUSE and SLE releases.
> 
> > Without this patch,
> > dbtest was not built on openSUSE, and was built successfully with this
> > patch applied.
> 
> Yeap.
> 
> > And dbtest is still built on RHEL6/7 and Fedora.
> 
> Feel free to modify the commit log accordingly then. Curious, what packages
> does Fedora/ RHEL6/7 use for the requirement here?
> 
> We just have one:
> 
> $ rpm -ql gdbm-devel-1.12-1.282.x86_64
> /usr/bin/gdbm_dump
> /usr/bin/gdbm_load
> /usr/bin/gdbmtool
> /usr/include/dbm.h
> /usr/include/gdbm.h
> /usr/include/ndbm.h
> /usr/lib64/libgdbm.a
> /usr/lib64/libgdbm.so
> /usr/lib64/libgdbm_compat.a
> /usr/lib64/libgdbm_compat.so
> /usr/lib64/libndbm.a
> /usr/lib64/libndbm.so
> /usr/share/info/gdbm.info.gz
> /usr/share/man/man1/gdbm_dump.1.gz
> /usr/share/man/man1/gdbm_load.1.gz
> /usr/share/man/man1/gdbmtool.1.gz
> /usr/share/man/man3/gdbm.3.gz

gdbm-devel too, but it has gdbm/[gn]dbm.h pointing to ../[gn]dbm.h, so
there's no such problem and dbtest is building normally.

# rpm -ql gdbm-devel
/usr/include/dbm.h
/usr/include/gdbm
/usr/include/gdbm.h
/usr/include/gdbm/dbm.h
/usr/include/gdbm/gdbm.h
/usr/include/gdbm/ndbm.h
/usr/include/ndbm.h
/usr/lib64/libgdbm.so
/usr/lib64/libgdbm_compat.so
/usr/share/info/gdbm.info.gz
/usr/share/man/man3/gdbm.3.gz

> 
> > BTW, I'll queue patch 3 and this patch for next fstests release, while
> > other patches seem not necessary,
> 
> I think patch 2 is fine too.
> 
> > I agreed with Dave that groups are not
> > for excluding tests, the required tools and environments should be
> > detected by tests and _notrun if not met.
> 
> Yeah makes sense now. I think we should also document when adding
> a group makes sense as well.
> 
> > (The README change looks fine,
> > but it doesn't apply due to the "fsgqa-381" change, so I drop it too for
> > now.)
> 
> Feel free to modify it, its not a big deal.

OK, I'll modify on commit, thanks!

Thanks,
Eryu


Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build

2017-12-14 Thread Eryu Guan
On Thu, Dec 14, 2017 at 06:55:03PM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 14, 2017 at 01:51:02PM +0800, Eryu Guan wrote:
> > On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote:
> > > Modern gdbm-devel packages bundle together gdbm.h and ndbm.h.
> > > The old m4 macro had detection support for some old gdbm libraries
> > > but not for new ones.
> > > 
> > > We fix compilation of src/dbtest.c by making the autoconf helper
> > > check for this new arrangement:
> > > 
> > > If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true,
> >  ^^ ndbm.h?
> > > and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already
> > > had a HAVE_GDBM_H but there was never a respective autoconf settter for
> > > it. We can just re-use this and fix it for new arrangement.
> > > 
> > > Signed-off-by: Luis R. Rodriguez 
> > 
> > This looks fine to me.
> > 
> > The only system I have by hand that have both  and  but
> > not any  is openSUSE Tumbleweed.
> 
> Indeed, openSUSE and SLE releases.
> 
> > Without this patch,
> > dbtest was not built on openSUSE, and was built successfully with this
> > patch applied.
> 
> Yeap.
> 
> > And dbtest is still built on RHEL6/7 and Fedora.
> 
> Feel free to modify the commit log accordingly then. Curious, what packages
> does Fedora/ RHEL6/7 use for the requirement here?
> 
> We just have one:
> 
> $ rpm -ql gdbm-devel-1.12-1.282.x86_64
> /usr/bin/gdbm_dump
> /usr/bin/gdbm_load
> /usr/bin/gdbmtool
> /usr/include/dbm.h
> /usr/include/gdbm.h
> /usr/include/ndbm.h
> /usr/lib64/libgdbm.a
> /usr/lib64/libgdbm.so
> /usr/lib64/libgdbm_compat.a
> /usr/lib64/libgdbm_compat.so
> /usr/lib64/libndbm.a
> /usr/lib64/libndbm.so
> /usr/share/info/gdbm.info.gz
> /usr/share/man/man1/gdbm_dump.1.gz
> /usr/share/man/man1/gdbm_load.1.gz
> /usr/share/man/man1/gdbmtool.1.gz
> /usr/share/man/man3/gdbm.3.gz

gdbm-devel too, but it has gdbm/[gn]dbm.h pointing to ../[gn]dbm.h, so
there's no such problem and dbtest is building normally.

# rpm -ql gdbm-devel
/usr/include/dbm.h
/usr/include/gdbm
/usr/include/gdbm.h
/usr/include/gdbm/dbm.h
/usr/include/gdbm/gdbm.h
/usr/include/gdbm/ndbm.h
/usr/include/ndbm.h
/usr/lib64/libgdbm.so
/usr/lib64/libgdbm_compat.so
/usr/share/info/gdbm.info.gz
/usr/share/man/man3/gdbm.3.gz

> 
> > BTW, I'll queue patch 3 and this patch for next fstests release, while
> > other patches seem not necessary,
> 
> I think patch 2 is fine too.
> 
> > I agreed with Dave that groups are not
> > for excluding tests, the required tools and environments should be
> > detected by tests and _notrun if not met.
> 
> Yeah makes sense now. I think we should also document when adding
> a group makes sense as well.
> 
> > (The README change looks fine,
> > but it doesn't apply due to the "fsgqa-381" change, so I drop it too for
> > now.)
> 
> Feel free to modify it, its not a big deal.

OK, I'll modify on commit, thanks!

Thanks,
Eryu


Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build

2017-12-13 Thread Eryu Guan
On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote:
> Modern gdbm-devel packages bundle together gdbm.h and ndbm.h.
> The old m4 macro had detection support for some old gdbm libraries
> but not for new ones.
> 
> We fix compilation of src/dbtest.c by making the autoconf helper
> check for this new arrangement:
> 
> If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true,
 ^^ ndbm.h?
> and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already
> had a HAVE_GDBM_H but there was never a respective autoconf settter for
> it. We can just re-use this and fix it for new arrangement.
> 
> Signed-off-by: Luis R. Rodriguez 

This looks fine to me.

The only system I have by hand that have both  and  but
not any  is openSUSE Tumbleweed. Without this patch,
dbtest was not built on openSUSE, and was built successfully with this
patch applied. And dbtest is still built on RHEL6/7 and Fedora.

BTW, I'll queue patch 3 and this patch for next fstests release, while
other patches seem not necessary, I agreed with Dave that groups are not
for excluding tests, the required tools and environments should be
detected by tests and _notrun if not met. (The README change looks fine,
but it doesn't apply due to the "fsgqa-381" change, so I drop it too for
now.)

Thanks,
Eryu


Re: [PATCH 4/9] build: update AC_PACKAGE_WANT_GDBM() and src/dbtest.c to build

2017-12-13 Thread Eryu Guan
On Tue, Dec 12, 2017 at 04:45:14PM -0800, Luis R. Rodriguez wrote:
> Modern gdbm-devel packages bundle together gdbm.h and ndbm.h.
> The old m4 macro had detection support for some old gdbm libraries
> but not for new ones.
> 
> We fix compilation of src/dbtest.c by making the autoconf helper
> check for this new arrangement:
> 
> If both gdbm.h and gdbm.h are found define set both gdbm_ndbm_=true,
 ^^ ndbm.h?
> and have_db=true, and define HAVE_GDBM_H. The src/dbtest.c already
> had a HAVE_GDBM_H but there was never a respective autoconf settter for
> it. We can just re-use this and fix it for new arrangement.
> 
> Signed-off-by: Luis R. Rodriguez 

This looks fine to me.

The only system I have by hand that have both  and  but
not any  is openSUSE Tumbleweed. Without this patch,
dbtest was not built on openSUSE, and was built successfully with this
patch applied. And dbtest is still built on RHEL6/7 and Fedora.

BTW, I'll queue patch 3 and this patch for next fstests release, while
other patches seem not necessary, I agreed with Dave that groups are not
for excluding tests, the required tools and environments should be
detected by tests and _notrun if not met. (The README change looks fine,
but it doesn't apply due to the "fsgqa-381" change, so I drop it too for
now.)

Thanks,
Eryu


[PATCH] lib/string: avoid reading beyond src buffer in strscpy

2017-12-07 Thread Eryu Guan
strscpy() tries to copy sizeof(unsigned long) bytes a time from src
to dest when possible, and stops the loop when 'max' is less than
sizeof(unsigned long). But it doesn't check if (src+res) goes beyond
src buffer and does out-of-bound access to the underlying memory.

KASAN reported global-out-of-bound bug when reading seccomp
actions_logged file in procfs:

  cat /proc/sys/kernel/seccomp/actions_logged

Because seccomp_names_from_actions_logged() is copying short strings
(less than sizeof(unsigned long)) to buffer 'names'. e.g.

  ret = strscpy(names, " ", size);

Fixed by capping the 'max' value according to the src buffer size,
to make sure we won't go beyond src buffer.

Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Chris Metcalf <cmetc...@ezchip.com>
Cc: Kees Cook <keesc...@chromium.org>
Signed-off-by: Eryu Guan <eg...@redhat.com>
---
 lib/string.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 64a9e33f1daa..13a0147eea00 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -179,6 +179,7 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
 {
const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
size_t max = count;
+   size_t src_sz = strlen(src) + 1;
long res = 0;
 
if (count == 0)
@@ -200,6 +201,10 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
 #endif
 
+   /* avoid reading beyond src buffer */
+   if (max > src_sz)
+   max = src_sz;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;
 
-- 
2.14.3



[PATCH] lib/string: avoid reading beyond src buffer in strscpy

2017-12-07 Thread Eryu Guan
strscpy() tries to copy sizeof(unsigned long) bytes a time from src
to dest when possible, and stops the loop when 'max' is less than
sizeof(unsigned long). But it doesn't check if (src+res) goes beyond
src buffer and does out-of-bound access to the underlying memory.

KASAN reported global-out-of-bound bug when reading seccomp
actions_logged file in procfs:

  cat /proc/sys/kernel/seccomp/actions_logged

Because seccomp_names_from_actions_logged() is copying short strings
(less than sizeof(unsigned long)) to buffer 'names'. e.g.

  ret = strscpy(names, " ", size);

Fixed by capping the 'max' value according to the src buffer size,
to make sure we won't go beyond src buffer.

Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Kees Cook 
Signed-off-by: Eryu Guan 
---
 lib/string.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 64a9e33f1daa..13a0147eea00 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -179,6 +179,7 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
 {
const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
size_t max = count;
+   size_t src_sz = strlen(src) + 1;
long res = 0;
 
if (count == 0)
@@ -200,6 +201,10 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
 #endif
 
+   /* avoid reading beyond src buffer */
+   if (max > src_sz)
+   max = src_sz;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;
 
-- 
2.14.3



Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

2017-10-31 Thread Eryu Guan
On Tue, Oct 31, 2017 at 01:10:41AM +0100, Fengguang Wu wrote:
> Hi Eryu,
> 
> On Mon, Oct 30, 2017 at 03:44:29PM +0800, Eryu Guan wrote:
> > Hi Fengguang,
> > 
> > On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote:
> > > CC fsdevel.
> > > 
> > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > > > Hi Linus,
> > > >
> > > > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > > >
> > > > They hit the RC release mainly due to various imperfections in 0day's
> > > > auto bisection. So I manually list them here and CC the likely easy to
> > > > debug ones to the corresponding maintainers in the followup emails.
> > > >
> > > > boot_successes: 4700
> > > > boot_failures: 247
> > > 
> > > [...]
> > > 
> > > > WARNING:at_fs/direct-io.c:#dio_complete: 7
> > > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3
> > > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1
> > > 
> > > The first warning happens on btrfs and is bisected to this commit.
> > > The other 2 warnings happen on xfs.
> > 
> > I noticed that the warnings are triggered by generic/095 and
> > generic/208, they're known to generate such warnings and I think these
> > warnings are kind of 'known issue', please see comments above
> > _filter_aiodio_dmesg() in fstests/common/filter.
> > 
> > Please make sure your local fstests contains the following 3 commits:
> > 
> > ca93e26865ab common: move _filter_xfs_dmesg() to common/filter
> > 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg()
> > 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs
> 
> OK.
> 
> > we filtered out such warnings in fstests on purpose so the affected
> > tests won't fail because of such dmesg warnings.
> 
> We may also teach 0day robot to ignore the warning when running the
> above 2 fstests.
> 
> The more generic way of filtering may be to inject a dmesg line like
> 
>THIS TEST WILL TRIGGER KERNEL WARNING, PLEASE IGNORE.
> 
> just before the specific tests startup. Then 3rd party dmesg parsing
> scripts can handle such purpose-made warnings in a uniform way.

fstests doesn't know, prior to the test, if the warnings the test is
going to trigger are intentional or real bugs, fstests records the dmesg
log and analyzes the log after test and reports PASS if all the warnings
are intentional (based on the whitelist filter).

But I think it's possible to insert such a message to dmesg *after* test
if fstests finds that all the warnings are intentional. Does that work
for 0day robot?

Thanks,
Eryu


Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

2017-10-31 Thread Eryu Guan
On Tue, Oct 31, 2017 at 01:10:41AM +0100, Fengguang Wu wrote:
> Hi Eryu,
> 
> On Mon, Oct 30, 2017 at 03:44:29PM +0800, Eryu Guan wrote:
> > Hi Fengguang,
> > 
> > On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote:
> > > CC fsdevel.
> > > 
> > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > > > Hi Linus,
> > > >
> > > > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > > >
> > > > They hit the RC release mainly due to various imperfections in 0day's
> > > > auto bisection. So I manually list them here and CC the likely easy to
> > > > debug ones to the corresponding maintainers in the followup emails.
> > > >
> > > > boot_successes: 4700
> > > > boot_failures: 247
> > > 
> > > [...]
> > > 
> > > > WARNING:at_fs/direct-io.c:#dio_complete: 7
> > > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3
> > > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1
> > > 
> > > The first warning happens on btrfs and is bisected to this commit.
> > > The other 2 warnings happen on xfs.
> > 
> > I noticed that the warnings are triggered by generic/095 and
> > generic/208, they're known to generate such warnings and I think these
> > warnings are kind of 'known issue', please see comments above
> > _filter_aiodio_dmesg() in fstests/common/filter.
> > 
> > Please make sure your local fstests contains the following 3 commits:
> > 
> > ca93e26865ab common: move _filter_xfs_dmesg() to common/filter
> > 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg()
> > 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs
> 
> OK.
> 
> > we filtered out such warnings in fstests on purpose so the affected
> > tests won't fail because of such dmesg warnings.
> 
> We may also teach 0day robot to ignore the warning when running the
> above 2 fstests.
> 
> The more generic way of filtering may be to inject a dmesg line like
> 
>THIS TEST WILL TRIGGER KERNEL WARNING, PLEASE IGNORE.
> 
> just before the specific tests startup. Then 3rd party dmesg parsing
> scripts can handle such purpose-made warnings in a uniform way.

fstests doesn't know, prior to the test, if the warnings the test is
going to trigger are intentional or real bugs, fstests records the dmesg
log and analyzes the log after test and reports PASS if all the warnings
are intentional (based on the whitelist filter).

But I think it's possible to insert such a message to dmesg *after* test
if fstests finds that all the warnings are intentional. Does that work
for 0day robot?

Thanks,
Eryu


Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

2017-10-30 Thread Eryu Guan
Hi Fengguang,

On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote:
> CC fsdevel.
> 
> On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > Hi Linus,
> > 
> > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > 
> > They hit the RC release mainly due to various imperfections in 0day's
> > auto bisection. So I manually list them here and CC the likely easy to
> > debug ones to the corresponding maintainers in the followup emails.
> > 
> > boot_successes: 4700
> > boot_failures: 247
> 
> [...]
> 
> > WARNING:at_fs/direct-io.c:#dio_complete: 7
> > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3
> > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1
> 
> The first warning happens on btrfs and is bisected to this commit.
> The other 2 warnings happen on xfs.

I noticed that the warnings are triggered by generic/095 and
generic/208, they're known to generate such warnings and I think these
warnings are kind of 'known issue', please see comments above
_filter_aiodio_dmesg() in fstests/common/filter.

Please make sure your local fstests contains the following 3 commits:

ca93e26865ab common: move _filter_xfs_dmesg() to common/filter
5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg()
228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs

we filtered out such warnings in fstests on purpose so the affected
tests won't fail because of such dmesg warnings.

Thanks,
Eryu

> 
> commit 332391a9935da939319e473b4680e173df75afcf
> Author: Lukas Czerner 
> AuthorDate: Thu Sep 21 08:16:29 2017 -0600
> Commit: Jens Axboe 
> CommitDate: Mon Sep 25 08:56:05 2017 -0600
> 
>fs: Fix page cache inconsistency when mixing buffered and AIO DIO
> 
>Currently when mixing buffered reads and asynchronous direct writes it
>is possible to end up with the situation where we have stale data in the
>page cache while the new data is already written to disk. This is
>permanent until the affected pages are flushed away. Despite the fact
>that mixing buffered and direct IO is ill-advised it does pose a thread
>for a data integrity, is unexpected and should be fixed.
> 
>Fix this by deferring completion of asynchronous direct writes to a
>process context in the case that there are mapped pages to be found in
>the inode. Later before the completion in dio_complete() invalidate
>the pages in question. This ensures that after the completion the pages
>in the written area are either unmapped, or populated with up-to-date
>data. Also do the same for the iomap case which uses
>iomap_dio_complete() instead.
> 
>This has a side effect of deferring the completion to a process context
>for every AIO DIO that happens on inode that has pages mapped. However
>since the consensus is that this is ill-advised practice the performance
>implication should not be a problem.
> 
>This was based on proposal from Jeff Moyer, thanks!
> 
>Reviewed-by: Jan Kara 
>Reviewed-by: Darrick J. Wong 
>Reviewed-by: Jeff Moyer 
>Signed-off-by: Lukas Czerner 
>Signed-off-by: Jens Axboe 
> ---
> fs/direct-io.c | 47 ++-
> fs/iomap.c | 29 -
> mm/filemap.c   |  6 ++
> 3 files changed, 64 insertions(+), 18 deletions(-)
> 
> The call traces are:
> 
> [  334.461955] BTRFS info (device vdb): has skinny extents
> [  334.463231] BTRFS info (device vdb): flagging fs with big metadata feature
> [  334.469746] BTRFS info (device vdb): creating UUID tree
> [  336.190840] [ cut here ]
> [  336.193338] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 
> dio_complete+0x1d4/0x220
> [  336.196925] Modules linked in: btrfs xor zstd_decompress zstd_compress 
> xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod 
> cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore 
> serio_raw parport_pc pcspkr parport floppy ata_piix libata i2c_piix4 ip_tables
> [  336.203746] CPU: 0 PID: 6379 Comm: kworker/0:0 Not tainted 4.14.0-rc6 #1
> [  336.204799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [  336.207480] Workqueue: dio/vdb dio_aio_complete_work
> [  336.208373] task: 880079094a80 task.stack: c995
> [  336.210495] RIP: 0010:dio_complete+0x1d4/0x220
> [  336.211288] RSP: 0018:c9953e20 EFLAGS: 00010286
> [  336.213145] RAX: fff0 RBX: 8800aae8c780 RCX: 
> c9953c70
> [  336.214232] RDX: 8000 RSI: 02b4 RDI: 
> 81cb9ccb
> [  336.216605] RBP: c9953e48 R08:  R09: 
> 880079c8b6c0
> [  336.217726] R10: 0074 R11:  R12: 
> 2000
> [  336.220141] R13: 2000 R14: 0003 R15: 
> 

Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

2017-10-30 Thread Eryu Guan
Hi Fengguang,

On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote:
> CC fsdevel.
> 
> On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > Hi Linus,
> > 
> > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > 
> > They hit the RC release mainly due to various imperfections in 0day's
> > auto bisection. So I manually list them here and CC the likely easy to
> > debug ones to the corresponding maintainers in the followup emails.
> > 
> > boot_successes: 4700
> > boot_failures: 247
> 
> [...]
> 
> > WARNING:at_fs/direct-io.c:#dio_complete: 7
> > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3
> > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1
> 
> The first warning happens on btrfs and is bisected to this commit.
> The other 2 warnings happen on xfs.

I noticed that the warnings are triggered by generic/095 and
generic/208, they're known to generate such warnings and I think these
warnings are kind of 'known issue', please see comments above
_filter_aiodio_dmesg() in fstests/common/filter.

Please make sure your local fstests contains the following 3 commits:

ca93e26865ab common: move _filter_xfs_dmesg() to common/filter
5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg()
228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs

we filtered out such warnings in fstests on purpose so the affected
tests won't fail because of such dmesg warnings.

Thanks,
Eryu

> 
> commit 332391a9935da939319e473b4680e173df75afcf
> Author: Lukas Czerner 
> AuthorDate: Thu Sep 21 08:16:29 2017 -0600
> Commit: Jens Axboe 
> CommitDate: Mon Sep 25 08:56:05 2017 -0600
> 
>fs: Fix page cache inconsistency when mixing buffered and AIO DIO
> 
>Currently when mixing buffered reads and asynchronous direct writes it
>is possible to end up with the situation where we have stale data in the
>page cache while the new data is already written to disk. This is
>permanent until the affected pages are flushed away. Despite the fact
>that mixing buffered and direct IO is ill-advised it does pose a thread
>for a data integrity, is unexpected and should be fixed.
> 
>Fix this by deferring completion of asynchronous direct writes to a
>process context in the case that there are mapped pages to be found in
>the inode. Later before the completion in dio_complete() invalidate
>the pages in question. This ensures that after the completion the pages
>in the written area are either unmapped, or populated with up-to-date
>data. Also do the same for the iomap case which uses
>iomap_dio_complete() instead.
> 
>This has a side effect of deferring the completion to a process context
>for every AIO DIO that happens on inode that has pages mapped. However
>since the consensus is that this is ill-advised practice the performance
>implication should not be a problem.
> 
>This was based on proposal from Jeff Moyer, thanks!
> 
>Reviewed-by: Jan Kara 
>Reviewed-by: Darrick J. Wong 
>Reviewed-by: Jeff Moyer 
>Signed-off-by: Lukas Czerner 
>Signed-off-by: Jens Axboe 
> ---
> fs/direct-io.c | 47 ++-
> fs/iomap.c | 29 -
> mm/filemap.c   |  6 ++
> 3 files changed, 64 insertions(+), 18 deletions(-)
> 
> The call traces are:
> 
> [  334.461955] BTRFS info (device vdb): has skinny extents
> [  334.463231] BTRFS info (device vdb): flagging fs with big metadata feature
> [  334.469746] BTRFS info (device vdb): creating UUID tree
> [  336.190840] [ cut here ]
> [  336.193338] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 
> dio_complete+0x1d4/0x220
> [  336.196925] Modules linked in: btrfs xor zstd_decompress zstd_compress 
> xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod 
> cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore 
> serio_raw parport_pc pcspkr parport floppy ata_piix libata i2c_piix4 ip_tables
> [  336.203746] CPU: 0 PID: 6379 Comm: kworker/0:0 Not tainted 4.14.0-rc6 #1
> [  336.204799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-1 04/01/2014
> [  336.207480] Workqueue: dio/vdb dio_aio_complete_work
> [  336.208373] task: 880079094a80 task.stack: c995
> [  336.210495] RIP: 0010:dio_complete+0x1d4/0x220
> [  336.211288] RSP: 0018:c9953e20 EFLAGS: 00010286
> [  336.213145] RAX: fff0 RBX: 8800aae8c780 RCX: 
> c9953c70
> [  336.214232] RDX: 8000 RSI: 02b4 RDI: 
> 81cb9ccb
> [  336.216605] RBP: c9953e48 R08:  R09: 
> 880079c8b6c0
> [  336.217726] R10: 0074 R11:  R12: 
> 2000
> [  336.220141] R13: 2000 R14: 0003 R15: 
> 00074000
> [  336.221232] FS:  () GS:88013fc0() 
> knlGS:
> [  336.223915] CS:  

Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Eryu Guan
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > One thing that comes up a lot every LSF is the fact that we have no 
> > > > general way
> > > > that we do performance testing.  Every fs developer has a set of 
> > > > scripts or
> > > > things that they run with varying degrees of consistency, but nothing 
> > > > central
> > > > that we all use.  I for one am getting tired of finding regressions 
> > > > when we are
> > > > deploying new kernels internally, so I wired this thing up to try and 
> > > > address
> > > > this need.
> > > > 
> > > > We all hate convoluted setups, the more brain power we have to put in 
> > > > to setting
> > > > something up the less likely we are to use it, so I took the xfstests 
> > > > approach
> > > > of making it relatively simple to get running and relatively easy to 
> > > > add new
> > > > tests.  For right now the only thing this framework does is run fio 
> > > > scripts.  I
> > > > chose fio because it already gathers loads of performance data about 
> > > > it's runs.
> > > > We have everything we need there, latency, bandwidth, cpu time, and all 
> > > > broken
> > > > down by reads, writes, and trims.  I figure most of us are familiar 
> > > > enough with
> > > > fio and how it works to make it relatively easy to add new tests to the
> > > > framework.
> > > > 
> > > > I've posted my code up on github, you can get it here
> > > > 
> > > > https://github.com/josefbacik/fsperf
> > > > 
> > > > All (well most) of the results from fio are stored in a local sqlite 
> > > > database.
> > > > Right now the comparison stuff is very crude, it simply checks against 
> > > > the
> > > > previous run and it only checks a few of the keys by default.  You can 
> > > > check
> > > > latency if you want, but while writing this stuff up it seemed that 
> > > > latency was
> > > > too variable from run to run to be useful in a "did my thing regress or 
> > > > improve"
> > > > sort of way.
> > > > 
> > > > The configuration is brain dead simple, the README has examples.  All 
> > > > you need
> > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you 
> > > > are good
> > > > to go.
> > > 
> > > Why re-invent the test infrastructure? Why not just make it a
> > > tests/perf subdir in fstests?
> > > 
> > 
> > Probably should have led with that shouldn't I have?  There's nothing 
> > keeping me
> > from doing it, but I didn't want to try and shoehorn in a python thing into
> > fstests.  I need python to do the sqlite and the json parsing to dump into 
> > the
> > sqlite database.
> > 
> > Now if you (and others) are not opposed to this being dropped into 
> > tests/perf
> > then I'll work that up.  But it's definitely going to need to be done in 
> > python.
> > I know you yourself have said you aren't opposed to using python in the 
> > past, so
> > if that's still the case then I can definitely wire it all up.
> 
> I have no problems with people using python for stuff like this but,
> OTOH, I'm not the fstests maintainer anymore :P

I have no problem either if python is really needed, after all this is a
very useful infrastructure improvement. But the python version problem
brought up by Ted made me a bit nervous, we need to work that round
carefully.

OTOH, I'm just curious, what is the specific reason that python is a
hard requirement? If we can use perl, that'll be much easier for
fstests.

BTW, opinions from key fs developers/fstests users, like you, are also
very important and welcomed :)

Thanks,
Eryu

> 
> > > > The plan is to add lots of workloads as we discover regressions and 
> > > > such.  We
> > > > don't want anything that takes too long to run otherwise people won't 
> > > > run this,
> > > > so the existing tests don't take much longer than a few minutes each.  
> > > > I will be
> > > > adding some more comparison options so you can compare against averages 
> > > > of all
> > > > previous runs and such.
> > > 
> > > Yup, that fits exactly into what fstests is for... :P
> > > 
> > > Integrating into fstests means it will be immediately available to
> > > all fs developers, it'll run on everything that everyone already has
> > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > option setup behaviour so there's no new hoops for everyone to jump
> > > through to run it...
> > > 
> > 
> > TBF I specifically made it as easy as possible because I know we all hate 
> > trying
> > to learn new shit.
> 
> Yeah, it's also hard to get people to change their workflows to add
> a whole new test harness into them. It's easy if it's just a new
> command to an existing workflow :P
> 
> > I figured this was different enough to warrant a separate
> > project, especially since I'm going to add block 

Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Eryu Guan
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > One thing that comes up a lot every LSF is the fact that we have no 
> > > > general way
> > > > that we do performance testing.  Every fs developer has a set of 
> > > > scripts or
> > > > things that they run with varying degrees of consistency, but nothing 
> > > > central
> > > > that we all use.  I for one am getting tired of finding regressions 
> > > > when we are
> > > > deploying new kernels internally, so I wired this thing up to try and 
> > > > address
> > > > this need.
> > > > 
> > > > We all hate convoluted setups, the more brain power we have to put in 
> > > > to setting
> > > > something up the less likely we are to use it, so I took the xfstests 
> > > > approach
> > > > of making it relatively simple to get running and relatively easy to 
> > > > add new
> > > > tests.  For right now the only thing this framework does is run fio 
> > > > scripts.  I
> > > > chose fio because it already gathers loads of performance data about 
> > > > it's runs.
> > > > We have everything we need there, latency, bandwidth, cpu time, and all 
> > > > broken
> > > > down by reads, writes, and trims.  I figure most of us are familiar 
> > > > enough with
> > > > fio and how it works to make it relatively easy to add new tests to the
> > > > framework.
> > > > 
> > > > I've posted my code up on github, you can get it here
> > > > 
> > > > https://github.com/josefbacik/fsperf
> > > > 
> > > > All (well most) of the results from fio are stored in a local sqlite 
> > > > database.
> > > > Right now the comparison stuff is very crude, it simply checks against 
> > > > the
> > > > previous run and it only checks a few of the keys by default.  You can 
> > > > check
> > > > latency if you want, but while writing this stuff up it seemed that 
> > > > latency was
> > > > too variable from run to run to be useful in a "did my thing regress or 
> > > > improve"
> > > > sort of way.
> > > > 
> > > > The configuration is brain dead simple, the README has examples.  All 
> > > > you need
> > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you 
> > > > are good
> > > > to go.
> > > 
> > > Why re-invent the test infrastructure? Why not just make it a
> > > tests/perf subdir in fstests?
> > > 
> > 
> > Probably should have led with that shouldn't I have?  There's nothing 
> > keeping me
> > from doing it, but I didn't want to try and shoehorn in a python thing into
> > fstests.  I need python to do the sqlite and the json parsing to dump into 
> > the
> > sqlite database.
> > 
> > Now if you (and others) are not opposed to this being dropped into 
> > tests/perf
> > then I'll work that up.  But it's definitely going to need to be done in 
> > python.
> > I know you yourself have said you aren't opposed to using python in the 
> > past, so
> > if that's still the case then I can definitely wire it all up.
> 
> I have no problems with people using python for stuff like this but,
> OTOH, I'm not the fstests maintainer anymore :P

I have no problem either if python is really needed, after all this is a
very useful infrastructure improvement. But the python version problem
brought up by Ted made me a bit nervous, we need to work that round
carefully.

OTOH, I'm just curious, what is the specific reason that python is a
hard requirement? If we can use perl, that'll be much easier for
fstests.

BTW, opinions from key fs developers/fstests users, like you, are also
very important and welcomed :)

Thanks,
Eryu

> 
> > > > The plan is to add lots of workloads as we discover regressions and 
> > > > such.  We
> > > > don't want anything that takes too long to run otherwise people won't 
> > > > run this,
> > > > so the existing tests don't take much longer than a few minutes each.  
> > > > I will be
> > > > adding some more comparison options so you can compare against averages 
> > > > of all
> > > > previous runs and such.
> > > 
> > > Yup, that fits exactly into what fstests is for... :P
> > > 
> > > Integrating into fstests means it will be immediately available to
> > > all fs developers, it'll run on everything that everyone already has
> > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > option setup behaviour so there's no new hoops for everyone to jump
> > > through to run it...
> > > 
> > 
> > TBF I specifically made it as easy as possible because I know we all hate 
> > trying
> > to learn new shit.
> 
> Yeah, it's also hard to get people to change their workflows to add
> a whole new test harness into them. It's easy if it's just a new
> command to an existing workflow :P
> 
> > I figured this was different enough to warrant a separate
> > project, especially since I'm going to add block 

Re: [PATCH] generic: Add nocheck shutdown stress test

2017-09-22 Thread Eryu Guan
On Thu, Sep 21, 2017 at 02:34:43PM -0700, Khazhismel Kumykov wrote:
> Most shutdown tests only run on filesystems with metadata journaling, so
> we lose coverage. Add a shutdown stress test that doesn't check for
> consistency, so does not require journaling.
> 
> Signed-off-by: Khazhismel Kumykov 

Test looks reasonable to me, some minor issues inline.

> ---
>  tests/generic/999 | 84 
> +++
>  tests/generic/999.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 87 insertions(+)
>  create mode 100755 tests/generic/999
>  create mode 100644 tests/generic/999.out
> 
> diff --git a/tests/generic/999 b/tests/generic/999
> new file mode 100755
> index ..71b9aa4c
> --- /dev/null
> +++ b/tests/generic/999
> @@ -0,0 +1,84 @@
> +#!/bin/bash
> +# FS QA Test No. 999
> +#
> +# Shutdown stress test - exercise shutdown codepath with fsstress,
> +# make sure we don't BUG/WARN. Coverage for all fs with shutdown.
> +#
> +#---
> +# Copyright (c) 2013 Red Hat, Inc.  All Rights Reserved.
> +# Copyright (c) 2017 Google, Inc.  All Rights Reserved.

Why Red Hat copyright? Is this based on a RH test?

> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +_cleanup()
> +{
> + cd /
> + _scratch_unmount 2>/dev/null
> + rm -f $tmp.*
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +
> +_require_scratch_nocheck
> +_require_scratch_shutdown
> +_require_command "$KILLALL_PROG" killall
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount
> +
> +SLEEP_TIME=$((10 * $TIME_FACTOR))
> +PROCS=$((4 * LOAD_FACTOR))
> +
> +load_dir=$SCRATCH_MNT/test
> +
> +$XFS_FSR_PROG -v $load_dir >> $seqres.full 2>&1

$load_dir is not created yet, xfs_fsr always fails with ENOENT. But even
with $load_dir created, it seems xfs_fsr only accepts a block device
name or a file name or the XFS mount point. And this only works for XFS.

Further more, I don't think we need to do defrag here, it's a newly
created filesystem and xfs_fsr (or e4defrag) exits quickly, as there's
nothing to defrag.

> +$FSSTRESS_PROG $FSSTRESS_AVOID -n1000 -p $PROCS -d $load_dir >> 
> $seqres.full 2>&1 &
> +sleep $SLEEP_TIME
> +sync
> +
> +# now shutdown and unmount
> +sleep 5
> +$here/src/godown $load_dir
> +$KILLALL_PROG -q $FSSTRESS_PROG
> +wait
> +
> +# for some reason fsstress processes manage to live on beyond the wait?
> +sleep 5
> +
> +_scratch_unmount
> +
> +echo "No output is good. Failures are loud."

Better to be consistent with other tests here :)

echo "Silence is golden"

Thanks,
Eryu

> +
> +status=0
> +exit
> diff --git a/tests/generic/999.out b/tests/generic/999.out
> new file mode 100644
> index ..68a51fe7
> --- /dev/null
> +++ b/tests/generic/999.out
> @@ -0,0 +1,2 @@
> +QA output created by 999
> +No output is good. Failures are loud.
> diff --git a/tests/generic/group b/tests/generic/group
> index f922b496..891386ac 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -463,3 +463,4 @@
>  458 auto quick clone
>  459 auto dangerous
>  460 auto quick rw
> +999 auto shutdown stress
> -- 
> 2.14.1.821.g8fa685d3b7-goog
> 




Re: [PATCH] generic: Add nocheck shutdown stress test

2017-09-22 Thread Eryu Guan
On Thu, Sep 21, 2017 at 02:34:43PM -0700, Khazhismel Kumykov wrote:
> Most shutdown tests only run on filesystems with metadata journaling, so
> we lose coverage. Add a shutdown stress test that doesn't check for
> consistency, so does not require journaling.
> 
> Signed-off-by: Khazhismel Kumykov 

Test looks reasonable to me, some minor issues inline.

> ---
>  tests/generic/999 | 84 
> +++
>  tests/generic/999.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 87 insertions(+)
>  create mode 100755 tests/generic/999
>  create mode 100644 tests/generic/999.out
> 
> diff --git a/tests/generic/999 b/tests/generic/999
> new file mode 100755
> index ..71b9aa4c
> --- /dev/null
> +++ b/tests/generic/999
> @@ -0,0 +1,84 @@
> +#!/bin/bash
> +# FS QA Test No. 999
> +#
> +# Shutdown stress test - exercise shutdown codepath with fsstress,
> +# make sure we don't BUG/WARN. Coverage for all fs with shutdown.
> +#
> +#---
> +# Copyright (c) 2013 Red Hat, Inc.  All Rights Reserved.
> +# Copyright (c) 2017 Google, Inc.  All Rights Reserved.

Why Red Hat copyright? Is this based on a RH test?

> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +_cleanup()
> +{
> + cd /
> + _scratch_unmount 2>/dev/null
> + rm -f $tmp.*
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +
> +_require_scratch_nocheck
> +_require_scratch_shutdown
> +_require_command "$KILLALL_PROG" killall
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount
> +
> +SLEEP_TIME=$((10 * $TIME_FACTOR))
> +PROCS=$((4 * LOAD_FACTOR))
> +
> +load_dir=$SCRATCH_MNT/test
> +
> +$XFS_FSR_PROG -v $load_dir >> $seqres.full 2>&1

$load_dir is not created yet, xfs_fsr always fails with ENOENT. But even
with $load_dir created, it seems xfs_fsr only accepts a block device
name or a file name or the XFS mount point. And this only works for XFS.

Further more, I don't think we need to do defrag here, it's a newly
created filesystem and xfs_fsr (or e4defrag) exits quickly, as there's
nothing to defrag.

> +$FSSTRESS_PROG $FSSTRESS_AVOID -n1000 -p $PROCS -d $load_dir >> 
> $seqres.full 2>&1 &
> +sleep $SLEEP_TIME
> +sync
> +
> +# now shutdown and unmount
> +sleep 5
> +$here/src/godown $load_dir
> +$KILLALL_PROG -q $FSSTRESS_PROG
> +wait
> +
> +# for some reason fsstress processes manage to live on beyond the wait?
> +sleep 5
> +
> +_scratch_unmount
> +
> +echo "No output is good. Failures are loud."

Better to be consistent with other tests here :)

echo "Silence is golden"

Thanks,
Eryu

> +
> +status=0
> +exit
> diff --git a/tests/generic/999.out b/tests/generic/999.out
> new file mode 100644
> index ..68a51fe7
> --- /dev/null
> +++ b/tests/generic/999.out
> @@ -0,0 +1,2 @@
> +QA output created by 999
> +No output is good. Failures are loud.
> diff --git a/tests/generic/group b/tests/generic/group
> index f922b496..891386ac 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -463,3 +463,4 @@
>  458 auto quick clone
>  459 auto dangerous
>  460 auto quick rw
> +999 auto shutdown stress
> -- 
> 2.14.1.821.g8fa685d3b7-goog
> 




Re: [PATCH 3/3] ext4: test for inline data + DAX corruption

2017-09-13 Thread Eryu Guan
On Mon, Sep 11, 2017 at 10:45:21PM -0600, Ross Zwisler wrote:
> Add a regression test for the following kernel commit:
> 
>   ext4: prevent data corruption with inline data + DAX
> 
> The test passes either if we don't encounter corruption, or if mounting
> with DAX + inline data fails.  The latter is the way that we prevent this
> issue in the kernel.
> 
> Signed-off-by: Ross Zwisler 

Besides the gitignore entry order issue and call the test program from
$here/src/... issue, need another require rule:

_require_ext4_mkfs_feature "inline_data"

Otherwise this looks fine to me.

Thanks,
Eryu


Re: [PATCH 3/3] ext4: test for inline data + DAX corruption

2017-09-13 Thread Eryu Guan
On Mon, Sep 11, 2017 at 10:45:21PM -0600, Ross Zwisler wrote:
> Add a regression test for the following kernel commit:
> 
>   ext4: prevent data corruption with inline data + DAX
> 
> The test passes either if we don't encounter corruption, or if mounting
> with DAX + inline data fails.  The latter is the way that we prevent this
> issue in the kernel.
> 
> Signed-off-by: Ross Zwisler 

Besides the gitignore entry order issue and call the test program from
$here/src/... issue, need another require rule:

_require_ext4_mkfs_feature "inline_data"

Otherwise this looks fine to me.

Thanks,
Eryu


Re: [PATCH 2/3] ext4: test for DAX + journaling corruption

2017-09-13 Thread Eryu Guan
On Mon, Sep 11, 2017 at 10:45:20PM -0600, Ross Zwisler wrote:
> Add a regression test for the following kernel commit:
> 
>   ext4: prevent data corruption with journaling + DAX
> 
> The test passes if either we successfully compare the data between the mmap
> with journaling turned on and the one with journaling turned off, or if we
> fail the chattr command to turn on or off journaling.  The latter is how we
> prevent this issue in the kernel.

Yeah, I noticed that mounting ext4 with "-o dax,data=journal" is not
allowed, enabling data journaling on a dax mount should be stopped too.

> 
> Signed-off-by: Ross Zwisler 
> ---
>  .gitignore  |  1 +
>  src/Makefile|  2 +-
>  src/t_ext4_dax_journal_corruption.c | 93 
> +
>  tests/ext4/030  | 68 +++
>  tests/ext4/030.out  |  2 +
>  tests/ext4/group|  1 +
>  6 files changed, 166 insertions(+), 1 deletion(-)
>  create mode 100644 src/t_ext4_dax_journal_corruption.c
>  create mode 100755 tests/ext4/030
>  create mode 100644 tests/ext4/030.out
> 
> diff --git a/.gitignore b/.gitignore
> index 2accc37..4bdc5bf 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -154,6 +154,7 @@
>  /src/t_mmap_stale_pmd
>  /src/t_mmap_cow_race
>  /src/t_mmap_fallocate
> +/src/t_ext4_dax_journal_corruption

Better to add new entry in alphabetical order, I know there're already
some out-of-order entries there, but this one is not affected and better
to stop adding new ones :)

>  
>  # dmapi/ binaries
>  /dmapi/src/common/cmd/read_invis
> diff --git a/src/Makefile b/src/Makefile
> index b8aff49..e6558e2 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
>   multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \
>   t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \
>   holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \
> - t_mmap_cow_race t_mmap_fallocate fsync-err
> + t_mmap_cow_race t_mmap_fallocate fsync-err t_ext4_dax_journal_corruption
>  
>  LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>   preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
> diff --git a/src/t_ext4_dax_journal_corruption.c 
> b/src/t_ext4_dax_journal_corruption.c
> new file mode 100644
> index 000..e0d63f8
> --- /dev/null
> +++ b/src/t_ext4_dax_journal_corruption.c
> @@ -0,0 +1,93 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PAGE(a) ((a)*0x1000)
> +#define STRLEN 256
> +
> +void err_exit(char *op)
> +{
> + fprintf(stderr, "%s: %s\n", op, strerror(errno));
> + exit(1);
> +}
> +
> +void chattr_cmd(char *chattr, char *cmd, char *file)
> +{
> + int ret;
> + char command[STRLEN];
> +
> + ret = snprintf(command, STRLEN, "%s %s %s 2>/dev/null", chattr, cmd, 
> file);
> + if (ret < 0)
> + err_exit("snprintf");
> +
> + ret = system(command);
> + if (ret) /* Success - the kernel fix is to have this chattr fail */
> + exit(77);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int fd, err, len = PAGE(1);
> + char *data, *dax_data, *chattr, *file;
> + char string[STRLEN];
> +
> + if (argc < 3) {
> + printf("Usage: %s  \n", 
> basename(argv[0]));
> + exit(0);
> + }
> +
> + chattr = argv[1];
> + file = argv[2];
> +
> + srand(time(NULL));
> + snprintf(string, STRLEN, "random number %d\n", rand());
> +
> + fd = open(file, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
> + if (fd < 0)
> + err_exit("fd");
> +
> + /* begin with journaling off and DAX on */
> + chattr_cmd(chattr, "-j", file);
> +
> + ftruncate(fd, 0);
> + fallocate(fd, 0, 0, len);
> +
> + dax_data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
> + if (!dax_data)
> + err_exit("mmap dax_data");
> +
> + /* turns on journaling, and turns off DAX */
> + chattr_cmd(chattr, "+j", file);

I'm a bit confused here, just from the test code, it's not obvious to me
how DAX is turned off. I looked at the kernel code and there's a comment
saying: "Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was
updated. E.g. S_DAX may get cleared / set." But isn't the per-inode dax
flag proposal rejected?

Anyway, some comments to explain how is DAX being turned off would be
good here.

> +
> + data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> + if (!data)
> + err_exit("mmap data");
> +
> + /*
> +  * Write the data using the non-DAX mapping, and try and read it back
> +  * using the DAX mapping.
> +  */
> + strcpy(data, string);
> + if (strcmp(dax_data, string) != 0)
> + 

Re: [PATCH 2/3] ext4: test for DAX + journaling corruption

2017-09-13 Thread Eryu Guan
On Mon, Sep 11, 2017 at 10:45:20PM -0600, Ross Zwisler wrote:
> Add a regression test for the following kernel commit:
> 
>   ext4: prevent data corruption with journaling + DAX
> 
> The test passes if either we successfully compare the data between the mmap
> with journaling turned on and the one with journaling turned off, or if we
> fail the chattr command to turn on or off journaling.  The latter is how we
> prevent this issue in the kernel.

Yeah, I noticed that mounting ext4 with "-o dax,data=journal" is not
allowed, enabling data journaling on a dax mount should be stopped too.

> 
> Signed-off-by: Ross Zwisler 
> ---
>  .gitignore  |  1 +
>  src/Makefile|  2 +-
>  src/t_ext4_dax_journal_corruption.c | 93 
> +
>  tests/ext4/030  | 68 +++
>  tests/ext4/030.out  |  2 +
>  tests/ext4/group|  1 +
>  6 files changed, 166 insertions(+), 1 deletion(-)
>  create mode 100644 src/t_ext4_dax_journal_corruption.c
>  create mode 100755 tests/ext4/030
>  create mode 100644 tests/ext4/030.out
> 
> diff --git a/.gitignore b/.gitignore
> index 2accc37..4bdc5bf 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -154,6 +154,7 @@
>  /src/t_mmap_stale_pmd
>  /src/t_mmap_cow_race
>  /src/t_mmap_fallocate
> +/src/t_ext4_dax_journal_corruption

Better to add new entry in alphabetical order, I know there're already
some out-of-order entries there, but this one is not affected and better
to stop adding new ones :)

>  
>  # dmapi/ binaries
>  /dmapi/src/common/cmd/read_invis
> diff --git a/src/Makefile b/src/Makefile
> index b8aff49..e6558e2 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
>   multi_open_unlink dmiperf unwritten_sync genhashnames t_holes \
>   t_mmap_writev t_truncate_cmtime dirhash_collide t_rename_overwrite \
>   holetest t_truncate_self t_mmap_dio af_unix t_mmap_stale_pmd \
> - t_mmap_cow_race t_mmap_fallocate fsync-err
> + t_mmap_cow_race t_mmap_fallocate fsync-err t_ext4_dax_journal_corruption
>  
>  LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>   preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
> diff --git a/src/t_ext4_dax_journal_corruption.c 
> b/src/t_ext4_dax_journal_corruption.c
> new file mode 100644
> index 000..e0d63f8
> --- /dev/null
> +++ b/src/t_ext4_dax_journal_corruption.c
> @@ -0,0 +1,93 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PAGE(a) ((a)*0x1000)
> +#define STRLEN 256
> +
> +void err_exit(char *op)
> +{
> + fprintf(stderr, "%s: %s\n", op, strerror(errno));
> + exit(1);
> +}
> +
> +void chattr_cmd(char *chattr, char *cmd, char *file)
> +{
> + int ret;
> + char command[STRLEN];
> +
> + ret = snprintf(command, STRLEN, "%s %s %s 2>/dev/null", chattr, cmd, 
> file);
> + if (ret < 0)
> + err_exit("snprintf");
> +
> + ret = system(command);
> + if (ret) /* Success - the kernel fix is to have this chattr fail */
> + exit(77);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int fd, err, len = PAGE(1);
> + char *data, *dax_data, *chattr, *file;
> + char string[STRLEN];
> +
> + if (argc < 3) {
> + printf("Usage: %s  \n", 
> basename(argv[0]));
> + exit(0);
> + }
> +
> + chattr = argv[1];
> + file = argv[2];
> +
> + srand(time(NULL));
> + snprintf(string, STRLEN, "random number %d\n", rand());
> +
> + fd = open(file, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
> + if (fd < 0)
> + err_exit("fd");
> +
> + /* begin with journaling off and DAX on */
> + chattr_cmd(chattr, "-j", file);
> +
> + ftruncate(fd, 0);
> + fallocate(fd, 0, 0, len);
> +
> + dax_data = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
> + if (!dax_data)
> + err_exit("mmap dax_data");
> +
> + /* turns on journaling, and turns off DAX */
> + chattr_cmd(chattr, "+j", file);

I'm a bit confused here, just from the test code, it's not obvious to me
how DAX is turned off. I looked at the kernel code and there's a comment
saying: "Update inode->i_flags after EXT4_INODE_JOURNAL_DATA was
updated. E.g. S_DAX may get cleared / set." But isn't the per-inode dax
flag proposal rejected?

Anyway, some comments to explain how is DAX being turned off would be
good here.

> +
> + data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> + if (!data)
> + err_exit("mmap data");
> +
> + /*
> +  * Write the data using the non-DAX mapping, and try and read it back
> +  * using the DAX mapping.
> +  */
> + strcpy(data, string);
> + if (strcmp(dax_data, string) != 0)
> + printf("Data miscompare\n");

Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df

2017-06-06 Thread Eryu Guan
On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote:
> Hi,
> 
> ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on
> ppc64le. From the calltrace
> I guess the reason is probably that the tests mount ext2 file system
> using ext4 driver.
> 
> A simple way to reproduce:
> 
> # dd of=wangli if=/dev/zero count=1024 bs=1024
> # mkfs -t ext2 wangli
> # mount -t ext4 wangli /mnt/

I can't reproduce this crash either by your reproducer nor by ltp
access04 test on ppc64le host.

> 
> 
> Are there any new changes in ext4 (on kernel-4.12-rc4) recently?

I don't think it's an ext4 bug, I've seen similar crashes twice in
4.12-rc4 kernel testings, once testing XFS running fstests, and once
running ltp on ext3. But it seems not related to filesystem code.

[  828.119270] run fstests generic/034 at 2017-06-06 19:16:10 
[  828.720341] XFS (sda5): Unmounting Filesystem 
[  828.814003] device-mapper: uevent: version 1.0.3 
[  828.814096] Unable to handle kernel paging request for unaligned access at 
address 0xc001c52c5e7f 
[  828.814103] Faulting instruction address: 0xc04d214c 
[  828.814109] Oops: Kernel access of bad area, sig: 7 [#1] 
[  828.814113] SMP NR_CPUS=2048  
[  828.814114] NUMA  
[  828.814117] pSeries 
[  828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic 
gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp 
[  828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 
[  828.814155] task: c003fe13c800 task.stack: c0046ec68000 
[  828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 
[  828.814168] REGS: c0046ec6b3d0 TRAP: 0600   Not tainted  (4.12.0-rc4) 
[  828.814173] MSR: 80010280b033  
[  828.814184]   CR: 28228244  XER: 0005 
[  828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR:  
SOFTE: 0  
[  828.814191] GPR00: c011c848 c0046ec6b650 c1049100 
c003f3b77020  
[  828.814191] GPR04: c003f3b77020 c001c52c5e7f  
0001  
[  828.814191] GPR08: 0008f92d89943c42 002448b7 0008 
  
[  828.814191] GPR12: c0130900 cfac6900 d7dd3908 
d7dd3908  
[  828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 
  
[  828.814191] GPR20: 52f8  4000 
c0cc5780  
[  828.814191] GPR24: 0001c45ffc5f  0001c45ffc5f 
c107dd00  
[  828.814191] GPR28: c003f3b77834 0004 0800 
c003f3b77000  
[  828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 
[  828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 
[  828.814268] Call Trace: 
[  828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 
(unreliable) 
[  828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 
[  828.814290] [c0046ec6b770] [c01059dc] 
alloc_unbound_pwq+0x3bc/0x4c0 
[  828.814296] [c0046ec6b7d0] [c010601c] 
apply_wqattrs_prepare+0x2ac/0x320 
[  828.814304] [c0046ec6b840] [c01060cc] 
apply_workqueue_attrs_locked+0x3c/0xa0 
[  828.814313] [c0046ec6b870] [c010662c] 
apply_workqueue_attrs+0x4c/0x80 
[  828.814322] [c0046ec6b8b0] [c01081cc] 
__alloc_workqueue_key+0x16c/0x4e0 
[  828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 
[dm_mod] 
[  828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] 
[  828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 
[  828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 
[  828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 
[  828.814414] [c0046ec6bd30] [c0198388] 
SyS_finit_module+0xa8/0x110 
[  828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 
[  828.814429] Instruction dump: 
[  828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 
6000  
[  828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 
7c6029ad  
[  828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- 

I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see
such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked
out a reliable reproducer (unless you can reliably reproduce it with
your reproducer :).

Thanks,
Eryu


Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df

2017-06-06 Thread Eryu Guan
On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote:
> Hi,
> 
> ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on
> ppc64le. From the calltrace
> I guess the reason is probably that the tests mount ext2 file system
> using ext4 driver.
> 
> A simple way to reproduce:
> 
> # dd of=wangli if=/dev/zero count=1024 bs=1024
> # mkfs -t ext2 wangli
> # mount -t ext4 wangli /mnt/

I can't reproduce this crash either by your reproducer nor by ltp
access04 test on ppc64le host.

> 
> 
> Are there any new changes in ext4 (on kernel-4.12-rc4) recently?

I don't think it's an ext4 bug, I've seen similar crashes twice in
4.12-rc4 kernel testings, once testing XFS running fstests, and once
running ltp on ext3. But it seems not related to filesystem code.

[  828.119270] run fstests generic/034 at 2017-06-06 19:16:10 
[  828.720341] XFS (sda5): Unmounting Filesystem 
[  828.814003] device-mapper: uevent: version 1.0.3 
[  828.814096] Unable to handle kernel paging request for unaligned access at 
address 0xc001c52c5e7f 
[  828.814103] Faulting instruction address: 0xc04d214c 
[  828.814109] Oops: Kernel access of bad area, sig: 7 [#1] 
[  828.814113] SMP NR_CPUS=2048  
[  828.814114] NUMA  
[  828.814117] pSeries 
[  828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic 
gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp 
[  828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 
[  828.814155] task: c003fe13c800 task.stack: c0046ec68000 
[  828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 
[  828.814168] REGS: c0046ec6b3d0 TRAP: 0600   Not tainted  (4.12.0-rc4) 
[  828.814173] MSR: 80010280b033  
[  828.814184]   CR: 28228244  XER: 0005 
[  828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR:  
SOFTE: 0  
[  828.814191] GPR00: c011c848 c0046ec6b650 c1049100 
c003f3b77020  
[  828.814191] GPR04: c003f3b77020 c001c52c5e7f  
0001  
[  828.814191] GPR08: 0008f92d89943c42 002448b7 0008 
  
[  828.814191] GPR12: c0130900 cfac6900 d7dd3908 
d7dd3908  
[  828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 
  
[  828.814191] GPR20: 52f8  4000 
c0cc5780  
[  828.814191] GPR24: 0001c45ffc5f  0001c45ffc5f 
c107dd00  
[  828.814191] GPR28: c003f3b77834 0004 0800 
c003f3b77000  
[  828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 
[  828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 
[  828.814268] Call Trace: 
[  828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 
(unreliable) 
[  828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 
[  828.814290] [c0046ec6b770] [c01059dc] 
alloc_unbound_pwq+0x3bc/0x4c0 
[  828.814296] [c0046ec6b7d0] [c010601c] 
apply_wqattrs_prepare+0x2ac/0x320 
[  828.814304] [c0046ec6b840] [c01060cc] 
apply_workqueue_attrs_locked+0x3c/0xa0 
[  828.814313] [c0046ec6b870] [c010662c] 
apply_workqueue_attrs+0x4c/0x80 
[  828.814322] [c0046ec6b8b0] [c01081cc] 
__alloc_workqueue_key+0x16c/0x4e0 
[  828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 
[dm_mod] 
[  828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] 
[  828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 
[  828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 
[  828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 
[  828.814414] [c0046ec6bd30] [c0198388] 
SyS_finit_module+0xa8/0x110 
[  828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 
[  828.814429] Instruction dump: 
[  828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 
6000  
[  828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 
7c6029ad  
[  828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- 

I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see
such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked
out a reliable reproducer (unless you can reliably reproduce it with
your reproducer :).

Thanks,
Eryu


Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test

2017-06-06 Thread Eryu Guan
On Tue, Jun 06, 2017 at 06:15:57AM -0400, Jeff Layton wrote:
> On Tue, 2017-06-06 at 16:58 +0800, Eryu Guan wrote:
> > On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote:
> > > I'm working on a set of kernel patches to change how writeback errors
> > > are handled and reported in the kernel. Instead of reporting a
> > > writeback error to only the first fsync caller on the file, I aim
> > > to make the kernel report them once on every file description.
> > > 
> > > This patch adds a test for the new behavior. Basically, open many fds
> > > to the same file, turn on dm_error, write to each of the fds, and then
> > > fsync them all to ensure that they all get an error back.
> > > 
> > > To do that, I'm adding a new tools/dmerror script that the C program
> > > can use to load the error table. For now, that's all it can do, but
> > > we can fill it out with other commands as necessary.
> > > 
> > > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > 
> > Thanks for the new tests! And sorry for the late review..
> > 
> > It's testing a new behavior on error reporting on writeback, I'm not
> > sure if we can call it a new feature or it fixed a bug? But it's more
> > like a behavior change, I'm not sure how to categorize it.
> > 
> > Because if it's testing a new feature, we usually let test do proper
> > detection of current test environment (based on actual behavior not
> > kernel version) and _notrun on filesystems that don't have this feature
> > yet, instead of failing the test; if it's testing a bug fix, we could
> > leave the test fail on unfixed filesystems, this also serves as a
> > reminder that there's bug to fix.
> > 
> 
> Thanks for the review! I'm not sure how to categorize this either. Since
> the plan is to convert all the filesystems piecemeal, maybe we should
> just consider it a new feature.

Then we need a new _require rule to properly detect for the 'feature'
support. I'm not sure if this is doable, but something like
_require_statx, _require_seek_data_hole would be good.

> 
> > I pulled your test kernel tree, and test passed on EXT4 but failed on
> > other local filesystems (XFS, btrfs). I assume that's expected.
> > 
> > Besides this kinda high-level question, some minor comments inline.
> > 
> 
> Yes, ext4 should pass on my latest kernel tree, but everything else
> should fail. 

With the new _require rule, test should _notrun on XFS and btrfs then.

> 
> > > ---
> > >  common/dmerror |  13 ++--
> > >  doc/auxiliary-programs.txt |   8 +++
> > >  src/Makefile   |   2 +-
> > >  src/fsync-err.c| 161 
> > > +
> > 
> > New binary needs an entry in .gitignore file.
> > 
> 
> OK, thanks. Will fix.
> 
> > >  tests/generic/999  |  76 +
> > >  tests/generic/999.out  |   3 +
> > >  tests/generic/group|   1 +
> > >  tools/dmerror  |  44 +
> > 
> > This file is used by the test, then it should be in src/ directory and
> > be installed along with other executable files on "make install".
> > Because files under tools/ are not installed. Most people will run tests
> > in the root dir of xfstests and this is not a problem, but there're
> > still cases people do "make && make install" and run fstests from
> > /var/lib/xfstests (default installation target).
> > 
> 
> Ok, no problem. I'll move it. I wasn't sure here since dmerror is a
> shell script, and most of the stuff in src/ is stuff that needs to be
> built.

We do have a few perl or shell scripts in src/ dir, we can see this from
src/Makefile

$(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh 
$(PKG_LIB_DIR)/src

>  
> > >  8 files changed, 302 insertions(+), 6 deletions(-)
> > >  create mode 100644 src/fsync-err.c
> > >  create mode 100755 tests/generic/999
> > >  create mode 100644 tests/generic/999.out
> > >  create mode 100755 tools/dmerror
> > > 
> > > diff --git a/common/dmerror b/common/dmerror
> > > index d46c5d0b7266..238baa213b1f 100644
> > > --- a/common/dmerror
> > > +++ b/common/dmerror
> > > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then
> > >   _notrun "Cannot run tests with DAX on dmerror devices"
> > >  fi
> > >  
> > > -_dmerror_init()
> > > +_dmerror_setup()
> > >  {
> > >   local dm_backing_dev=$SCRATCH_DEV
&

Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test

2017-06-06 Thread Eryu Guan
On Tue, Jun 06, 2017 at 06:15:57AM -0400, Jeff Layton wrote:
> On Tue, 2017-06-06 at 16:58 +0800, Eryu Guan wrote:
> > On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote:
> > > I'm working on a set of kernel patches to change how writeback errors
> > > are handled and reported in the kernel. Instead of reporting a
> > > writeback error to only the first fsync caller on the file, I aim
> > > to make the kernel report them once on every file description.
> > > 
> > > This patch adds a test for the new behavior. Basically, open many fds
> > > to the same file, turn on dm_error, write to each of the fds, and then
> > > fsync them all to ensure that they all get an error back.
> > > 
> > > To do that, I'm adding a new tools/dmerror script that the C program
> > > can use to load the error table. For now, that's all it can do, but
> > > we can fill it out with other commands as necessary.
> > > 
> > > Signed-off-by: Jeff Layton 
> > 
> > Thanks for the new tests! And sorry for the late review..
> > 
> > It's testing a new behavior on error reporting on writeback, I'm not
> > sure if we can call it a new feature or it fixed a bug? But it's more
> > like a behavior change, I'm not sure how to categorize it.
> > 
> > Because if it's testing a new feature, we usually let test do proper
> > detection of current test environment (based on actual behavior not
> > kernel version) and _notrun on filesystems that don't have this feature
> > yet, instead of failing the test; if it's testing a bug fix, we could
> > leave the test fail on unfixed filesystems, this also serves as a
> > reminder that there's bug to fix.
> > 
> 
> Thanks for the review! I'm not sure how to categorize this either. Since
> the plan is to convert all the filesystems piecemeal, maybe we should
> just consider it a new feature.

Then we need a new _require rule to properly detect for the 'feature'
support. I'm not sure if this is doable, but something like
_require_statx, _require_seek_data_hole would be good.

> 
> > I pulled your test kernel tree, and test passed on EXT4 but failed on
> > other local filesystems (XFS, btrfs). I assume that's expected.
> > 
> > Besides this kinda high-level question, some minor comments inline.
> > 
> 
> Yes, ext4 should pass on my latest kernel tree, but everything else
> should fail. 

With the new _require rule, test should _notrun on XFS and btrfs then.

> 
> > > ---
> > >  common/dmerror |  13 ++--
> > >  doc/auxiliary-programs.txt |   8 +++
> > >  src/Makefile   |   2 +-
> > >  src/fsync-err.c| 161 
> > > +
> > 
> > New binary needs an entry in .gitignore file.
> > 
> 
> OK, thanks. Will fix.
> 
> > >  tests/generic/999  |  76 +
> > >  tests/generic/999.out  |   3 +
> > >  tests/generic/group|   1 +
> > >  tools/dmerror  |  44 +
> > 
> > This file is used by the test, then it should be in src/ directory and
> > be installed along with other executable files on "make install".
> > Because files under tools/ are not installed. Most people will run tests
> > in the root dir of xfstests and this is not a problem, but there're
> > still cases people do "make && make install" and run fstests from
> > /var/lib/xfstests (default installation target).
> > 
> 
> Ok, no problem. I'll move it. I wasn't sure here since dmerror is a
> shell script, and most of the stuff in src/ is stuff that needs to be
> built.

We do have a few perl or shell scripts in src/ dir, we can see this from
src/Makefile

$(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh 
$(PKG_LIB_DIR)/src

>  
> > >  8 files changed, 302 insertions(+), 6 deletions(-)
> > >  create mode 100644 src/fsync-err.c
> > >  create mode 100755 tests/generic/999
> > >  create mode 100644 tests/generic/999.out
> > >  create mode 100755 tools/dmerror
> > > 
> > > diff --git a/common/dmerror b/common/dmerror
> > > index d46c5d0b7266..238baa213b1f 100644
> > > --- a/common/dmerror
> > > +++ b/common/dmerror
> > > @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then
> > >   _notrun "Cannot run tests with DAX on dmerror devices"
> > >  fi
> > >  
> > > -_dmerror_init()
> > > +_dmerror_setup()
> > >  {
> > >   local dm_backing_dev=$SCRATCH_DEV
> &g

Re: [xfstests PATCH v3 5/5] btrfs: allow it to use $SCRATCH_LOGDEV

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:20AM -0400, Jeff Layton wrote:
> With btrfs, we can't really put the log on a separate device. What we
> can do however is mirror the metadata across two devices and make the
> data striped across all devices. When we turn on dmerror then the
> metadata can fall back to using the other mirror while the data errors
> out.
> 
> Note that the current incarnation of btrfs has a fixed 64k stripe
> width. If that ever changes or becomes settable, we may need to adjust
> the amount of data that the test program writes.
> 
> Signed-off-by: Jeff Layton 
> ---
>  common/rc | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 83765aacfb06..078270451b53 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -830,6 +830,8 @@ _scratch_mkfs()
>   ;;
>   btrfs)
>   mkfs_cmd="$MKFS_BTRFS_PROG"
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + mkfs_cmd="$mkfs_cmd -d raid0 -m raid1 $SCRATCH_LOGDEV"

I don't think this is the correct way to do it. If btrfs doesn't support
external log device, then this test doesn't fit btrfs, or we need other
ways to test btrfs.

One of the problems of this hack is that raid1 requires all devices are
in the same size, we have a _require_scratch_dev_pool_equal_size() rule
to check on it, but this hack doesn't do the proper check and test fails
if SCRATCH_LOGDEV is smaller or bigger in size.

If btrfs "-d raid0 -m raid1" is capable to do this writeback error test,
perhaps you can write a new btrfs test and mkfs with "-d raid0 -m raid1"
explicitly. e.g.

...
_require_scratch_dev_pool 2
_require_scratch_dev_pool_equal_size
...
_scratch_mkfs "-d raid0 -m raid1"
...

Thanks,
Eryu

>   mkfs_filter="cat"
>   ;;
>   ext3)
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 5/5] btrfs: allow it to use $SCRATCH_LOGDEV

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:20AM -0400, Jeff Layton wrote:
> With btrfs, we can't really put the log on a separate device. What we
> can do however is mirror the metadata across two devices and make the
> data striped across all devices. When we turn on dmerror then the
> metadata can fall back to using the other mirror while the data errors
> out.
> 
> Note that the current incarnation of btrfs has a fixed 64k stripe
> width. If that ever changes or becomes settable, we may need to adjust
> the amount of data that the test program writes.
> 
> Signed-off-by: Jeff Layton 
> ---
>  common/rc | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 83765aacfb06..078270451b53 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -830,6 +830,8 @@ _scratch_mkfs()
>   ;;
>   btrfs)
>   mkfs_cmd="$MKFS_BTRFS_PROG"
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + mkfs_cmd="$mkfs_cmd -d raid0 -m raid1 $SCRATCH_LOGDEV"

I don't think this is the correct way to do it. If btrfs doesn't support
external log device, then this test doesn't fit btrfs, or we need other
ways to test btrfs.

One of the problems of this hack is that raid1 requires all devices are
in the same size, we have a _require_scratch_dev_pool_equal_size() rule
to check on it, but this hack doesn't do the proper check and test fails
if SCRATCH_LOGDEV is smaller or bigger in size.

If btrfs "-d raid0 -m raid1" is capable to do this writeback error test,
perhaps you can write a new btrfs test and mkfs with "-d raid0 -m raid1"
explicitly. e.g.

...
_require_scratch_dev_pool 2
_require_scratch_dev_pool_equal_size
...
_scratch_mkfs "-d raid0 -m raid1"
...

Thanks,
Eryu

>   mkfs_filter="cat"
>   ;;
>   ext3)
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 4/5] ext3: allow it to put journal on a separate device when doing scratch_mkfs

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:19AM -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton 
> ---
>  common/rc | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/common/rc b/common/rc
> index 391d36f373cd..83765aacfb06 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -832,7 +832,16 @@ _scratch_mkfs()
>   mkfs_cmd="$MKFS_BTRFS_PROG"
>   mkfs_filter="cat"
>   ;;
> - ext2|ext3)
> + ext3)
> + mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F"
> + mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \""
> +
> + # put journal on separate device?
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \
> + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV"

Similar to that ext4 patch, need $MKFS_OPTIONS when creating journal
device.

Thanks,
Eryu

> + ;;
> + ext2)
>   mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F"
>   mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \""
>   ;;
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 4/5] ext3: allow it to put journal on a separate device when doing scratch_mkfs

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:19AM -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton 
> ---
>  common/rc | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/common/rc b/common/rc
> index 391d36f373cd..83765aacfb06 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -832,7 +832,16 @@ _scratch_mkfs()
>   mkfs_cmd="$MKFS_BTRFS_PROG"
>   mkfs_filter="cat"
>   ;;
> - ext2|ext3)
> + ext3)
> + mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F"
> + mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \""
> +
> + # put journal on separate device?
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \
> + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV"

Similar to that ext4 patch, need $MKFS_OPTIONS when creating journal
device.

Thanks,
Eryu

> + ;;
> + ext2)
>   mkfs_cmd="$MKFS_PROG -t $FSTYP -- -F"
>   mkfs_filter="grep -v -e ^Warning: -e \"^mke2fs \""
>   ;;
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 3/5] generic: test writeback error handling on dmerror devices

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:18AM -0400, Jeff Layton wrote:
> Ensure that we get an error back on all fds when a block device is
> open by multiple writers and writeback fails.
> 
> Signed-off-by: Jeff Layton 
> ---
>  tests/generic/998 | 64 
> +++
>  tests/generic/998.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 67 insertions(+)
>  create mode 100755 tests/generic/998
>  create mode 100644 tests/generic/998.out
> 
> diff --git a/tests/generic/998 b/tests/generic/998
> new file mode 100755
> index ..fbadb47507c2
> --- /dev/null
> +++ b/tests/generic/998
> @@ -0,0 +1,64 @@
> +#! /bin/bash
> +# FS QA Test No. 998
> +#
> +# Test writeback error handling when writing to block devices via pagecache.
> +# See src/fsync-err.c for details of what test actually does.
> +#
> +#---
> +# Copyright (c) 2017, Jeff Layton 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +cd /
> +rm -rf $tmp.* $testdir
> +_dmerror_cleanup
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmerror
> +
> +# real QA test starts here
> +_supported_os Linux
> +_require_scratch

_require_scratch_nocheck

Then we don't have to re-create a filesystem on SCRATCH_DEV before
exiting.

> +_require_logdev
> +_require_dm_target error
> +_require_test_program fsync-err
> +
> +rm -f $seqres.full
> +
> +$XFS_IO_PROG -d -c "pwrite -S 0x7c -b 1048576 0 $((64 * 1048576))" 
> $SCRATCH_DEV >> $seqres.full

I don't see why this is needed, add some comments? Or remove it if it's
not needed?

> +_dmerror_init
> +
> +$here/src/fsync-err $DMERROR_DEV
> +
> +# success, all done
> +_dmerror_load_working_table
> +_dmerror_cleanup
> +_scratch_mkfs > $seqres.full 2>&1
> +status=0
> +exit
> diff --git a/tests/generic/998.out b/tests/generic/998.out
> new file mode 100644
> index ..658c438820e2
> --- /dev/null
> +++ b/tests/generic/998.out
> @@ -0,0 +1,2 @@
> +QA output created by 998
> +Test passed!
> diff --git a/tests/generic/group b/tests/generic/group
> index 39f7b14657f1..9fc384363ca7 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -440,4 +440,5 @@
>  435 auto encrypt
>  436 auto quick rw
>  437 auto quick
> +998 auto quick

This is a test for block device, not filesystems, I'd remove it from
auto and quick group, but add it to 'blockdev' group. So it won't be run
if someone wants to test filesystems.

Thanks,
Eryu

>  999 auto quick
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 3/5] generic: test writeback error handling on dmerror devices

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:18AM -0400, Jeff Layton wrote:
> Ensure that we get an error back on all fds when a block device is
> open by multiple writers and writeback fails.
> 
> Signed-off-by: Jeff Layton 
> ---
>  tests/generic/998 | 64 
> +++
>  tests/generic/998.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 67 insertions(+)
>  create mode 100755 tests/generic/998
>  create mode 100644 tests/generic/998.out
> 
> diff --git a/tests/generic/998 b/tests/generic/998
> new file mode 100755
> index ..fbadb47507c2
> --- /dev/null
> +++ b/tests/generic/998
> @@ -0,0 +1,64 @@
> +#! /bin/bash
> +# FS QA Test No. 998
> +#
> +# Test writeback error handling when writing to block devices via pagecache.
> +# See src/fsync-err.c for details of what test actually does.
> +#
> +#---
> +# Copyright (c) 2017, Jeff Layton 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +cd /
> +rm -rf $tmp.* $testdir
> +_dmerror_cleanup
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmerror
> +
> +# real QA test starts here
> +_supported_os Linux
> +_require_scratch

_require_scratch_nocheck

Then we don't have to re-create a filesystem on SCRATCH_DEV before
exiting.

> +_require_logdev
> +_require_dm_target error
> +_require_test_program fsync-err
> +
> +rm -f $seqres.full
> +
> +$XFS_IO_PROG -d -c "pwrite -S 0x7c -b 1048576 0 $((64 * 1048576))" 
> $SCRATCH_DEV >> $seqres.full

I don't see why this is needed, add some comments? Or remove it if it's
not needed?

> +_dmerror_init
> +
> +$here/src/fsync-err $DMERROR_DEV
> +
> +# success, all done
> +_dmerror_load_working_table
> +_dmerror_cleanup
> +_scratch_mkfs > $seqres.full 2>&1
> +status=0
> +exit
> diff --git a/tests/generic/998.out b/tests/generic/998.out
> new file mode 100644
> index ..658c438820e2
> --- /dev/null
> +++ b/tests/generic/998.out
> @@ -0,0 +1,2 @@
> +QA output created by 998
> +Test passed!
> diff --git a/tests/generic/group b/tests/generic/group
> index 39f7b14657f1..9fc384363ca7 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -440,4 +440,5 @@
>  435 auto encrypt
>  436 auto quick rw
>  437 auto quick
> +998 auto quick

This is a test for block device, not filesystems, I'd remove it from
auto and quick group, but add it to 'blockdev' group. So it won't be run
if someone wants to test filesystems.

Thanks,
Eryu

>  999 auto quick
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 2/5] ext4: allow ext4 to use $SCRATCH_LOGDEV

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:17AM -0400, Jeff Layton wrote:
> The writeback error handling test requires that you put the journal on a
> separate device. This allows us to use dmerror to simulate data
> writeback failure, without affecting the journal.
> 
> xfs already has infrastructure for this (a'la $SCRATCH_LOGDEV), so wire
> up the ext4 code so that it can do the same thing when _scratch_mkfs is
> called.
> 
> Signed-off-by: Jeff Layton 
> Reviewed-by: Darrick J. Wong 
> ---
>  common/rc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 743df427c047..391d36f373cd 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -676,6 +676,9 @@ _scratch_mkfs_ext4()
>   local tmp=`mktemp`
>   local mkfs_status
>  
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \
> + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV"

Need $MKFS_OPTIONS too when creating journal device, otherwise mkfs will
fail when making non-default block size ext4, i.e. journal device has 4k
block size, but ext4 has 1k block size if we have MKFS_OPTIONS="-b 1024"

Thanks,
Eryu

>  
>   _scratch_do_mkfs "$mkfs_cmd" "$mkfs_filter" $* 2>$tmp.mkfserr 
> 1>$tmp.mkfsstd
>   mkfs_status=$?
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 2/5] ext4: allow ext4 to use $SCRATCH_LOGDEV

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:17AM -0400, Jeff Layton wrote:
> The writeback error handling test requires that you put the journal on a
> separate device. This allows us to use dmerror to simulate data
> writeback failure, without affecting the journal.
> 
> xfs already has infrastructure for this (a'la $SCRATCH_LOGDEV), so wire
> up the ext4 code so that it can do the same thing when _scratch_mkfs is
> called.
> 
> Signed-off-by: Jeff Layton 
> Reviewed-by: Darrick J. Wong 
> ---
>  common/rc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 743df427c047..391d36f373cd 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -676,6 +676,9 @@ _scratch_mkfs_ext4()
>   local tmp=`mktemp`
>   local mkfs_status
>  
> + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> + $mkfs_cmd -O journal_dev $SCRATCH_LOGDEV && \
> + mkfs_cmd="$mkfs_cmd -J device=$SCRATCH_LOGDEV"

Need $MKFS_OPTIONS too when creating journal device, otherwise mkfs will
fail when making non-default block size ext4, i.e. journal device has 4k
block size, but ext4 has 1k block size if we have MKFS_OPTIONS="-b 1024"

Thanks,
Eryu

>  
>   _scratch_do_mkfs "$mkfs_cmd" "$mkfs_filter" $* 2>$tmp.mkfserr 
> 1>$tmp.mkfsstd
>   mkfs_status=$?
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote:
> I'm working on a set of kernel patches to change how writeback errors
> are handled and reported in the kernel. Instead of reporting a
> writeback error to only the first fsync caller on the file, I aim
> to make the kernel report them once on every file description.
> 
> This patch adds a test for the new behavior. Basically, open many fds
> to the same file, turn on dm_error, write to each of the fds, and then
> fsync them all to ensure that they all get an error back.
> 
> To do that, I'm adding a new tools/dmerror script that the C program
> can use to load the error table. For now, that's all it can do, but
> we can fill it out with other commands as necessary.
> 
> Signed-off-by: Jeff Layton 

Thanks for the new tests! And sorry for the late review..

It's testing a new behavior on error reporting on writeback, I'm not
sure if we can call it a new feature or it fixed a bug? But it's more
like a behavior change, I'm not sure how to categorize it.

Because if it's testing a new feature, we usually let test do proper
detection of current test environment (based on actual behavior not
kernel version) and _notrun on filesystems that don't have this feature
yet, instead of failing the test; if it's testing a bug fix, we could
leave the test fail on unfixed filesystems, this also serves as a
reminder that there's bug to fix.

I pulled your test kernel tree, and test passed on EXT4 but failed on
other local filesystems (XFS, btrfs). I assume that's expected.

Besides this kinda high-level question, some minor comments inline.

> ---
>  common/dmerror |  13 ++--
>  doc/auxiliary-programs.txt |   8 +++
>  src/Makefile   |   2 +-
>  src/fsync-err.c| 161 
> +

New binary needs an entry in .gitignore file.

>  tests/generic/999  |  76 +
>  tests/generic/999.out  |   3 +
>  tests/generic/group|   1 +
>  tools/dmerror  |  44 +

This file is used by the test, then it should be in src/ directory and
be installed along with other executable files on "make install".
Because files under tools/ are not installed. Most people will run tests
in the root dir of xfstests and this is not a problem, but there're
still cases people do "make && make install" and run fstests from
/var/lib/xfstests (default installation target).

>  8 files changed, 302 insertions(+), 6 deletions(-)
>  create mode 100644 src/fsync-err.c
>  create mode 100755 tests/generic/999
>  create mode 100644 tests/generic/999.out
>  create mode 100755 tools/dmerror
> 
> diff --git a/common/dmerror b/common/dmerror
> index d46c5d0b7266..238baa213b1f 100644
> --- a/common/dmerror
> +++ b/common/dmerror
> @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then
>   _notrun "Cannot run tests with DAX on dmerror devices"
>  fi
>  
> -_dmerror_init()
> +_dmerror_setup()
>  {
>   local dm_backing_dev=$SCRATCH_DEV
>  
> - $DMSETUP_PROG remove error-test > /dev/null 2>&1
> -
>   local blk_dev_size=`blockdev --getsz $dm_backing_dev`
>  
>   DMERROR_DEV='/dev/mapper/error-test'
>  
>   DMLINEAR_TABLE="0 $blk_dev_size linear $dm_backing_dev 0"
>  
> + DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0"
> +}
> +
> +_dmerror_init()
> +{
> + _dmerror_setup
> + $DMSETUP_PROG remove error-test > /dev/null 2>&1
>   $DMSETUP_PROG create error-test --table "$DMLINEAR_TABLE" || \
>   _fatal "failed to create dm linear device"
> -
> - DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0"
>  }
>  
>  _dmerror_mount()
> diff --git a/doc/auxiliary-programs.txt b/doc/auxiliary-programs.txt
> index 21ef118596b6..191ac0596511 100644
> --- a/doc/auxiliary-programs.txt
> +++ b/doc/auxiliary-programs.txt
> @@ -16,6 +16,7 @@ note the dependency with:
>  Contents:
>  
>   - af_unix   -- Create an AF_UNIX socket
> + - fsync-err -- tests fsync error reporting after failed writeback
>   - open_by_handle-- open_by_handle_at syscall exercise
>   - stat_test -- statx syscall exercise
>   - t_dir_type-- print directory entries and their file type
> @@ -30,6 +31,13 @@ af_unix
>  
>   The af_unix program creates an AF_UNIX socket at the given location.
>  
> +fsync-err
> + Specialized program for testing how the kernel reports errors that
> + occur during writeback. Works in conjunction with the dmerror script
> + in tools/ to write data to a device, and then force it to fail
> + writeback and test that errors are reported during fsync and cleared
> + afterward.
> +
>  open_by_handle
>  
>   The open_by_handle program exercises the open_by_handle_at() system
> diff --git a/src/Makefile b/src/Makefile
> index 4ec01975f8f7..b79c4d84d31b 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 

Re: [xfstests PATCH v3 1/5] generic: add a writeback error handling test

2017-06-06 Thread Eryu Guan
On Wed, May 31, 2017 at 09:08:16AM -0400, Jeff Layton wrote:
> I'm working on a set of kernel patches to change how writeback errors
> are handled and reported in the kernel. Instead of reporting a
> writeback error to only the first fsync caller on the file, I aim
> to make the kernel report them once on every file description.
> 
> This patch adds a test for the new behavior. Basically, open many fds
> to the same file, turn on dm_error, write to each of the fds, and then
> fsync them all to ensure that they all get an error back.
> 
> To do that, I'm adding a new tools/dmerror script that the C program
> can use to load the error table. For now, that's all it can do, but
> we can fill it out with other commands as necessary.
> 
> Signed-off-by: Jeff Layton 

Thanks for the new tests! And sorry for the late review..

It's testing a new behavior on error reporting on writeback, I'm not
sure if we can call it a new feature or it fixed a bug? But it's more
like a behavior change, I'm not sure how to categorize it.

Because if it's testing a new feature, we usually let test do proper
detection of current test environment (based on actual behavior not
kernel version) and _notrun on filesystems that don't have this feature
yet, instead of failing the test; if it's testing a bug fix, we could
leave the test fail on unfixed filesystems, this also serves as a
reminder that there's bug to fix.

I pulled your test kernel tree, and test passed on EXT4 but failed on
other local filesystems (XFS, btrfs). I assume that's expected.

Besides this kinda high-level question, some minor comments inline.

> ---
>  common/dmerror |  13 ++--
>  doc/auxiliary-programs.txt |   8 +++
>  src/Makefile   |   2 +-
>  src/fsync-err.c| 161 
> +

New binary needs an entry in .gitignore file.

>  tests/generic/999  |  76 +
>  tests/generic/999.out  |   3 +
>  tests/generic/group|   1 +
>  tools/dmerror  |  44 +

This file is used by the test, then it should be in src/ directory and
be installed along with other executable files on "make install".
Because files under tools/ are not installed. Most people will run tests
in the root dir of xfstests and this is not a problem, but there're
still cases people do "make && make install" and run fstests from
/var/lib/xfstests (default installation target).

>  8 files changed, 302 insertions(+), 6 deletions(-)
>  create mode 100644 src/fsync-err.c
>  create mode 100755 tests/generic/999
>  create mode 100644 tests/generic/999.out
>  create mode 100755 tools/dmerror
> 
> diff --git a/common/dmerror b/common/dmerror
> index d46c5d0b7266..238baa213b1f 100644
> --- a/common/dmerror
> +++ b/common/dmerror
> @@ -23,22 +23,25 @@ if [ $? -eq 0 ]; then
>   _notrun "Cannot run tests with DAX on dmerror devices"
>  fi
>  
> -_dmerror_init()
> +_dmerror_setup()
>  {
>   local dm_backing_dev=$SCRATCH_DEV
>  
> - $DMSETUP_PROG remove error-test > /dev/null 2>&1
> -
>   local blk_dev_size=`blockdev --getsz $dm_backing_dev`
>  
>   DMERROR_DEV='/dev/mapper/error-test'
>  
>   DMLINEAR_TABLE="0 $blk_dev_size linear $dm_backing_dev 0"
>  
> + DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0"
> +}
> +
> +_dmerror_init()
> +{
> + _dmerror_setup
> + $DMSETUP_PROG remove error-test > /dev/null 2>&1
>   $DMSETUP_PROG create error-test --table "$DMLINEAR_TABLE" || \
>   _fatal "failed to create dm linear device"
> -
> - DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0"
>  }
>  
>  _dmerror_mount()
> diff --git a/doc/auxiliary-programs.txt b/doc/auxiliary-programs.txt
> index 21ef118596b6..191ac0596511 100644
> --- a/doc/auxiliary-programs.txt
> +++ b/doc/auxiliary-programs.txt
> @@ -16,6 +16,7 @@ note the dependency with:
>  Contents:
>  
>   - af_unix   -- Create an AF_UNIX socket
> + - fsync-err -- tests fsync error reporting after failed writeback
>   - open_by_handle-- open_by_handle_at syscall exercise
>   - stat_test -- statx syscall exercise
>   - t_dir_type-- print directory entries and their file type
> @@ -30,6 +31,13 @@ af_unix
>  
>   The af_unix program creates an AF_UNIX socket at the given location.
>  
> +fsync-err
> + Specialized program for testing how the kernel reports errors that
> + occur during writeback. Works in conjunction with the dmerror script
> + in tools/ to write data to a device, and then force it to fail
> + writeback and test that errors are reported during fsync and cleared
> + afterward.
> +
>  open_by_handle
>  
>   The open_by_handle program exercises the open_by_handle_at() system
> diff --git a/src/Makefile b/src/Makefile
> index 4ec01975f8f7..b79c4d84d31b 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -13,7 +13,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
>   

Re: [PATCH] clean_bdev_aliases: Prevent cleaning blocks that are not in block range

2017-01-02 Thread Eryu Guan
On Mon, Jan 02, 2017 at 04:58:03PM +0100, Jan Kara wrote:
> On Sun 25-12-16 19:01:03, Chandan Rajendra wrote:
> > The first block to be cleaned may start at a non-zero page offset. In
> > such a scenario clean_bdev_aliases() will end up cleaning blocks that
> > do not fall in the range of blocks to be cleaned. This commit fixes the
> > issue by skipping blocks that do not fall in valid block range.
> > 
> > Signed-off-by: Chandan Rajendra <chan...@linux.vnet.ibm.com>
> 
> Ah, very good catch! How did you spot this?

I failed to notice this patch, and I came up with a same patch today
myself, and I'm still testing it.

I found this by xfstests, many tests (tens of tests) failed fsck after
test when testing extN if blocksize < pagesize. E.g. generic/013 could
reproduce the fs corruption quite reliablely for me.

Reviewed-by: Eryu Guan <eg...@redhat.com>

> 
> The patch looks good. You can add:
> 
> Reviewed-by: Jan Kara <j...@suse.cz>
> 
> Jens, please merge this fix quickly as we may end up discarding changes to
> innocent metadata blocks due to this... Thanks!
> 
>   Honza
> > ---
> >  fs/buffer.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 1df2bd5..28484b3 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1660,7 +1660,7 @@ void clean_bdev_aliases(struct block_device *bdev, 
> > sector_t block, sector_t len)
> > head = page_buffers(page);
> > bh = head;
> > do {
> > -   if (!buffer_mapped(bh))
> > +   if (!buffer_mapped(bh) || (bh->b_blocknr < 
> > block))
> > goto next;
> > if (bh->b_blocknr >= block + len)
> > break;
> > -- 
> > 2.5.5
> > 
> -- 
> Jan Kara <j...@suse.com>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clean_bdev_aliases: Prevent cleaning blocks that are not in block range

2017-01-02 Thread Eryu Guan
On Mon, Jan 02, 2017 at 04:58:03PM +0100, Jan Kara wrote:
> On Sun 25-12-16 19:01:03, Chandan Rajendra wrote:
> > The first block to be cleaned may start at a non-zero page offset. In
> > such a scenario clean_bdev_aliases() will end up cleaning blocks that
> > do not fall in the range of blocks to be cleaned. This commit fixes the
> > issue by skipping blocks that do not fall in valid block range.
> > 
> > Signed-off-by: Chandan Rajendra 
> 
> Ah, very good catch! How did you spot this?

I failed to notice this patch, and I came up with a same patch today
myself, and I'm still testing it.

I found this by xfstests, many tests (tens of tests) failed fsck after
test when testing extN if blocksize < pagesize. E.g. generic/013 could
reproduce the fs corruption quite reliablely for me.

Reviewed-by: Eryu Guan 

> 
> The patch looks good. You can add:
> 
> Reviewed-by: Jan Kara 
> 
> Jens, please merge this fix quickly as we may end up discarding changes to
> innocent metadata blocks due to this... Thanks!
> 
>   Honza
> > ---
> >  fs/buffer.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 1df2bd5..28484b3 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1660,7 +1660,7 @@ void clean_bdev_aliases(struct block_device *bdev, 
> > sector_t block, sector_t len)
> > head = page_buffers(page);
> > bh = head;
> > do {
> > -   if (!buffer_mapped(bh))
> > +   if (!buffer_mapped(bh) || (bh->b_blocknr < 
> > block))
> > goto next;
> > if (bh->b_blocknr >= block + len)
> > break;
> > -- 
> > 2.5.5
> > 
> -- 
> Jan Kara 
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] mm/filemap: don't allow partially uptodate page for pipes

2016-11-01 Thread Eryu Guan
Starting from 4.9-rc1 kernel, I started noticing some test failures
of sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when
testing on sub-page block size filesystems (tested both XFS and
ext4), these syscalls start to return EIO in the tests. e.g.

sendfile021  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 26, got: -1
sendfile022  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 24, got: -1
sendfile023  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 22, got: -1
sendfile024  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 20, got: -1

This is because that in sub-page block size cases, we don't need the
whole page to be uptodate, only the part we care about is uptodate
is OK (if fs has ->is_partially_uptodate defined). But
page_cache_pipe_buf_confirm() doesn't have the ability to check the
partially-uptodate case, it needs the whole page to be uptodate. So
it returns EIO in this case.

This is a regression introduced by commit 82c156f85384 ("switch
generic_file_splice_read() to use of ->read_iter()"). Prior to the
change, generic_file_splice_read() doesn't allow partially-uptodate
page either, so it worked fine.

Fix it by skipping the partially-uptodate check if we're working on
a pipe in do_generic_file_read(), so we read the whole page from
disk as long as the page is not uptodate.

Signed-off-by: Eryu Guan <guane...@gmail.com>
---

I think the other way to fix it is to add the ability to check & allow
partially-uptodate page to page_cache_pipe_buf_confirm(), but that is much
harder to do and seems gain little.

v2:
- Update summary a little bit
- Update commit log
- Add comment to the code
- Add more people/list to cc

v1: http://marc.info/?l=linux-mm=147756897431777=2

 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 849f459..670264d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1734,6 +1734,9 @@ static ssize_t do_generic_file_read(struct file *filp, 
loff_t *ppos,
if (inode->i_blkbits == PAGE_SHIFT ||
!mapping->a_ops->is_partially_uptodate)
goto page_not_up_to_date;
+   /* pipes can't handle partially uptodate pages */
+   if (unlikely(iter->type & ITER_PIPE))
+   goto page_not_up_to_date;
if (!trylock_page(page))
goto page_not_up_to_date;
/* Did it get truncated before we got the lock? */
-- 
2.7.4



[PATCH v2] mm/filemap: don't allow partially uptodate page for pipes

2016-11-01 Thread Eryu Guan
Starting from 4.9-rc1 kernel, I started noticing some test failures
of sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when
testing on sub-page block size filesystems (tested both XFS and
ext4), these syscalls start to return EIO in the tests. e.g.

sendfile021  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 26, got: -1
sendfile022  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 24, got: -1
sendfile023  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 22, got: -1
sendfile024  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return 
expected value, expected: 20, got: -1

This is because that in sub-page block size cases, we don't need the
whole page to be uptodate, only the part we care about is uptodate
is OK (if fs has ->is_partially_uptodate defined). But
page_cache_pipe_buf_confirm() doesn't have the ability to check the
partially-uptodate case, it needs the whole page to be uptodate. So
it returns EIO in this case.

This is a regression introduced by commit 82c156f85384 ("switch
generic_file_splice_read() to use of ->read_iter()"). Prior to the
change, generic_file_splice_read() doesn't allow partially-uptodate
page either, so it worked fine.

Fix it by skipping the partially-uptodate check if we're working on
a pipe in do_generic_file_read(), so we read the whole page from
disk as long as the page is not uptodate.

Signed-off-by: Eryu Guan 
---

I think the other way to fix it is to add the ability to check & allow
partially-uptodate page to page_cache_pipe_buf_confirm(), but that is much
harder to do and seems gain little.

v2:
- Update summary a little bit
- Update commit log
- Add comment to the code
- Add more people/list to cc

v1: http://marc.info/?l=linux-mm=147756897431777=2

 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 849f459..670264d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1734,6 +1734,9 @@ static ssize_t do_generic_file_read(struct file *filp, 
loff_t *ppos,
if (inode->i_blkbits == PAGE_SHIFT ||
!mapping->a_ops->is_partially_uptodate)
goto page_not_up_to_date;
+   /* pipes can't handle partially uptodate pages */
+   if (unlikely(iter->type & ITER_PIPE))
+   goto page_not_up_to_date;
if (!trylock_page(page))
goto page_not_up_to_date;
/* Did it get truncated before we got the lock? */
-- 
2.7.4



Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-14 Thread Eryu Guan
On Sun, Aug 14, 2016 at 04:52:16PM +0200, Greg Kroah-Hartman wrote:
> 
> Did a fix for this ever get into Linus's tree?
> 
> thanks,
> 
> greg k-h

Yes, please see commit c1892c37769c ("vfs: fix deadlock in
file_remove_privs() on overlayfs"), which is tagged as "stable".

Thanks,
Eryu


Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-14 Thread Eryu Guan
On Sun, Aug 14, 2016 at 04:52:16PM +0200, Greg Kroah-Hartman wrote:
> 
> Did a fix for this ever get into Linus's tree?
> 
> thanks,
> 
> greg k-h

Yes, please see commit c1892c37769c ("vfs: fix deadlock in
file_remove_privs() on overlayfs"), which is tagged as "stable".

Thanks,
Eryu


[BUG] commit 444d13ff10f introduced boot failure on s390x

2016-08-10 Thread Eryu Guan
Hi,

I hit boot failure on s390x host starting from 4.8-rc1 kernel, 4.7
kernel works fine. And I bisected to this commit 444d13ff10fb

commit 444d13ff10fb13bc3e64859c3cf9ce43dcfeb075
Author: Jessica Yu 
Date:   Wed Jul 27 12:06:21 2016 +0930

modules: add ro_after_init support

Add ro_after_init support for modules by adding a new page-aligned 
section
in the module layout (after rodata) for ro_after_init data and enabling 
RO
protection for that section after module init runs.

Signed-off-by: Jessica Yu 
Acked-by: Kees Cook 
Signed-off-by: Rusty Russell 

and I've only hit this panic on s390x hosts. Console log is appended at
the end of email.

Thanks,
Eryu

[2.050197] device-mapper: uevent: version 1.0.3
[2.050370] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-d
e...@redhat.com
[2.057615] Unable to handle kernel pointer dereference in virtual kernel add
ress space
[2.057619] Failing address: 03ff8001d000 TEID: 03ff8001d407
[2.057620] Fault in home space mode while using kernel ASCE.
[2.057622] AS:00a7c007 R3:7c974007 S:7cc24800 P:
0239b21d
[2.057665] Oops: 0004 ilc:3 [#1] SMP
[2.057667] Modules linked in: dm_mod
[2.057670] CPU: 0 PID: 399 Comm: modprobe Not tainted 4.7.0+ #7
[2.057672] Hardware name: IBM  2827 H43  400
  (z/VM)
[2.057673] task: 7cccd100 ti: 02324000 task.ti: 0232
4000
[2.057675] Krnl PSW : 0704c0018000 0043a5c8 (__list_add_rcu+0x50
/0xa8)
[2.057683]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:
0 EA:3
Krnl GPRS: 0006 0098b278 03ff80117208 0098b278
[2.057685]03ff8001df08 001c913c 0002 000
08001e55c
[2.057686]0560 02327e00 03ff80117218 000
003ff80117208
[2.057687]0098b278 03ff8001df08 02327cc0 000
002327c90
[2.057696] Krnl Code: 0043a5b6: e3d0b024stg %r13,0(%
r11)
   0043a5bc: e3c0b0080024   stg %r12,8(%r11)
  #0043a5c2: e3b0c024   stg %r11,0(%r12)
  >0043a5c8: e3b0d0080024   stg %r11,8(%r13)
   0043a5ce: e340f0b80004   lg  %r4,184(%r15)
   0043a5d4: ebbff0a4   lmg %r11,%r15,160(%r15)
   0043a5da: 07f4   bcr 15,%r4
   0043a5dc: e34040080004   lg  %r4,8(%r4)
[2.057706] Call Trace:
[2.057708] ([<02327cc0>] 0x2327cc0)
[2.057714] ([<001c98d0>] load_module+0x8e0/0x1870)
[2.057715] ([<001caa74>] SyS_finit_module+0xb4/0xf0)
[2.057720] ([<006678b6>] system_call+0xd6/0x264)
[2.057721] Last Breaking-Event-Address:
[2.057722]  [<001c98ca>] load_module+0x8da/0x1870
[2.057723]
[2.057724] Kernel panic - not syncing: Fatal exception: panic_on_oops


[BUG] commit 444d13ff10f introduced boot failure on s390x

2016-08-10 Thread Eryu Guan
Hi,

I hit boot failure on s390x host starting from 4.8-rc1 kernel, 4.7
kernel works fine. And I bisected to this commit 444d13ff10fb

commit 444d13ff10fb13bc3e64859c3cf9ce43dcfeb075
Author: Jessica Yu 
Date:   Wed Jul 27 12:06:21 2016 +0930

modules: add ro_after_init support

Add ro_after_init support for modules by adding a new page-aligned 
section
in the module layout (after rodata) for ro_after_init data and enabling 
RO
protection for that section after module init runs.

Signed-off-by: Jessica Yu 
Acked-by: Kees Cook 
Signed-off-by: Rusty Russell 

and I've only hit this panic on s390x hosts. Console log is appended at
the end of email.

Thanks,
Eryu

[2.050197] device-mapper: uevent: version 1.0.3
[2.050370] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-d
e...@redhat.com
[2.057615] Unable to handle kernel pointer dereference in virtual kernel add
ress space
[2.057619] Failing address: 03ff8001d000 TEID: 03ff8001d407
[2.057620] Fault in home space mode while using kernel ASCE.
[2.057622] AS:00a7c007 R3:7c974007 S:7cc24800 P:
0239b21d
[2.057665] Oops: 0004 ilc:3 [#1] SMP
[2.057667] Modules linked in: dm_mod
[2.057670] CPU: 0 PID: 399 Comm: modprobe Not tainted 4.7.0+ #7
[2.057672] Hardware name: IBM  2827 H43  400
  (z/VM)
[2.057673] task: 7cccd100 ti: 02324000 task.ti: 0232
4000
[2.057675] Krnl PSW : 0704c0018000 0043a5c8 (__list_add_rcu+0x50
/0xa8)
[2.057683]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:
0 EA:3
Krnl GPRS: 0006 0098b278 03ff80117208 0098b278
[2.057685]03ff8001df08 001c913c 0002 000
08001e55c
[2.057686]0560 02327e00 03ff80117218 000
003ff80117208
[2.057687]0098b278 03ff8001df08 02327cc0 000
002327c90
[2.057696] Krnl Code: 0043a5b6: e3d0b024stg %r13,0(%
r11)
   0043a5bc: e3c0b0080024   stg %r12,8(%r11)
  #0043a5c2: e3b0c024   stg %r11,0(%r12)
  >0043a5c8: e3b0d0080024   stg %r11,8(%r13)
   0043a5ce: e340f0b80004   lg  %r4,184(%r15)
   0043a5d4: ebbff0a4   lmg %r11,%r15,160(%r15)
   0043a5da: 07f4   bcr 15,%r4
   0043a5dc: e34040080004   lg  %r4,8(%r4)
[2.057706] Call Trace:
[2.057708] ([<02327cc0>] 0x2327cc0)
[2.057714] ([<001c98d0>] load_module+0x8e0/0x1870)
[2.057715] ([<001caa74>] SyS_finit_module+0xb4/0xf0)
[2.057720] ([<006678b6>] system_call+0xd6/0x264)
[2.057721] Last Breaking-Event-Address:
[2.057722]  [<001c98ca>] load_module+0x8da/0x1870
[2.057723]
[2.057724] Kernel panic - not syncing: Fatal exception: panic_on_oops


Re: [lkp] [fs] 6141b4d642: xfstests.generic.079.fail

2016-08-09 Thread Eryu Guan
On Tue, Aug 09, 2016 at 11:14:38PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus
> commit 6141b4d64295ec08a1b48c7fcac8a566658cd64f ("fs: return EPERM on 
> immutable inode")
> 
> in testcase: xfstests
> with following parameters:
> 
>   disk: 4HDD
>   fs: btrfs
>   test: generic-quick1
> 
> 
> on test machine: 4 threads Sandy Bridge with 6G memory
> 
> caused below changes:

I think generic/079 needs update, because the expected errno was changed
after this commit. I've sent a patch to fste...@vger.kernel.org.

Thanks,
Eryu


Re: [lkp] [fs] 6141b4d642: xfstests.generic.079.fail

2016-08-09 Thread Eryu Guan
On Tue, Aug 09, 2016 at 11:14:38PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus
> commit 6141b4d64295ec08a1b48c7fcac8a566658cd64f ("fs: return EPERM on 
> immutable inode")
> 
> in testcase: xfstests
> with following parameters:
> 
>   disk: 4HDD
>   fs: btrfs
>   test: generic-quick1
> 
> 
> on test machine: 4 threads Sandy Bridge with 6G memory
> 
> caused below changes:

I think generic/079 needs update, because the expected errno was changed
after this commit. I've sent a patch to fste...@vger.kernel.org.

Thanks,
Eryu


Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-03 Thread Eryu Guan
On Wed, Aug 03, 2016 at 09:45:06AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 03, 2016 at 03:36:44PM +0800, Eryu Guan wrote:
> > On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > > 
> > > --
> > > 
> > > From: Vivek Goyal <vgo...@redhat.com>
> > > 
> > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream.
> > > 
> > > Right now when a new overlay inode is created, we initialize overlay
> > > inode's ->i_mode from underlying inode ->i_mode but we retain only
> > > file type bits (S_IFMT) and discard permission bits.
> > > 
> > > This patch changes it and retains permission bits too. This should allow
> > > overlay to do permission checks on overlay inode itself in task context.
> > > 
> > > [SzM] It also fixes clearing suid/sgid bits on write.
> > 
> > This patch introduced a hang when writing to suid file, fstests
> > generic/355 could reproduce the hang easily, it only failed the test
> > without this patch and didn't hang the kernel.
> > 
> > Should we skip it for now and wait for a further fix?
> 
> Does Linus's tree have the same problem?

Yes, 4.7 kernel hangs as well.

> 
> > (The 4.6-stable tree faces the same question)
> 
> Are we just missing a patch to be applied here?

AFAICT, no patch is available to fix it yet. But I've reported the bug
to Miklos (in RH bugzilla).

> 
> And this is already in the released stable kernels...

This patch is not in 4.7-rc7, it first appears in 4.7 kernel, seems like
the soak time is not long enough to let testings finish :) I finished my
4.7 kernel testing and identified the culprit this Monday.

Thanks,
Eryu


Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-03 Thread Eryu Guan
On Wed, Aug 03, 2016 at 09:45:06AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 03, 2016 at 03:36:44PM +0800, Eryu Guan wrote:
> > On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > > 
> > > --
> > > 
> > > From: Vivek Goyal 
> > > 
> > > commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream.
> > > 
> > > Right now when a new overlay inode is created, we initialize overlay
> > > inode's ->i_mode from underlying inode ->i_mode but we retain only
> > > file type bits (S_IFMT) and discard permission bits.
> > > 
> > > This patch changes it and retains permission bits too. This should allow
> > > overlay to do permission checks on overlay inode itself in task context.
> > > 
> > > [SzM] It also fixes clearing suid/sgid bits on write.
> > 
> > This patch introduced a hang when writing to suid file, fstests
> > generic/355 could reproduce the hang easily, it only failed the test
> > without this patch and didn't hang the kernel.
> > 
> > Should we skip it for now and wait for a further fix?
> 
> Does Linus's tree have the same problem?

Yes, 4.7 kernel hangs as well.

> 
> > (The 4.6-stable tree faces the same question)
> 
> Are we just missing a patch to be applied here?

AFAICT, no patch is available to fix it yet. But I've reported the bug
to Miklos (in RH bugzilla).

> 
> And this is already in the released stable kernels...

This patch is not in 4.7-rc7, it first appears in 4.7 kernel, seems like
the soak time is not long enough to let testings finish :) I finished my
4.7 kernel testing and identified the culprit this Monday.

Thanks,
Eryu


Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-03 Thread Eryu Guan
On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Vivek Goyal <vgo...@redhat.com>
> 
> commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream.
> 
> Right now when a new overlay inode is created, we initialize overlay
> inode's ->i_mode from underlying inode ->i_mode but we retain only
> file type bits (S_IFMT) and discard permission bits.
> 
> This patch changes it and retains permission bits too. This should allow
> overlay to do permission checks on overlay inode itself in task context.
> 
> [SzM] It also fixes clearing suid/sgid bits on write.

This patch introduced a hang when writing to suid file, fstests
generic/355 could reproduce the hang easily, it only failed the test
without this patch and didn't hang the kernel.

Should we skip it for now and wait for a further fix?

(The 4.6-stable tree faces the same question)

Thanks,
Eryu

> 
> Signed-off-by: Vivek Goyal <vgo...@redhat.com>
> Reported-by: Eryu Guan <eg...@redhat.com>
> Signed-off-by: Miklos Szeredi <mszer...@redhat.com>
> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and 
> f_inode to the underlay")
> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> 
> ---
>  fs/overlayfs/inode.c |3 +--
>  fs/overlayfs/overlayfs.h |1 +
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -412,12 +412,11 @@ struct inode *ovl_new_inode(struct super
>   if (!inode)
>   return NULL;
>  
> - mode &= S_IFMT;
> -
>   inode->i_ino = get_next_ino();
>   inode->i_mode = mode;
>   inode->i_flags |= S_NOATIME | S_NOCMTIME;
>  
> + mode &= S_IFMT;
>   switch (mode) {
>   case S_IFDIR:
>   inode->i_private = oe;
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -181,6 +181,7 @@ static inline void ovl_copyattr(struct i
>  {
>   to->i_uid = from->i_uid;
>   to->i_gid = from->i_gid;
> + to->i_mode = from->i_mode;
>  }
>  
>  /* dir.c */
> 
> 


Re: [PATCH 4.4 133/146] ovl: Copy up underlying inodes ->i_mode to overlay inode

2016-08-03 Thread Eryu Guan
On Mon, Jul 25, 2016 at 01:56:28PM -0700, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Vivek Goyal 
> 
> commit 07a2daab49c549a37b5b744cbebb6e3f445f12bc upstream.
> 
> Right now when a new overlay inode is created, we initialize overlay
> inode's ->i_mode from underlying inode ->i_mode but we retain only
> file type bits (S_IFMT) and discard permission bits.
> 
> This patch changes it and retains permission bits too. This should allow
> overlay to do permission checks on overlay inode itself in task context.
> 
> [SzM] It also fixes clearing suid/sgid bits on write.

This patch introduced a hang when writing to suid file, fstests
generic/355 could reproduce the hang easily, it only failed the test
without this patch and didn't hang the kernel.

Should we skip it for now and wait for a further fix?

(The 4.6-stable tree faces the same question)

Thanks,
Eryu

> 
> Signed-off-by: Vivek Goyal 
> Reported-by: Eryu Guan 
> Signed-off-by: Miklos Szeredi 
> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and 
> f_inode to the underlay")
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  fs/overlayfs/inode.c |3 +--
>  fs/overlayfs/overlayfs.h |1 +
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -412,12 +412,11 @@ struct inode *ovl_new_inode(struct super
>   if (!inode)
>   return NULL;
>  
> - mode &= S_IFMT;
> -
>   inode->i_ino = get_next_ino();
>   inode->i_mode = mode;
>   inode->i_flags |= S_NOATIME | S_NOCMTIME;
>  
> + mode &= S_IFMT;
>   switch (mode) {
>   case S_IFDIR:
>   inode->i_private = oe;
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -181,6 +181,7 @@ static inline void ovl_copyattr(struct i
>  {
>   to->i_uid = from->i_uid;
>   to->i_gid = from->i_gid;
> + to->i_mode = from->i_mode;
>  }
>  
>  /* dir.c */
> 
> 


[PATCH v2 RESEND] fs: return EPERM on immutable inode

2016-08-02 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.

Acked-by: Dave Chinner <dchin...@redhat.com>
Signed-off-by: Eryu Guan <guane...@gmail.com>
---
v2:
- update commit log to mention that it's found by running LTP

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index e0621ca..e4da0ec 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1800,7 +1800,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (gfs2_holder_initialized(_gh))
diff --git a/fs/namei.c b/fs/namei.c
index c386a32..adb0414 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -410,7 +410,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
 
/*
 * Updating mtime will likely cause i_uid and i_gid to be
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 9a7c878..3d6820f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.7.4



[PATCH v2 RESEND] fs: return EPERM on immutable inode

2016-08-02 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.

Acked-by: Dave Chinner 
Signed-off-by: Eryu Guan 
---
v2:
- update commit log to mention that it's found by running LTP

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index e0621ca..e4da0ec 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1800,7 +1800,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (gfs2_holder_initialized(_gh))
diff --git a/fs/namei.c b/fs/namei.c
index c386a32..adb0414 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -410,7 +410,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
 
/*
 * Updating mtime will likely cause i_uid and i_gid to be
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 9a7c878..3d6820f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.7.4



[PATCH v2] fs: return EPERM on immutable inode

2016-04-05 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.

Acked-by: Dave Chinner <dchin...@redhat.com>
Signed-off-by: Eryu Guan <guane...@gmail.com>
---
v2:
- update commit message to include the background on noticing this issue

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index bb30f9a..4c68d91 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (unlock)
diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..7f4a40a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
}
 
retval = do_inode_permission(inode, mask);
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index bcb6c19..4c4c58f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.5.5



[PATCH v2] fs: return EPERM on immutable inode

2016-04-05 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.

Acked-by: Dave Chinner 
Signed-off-by: Eryu Guan 
---
v2:
- update commit message to include the background on noticing this issue

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index bb30f9a..4c68d91 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (unlock)
diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..7f4a40a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
}
 
retval = do_inode_permission(inode, mask);
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index bcb6c19..4c4c58f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.5.5



[PATCH] fs: return EPERM on immutable inode

2016-04-05 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only
a few places returning EACCES. And EPERM looks more reasonable to me.

So converting all EACCES to EPERM on immutable inode.

Signed-off-by: Eryu Guan <guane...@gmail.com>
---

I noticed this when running LTP on overlayfs, setxattr03 failed due to
unexpected EACCES on immutable inode.

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index bb30f9a..4c68d91 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (unlock)
diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..7f4a40a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
}
 
retval = do_inode_permission(inode, mask);
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index bcb6c19..4c4c58f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.5.5



[PATCH] fs: return EPERM on immutable inode

2016-04-05 Thread Eryu Guan
In most cases, EPERM is returned on immutable inode, and there're only
a few places returning EACCES. And EPERM looks more reasonable to me.

So converting all EACCES to EPERM on immutable inode.

Signed-off-by: Eryu Guan 
---

I noticed this when running LTP on overlayfs, setxattr03 failed due to
unexpected EACCES on immutable inode.

 fs/gfs2/inode.c| 2 +-
 fs/namei.c | 2 +-
 fs/utimes.c| 3 ++-
 fs/xfs/xfs_ioctl.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index bb30f9a..4c68d91 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1757,7 +1757,7 @@ int gfs2_permission(struct inode *inode, int mask)
}
 
if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
-   error = -EACCES;
+   error = -EPERM;
else
error = generic_permission(inode, mask);
if (unlock)
diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..7f4a40a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -408,7 +408,7 @@ int __inode_permission(struct inode *inode, int mask)
 * Nobody gets write access to an immutable file.
 */
if (IS_IMMUTABLE(inode))
-   return -EACCES;
+   return -EPERM;
}
 
retval = do_inode_permission(inode, mask);
diff --git a/fs/utimes.c b/fs/utimes.c
index 85c40f4..794f5f5 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -92,10 +92,11 @@ static int utimes_common(struct path *path, struct timespec 
*times)
 * then we need to check permissions, because
 * inode_change_ok() won't do it.
 */
-   error = -EACCES;
+   error = -EPERM;
 if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
 
+   error = -EACCES;
if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index bcb6c19..4c4c58f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -232,7 +232,7 @@ xfs_open_by_handle(
}
 
if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-   error = -EACCES;
+   error = -EPERM;
goto out_dput;
}
 
-- 
2.5.5



Re: [PATCH] list: kill list_force_poison()

2016-03-02 Thread Eryu Guan
On Tue, Mar 01, 2016 at 01:44:32PM -0800, Dan Williams wrote:
> Given we have uninitialized list_heads being passed to list_add() it
> will always be the case that those uninitialized values randomly trigger
> the poison value.  Especially since a list_add() operation will seed the
> stack with the poison value for later stack allocations to trip over.
> For example, see these two false positive reports:
> 
>  list_add attempted on force-poisoned entry
>  WARNING: at lib/list_debug.c:34
>  [..]
>  NIP [c043c390] __list_add+0xb0/0x150
>  LR [c043c38c] __list_add+0xac/0x150
>  Call Trace:
>  [c00fb5fc3320] [c043c38c] __list_add+0xac/0x150 (unreliable)
>  [c00fb5fc33a0] [c081b454] __down+0x4c/0xf8
>  [c00fb5fc3410] [c010b6f8] down+0x68/0x70
>  [c00fb5fc3450] [d000201ebf4c] xfs_buf_lock+0x4c/0x150 [xfs]
> 
>  list_add attempted on force-poisoned entry(0500),
>   new->next == d59ecdb0, new->prev == 0500
>  WARNING: at lib/list_debug.c:33
>  [..]
>  NIP [c042db78] __list_add+0xa8/0x140
>  LR [c042db74] __list_add+0xa4/0x140
>  Call Trace:
>  [c004c749f620] [c042db74] __list_add+0xa4/0x140 (unreliable)
>  [c004c749f6b0] [c08010ec] rwsem_down_read_failed+0x6c/0x1a0
>  [c004c749f760] [c0800828] down_read+0x58/0x60
>  [c004c749f7e0] [d5a1a6bc] xfs_log_commit_cil+0x7c/0x600 [xfs]
> 
> Reported-by: Eryu Guan <eg...@redhat.com>
> Cc: Ross Zwisler <ross.zwis...@linux.intel.com>
> Cc: <x...@oss.sgi.com>
> Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() 
> for dax-gup")
> Signed-off-by: Dan Williams <dan.j.willi...@intel.com>

With this patch applied, I don't see the warning after 1000 iterations
(of course, the warning has been removed :-)).

Thanks!
Eryu

P.S.
With the RFC patch posted eariler, warnings are still triggered.

> ---
>  include/linux/list.h |   11 ---
>  kernel/memremap.c|9 +++--
>  lib/list_debug.c |9 -
>  3 files changed, 7 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 30cf4200ab40..5356f4d661a7 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -113,17 +113,6 @@ extern void __list_del_entry(struct list_head *entry);
>  extern void list_del(struct list_head *entry);
>  #endif
>  
> -#ifdef CONFIG_DEBUG_LIST
> -/*
> - * See devm_memremap_pages() which wants DEBUG_LIST=y to assert if one
> - * of the pages it allocates is ever passed to list_add()
> - */
> -extern void list_force_poison(struct list_head *entry);
> -#else
> -/* fallback to the less strict LIST_POISON* definitions */
> -#define list_force_poison list_del
> -#endif
> -
>  /**
>   * list_replace - replace old entry by new one
>   * @old : the element to be replaced
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index b981a7b023f0..778191e3e887 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -351,8 +351,13 @@ void *devm_memremap_pages(struct device *dev, struct 
> resource *res,
>   for_each_device_pfn(pfn, page_map) {
>   struct page *page = pfn_to_page(pfn);
>  
> - /* ZONE_DEVICE pages must never appear on a slab lru */
> - list_force_poison(>lru);
> + /*
> +  * ZONE_DEVICE pages union ->lru with a ->pgmap back
> +  * pointer.  It is a bug if a ZONE_DEVICE page is ever
> +  * freed or placed on a driver-private list.  Seed the
> +  * storage with LIST_POISON* values.
> +  */
> + list_del(>lru);
>   page->pgmap = pgmap;
>   }
>   devres_add(dev, page_map);
> diff --git a/lib/list_debug.c b/lib/list_debug.c
> index 3345a089ef7b..3859bf63561c 100644
> --- a/lib/list_debug.c
> +++ b/lib/list_debug.c
> @@ -12,13 +12,6 @@
>  #include 
>  #include 
>  
> -static struct list_head force_poison;
> -void list_force_poison(struct list_head *entry)
> -{
> - entry->next = _poison;
> - entry->prev = _poison;
> -}
> -
>  /*
>   * Insert a new entry between two known consecutive entries.
>   *
> @@ -30,8 +23,6 @@ void __list_add(struct list_head *new,
> struct list_head *prev,
> struct list_head *next)
>  {
> - WARN(new->next == _poison || new->prev == _poison,
> - "list_add attempted on force-poisoned entry\n");
>   WARN(next->prev != prev,
>   "list_add corruption. next->prev should be "
>   "prev (%p), but was %p. (next=%p).\n",
> 


Re: [PATCH] list: kill list_force_poison()

2016-03-02 Thread Eryu Guan
On Tue, Mar 01, 2016 at 01:44:32PM -0800, Dan Williams wrote:
> Given we have uninitialized list_heads being passed to list_add() it
> will always be the case that those uninitialized values randomly trigger
> the poison value.  Especially since a list_add() operation will seed the
> stack with the poison value for later stack allocations to trip over.
> For example, see these two false positive reports:
> 
>  list_add attempted on force-poisoned entry
>  WARNING: at lib/list_debug.c:34
>  [..]
>  NIP [c043c390] __list_add+0xb0/0x150
>  LR [c043c38c] __list_add+0xac/0x150
>  Call Trace:
>  [c00fb5fc3320] [c043c38c] __list_add+0xac/0x150 (unreliable)
>  [c00fb5fc33a0] [c081b454] __down+0x4c/0xf8
>  [c00fb5fc3410] [c010b6f8] down+0x68/0x70
>  [c00fb5fc3450] [d000201ebf4c] xfs_buf_lock+0x4c/0x150 [xfs]
> 
>  list_add attempted on force-poisoned entry(0500),
>   new->next == d59ecdb0, new->prev == 0500
>  WARNING: at lib/list_debug.c:33
>  [..]
>  NIP [c042db78] __list_add+0xa8/0x140
>  LR [c042db74] __list_add+0xa4/0x140
>  Call Trace:
>  [c004c749f620] [c042db74] __list_add+0xa4/0x140 (unreliable)
>  [c004c749f6b0] [c08010ec] rwsem_down_read_failed+0x6c/0x1a0
>  [c004c749f760] [c0800828] down_read+0x58/0x60
>  [c004c749f7e0] [d5a1a6bc] xfs_log_commit_cil+0x7c/0x600 [xfs]
> 
> Reported-by: Eryu Guan 
> Cc: Ross Zwisler 
> Cc: 
> Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() 
> for dax-gup")
> Signed-off-by: Dan Williams 

With this patch applied, I don't see the warning after 1000 iterations
(of course, the warning has been removed :-)).

Thanks!
Eryu

P.S.
With the RFC patch posted eariler, warnings are still triggered.

> ---
>  include/linux/list.h |   11 ---
>  kernel/memremap.c|9 +++--
>  lib/list_debug.c |9 -
>  3 files changed, 7 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 30cf4200ab40..5356f4d661a7 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -113,17 +113,6 @@ extern void __list_del_entry(struct list_head *entry);
>  extern void list_del(struct list_head *entry);
>  #endif
>  
> -#ifdef CONFIG_DEBUG_LIST
> -/*
> - * See devm_memremap_pages() which wants DEBUG_LIST=y to assert if one
> - * of the pages it allocates is ever passed to list_add()
> - */
> -extern void list_force_poison(struct list_head *entry);
> -#else
> -/* fallback to the less strict LIST_POISON* definitions */
> -#define list_force_poison list_del
> -#endif
> -
>  /**
>   * list_replace - replace old entry by new one
>   * @old : the element to be replaced
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index b981a7b023f0..778191e3e887 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -351,8 +351,13 @@ void *devm_memremap_pages(struct device *dev, struct 
> resource *res,
>   for_each_device_pfn(pfn, page_map) {
>   struct page *page = pfn_to_page(pfn);
>  
> - /* ZONE_DEVICE pages must never appear on a slab lru */
> - list_force_poison(>lru);
> + /*
> +  * ZONE_DEVICE pages union ->lru with a ->pgmap back
> +  * pointer.  It is a bug if a ZONE_DEVICE page is ever
> +  * freed or placed on a driver-private list.  Seed the
> +  * storage with LIST_POISON* values.
> +  */
> + list_del(>lru);
>   page->pgmap = pgmap;
>   }
>   devres_add(dev, page_map);
> diff --git a/lib/list_debug.c b/lib/list_debug.c
> index 3345a089ef7b..3859bf63561c 100644
> --- a/lib/list_debug.c
> +++ b/lib/list_debug.c
> @@ -12,13 +12,6 @@
>  #include 
>  #include 
>  
> -static struct list_head force_poison;
> -void list_force_poison(struct list_head *entry)
> -{
> - entry->next = _poison;
> - entry->prev = _poison;
> -}
> -
>  /*
>   * Insert a new entry between two known consecutive entries.
>   *
> @@ -30,8 +23,6 @@ void __list_add(struct list_head *new,
> struct list_head *prev,
> struct list_head *next)
>  {
> - WARN(new->next == _poison || new->prev == _poison,
> - "list_add attempted on force-poisoned entry\n");
>   WARN(next->prev != prev,
>   "list_add corruption. next->prev should be "
>   "prev (%p), but was %p. (next=%p).\n",
> 


Re: [PATCH 2/2] fsnotify: turn fsnotify reaper thread into a workqueue job

2016-02-14 Thread Eryu Guan
On Sun, Feb 14, 2016 at 07:15:23PM -0500, Jeff Layton wrote:
> We don't require a dedicated thread for fsnotify cleanup. Switch it over
> to a workqueue job instead that runs on the system_unbound_wq.
> 
> In the interest of not thrashing the queued job too often when there are
> a lot of marks being removed, we delay the reaper job slightly when
> queueing it, to allow several to gather on the list.
> 
> Cc: Jan Kara <j...@suse.com>
> Cc: Eric Paris <epa...@parisplace.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Eryu Guan <guane...@gmail.com>
> Signed-off-by: Jeff Layton <jeff.lay...@primarydata.com>

With these two patches applied on top of r.5-rc3, the same test passed
2-hour stress run, also survived stress test that forks 5 processes
running the same test program for 30 minutes.

Thanks for looking into this!

Eryu


Re: [PATCH 2/2] fsnotify: turn fsnotify reaper thread into a workqueue job

2016-02-14 Thread Eryu Guan
On Sun, Feb 14, 2016 at 07:15:23PM -0500, Jeff Layton wrote:
> We don't require a dedicated thread for fsnotify cleanup. Switch it over
> to a workqueue job instead that runs on the system_unbound_wq.
> 
> In the interest of not thrashing the queued job too often when there are
> a lot of marks being removed, we delay the reaper job slightly when
> queueing it, to allow several to gather on the list.
> 
> Cc: Jan Kara 
> Cc: Eric Paris 
> Cc: Andrew Morton 
> Cc: Eryu Guan 
> Signed-off-by: Jeff Layton 

With these two patches applied on top of r.5-rc3, the same test passed
2-hour stress run, also survived stress test that forks 5 processes
running the same test program for 30 minutes.

Thanks for looking into this!

Eryu


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-24 Thread Eryu Guan
On Mon, Aug 24, 2015 at 04:24:25PM +1000, Dave Chinner wrote:
> On Mon, Aug 24, 2015 at 11:18:16AM +0800, Eryu Guan wrote:
> > On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote:
> > > 
> > > Eryu, can you change the way you run the event trace to be:
> > > 
> > > $ sudo trace-cmd  -o  ./check 
> > > 
> > > rather than running the trace as a background operation elsewhere?
> > > Maybe that will give better results.
[snip]
> Anyway, Eryum long and short of it is that you don't need to worry
> about testing all the different combinations - we now know that the
> completion events are occurring, so let's focus on whether the sync
> code is not waiting for them correctly. Can you trace the following
> events:
> 
>   xfs_log_force
>   xfs_setfilesize
>   writeback_queue
>   writeback_exec
>   writeback_start
>   writeback_queue_io
>   writeback_written
>   writeback_pages_written
> 
> basically I'm trying to see if we've got all the BDI events as we'd
> expect then to be queued and run for sync, and when the ->sync_fs
> call occurs during the sync process before shutdown and unmount...

I collected two versions of trace info with crc enabled.

http://128.199.137.77/writeback-crc/

This version traced the same events as previous runs.

http://128.199.137.77/writeback-crc-v2/

And this version only traced the events you listed above.


And the results of other tests to check(all done with v4 xfs, with no
tracepoints enabled):

> Other things to check (separately):
>   - change godown to godown -f

Passed 100 loops.

>   - add a "sleep 5" before running godown after sync

Failed, if you need the trace info please let me know.

>   - add a "sleep 5; sync" before running godown

Passed 100 loops.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-24 Thread Eryu Guan
On Mon, Aug 24, 2015 at 04:24:25PM +1000, Dave Chinner wrote:
 On Mon, Aug 24, 2015 at 11:18:16AM +0800, Eryu Guan wrote:
  On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote:
   
   Eryu, can you change the way you run the event trace to be:
   
   $ sudo trace-cmd options -o outfile location ./check test options
   
   rather than running the trace as a background operation elsewhere?
   Maybe that will give better results.
[snip]
 Anyway, Eryum long and short of it is that you don't need to worry
 about testing all the different combinations - we now know that the
 completion events are occurring, so let's focus on whether the sync
 code is not waiting for them correctly. Can you trace the following
 events:
 
   xfs_log_force
   xfs_setfilesize
   writeback_queue
   writeback_exec
   writeback_start
   writeback_queue_io
   writeback_written
   writeback_pages_written
 
 basically I'm trying to see if we've got all the BDI events as we'd
 expect then to be queued and run for sync, and when the -sync_fs
 call occurs during the sync process before shutdown and unmount...

I collected two versions of trace info with crc enabled.

http://128.199.137.77/writeback-crc/

This version traced the same events as previous runs.

http://128.199.137.77/writeback-crc-v2/

And this version only traced the events you listed above.


And the results of other tests to check(all done with v4 xfs, with no
tracepoints enabled):

 Other things to check (separately):
   - change godown to godown -f

Passed 100 loops.

   - add a sleep 5 before running godown after sync

Failed, if you need the trace info please let me know.

   - add a sleep 5; sync before running godown

Passed 100 loops.

Thanks,
Eryu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-23 Thread Eryu Guan
On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote:
> 
> Eryu, can you change the way you run the event trace to be:
> 
> $ sudo trace-cmd  -o  ./check 
> 
> rather than running the trace as a background operation elsewhere?
> Maybe that will give better results.

The results are here

http://128.199.137.77/writeback-v3/

> 
> Also, it would be informative to us if you can reproduce this with a
> v5 filesystem (i.e. mkfs.xfs -m crc=1) because it has much better
> on-disk information for sequence-of-event triage like this. If you
> can reproduce it with a v5 filesystem, can you post the trace and
> metadump?

This seems to be harder to reproduce with tracepoints enabled, but I'll
keep trying, and the tests below.

Thanks,
Eryu

> 
> Other things to check (separately):
> 
>   - change godown to godown -f
>   - add a "sleep 5" before running godown after sync
>   - add a "sleep 5; sync" before running godown
> 
> i.e. I'm wondering if sync is not waiting for everything, and so we
> aren't capturing the IO completions because the filesystem is
> already shut down by the time they are delivered...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
> 
> ___
> xfs mailing list
> x...@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-23 Thread Eryu Guan
On Mon, Aug 24, 2015 at 11:11:23AM +1000, Dave Chinner wrote:
 
 Eryu, can you change the way you run the event trace to be:
 
 $ sudo trace-cmd options -o outfile location ./check test options
 
 rather than running the trace as a background operation elsewhere?
 Maybe that will give better results.

The results are here

http://128.199.137.77/writeback-v3/

 
 Also, it would be informative to us if you can reproduce this with a
 v5 filesystem (i.e. mkfs.xfs -m crc=1) because it has much better
 on-disk information for sequence-of-event triage like this. If you
 can reproduce it with a v5 filesystem, can you post the trace and
 metadump?

This seems to be harder to reproduce with tracepoints enabled, but I'll
keep trying, and the tests below.

Thanks,
Eryu

 
 Other things to check (separately):
 
   - change godown to godown -f
   - add a sleep 5 before running godown after sync
   - add a sleep 5; sync before running godown
 
 i.e. I'm wondering if sync is not waiting for everything, and so we
 aren't capturing the IO completions because the filesystem is
 already shut down by the time they are delivered...
 
 Cheers,
 
 Dave.
 -- 
 Dave Chinner
 da...@fromorbit.com
 
 ___
 xfs mailing list
 x...@oss.sgi.com
 http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-21 Thread Eryu Guan
On Sat, Aug 22, 2015 at 10:30:25AM +1000, Dave Chinner wrote:
> On Fri, Aug 21, 2015 at 06:20:53PM +0800, Eryu Guan wrote:
[snip]
> 
> Eryu, can you try again, this time manually specifying the writeback
> tracepoints so you exclude the really noisy ones? You can also drop
> the xfs_file_buffered_write and xfs_file_fsync tracepoints as well,
> as we can see that the incoming side of the code is doing the right
> thing

I excluded the writeback tracepoints you mentioned

writeback_mark_inode_dirty
writeback_dirty_inode_start
writeback_dirty_inode
writeback_dirty_page
writeback_write_inode

and left all other writeback tracepoints enabled, also dropped
xfs_file_buffered_write and xfs_file_fsync.

This time I can reproduce generic/048 quickly and please download the
trace info from below

http://128.199.137.77/writeback-v2/

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-21 Thread Eryu Guan
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
> On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
[snip]
> 
> I'd suggest looking at some of the XFS tracepoints during the test:
> 
> tracepointtrigger
> xfs_file_buffered_write   once per write syscall
> xfs_file_sync once per fsync per inode
> xfs_vm_writepage  every ->writepage call
> xfs_setfilesize   every IO completion that updates inode 
> size
> 
> And it's probably best to also include all the writeback
> tracepoints, too, for context. That will tell you what inodes and
> what part of them are getting written back and when

I finally reproduced generic/048 with both xfs and writeback tracepoints
enabled, please download the trace dat file and trace report file from

http://128.199.137.77/writeback/

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-21 Thread Eryu Guan
On Sat, Aug 22, 2015 at 10:30:25AM +1000, Dave Chinner wrote:
 On Fri, Aug 21, 2015 at 06:20:53PM +0800, Eryu Guan wrote:
[snip]
 
 Eryu, can you try again, this time manually specifying the writeback
 tracepoints so you exclude the really noisy ones? You can also drop
 the xfs_file_buffered_write and xfs_file_fsync tracepoints as well,
 as we can see that the incoming side of the code is doing the right
 thing

I excluded the writeback tracepoints you mentioned

writeback_mark_inode_dirty
writeback_dirty_inode_start
writeback_dirty_inode
writeback_dirty_page
writeback_write_inode

and left all other writeback tracepoints enabled, also dropped
xfs_file_buffered_write and xfs_file_fsync.

This time I can reproduce generic/048 quickly and please download the
trace info from below

http://128.199.137.77/writeback-v2/

Thanks,
Eryu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-21 Thread Eryu Guan
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
 On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
[snip]
 
 I'd suggest looking at some of the XFS tracepoints during the test:
 
 tracepointtrigger
 xfs_file_buffered_write   once per write syscall
 xfs_file_sync once per fsync per inode
 xfs_vm_writepage  every -writepage call
 xfs_setfilesize   every IO completion that updates inode 
 size
 
 And it's probably best to also include all the writeback
 tracepoints, too, for context. That will tell you what inodes and
 what part of them are getting written back and when

I finally reproduced generic/048 with both xfs and writeback tracepoints
enabled, please download the trace dat file and trace report file from

http://128.199.137.77/writeback/

Thanks,
Eryu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-20 Thread Eryu Guan
On Thu, Aug 20, 2015 at 02:12:24PM +0800, Eryu Guan wrote:
> On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
> > On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote:
> > > > Hmm... the only possibility I can think of is tot_write_bandwidth
> > > > being zero when it shouldn't be.  I've been staring at the code for a
> > > > while now but nothing rings a bell.  Time for another debug patch, I
> > > > guess.
> > > 
> > > So, I can now reproduce the bug (it takes a lot of trials but lowering
> > > the number of tested files helps quite a bit) and instrumented all the
> > > early exit paths w/o the fix patch.  bdi_has_dirty_io() and
> > > wb_has_dirty_io() are never out of sync with the actual dirty / io
> > > lists even when the test 048 fails, so the bug at least is not caused
> > > by writeback skipping due to buggy bdi/wb_has_dirty_io() result.
> > > Whenever it skips, all the lists are actually empty (verified while
> > > holding list_lock).
> > > 
> > > One suspicion I have is that this could be a subtle timing issue which
> > > is being exposed by the new short-cut path.  Anything which adds delay
> > > seems to make the issue go away.  Dave, does anything ring a bell?
> > 
> > No, it doesn't. The data writeback mechanisms XFS uses are all
> > generic. It marks inodes I_DIRTY_PAGES and lets the generic code
> > take care of everything else. Yes, we do delayed allocation during
> > writeback, and we log the inode size updates during IO completion,
> > so if inode sizes are not getting updated, then Occam's Razor
> > suggests that writeback is not happening.
> > 
> > I'd suggest looking at some of the XFS tracepoints during the test:
> > 
> > tracepoint  trigger
> > xfs_file_buffered_write once per write syscall
> > xfs_file_sync   once per fsync per inode
> > xfs_vm_writepageevery ->writepage call
> > xfs_setfilesize every IO completion that updates inode 
> > size
> 
> I gave the tracepoints a try, but my root fs is xfs so I got many
> noises. I'll try to install a new vm with ext4 as root fs. But I'm not
> sure if the new vm could reproduce the failure, will see.

I installed a new vm with ext4 as root fs and got some trace info.

On the new vm, only generic/048 is reproducible, generic/049 always
passes. And I can only reproduce generic/048 when xfs tracepoints are
enabled, if writeback tracepoints are enabled too, I can no longer
reproduce the failure.

All tests are done on 4.2-rc7 kernel.

This is the trace-cmd I'm using:

cd /mnt/ext4
trace-cmd record -e xfs_file_buffered_write \
 -e xfs_file_fsync \
 -e xfs_writepage \
 -e xfs_setfilesize &
pushd /path/to/xfstests
./check generic/048
popd
kill -s 2 $!
trace-cmd report >trace_report.txt

I attached three files:
1) xfs-trace-generic-048.txt.bz2[1] trace report result
2) xfs-trace-generic-048.diff   generic/048 failure diff output, could 
know which files has incorrect size
3) xfs-trace-generic-048.metadump.bz2   metadump of SCRATCH_DEV, which contains 
the test files

If more info is needed please let me know.

Thanks,
Eryu

[1] attach this file in a following mail, to avoid xfs list 500k limit
--- tests/generic/048.out   2015-08-20 15:00:06.21000 +0800
+++ /root/xfstests/results//generic/048.out.bad 2015-08-20 20:52:58.84700 
+0800
@@ -1 +1,9 @@
 QA output created by 048
+file /mnt/testarea/scratch/982 has incorrect size - sync failed
+file /mnt/testarea/scratch/983 has incorrect size - sync failed
+file /mnt/testarea/scratch/984 has incorrect size - sync failed
+file /mnt/testarea/scratch/985 has incorrect size - sync failed
+file /mnt/testarea/scratch/987 has incorrect size - sync failed
+file /mnt/testarea/scratch/989 has incorrect size - sync failed
+file /mnt/testarea/scratch/991 has incorrect size - sync failed
+file /mnt/testarea/scratch/993 has incorrect size - sync failed


xfs-trace-generic-048.metadump.bz2
Description: BZip2 compressed data


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-20 Thread Eryu Guan
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
> On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote:
> > > Hmm... the only possibility I can think of is tot_write_bandwidth
> > > being zero when it shouldn't be.  I've been staring at the code for a
> > > while now but nothing rings a bell.  Time for another debug patch, I
> > > guess.
> > 
> > So, I can now reproduce the bug (it takes a lot of trials but lowering
> > the number of tested files helps quite a bit) and instrumented all the
> > early exit paths w/o the fix patch.  bdi_has_dirty_io() and
> > wb_has_dirty_io() are never out of sync with the actual dirty / io
> > lists even when the test 048 fails, so the bug at least is not caused
> > by writeback skipping due to buggy bdi/wb_has_dirty_io() result.
> > Whenever it skips, all the lists are actually empty (verified while
> > holding list_lock).
> > 
> > One suspicion I have is that this could be a subtle timing issue which
> > is being exposed by the new short-cut path.  Anything which adds delay
> > seems to make the issue go away.  Dave, does anything ring a bell?
> 
> No, it doesn't. The data writeback mechanisms XFS uses are all
> generic. It marks inodes I_DIRTY_PAGES and lets the generic code
> take care of everything else. Yes, we do delayed allocation during
> writeback, and we log the inode size updates during IO completion,
> so if inode sizes are not getting updated, then Occam's Razor
> suggests that writeback is not happening.
> 
> I'd suggest looking at some of the XFS tracepoints during the test:
> 
> tracepointtrigger
> xfs_file_buffered_write   once per write syscall
> xfs_file_sync once per fsync per inode
> xfs_vm_writepage  every ->writepage call
> xfs_setfilesize   every IO completion that updates inode 
> size

I gave the tracepoints a try, but my root fs is xfs so I got many
noises. I'll try to install a new vm with ext4 as root fs. But I'm not
sure if the new vm could reproduce the failure, will see.

BTW, I guess xfs_vm_writepage should be xfs_writepage, and xfs_file_sync
should be xfs_file_fsync?

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-20 Thread Eryu Guan
On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
 On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
  Hello,
  
  On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote:
   Hmm... the only possibility I can think of is tot_write_bandwidth
   being zero when it shouldn't be.  I've been staring at the code for a
   while now but nothing rings a bell.  Time for another debug patch, I
   guess.
  
  So, I can now reproduce the bug (it takes a lot of trials but lowering
  the number of tested files helps quite a bit) and instrumented all the
  early exit paths w/o the fix patch.  bdi_has_dirty_io() and
  wb_has_dirty_io() are never out of sync with the actual dirty / io
  lists even when the test 048 fails, so the bug at least is not caused
  by writeback skipping due to buggy bdi/wb_has_dirty_io() result.
  Whenever it skips, all the lists are actually empty (verified while
  holding list_lock).
  
  One suspicion I have is that this could be a subtle timing issue which
  is being exposed by the new short-cut path.  Anything which adds delay
  seems to make the issue go away.  Dave, does anything ring a bell?
 
 No, it doesn't. The data writeback mechanisms XFS uses are all
 generic. It marks inodes I_DIRTY_PAGES and lets the generic code
 take care of everything else. Yes, we do delayed allocation during
 writeback, and we log the inode size updates during IO completion,
 so if inode sizes are not getting updated, then Occam's Razor
 suggests that writeback is not happening.
 
 I'd suggest looking at some of the XFS tracepoints during the test:
 
 tracepointtrigger
 xfs_file_buffered_write   once per write syscall
 xfs_file_sync once per fsync per inode
 xfs_vm_writepage  every -writepage call
 xfs_setfilesize   every IO completion that updates inode 
 size

I gave the tracepoints a try, but my root fs is xfs so I got many
noises. I'll try to install a new vm with ext4 as root fs. But I'm not
sure if the new vm could reproduce the failure, will see.

BTW, I guess xfs_vm_writepage should be xfs_writepage, and xfs_file_sync
should be xfs_file_fsync?

Thanks,
Eryu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

2015-08-20 Thread Eryu Guan
On Thu, Aug 20, 2015 at 02:12:24PM +0800, Eryu Guan wrote:
 On Wed, Aug 19, 2015 at 07:56:11AM +1000, Dave Chinner wrote:
  On Tue, Aug 18, 2015 at 12:54:39PM -0700, Tejun Heo wrote:
   Hello,
   
   On Tue, Aug 18, 2015 at 10:47:18AM -0700, Tejun Heo wrote:
Hmm... the only possibility I can think of is tot_write_bandwidth
being zero when it shouldn't be.  I've been staring at the code for a
while now but nothing rings a bell.  Time for another debug patch, I
guess.
   
   So, I can now reproduce the bug (it takes a lot of trials but lowering
   the number of tested files helps quite a bit) and instrumented all the
   early exit paths w/o the fix patch.  bdi_has_dirty_io() and
   wb_has_dirty_io() are never out of sync with the actual dirty / io
   lists even when the test 048 fails, so the bug at least is not caused
   by writeback skipping due to buggy bdi/wb_has_dirty_io() result.
   Whenever it skips, all the lists are actually empty (verified while
   holding list_lock).
   
   One suspicion I have is that this could be a subtle timing issue which
   is being exposed by the new short-cut path.  Anything which adds delay
   seems to make the issue go away.  Dave, does anything ring a bell?
  
  No, it doesn't. The data writeback mechanisms XFS uses are all
  generic. It marks inodes I_DIRTY_PAGES and lets the generic code
  take care of everything else. Yes, we do delayed allocation during
  writeback, and we log the inode size updates during IO completion,
  so if inode sizes are not getting updated, then Occam's Razor
  suggests that writeback is not happening.
  
  I'd suggest looking at some of the XFS tracepoints during the test:
  
  tracepoint  trigger
  xfs_file_buffered_write once per write syscall
  xfs_file_sync   once per fsync per inode
  xfs_vm_writepageevery -writepage call
  xfs_setfilesize every IO completion that updates inode 
  size
 
 I gave the tracepoints a try, but my root fs is xfs so I got many
 noises. I'll try to install a new vm with ext4 as root fs. But I'm not
 sure if the new vm could reproduce the failure, will see.

I installed a new vm with ext4 as root fs and got some trace info.

On the new vm, only generic/048 is reproducible, generic/049 always
passes. And I can only reproduce generic/048 when xfs tracepoints are
enabled, if writeback tracepoints are enabled too, I can no longer
reproduce the failure.

All tests are done on 4.2-rc7 kernel.

This is the trace-cmd I'm using:

cd /mnt/ext4
trace-cmd record -e xfs_file_buffered_write \
 -e xfs_file_fsync \
 -e xfs_writepage \
 -e xfs_setfilesize 
pushd /path/to/xfstests
./check generic/048
popd
kill -s 2 $!
trace-cmd report trace_report.txt

I attached three files:
1) xfs-trace-generic-048.txt.bz2[1] trace report result
2) xfs-trace-generic-048.diff   generic/048 failure diff output, could 
know which files has incorrect size
3) xfs-trace-generic-048.metadump.bz2   metadump of SCRATCH_DEV, which contains 
the test files

If more info is needed please let me know.

Thanks,
Eryu

[1] attach this file in a following mail, to avoid xfs list 500k limit
--- tests/generic/048.out   2015-08-20 15:00:06.21000 +0800
+++ /root/xfstests/results//generic/048.out.bad 2015-08-20 20:52:58.84700 
+0800
@@ -1 +1,9 @@
 QA output created by 048
+file /mnt/testarea/scratch/982 has incorrect size - sync failed
+file /mnt/testarea/scratch/983 has incorrect size - sync failed
+file /mnt/testarea/scratch/984 has incorrect size - sync failed
+file /mnt/testarea/scratch/985 has incorrect size - sync failed
+file /mnt/testarea/scratch/987 has incorrect size - sync failed
+file /mnt/testarea/scratch/989 has incorrect size - sync failed
+file /mnt/testarea/scratch/991 has incorrect size - sync failed
+file /mnt/testarea/scratch/993 has incorrect size - sync failed


xfs-trace-generic-048.metadump.bz2
Description: BZip2 compressed data