Re: [PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-16 Thread Eric Blake
On Wed, Feb 16, 2022 at 12:39:06PM +0100, Thomas Huth wrote:
> > > -$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> > > [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
> > > +gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> > > [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
> > 
> > GNU sed recommends spelling it 'sed -E', not 'sed -r', when using
> > extended regex.  Older POSIX did not support either spelling, but the
> > next version will require -E, as many implementations have it now:
> > https://www.austingroupbugs.net/view.php?id=528
> 
> Thanks for the pointer ... I originally checked "man 1p sed" on
> my system and did not see -r or -E in there, so I thought that
> this must be really something specific to GNU sed. But now that
> you've mentioned this, I just double-checked the build environments
> that we support with QEMU, and seems like -E should be supported
> everywhere:

Yay.

> 
> So I think it should be safe to change these spots that are
> using "-r" to "sed -E" (and not use gsed here).
> 
> > Other than the fact that this was easier to write with ERE, I'm not
> > seeing any other GNU-only extensions in use here; but I'd recommend
> > that as long as we're touching the line, we spell it 'gsed -Ee'
> > instead of -re (here, and in several other places).
> > 
> > >   _filter_qom_path()
> > >   {
> > > -$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
> > > +gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
> > 
> > Here, it is our use of \+ that is a GNU sed extension, although it is
> > fairly easy (but verbose) to translate that one to portable sed
> > (\+ is the same as *).  So gsed is correct.

Then again, since we claim 'sed -E' is portable, we can get the +
operator everywhere by using ERE instead of BRE (and with fewer
leaning toothpicks, another reason I like ERE better than BRE).

On the
> > other hand, the use of [[0-9]\+\] looks ugly - it probably does NOT
> > match what we meant (we have a bracket expression '[...]' that matches
> > the 11 characters [ and 0-9, then '\+' to match that bracket
> > expression 1 or more times, then '\]' which in its context is
> > identical to ']' to match a closing ], since only opening [ needs \
> > escaping for literal treatment).  What we probably meant is:
> > 
> > '/Attached to:/s/\device\[[0-9][0-9]*]/device[N]/g'
> > 
> > at which point normal sed would do.
> 
> Ok ... but I'd prefer to clean such spots rather in a second step,
> to make sure not to introduce bugs and to make the review easier.

Yeah, fixing bugs and cleaning up consistent use of sed/gsed/$SED are
worth separating.

> > >   _filter_qemu_io()
> > >   {
> > > -_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> > > [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> > > YYY\/sec and XXX ops\/sec)/" \
> > > +_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> > > [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> > > YYY\/sec and XXX ops\/sec)/" \
> > >   -e "s/: line [0-9][0-9]*:  *[0-9][0-9]*\( Aborted\| 
> > > Killed\)/:\1/" \
> > >   -e "s/qemu-io> //g"
> > 
> > I'm not seeing anything specific to GNU sed in this (long) sed script;
> > can we relax this one to plain 'sed'?  Use of s#some/text## might be
> > easier than having to s/some\/text//, but that's not essential.
> 
> If I switch that to plain sed, I'm getting errors like this on NetBSD:
> 
> --- /home/qemu/qemu-test.is2SLq/src/tests/qemu-iotests/046.out
> +++ 11296-046.out.bad
> @@ -66,6 +66,7 @@
>  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  wrote 65536/65536 bytes at offset 2031616
>  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=6442450944 
> backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT

Huh; not sure what happened that I didn't see.  But I trust your tests
as a more canonical version of "it worked on this platform's sed" than
my "I don't see anything blantantly non-POSIX" ;)

> 
>  == Some concurrent requests touching the same cluster ==
> 
> So I'll keep gsed here for now - we need it for _filter_qemu_io
> anyway since it's calling _filter_win32 that currently needs
> gsed, too.

Yeah, I think your patch is big enough to prove there are places where
it really is easier to rely on gsed than to try and be portable.

> 
> > > @@ -142,7 +142,7 @@ _do_filter_img_create()
> > >   # precedes ", fmt=") and the options part ($options, which starts
> > >   # with "fmt=")
> > >   # (And just echo everything before the first "^Formatting")
> > > -readarray formatting_line < <($SED -e 's/, fmt=/\n/')
> > > +readarray formatting_line < <(gsed -e 's/, fmt=/\n/')
> > 
> > This one looks like it should work with plain 'sed'.
> 
> Using normal sed does not really work for me here. For example
> with NetBSD, I get errors like this:
> 
> --- 

Re: [PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-16 Thread Thomas Huth

On 15/02/2022 23.10, Eric Blake wrote:

On Tue, Feb 15, 2022 at 02:20:31PM +0100, Thomas Huth wrote:

Instead of failing the iotests if GNU sed is not available (or skipping
them completely in the check-block.sh script), it would be better to
simply skip the bash-based tests that rely on GNU sed, so that the other
tests could still be run. Thus we now explicitely use "gsed" (either as
direct program or as a wrapper around "sed" if it's the GNU version)
in the spots that rely on the GNU sed behavior. Then we also remove the
sed checks from the check-block.sh script, so that "make check-block"
can now be run on systems without GNU sed, too.

...

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 935217aa65..a3b1708adc 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -21,58 +21,58 @@
  
  _filter_date()

  {
-$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
+gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'


GNU sed recommends spelling it 'sed -E', not 'sed -r', when using
extended regex.  Older POSIX did not support either spelling, but the
next version will require -E, as many implementations have it now:
https://www.austingroupbugs.net/view.php?id=528


Thanks for the pointer ... I originally checked "man 1p sed" on
my system and did not see -r or -E in there, so I thought that
this must be really something specific to GNU sed. But now that
you've mentioned this, I just double-checked the build environments
that we support with QEMU, and seems like -E should be supported
everywhere:

macOS 11:

$ sed --help
sed: illegal option -- -
usage: sed script [-Ealnru] [-i extension] [file ...]
sed [-Ealnu] [-i extension] [-e script] ... [-f script_file] ... [file 
...]


NetBSD 9.2:

$ sed --help
sed: unknown option -- -
Usage:  sed [-aElnru] command [file ...]
sed [-aElnru] [-e command] [-f command_file] [-I[extension]]
[-i[extension]] [file ...]


FreeBSD 12.3:

$ sed --help
sed: illegal option -- -
usage: sed script [-Ealnru] [-i extension] [file ...]
sed [-Ealnu] [-i extension] [-e script] ... [-f script_file] ... [file 
...]


OpenBSD 7.0:

$ sed --help
sed: unknown option -- -
usage: sed [-aEnru] [-i[extension]] command [file ...]
   sed [-aEnru] [-e command] [-f command_file] [-i[extension]] [file ...]


Illumos:

Has -E according to https://illumos.org/man/1/sed


Busybox:

Has -E according to https://www.commandlinux.com/man-page/man1/busybox.1.html


Haiku:

Seems to use GNU sed, so -E is available.


We likely never will run the iotests on Windows, so I did not check
those (but I assume MSYS and friends are using GNU sed, too).


So I think it should be safe to change these spots that are
using "-r" to "sed -E" (and not use gsed here).


Other than the fact that this was easier to write with ERE, I'm not
seeing any other GNU-only extensions in use here; but I'd recommend
that as long as we're touching the line, we spell it 'gsed -Ee'
instead of -re (here, and in several other places).


  _filter_qom_path()
  {
-$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
+gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'


Here, it is our use of \+ that is a GNU sed extension, although it is
fairly easy (but verbose) to translate that one to portable sed
(\+ is the same as *).  So gsed is correct.  On the
other hand, the use of [[0-9]\+\] looks ugly - it probably does NOT
match what we meant (we have a bracket expression '[...]' that matches
the 11 characters [ and 0-9, then '\+' to match that bracket
expression 1 or more times, then '\]' which in its context is
identical to ']' to match a closing ], since only opening [ needs \
escaping for literal treatment).  What we probably meant is:

'/Attached to:/s/\device\[[0-9][0-9]*]/device[N]/g'

at which point normal sed would do.


Ok ... but I'd prefer to clean such spots rather in a second step,
to make sure not to introduce bugs and to make the review easier.


  # Removes \r from messages
  _filter_win32()
  {
-$SED -e 's/\r//g'
+gsed -e 's/\r//g'


Yep, \r is another GNU sed extension.


  }
  
  # sanitize qemu-io output

  _filter_qemu_io()
  {
-_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX 
ops\/sec)/" \
+_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX 
ops\/sec)/" \
  -e "s/: line [0-9][0-9]*:  *[0-9][0-9]*\( Aborted\| Killed\)/:\1/" \
  -e "s/qemu-io> //g"


I'm not seeing anything specific to GNU sed in this (long) sed script;
can we relax this one to plain 'sed'?  Use of s#some/text## might be
easier than having to s/some\/text//, but that's not essential.


If I switch that to plain 

Re: [PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-15 Thread Eric Blake
On Tue, Feb 15, 2022 at 02:20:31PM +0100, Thomas Huth wrote:
> Instead of failing the iotests if GNU sed is not available (or skipping
> them completely in the check-block.sh script), it would be better to
> simply skip the bash-based tests that rely on GNU sed, so that the other
> tests could still be run. Thus we now explicitely use "gsed" (either as
> direct program or as a wrapper around "sed" if it's the GNU version)
> in the spots that rely on the GNU sed behavior. Then we also remove the
> sed checks from the check-block.sh script, so that "make check-block"
> can now be run on systems without GNU sed, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  I've checked that this works fine with:
>  make vm-build-netbsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
>  make vm-build-freebsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
>  and with the macOS targets in our CI.
> 
>  tests/check-block.sh | 12 --
>  tests/qemu-iotests/271   |  2 +-
>  tests/qemu-iotests/common.filter | 74 
>  tests/qemu-iotests/common.rc | 45 +--
>  4 files changed, 61 insertions(+), 72 deletions(-)
> 

> +++ b/tests/qemu-iotests/271
> @@ -896,7 +896,7 @@ _make_test_img -o extended_l2=on 1M
>  # Second and third writes in _concurrent_io() are independent and may finish 
> in
>  # different order. So, filter offset out to match both possible variants.
>  _concurrent_io | $QEMU_IO | _filter_qemu_io | \
> -$SED -e 's/\(20480\|40960\)/OFFSET/'
> +sed -e 's/\(20480\|40960\)/OFFSET/'

Looks like a portable sed script, so 'sed' instead of 'gsed' here is fine.

>  _concurrent_verify | $QEMU_IO | _filter_qemu_io
>  
>  # success, all done
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index 935217aa65..a3b1708adc 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -21,58 +21,58 @@
>  
>  _filter_date()
>  {
> -$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
> +gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'

GNU sed recommends spelling it 'sed -E', not 'sed -r', when using
extended regex.  Older POSIX did not support either spelling, but the
next version will require -E, as many implementations have it now:
https://www.austingroupbugs.net/view.php?id=528

Other than the fact that this was easier to write with ERE, I'm not
seeing any other GNU-only extensions in use here; but I'd recommend
that as long as we're touching the line, we spell it 'gsed -Ee'
instead of -re (here, and in several other places).

>  _filter_qom_path()
>  {
> -$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
> +gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'

Here, it is our use of \+ that is a GNU sed extension, although it is
fairly easy (but verbose) to translate that one to portable sed
(\+ is the same as *).  So gsed is correct.  On the
other hand, the use of [[0-9]\+\] looks ugly - it probably does NOT
match what we meant (we have a bracket expression '[...]' that matches
the 11 characters [ and 0-9, then '\+' to match that bracket
expression 1 or more times, then '\]' which in its context is
identical to ']' to match a closing ], since only opening [ needs \
escaping for literal treatment).  What we probably meant is:

'/Attached to:/s/\device\[[0-9][0-9]*]/device[N]/g'

at which point normal sed would do.

>  }
>  
>  # replace occurrences of the actual TEST_DIR value with TEST_DIR
>  _filter_testdir()
>  {
> -$SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
> - -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
> - -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
> +sed -e "s#$TEST_DIR/#TEST_DIR/#g" \
> +-e "s#$SOCK_DIR/#SOCK_DIR/#g" \
> +-e "s#SOCK_DIR/fuse-#TEST_DIR/#g"

And this one indeed looks portable to POSIX (unless $TEST_DIR contains
weird stuff by accident).

>  # Removes \r from messages
>  _filter_win32()
>  {
> -$SED -e 's/\r//g'
> +gsed -e 's/\r//g'

Yep, \r is another GNU sed extension.

>  }
>  
>  # sanitize qemu-io output
>  _filter_qemu_io()
>  {
> -_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> YYY\/sec and XXX ops\/sec)/" \
> +_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> YYY\/sec and XXX ops\/sec)/" \
>  -e "s/: line [0-9][0-9]*:  *[0-9][0-9]*\( Aborted\| Killed\)/:\1/" \
>  -e "s/qemu-io> //g"

I'm not seeing anything specific to GNU sed in this (long) sed script;
can we relax this one to plain 'sed'?  Use of s#some/text## might be
easier than having to s/some\/text//, but that's not essential.

>  }
> @@ -80,7 +80,7 @@ _filter_qemu_io()
>  # replace occurrences of QEMU_PROG with "qemu"
>  _filter_qemu()
>  {
> -