Re: [PATCH 2/5] iotests: allow test discovery before building

2023-03-03 Thread Thomas Huth

On 02/03/2023 19.46, Daniel P. Berrangé wrote:

The 'check' script can be invoked in "dry run" mode, in which case it
merely does test discovery and prints out all their names. Despite only
doing test discovery it still validates that the various QEMU binaries
can be found. This makes it impossible todo test discovery prior to
building QEMU. This is a desirable feature to support, because it will
let meson discover tests.

Fortunately the code in the TestEnv constructor is ordered in a way
that makes this fairly trivial to achieve. We can just short circuit
the constructor after the basic directory paths have been set.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/check  | 1 +
  tests/qemu-iotests/testenv.py | 5 +
  2 files changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 806abc21d6..7e287a79a3 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -129,6 +129,7 @@ if __name__ == '__main__':
imgopts=args.imgopts, misalign=args.misalign,
debug=args.debug, valgrind=args.valgrind,
gdb=args.gdb, qprint=args.print,
+  dry_run=args.dry_run,
source_dir=args.source_dir,
build_dir=args.build_dir)
  
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py

index 9bf37cd381..952efa0e63 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -178,6 +178,7 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
   valgrind: bool = False,
   gdb: bool = False,
   qprint: bool = False,
+ dry_run: bool = False,
   source_dir: Optional[str] = None,
   build_dir: Optional[str] = None) -> None:
  self.imgfmt = imgfmt
@@ -232,6 +233,10 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
  self.build_root = os.path.join(self.build_iotests, '..', '..')
  
  self.init_directories()

+
+if dry_run:
+return
+
  self.init_binaries()
  
  self.malloc_perturb_ = os.getenv('MALLOC_PERTURB_',


This gives me a feeling of Deja-Vu:

 https://lore.kernel.org/all/20220209101530.3442837-5-th...@redhat.com/

... to bad that I never got that merged ;-)

Anyway:
Reviewed-by: Thomas Huth 




Re: [PATCH 4/5] iotests: print TAP protocol version when reporting tests

2023-03-03 Thread Thomas Huth

On 02/03/2023 19.46, Daniel P. Berrangé wrote:

Recently meson started complaining that TAP test reports don't include
the TAP protocol version. While this warning is bogus and has since been
removed from Meson, it looks like good practice to include this header
going forward. The GLib library test harness has started unconditionally
printing the version, so this brings the I/O tests into line.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/testrunner.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 5a771da86e..e734800b3d 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -391,6 +391,7 @@ def run_tests(self, tests: List[str], jobs: int = 1) -> 
bool:
  casenotrun = []
  
  if self.tap:

+print('TAP version 13')
  self.env.print_env('# ')
  print('1..%d' % len(tests))
  else:


Reviewed-by: Thomas Huth 




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Thomas Huth

On 02/03/2023 19.46, Daniel P. Berrangé wrote:

To just repeat the patch 5 description...

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

  * You cannot ask 'meson test' to invoke a single I/O test
  * The meson test timeout can't be applied to the individual
tests
  * Meson only gets a pass/fail for the overall I/O test group
not individual tests
  * If a CI job gets killed by the GitLab timeout, we don't
get visibility into how far through the I/O tests
execution got.

This is not really specific to the I/O tests, the problem is common
to any case of us running a test which is in fact another test
harness which runs many tests. It would be nice to have meson have
the full view of all tests run. Adapting the I/O tests is as easy
win in this respect.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.


Great to see some movement in this area again!

Some questions/remarks:

1) Could you remove tests/check-block.sh now? See also:
   https://lore.kernel.org/all/20220209101530.3442837-9-th...@redhat.com/

2) With regards to parallel execution ... I think it should be
   possible nowadays - the "check" script is normally also run
   with the "-j" switch by the tests/check-block.sh script, so
   if you remove the possibility to run in parallel, it's a
   regression from the previous behavior!

3) When I tried this last year, I had a weird problem that
   the terminal sometimes gets messed up ... I wasn't able
   to track it down back then - could you check by running
   "make check-block" many times (>10 times) to see whether
   it happens with your series or not?

 Thomas




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > To just repeat the patch 5 description...
> > 
> > Currently meson registers a single test that invokes an entire group of
> > I/O tests, hiding the test granularity from meson. There are various
> > downsides of doing this
> > 
> >   * You cannot ask 'meson test' to invoke a single I/O test
> >   * The meson test timeout can't be applied to the individual
> > tests
> >   * Meson only gets a pass/fail for the overall I/O test group
> > not individual tests
> >   * If a CI job gets killed by the GitLab timeout, we don't
> > get visibility into how far through the I/O tests
> > execution got.
> > 
> > This is not really specific to the I/O tests, the problem is common
> > to any case of us running a test which is in fact another test
> > harness which runs many tests. It would be nice to have meson have
> > the full view of all tests run. Adapting the I/O tests is as easy
> > win in this respect.
> > 
> > This switches meson to perform test discovery by invoking 'check' in
> > dry-run mode. It then registers one meson test case for each I/O
> > test. Parallel execution remains disabled since the I/O tests do not
> > use self contained execution environments and thus conflict with
> > each other.
> 
> Great to see some movement in this area again!
> 
> Some questions/remarks:
> 
> 1) Could you remove tests/check-block.sh now? See also:
>https://lore.kernel.org/all/20220209101530.3442837-9-th...@redhat.com/

Possibly, I wasn't sure if that was wanted as a general entry
point for humans, or was solely for meson ?

> 2) With regards to parallel execution ... I think it should be
>possible nowadays - the "check" script is normally also run
>with the "-j" switch by the tests/check-block.sh script, so
>if you remove the possibility to run in parallel, it's a
>regression from the previous behavior!

Hmmm, I got *masses* of test failures when running in parallel
but I'll check again to be sure.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] scsi: megasas: Internal cdbs have 16-byte length

2023-03-03 Thread Fiona Ebner
Am 28.02.23 um 18:11 schrieb Guenter Roeck:
> Host drivers do not necessarily set cdb_len in megasas io commands.
> With commits 6d1511cea0 ("scsi: Reject commands if the CDB length
> exceeds buf_len") and fe9d8927e2 ("scsi: Add buf_len parameter to
> scsi_req_new()"), this results in failures to boot Linux from affected
> SCSI drives because cdb_len is set to 0 by the host driver.
> Set the cdb length to its actual size to solve the problem.
> 

Tested-by: Fiona Ebner 

But I do have a question:

> Signed-off-by: Guenter Roeck 
> ---
>  hw/scsi/megasas.c | 14 ++
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> index 9cbbb16121..d624866bb6 100644
> --- a/hw/scsi/megasas.c
> +++ b/hw/scsi/megasas.c
> @@ -1780,7 +1780,7 @@ static int megasas_handle_io(MegasasState *s, 
> MegasasCmd *cmd, int frame_cmd)
>  uint8_t cdb[16];
>  int len;
>  struct SCSIDevice *sdev = NULL;
> -int target_id, lun_id, cdb_len;
> +int target_id, lun_id;
>  
>  lba_count = le32_to_cpu(cmd->frame->io.header.data_len);
>  lba_start_lo = le32_to_cpu(cmd->frame->io.lba_lo);
> @@ -1789,7 +1789,6 @@ static int megasas_handle_io(MegasasState *s, 
> MegasasCmd *cmd, int frame_cmd)
>  
>  target_id = cmd->frame->header.target_id;
>  lun_id = cmd->frame->header.lun_id;
> -cdb_len = cmd->frame->header.cdb_len;
>  
>  if (target_id < MFI_MAX_LD && lun_id == 0) {
>  sdev = scsi_device_find(&s->bus, 0, target_id, lun_id);
> @@ -1804,15 +1803,6 @@ static int megasas_handle_io(MegasasState *s, 
> MegasasCmd *cmd, int frame_cmd)
>  return MFI_STAT_DEVICE_NOT_FOUND;
>  }
>  
> -if (cdb_len > 16) {
> -trace_megasas_scsi_invalid_cdb_len(
> -mfi_frame_desc(frame_cmd), 1, target_id, lun_id, cdb_len);
> -megasas_write_sense(cmd, SENSE_CODE(INVALID_OPCODE));
> -cmd->frame->header.scsi_status = CHECK_CONDITION;
> -s->event_count++;
> -return MFI_STAT_SCSI_DONE_WITH_ERROR;
> -}

Shouldn't we still fail when cmd->frame->header.cdb_len > 16? Or is the
consequence of

> Host drivers do not necessarily set cdb_len in megasas io commands.

that this can be uninitialized memory and we need to assume it was not
explicitly set?

Best Regards,
Fiona

> -
>  cmd->iov_size = lba_count * sdev->blocksize;
>  if (megasas_map_sgl(s, cmd, &cmd->frame->io.sgl)) {
>  megasas_write_sense(cmd, SENSE_CODE(TARGET_FAILURE));
> @@ -1823,7 +1813,7 @@ static int megasas_handle_io(MegasasState *s, 
> MegasasCmd *cmd, int frame_cmd)
>  
>  megasas_encode_lba(cdb, lba_start, lba_count, is_write);
>  cmd->req = scsi_req_new(sdev, cmd->index,
> -lun_id, cdb, cdb_len, cmd);
> +lun_id, cdb, sizeof(cdb), cmd);
>  if (!cmd->req) {
>  trace_megasas_scsi_req_alloc_failed(
>  mfi_frame_desc(frame_cmd), target_id, lun_id);




Re: [PATCH 5/5] iotests: register each I/O test separately with meson

2023-03-03 Thread Thomas Huth

On 02/03/2023 19.46, Daniel P. Berrangé wrote:

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

  * You cannot ask 'meson test' to invoke a single I/O test
  * The meson test timeout can't be applied to the individual
tests
  * Meson only gets a pass/fail for the overall I/O test group
not individual tests
  * If a CI job gets killed by the GitLab timeout, we don't
get visibility into how far through the I/O tests
execution got.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/meson.build | 33 +++--
  1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
index 323a4acb6a..48c82085af 100644
--- a/tests/qemu-iotests/meson.build
+++ b/tests/qemu-iotests/meson.build
@@ -32,16 +32,37 @@ foreach k, v : emulators
endif
  endforeach
  
+qemu_iotests_check_cmd = files('check')

+
  foreach format, speed: qemu_iotests_formats
if speed == 'quick'
  suites = 'block'
else
  suites = ['block-' + speed, speed]
endif
-  test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
format],
-   depends: qemu_iotests_binaries, env: qemu_iotests_env,
-   protocol: 'tap',
-   suite: suites,
-   timeout: 0,
-   is_parallel: false)
+
+  args = ['-tap', '-' + format]
+  if speed == 'quick'
+  args += ['-g', 'auto']
+  endif
+
+  rc = run_command(
+  [qemu_iotests_check_cmd] + args + ['-n'],
+  check: true,
+  )
+
+  foreach item: rc.stdout().strip().split()
+  message('Adding test qemu-iotests-' + format + '-' + item)


This message spoils the output during "configure" quite a bit, please remove 
that line.


Apart from that, patch looks fine to me!

 Thomas



+  args = ['-tap', '-' + format, item,
+  '--source-dir', meson.current_source_dir(),
+  '--build-dir', meson.current_build_dir()]
+  test('qemu-iotests-' + format + '-' + item,
+   qemu_iotests_check_cmd,
+   args: args,
+   is_parallel: false,
+   depends: qemu_iotests_binaries,
+   env: qemu_iotests_env,
+   protocol: 'tap',
+   suite: suites)
+  endforeach
  endforeach





Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 08:53:32AM +, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > > To just repeat the patch 5 description...
> > > 
> > > Currently meson registers a single test that invokes an entire group of
> > > I/O tests, hiding the test granularity from meson. There are various
> > > downsides of doing this
> > > 
> > >   * You cannot ask 'meson test' to invoke a single I/O test
> > >   * The meson test timeout can't be applied to the individual
> > > tests
> > >   * Meson only gets a pass/fail for the overall I/O test group
> > > not individual tests
> > >   * If a CI job gets killed by the GitLab timeout, we don't
> > > get visibility into how far through the I/O tests
> > > execution got.
> > > 
> > > This is not really specific to the I/O tests, the problem is common
> > > to any case of us running a test which is in fact another test
> > > harness which runs many tests. It would be nice to have meson have
> > > the full view of all tests run. Adapting the I/O tests is as easy
> > > win in this respect.
> > > 
> > > This switches meson to perform test discovery by invoking 'check' in
> > > dry-run mode. It then registers one meson test case for each I/O
> > > test. Parallel execution remains disabled since the I/O tests do not
> > > use self contained execution environments and thus conflict with
> > > each other.
> > 
> > Great to see some movement in this area again!
> > 
> > Some questions/remarks:
> > 
> > 1) Could you remove tests/check-block.sh now? See also:
> >https://lore.kernel.org/all/20220209101530.3442837-9-th...@redhat.com/
> 
> Possibly, I wasn't sure if that was wanted as a general entry
> point for humans, or was solely for meson ?
> 
> > 2) With regards to parallel execution ... I think it should be
> >possible nowadays - the "check" script is normally also run
> >with the "-j" switch by the tests/check-block.sh script, so
> >if you remove the possibility to run in parallel, it's a
> >regression from the previous behavior!
> 
> Hmmm, I got *masses* of test failures when running in parallel
> but I'll check again to be sure.

Ooooh, the 'check' script only uses a unique sub-dir for
tests when passing -j with a value >= 2. It also merely
appends the test name, so it is still broken for anyone
who runs multiple copies of 'check' in parallel for
different configurations, even if they passed -j. It
needs to just always use an isolated subdir no matter
what, with process level uniqueness, not merely test.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 5/5] iotests: register each I/O test separately with meson

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 10:34:11AM +0100, Thomas Huth wrote:
> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > Currently meson registers a single test that invokes an entire group of
> > I/O tests, hiding the test granularity from meson. There are various
> > downsides of doing this
> > 
> >   * You cannot ask 'meson test' to invoke a single I/O test
> >   * The meson test timeout can't be applied to the individual
> > tests
> >   * Meson only gets a pass/fail for the overall I/O test group
> > not individual tests
> >   * If a CI job gets killed by the GitLab timeout, we don't
> > get visibility into how far through the I/O tests
> > execution got.
> > 
> > This switches meson to perform test discovery by invoking 'check' in
> > dry-run mode. It then registers one meson test case for each I/O
> > test. Parallel execution remains disabled since the I/O tests do not
> > use self contained execution environments and thus conflict with
> > each other.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   tests/qemu-iotests/meson.build | 33 +++--
> >   1 file changed, 27 insertions(+), 6 deletions(-)
> > 
> > diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
> > index 323a4acb6a..48c82085af 100644
> > --- a/tests/qemu-iotests/meson.build
> > +++ b/tests/qemu-iotests/meson.build
> > @@ -32,16 +32,37 @@ foreach k, v : emulators
> > endif
> >   endforeach
> > +qemu_iotests_check_cmd = files('check')
> > +
> >   foreach format, speed: qemu_iotests_formats
> > if speed == 'quick'
> >   suites = 'block'
> > else
> >   suites = ['block-' + speed, speed]
> > endif
> > -  test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
> > format],
> > -   depends: qemu_iotests_binaries, env: qemu_iotests_env,
> > -   protocol: 'tap',
> > -   suite: suites,
> > -   timeout: 0,
> > -   is_parallel: false)
> > +
> > +  args = ['-tap', '-' + format]
> > +  if speed == 'quick'
> > +  args += ['-g', 'auto']
> > +  endif
> > +
> > +  rc = run_command(
> > +  [qemu_iotests_check_cmd] + args + ['-n'],
> > +  check: true,
> > +  )
> > +
> > +  foreach item: rc.stdout().strip().split()
> > +  message('Adding test qemu-iotests-' + format + '-' + item)
> 
> This message spoils the output during "configure" quite a bit, please remove
> that line.

Yes, that wasn't supposed to be left it, it was me debugging :-)

> 
> Apart from that, patch looks fine to me!
> 
>  Thomas
> 
> 
> > +  args = ['-tap', '-' + format, item,
> > +  '--source-dir', meson.current_source_dir(),
> > +  '--build-dir', meson.current_build_dir()]
> > +  test('qemu-iotests-' + format + '-' + item,
> > +   qemu_iotests_check_cmd,
> > +   args: args,
> > +   is_parallel: false,
> > +   depends: qemu_iotests_binaries,
> > +   env: qemu_iotests_env,
> > +   protocol: 'tap',
> > +   suite: suites)
> > +  endforeach
> >   endforeach
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Thomas Huth

On 03/03/2023 09.53, Daniel P. Berrangé wrote:

On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:

On 02/03/2023 19.46, Daniel P. Berrangé wrote:

To just repeat the patch 5 description...

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

   * You cannot ask 'meson test' to invoke a single I/O test
   * The meson test timeout can't be applied to the individual
 tests
   * Meson only gets a pass/fail for the overall I/O test group
 not individual tests
   * If a CI job gets killed by the GitLab timeout, we don't
 get visibility into how far through the I/O tests
 execution got.

This is not really specific to the I/O tests, the problem is common
to any case of us running a test which is in fact another test
harness which runs many tests. It would be nice to have meson have
the full view of all tests run. Adapting the I/O tests is as easy
win in this respect.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.


Great to see some movement in this area again!

Some questions/remarks:

1) Could you remove tests/check-block.sh now? See also:
https://lore.kernel.org/all/20220209101530.3442837-9-th...@redhat.com/


Possibly, I wasn't sure if that was wanted as a general entry
point for humans, or was solely for meson ?


I think this script was only ever used for "make check-block", I never heard 
of anybody really using this script directly in a regular fashion. Humans 
rather run the tests/qemu-iotests/check script directly. Also see its origins:


 https://gitlab.com/qemu-project/qemu/-/commit/b8c6f29eb84cd3ccbbf

 HTH,
  Thomas




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> 3) When I tried this last year, I had a weird problem that
>the terminal sometimes gets messed up ... I wasn't able
>to track it down back then - could you check by running
>"make check-block" many times (>10 times) to see whether
>it happens with your series or not?

I've just seen this - echo got disabled on my terminal.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 1/5] iotests: explicitly pass source/build dir to 'check' command

2023-03-03 Thread Alex Bennée


Daniel P. Berrangé  writes:

> The 'check' script has some rather dubious logic whereby it assumes
> that if invoked as a symlink, then it is running from a separate
> source tree and build tree, otherwise it assumes the current working
> directory is a combined source and build tree.
>
> This doesn't work if you want to invoke the 'check' script using
> its full source tree path while still using a split source and build
> tree layout. This would be a typical situation with meson if you ask
> it to find the 'check' script path using files('check').
>
> Rather than trying to make the logic more magical, add support for
> explicitly passing the dirs using --source-dir and --build-dir. If
> either is omitted the current logic is maintained.
>
> Signed-off-by: Daniel P. Berrangé 
> ---
>  tests/qemu-iotests/check  |  8 +++-
>  tests/qemu-iotests/testenv.py | 17 +
>  2 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 9bdda1394e..806abc21d6 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -113,6 +113,10 @@ def make_argparser() -> argparse.ArgumentParser:
> 'middle of the process.')
>  g_sel.add_argument('tests', metavar='TEST_FILES', nargs='*',
> help='tests to run, or "--" followed by a command')
> +g_sel.add_argument('--build-dir', default=None,
> +   help='Path to iotests build directory')
> +g_sel.add_argument('--source-dir', default=None,
> +   help='Path to iotests build directory')
>

I'd be tempted to push all the automagic stuff into the options so you
have something like:

--8<---cut here---start->8---
modified   tests/qemu-iotests/check
@@ -27,8 +27,23 @@ from testenv import TestEnv
 from testrunner import TestRunner
 
 
+def get_default_path(follow_link=False):
+"""
+Try to automagically figure out the path we are running from.
+"""
+# called from the build tree?
+if os.path.islink(sys.argv[0]):
+if follow_link:
+return os.path.dirname(os.readlink(sys.argv[0]))
+else:
+return os.path.dirname(os.path.abspath(sys.argv[0]))
+else:  # or source tree?
+return os.getcwd()
+
+
 def make_argparser() -> argparse.ArgumentParser:
-p = argparse.ArgumentParser(description="Test run options")
+p = argparse.ArgumentParser(description="Test run options",
+
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 
 p.add_argument('-n', '--dry-run', action='store_true',
help='show me, do not run tests')
@@ -113,9 +128,9 @@ def make_argparser() -> argparse.ArgumentParser:
'middle of the process.')
 g_sel.add_argument('tests', metavar='TEST_FILES', nargs='*',
help='tests to run, or "--" followed by a command')
-g_sel.add_argument('--build-dir', default=None,
+g_sel.add_argument('--build-dir', default=get_default_path(),
help='Path to iotests build directory')
-g_sel.add_argument('--source-dir', default=None,
+g_sel.add_argument('--source-dir', 
default=get_default_path(follow_link=True),
help='Path to iotests build directory')
 
 return p
modified   tests/qemu-iotests/testenv.py
@@ -213,23 +213,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
 # Initialize generic paths: build_root, build_iotests, source_iotests,
 # which are needed to initialize some environment variables. They are
 # used by init_*() functions as well.
-
-if os.path.islink(sys.argv[0]):
-# called from the build tree
-self.source_iotests = os.path.dirname(
-os.readlink(sys.argv[0]))
-self.build_iotests = os.path.dirname(
-os.path.abspath(sys.argv[0]))
-else:
-# called from the source tree
-self.source_iotests = os.getcwd()
-self.build_iotests = os.getcwd()
-
-if source_dir is not None:
-self.source_iotests = source_dir
-if build_dir is not None:
-self.build_iotests = build_dir
-
+self.source_iotests = source_dir
+self.build_iotests = build_dir
 self.build_root = os.path.join(self.build_iotests, '..', '..')
 
 self.init_directories()
--8<---cut here---end--->8---


>  return p
>  
> @@ -124,7 +128,9 @@ if __name__ == '__main__':
>aiomode=args.aiomode, cachemode=args.cachemode,
>imgopts=args.imgopts, misalign=args.misalign,
>debug=args.debug, valgrind=args.valgrind,
> -  gdb=args.gdb, qprint=args.print)
> +  gdb=args.gdb, qprint=args.print,
> +  source_dir=args.sour

Re: [PATCH 2/5] iotests: allow test discovery before building

2023-03-03 Thread Alex Bennée


Daniel P. Berrangé  writes:

> The 'check' script can be invoked in "dry run" mode, in which case it
> merely does test discovery and prints out all their names. Despite only
> doing test discovery it still validates that the various QEMU binaries
> can be found. This makes it impossible todo test discovery prior to
> building QEMU. This is a desirable feature to support, because it will
> let meson discover tests.
>
> Fortunately the code in the TestEnv constructor is ordered in a way
> that makes this fairly trivial to achieve. We can just short circuit
> the constructor after the basic directory paths have been set.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 3/5] iotests: strip subdir path when listing tests

2023-03-03 Thread Alex Bennée


Daniel P. Berrangé  writes:

> When asking 'check' to list individual tests by invoking it in dry run
> mode, it prints the paths to the tests relative to the base of the
> I/O test directory.
>
> When asking 'check' to run an individual test, however, it mandates that
> only the unqualified test name is given, without any path prefix. This
> inconsistency makes it harder to ask for a list of tests and then invoke
> each one.
>
> Thus the test listing code is change to flatten the test names, by
> printing only the base name, which can be directly invoked.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 4/5] iotests: print TAP protocol version when reporting tests

2023-03-03 Thread Alex Bennée


Daniel P. Berrangé  writes:

> Recently meson started complaining that TAP test reports don't include
> the TAP protocol version. While this warning is bogus and has since been
> removed from Meson, it looks like good practice to include this header
> going forward. The GLib library test harness has started unconditionally
> printing the version, so this brings the I/O tests into line.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 1/5] iotests: explicitly pass source/build dir to 'check' command

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 12:55:26PM +, Alex Bennée wrote:
> 
> Daniel P. Berrangé  writes:
> 
> > The 'check' script has some rather dubious logic whereby it assumes
> > that if invoked as a symlink, then it is running from a separate
> > source tree and build tree, otherwise it assumes the current working
> > directory is a combined source and build tree.
> >
> > This doesn't work if you want to invoke the 'check' script using
> > its full source tree path while still using a split source and build
> > tree layout. This would be a typical situation with meson if you ask
> > it to find the 'check' script path using files('check').
> >
> > Rather than trying to make the logic more magical, add support for
> > explicitly passing the dirs using --source-dir and --build-dir. If
> > either is omitted the current logic is maintained.
> >
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  tests/qemu-iotests/check  |  8 +++-
> >  tests/qemu-iotests/testenv.py | 17 +
> >  2 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> > index 9bdda1394e..806abc21d6 100755
> > --- a/tests/qemu-iotests/check
> > +++ b/tests/qemu-iotests/check
> > @@ -113,6 +113,10 @@ def make_argparser() -> argparse.ArgumentParser:
> > 'middle of the process.')
> >  g_sel.add_argument('tests', metavar='TEST_FILES', nargs='*',
> > help='tests to run, or "--" followed by a command')
> > +g_sel.add_argument('--build-dir', default=None,
> > +   help='Path to iotests build directory')
> > +g_sel.add_argument('--source-dir', default=None,
> > +   help='Path to iotests build directory')
> >
> 
> I'd be tempted to push all the automagic stuff into the options so you
> have something like:

Hmm, yes, that's a nice idea, will give it a go as you suggest.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 10:45:40AM +, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > 3) When I tried this last year, I had a weird problem that
> >the terminal sometimes gets messed up ... I wasn't able
> >to track it down back then - could you check by running
> >"make check-block" many times (>10 times) to see whether
> >it happens with your series or not?
> 
> I've just seen this - echo got disabled on my terminal.

The problem is that testrunner.py script doing

# We want to save current tty settings during test run,
# since an aborting qemu call may leave things screwed up.
@contextmanager
def savetty() -> Iterator[None]:
isterm = sys.stdin.isatty()
if isterm:
fd = sys.stdin.fileno()
attr = termios.tcgetattr(fd)

try:
yield
finally:
if isterm:
termios.tcsetattr(fd, termios.TCSADRAIN, attr)


When invoking 'check' this wraps around execution of the entire
'check' script. IOW it saves/restores the terminal once.

When we invoke 'check' in parallel it will save/restore the same
terminal for each invokation.

If the 'termios.tcgetattr' call runs at the same time as there is
a QEMU running which has put the terminal in raw mode, then when
'check' exits it'll "restore" the terminal to raw mode.

IOW, this savetty() logic is fundamentally unsafe when invoking
'check' in parallel.

AFAICT, the more reliable thing todo would be to spawn a new PTY
when invoking each test, so there's no cleanup required. If QEMU
aborts leaving the TTY messed up it dosn't matter was it was a
temporary throwaway PTY.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] scsi: megasas: Internal cdbs have 16-byte length

2023-03-03 Thread Guenter Roeck

On 3/3/23 01:02, Fiona Ebner wrote:

Am 28.02.23 um 18:11 schrieb Guenter Roeck:

Host drivers do not necessarily set cdb_len in megasas io commands.
With commits 6d1511cea0 ("scsi: Reject commands if the CDB length
exceeds buf_len") and fe9d8927e2 ("scsi: Add buf_len parameter to
scsi_req_new()"), this results in failures to boot Linux from affected
SCSI drives because cdb_len is set to 0 by the host driver.
Set the cdb length to its actual size to solve the problem.



Tested-by: Fiona Ebner 

But I do have a question:


Signed-off-by: Guenter Roeck 
---
  hw/scsi/megasas.c | 14 ++
  1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 9cbbb16121..d624866bb6 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1780,7 +1780,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd 
*cmd, int frame_cmd)
  uint8_t cdb[16];
  int len;
  struct SCSIDevice *sdev = NULL;
-int target_id, lun_id, cdb_len;
+int target_id, lun_id;
  
  lba_count = le32_to_cpu(cmd->frame->io.header.data_len);

  lba_start_lo = le32_to_cpu(cmd->frame->io.lba_lo);
@@ -1789,7 +1789,6 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd 
*cmd, int frame_cmd)
  
  target_id = cmd->frame->header.target_id;

  lun_id = cmd->frame->header.lun_id;
-cdb_len = cmd->frame->header.cdb_len;
  
  if (target_id < MFI_MAX_LD && lun_id == 0) {

  sdev = scsi_device_find(&s->bus, 0, target_id, lun_id);
@@ -1804,15 +1803,6 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd 
*cmd, int frame_cmd)
  return MFI_STAT_DEVICE_NOT_FOUND;
  }
  
-if (cdb_len > 16) {

-trace_megasas_scsi_invalid_cdb_len(
-mfi_frame_desc(frame_cmd), 1, target_id, lun_id, cdb_len);
-megasas_write_sense(cmd, SENSE_CODE(INVALID_OPCODE));
-cmd->frame->header.scsi_status = CHECK_CONDITION;
-s->event_count++;
-return MFI_STAT_SCSI_DONE_WITH_ERROR;
-}


Shouldn't we still fail when cmd->frame->header.cdb_len > 16? Or is the
consequence of


Host drivers do not necessarily set cdb_len in megasas io commands.


that this can be uninitialized memory and we need to assume it was not
explicitly set?



I doubt that real hardware uses or checks the field for the affected commands
because that would be pointless, but it is really up to you to decide how
you want to handle it.

Guenter


Best Regards,
Fiona


-
  cmd->iov_size = lba_count * sdev->blocksize;
  if (megasas_map_sgl(s, cmd, &cmd->frame->io.sgl)) {
  megasas_write_sense(cmd, SENSE_CODE(TARGET_FAILURE));
@@ -1823,7 +1813,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd 
*cmd, int frame_cmd)
  
  megasas_encode_lba(cdb, lba_start, lba_count, is_write);

  cmd->req = scsi_req_new(sdev, cmd->index,
-lun_id, cdb, cdb_len, cmd);
+lun_id, cdb, sizeof(cdb), cmd);
  if (!cmd->req) {
  trace_megasas_scsi_req_alloc_failed(
  mfi_frame_desc(frame_cmd), target_id, lun_id);







Re: [PATCH 1/3] block: make BlockBackend->quiesce_counter atomic

2023-03-03 Thread Hanna Czenczek

On 27.02.23 21:57, Stefan Hajnoczi wrote:

The main loop thread increments/decrements BlockBackend->quiesce_counter
when drained sections begin/end. The counter is read in the I/O code
path. Therefore this field is used to communicate between threads
without a lock.

Use qatomic_set()/qatomic_read() to make it clear that this field is
accessed by multiple threads.

Acquire/release are not necessary because the BlockBackend->in_flight
counter already uses sequentially consistent accesses and running I/O
requests hold that counter when blk_wait_while_drained() is called.

Signed-off-by: Stefan Hajnoczi 
---
  block/block-backend.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 278b04ce69..f00bf2ab35 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c


[...]


@@ -2568,7 +2568,9 @@ static void blk_root_drained_begin(BdrvChild *child)
  BlockBackend *blk = child->opaque;
  ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
  
-if (++blk->quiesce_counter == 1) {

+int new_counter = qatomic_read(&blk->quiesce_counter) + 1;
+qatomic_set(&blk->quiesce_counter, new_counter);
+if (new_counter == 1) {
  if (blk->dev_ops && blk->dev_ops->drained_begin) {
  blk->dev_ops->drained_begin(blk->dev_opaque);
  }


[...]


@@ -2597,12 +2599,14 @@ static bool blk_root_drained_poll(BdrvChild *child)


[...]


  assert(blk->public.throttle_group_member.io_limits_disabled);
  qatomic_dec(&blk->public.throttle_group_member.io_limits_disabled);
  
-if (--blk->quiesce_counter == 0) {

+int new_counter = qatomic_read(&blk->quiesce_counter) - 1;
+qatomic_set(&blk->quiesce_counter, new_counter);


I don’t quite understand why you decided not to use simple atomic 
increments/decrements with just SeqCst in these places.  Maybe it is 
fine this way, but it isn’t trivial to see.  As far as I understand, 
these aren’t hot paths, so I don’t think we’d lose performance by using 
fully atomic operations here.


Hanna




Re: [PATCH 3/3] block: protect BlockBackend->queued_requests with a lock

2023-03-03 Thread Hanna Czenczek

On 27.02.23 21:57, Stefan Hajnoczi wrote:

The CoQueue API offers thread-safety via the lock argument that
qemu_co_queue_wait() and qemu_co_enter_next() take. BlockBackend
currently does not make use of the lock argument. This means that
multiple threads submitting I/O requests can corrupt the CoQueue's
QSIMPLEQ.

Add a QemuMutex and pass it to CoQueue APIs so that the queue is
protected. While we're at it, also assert that the queue is empty when
the BlockBackend is deleted.

Signed-off-by: Stefan Hajnoczi 
---
  block/block-backend.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 2/3] block: make BlockBackend->disable_request_queuing atomic

2023-03-03 Thread Hanna Czenczek

On 27.02.23 21:57, Stefan Hajnoczi wrote:

This field is accessed by multiple threads without a lock. Use explicit
qatomic_read()/qatomic_set() calls. There is no need for acquire/release
because blk_set_disable_request_queuing() doesn't provide any
guarantees (it helps that it's used at BlockBackend creation time and
not when there is I/O in flight).

Signed-off-by: Stefan Hajnoczi 
---
  block/block-backend.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Thomas Huth

On 03/03/2023 14.06, Daniel P. Berrangé wrote:

On Fri, Mar 03, 2023 at 10:45:40AM +, Daniel P. Berrangé wrote:

On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:

On 02/03/2023 19.46, Daniel P. Berrangé wrote:
3) When I tried this last year, I had a weird problem that
the terminal sometimes gets messed up ... I wasn't able
to track it down back then - could you check by running
"make check-block" many times (>10 times) to see whether
it happens with your series or not?


I've just seen this - echo got disabled on my terminal.


The problem is that testrunner.py script doing

# We want to save current tty settings during test run,
# since an aborting qemu call may leave things screwed up.
@contextmanager
def savetty() -> Iterator[None]:
 isterm = sys.stdin.isatty()
 if isterm:
 fd = sys.stdin.fileno()
 attr = termios.tcgetattr(fd)

 try:
 yield
 finally:
 if isterm:
 termios.tcsetattr(fd, termios.TCSADRAIN, attr)


When invoking 'check' this wraps around execution of the entire
'check' script. IOW it saves/restores the terminal once.

When we invoke 'check' in parallel it will save/restore the same
terminal for each invokation.

If the 'termios.tcgetattr' call runs at the same time as there is
a QEMU running which has put the terminal in raw mode, then when
'check' exits it'll "restore" the terminal to raw mode.

IOW, this savetty() logic is fundamentally unsafe when invoking
'check' in parallel.


Hmm, couldn't we e.g. also simply skip this termios stuff when running with 
"--tap" ?


 Thomas




Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
On Fri, Mar 03, 2023 at 04:49:36PM +0100, Thomas Huth wrote:
> On 03/03/2023 14.06, Daniel P. Berrangé wrote:
> > On Fri, Mar 03, 2023 at 10:45:40AM +, Daniel P. Berrangé wrote:
> > > On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > > > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > > > 3) When I tried this last year, I had a weird problem that
> > > > the terminal sometimes gets messed up ... I wasn't able
> > > > to track it down back then - could you check by running
> > > > "make check-block" many times (>10 times) to see whether
> > > > it happens with your series or not?
> > > 
> > > I've just seen this - echo got disabled on my terminal.
> > 
> > The problem is that testrunner.py script doing
> > 
> > # We want to save current tty settings during test run,
> > # since an aborting qemu call may leave things screwed up.
> > @contextmanager
> > def savetty() -> Iterator[None]:
> >  isterm = sys.stdin.isatty()
> >  if isterm:
> >  fd = sys.stdin.fileno()
> >  attr = termios.tcgetattr(fd)
> > 
> >  try:
> >  yield
> >  finally:
> >  if isterm:
> >  termios.tcsetattr(fd, termios.TCSADRAIN, attr)
> > 
> > 
> > When invoking 'check' this wraps around execution of the entire
> > 'check' script. IOW it saves/restores the terminal once.
> > 
> > When we invoke 'check' in parallel it will save/restore the same
> > terminal for each invokation.
> > 
> > If the 'termios.tcgetattr' call runs at the same time as there is
> > a QEMU running which has put the terminal in raw mode, then when
> > 'check' exits it'll "restore" the terminal to raw mode.
> > 
> > IOW, this savetty() logic is fundamentally unsafe when invoking
> > 'check' in parallel.
> 
> Hmm, couldn't we e.g. also simply skip this termios stuff when running with
> "--tap" ?

It actually turns out to be way simpler. We merely need to set
stdin=subprocess.DEVNULL. There is no valid reason for the test
cases to be reading for stdin, as that would block execution.
By setting stdin==/dev/null, the isatty(0) checks will fail and
thus QEMU won't mess with termios, and thus we don't need any
save/restore dance either.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2 0/8] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Daniel P . Berrangé
To just repeat the patch 5 description...

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

 * You cannot ask 'meson test' to invoke a single I/O test
 * The meson test timeout can't be applied to the individual
   tests
 * Meson only gets a pass/fail for the overall I/O test group
   not individual tests
 * Meson can't show the time of individual I/O tests, so we
   can't see why 4-5 are consuming the bulk of the time
   and ripe for optimization
 * If a CI job gets killed by the GitLab timeout, we don't
   get visibility into how far through the I/O tests
   execution got.

This is not really specific to the I/O tests, the problem is common
to any case of us running a test which is in fact another test
harness which runs many tests. It would be nice to have meson have
the full view of all tests run. Adapting the I/O tests is as easy
win in this respect.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.

Compare contrast output from a current job:

  https://gitlab.com/qemu-project/qemu/-/jobs/3863603546

[quote]
204/224 qemu:block / qemu-iotests qcow2   OK 329.94s   119 subtests passed
[/quote]

Vs what is seen with this series:

  https://gitlab.com/berrange/qemu/-/jobs/3865975463

[quote]
204/350 qemu:block / qemu-iotests-qcow2-001   OK2.16s   1 subtests passed
205/350 qemu:block / qemu-iotests-qcow2-002   OK2.77s   1 subtests passed

...snip...

329/350 qemu:block / qemu-iotests-qcow2-qemu-img-close-errors   OK6.19s 
  1 subtests passed
330/350 qemu:block / qemu-iotests-qcow2-qsd-jobs  OK0.55s   1 
subtests passed
[/quote]

A few tweaks were needed to the iotests runner because it had a few
assumptions about it always running in a tree that has already been
built, which is obviously not the case at the time meson does test
discovery.

In v2:

New example pipeline job

   https://gitlab.com/berrange/qemu/-/jobs/3871446106

 * Set build/source dir defaults in CLI option parser
   instead of testenv.py (Alex)
 * Fix messed up termios settings with parallel execution
   by connecting stdin to /dev/null (Thomas)
 * Remove the obsolete check-block.sh script (Thomas)
 * Use a unique sub-directory per test to allow parallelization (Thomas)
 * Enable parallel execution by meson (Thomas)
 * Remove leftover debugging message (Thomas)
 * Use a shorter meson test name 'io-qcow2-012' instead of
   'qemu-iotests-qcow2-012'

Daniel P. Berrangé (8):
  iotests: explicitly pass source/build dir to 'check' command
  iotests: allow test discovery before building
  iotests: strip subdir path when listing tests
  iotests: print TAP protocol version when reporting tests
  iotests: connect stdin to /dev/null when running tests
  iotests: always use a unique sub-directory per test
  iotests: register each I/O test separately with meson
  iotests: remove the check-block.sh script

 tests/check-block.sh | 43 
 tests/qemu-iotests/check | 30 +++---
 tests/qemu-iotests/meson.build   | 35 +-
 tests/qemu-iotests/testenv.py| 20 +++
 tests/qemu-iotests/testrunner.py | 43 ++--
 5 files changed, 78 insertions(+), 93 deletions(-)
 delete mode 100755 tests/check-block.sh

-- 
2.39.2




[PATCH v2 3/8] iotests: strip subdir path when listing tests

2023-03-03 Thread Daniel P . Berrangé
When asking 'check' to list individual tests by invoking it in dry run
mode, it prints the paths to the tests relative to the base of the
I/O test directory.

When asking 'check' to run an individual test, however, it mandates that
only the unqualified test name is given, without any path prefix. This
inconsistency makes it harder to ask for a list of tests and then invoke
each one.

Thus the test listing code is change to flatten the test names, by
printing only the base name, which can be directly invoked.

Reviewed-by: Alex Bennée 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/check | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index bb294ef556..f2e9d27dcf 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -184,7 +184,7 @@ if __name__ == '__main__':
 sys.exit(str(e))
 
 if args.dry_run:
-print('\n'.join(tests))
+print('\n'.join([os.path.basename(t) for t in tests]))
 else:
 with TestRunner(env, tap=args.tap,
 color=args.color) as tr:
-- 
2.39.2




[PATCH v2 6/8] iotests: always use a unique sub-directory per test

2023-03-03 Thread Daniel P . Berrangé
The current test runner is only safe against parallel execution within
a single instance of the 'check' process, and only if -j is given a
value greater than 2. This prevents running multiple copies of the
'check' process for different test scenarios.

This change switches the output / socket directories to always include
the test name, image format and image protocol. This should allow full
parallelism of all distinct test scenarios. eg running both qcow2 and
raw tests at the same time, or both file and nbd tests at the same
time.

It would be possible to allow for parallelism of the same test scenario
by including the pid, but that would potentially let many directories
accumulate over time on failures, so is not done.

Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/testrunner.py | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 81519ed6e2..7b322272e9 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -228,13 +228,11 @@ def find_reference(self, test: str) -> str:
 
 return f'{test}.out'
 
-def do_run_test(self, test: str, mp: bool) -> TestResult:
+def do_run_test(self, test: str) -> TestResult:
 """
 Run one test
 
 :param test: test file path
-:param mp: if true, we are in a multiprocessing environment, use
-   personal subdirectories for test run
 
 Note: this method may be called from subprocess, so it does not
 change ``self`` object in any way!
@@ -257,12 +255,14 @@ def do_run_test(self, test: str, mp: bool) -> TestResult:
 
 args = [str(f_test.resolve())]
 env = self.env.prepare_subprocess(args)
-if mp:
-# Split test directories, so that tests running in parallel don't
-# break each other.
-for d in ['TEST_DIR', 'SOCK_DIR']:
-env[d] = os.path.join(env[d], f_test.name)
-Path(env[d]).mkdir(parents=True, exist_ok=True)
+
+# Split test directories, so that tests running in parallel don't
+# break each other.
+for d in ['TEST_DIR', 'SOCK_DIR']:
+env[d] = os.path.join(
+env[d],
+f"{self.env.imgfmt}-{self.env.imgproto}-{f_test.name}")
+Path(env[d]).mkdir(parents=True, exist_ok=True)
 
 test_dir = env['TEST_DIR']
 f_bad = Path(test_dir, f_test.name + '.out.bad')
@@ -347,7 +347,7 @@ def run_test(self, test: str,
 testname = os.path.basename(test)
 print(f'# running {self.env.imgfmt} {testname}')
 
-res = self.do_run_test(test, mp)
+res = self.do_run_test(test)
 
 end = datetime.datetime.now().strftime('%H:%M:%S')
 self.test_print_one_line(test=test,
-- 
2.39.2




[PATCH v2 2/8] iotests: allow test discovery before building

2023-03-03 Thread Daniel P . Berrangé
The 'check' script can be invoked in "dry run" mode, in which case it
merely does test discovery and prints out all their names. Despite only
doing test discovery it still validates that the various QEMU binaries
can be found. This makes it impossible todo test discovery prior to
building QEMU. This is a desirable feature to support, because it will
let meson discover tests.

Fortunately the code in the TestEnv constructor is ordered in a way
that makes this fairly trivial to achieve. We can just short circuit
the constructor after the basic directory paths have been set.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/check  | 3 ++-
 tests/qemu-iotests/testenv.py | 7 ++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index da7e8a87fe..bb294ef556 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -145,7 +145,8 @@ if __name__ == '__main__':
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
   debug=args.debug, valgrind=args.valgrind,
-  gdb=args.gdb, qprint=args.print)
+  gdb=args.gdb, qprint=args.print,
+  dry_run=args.dry_run)
 
 if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--':
 if not args.tests:
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index aa9d735414..9a37ad9152 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -178,7 +178,8 @@ def __init__(self, source_dir: str, build_dir: str,
  debug: bool = False,
  valgrind: bool = False,
  gdb: bool = False,
- qprint: bool = False) -> None:
+ qprint: bool = False,
+ dry_run: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -218,6 +219,10 @@ def __init__(self, source_dir: str, build_dir: str,
 self.build_root = os.path.join(self.build_iotests, '..', '..')
 
 self.init_directories()
+
+if dry_run:
+return
+
 self.init_binaries()
 
 self.malloc_perturb_ = os.getenv('MALLOC_PERTURB_',
-- 
2.39.2




[PATCH v2 1/8] iotests: explicitly pass source/build dir to 'check' command

2023-03-03 Thread Daniel P . Berrangé
The 'check' script has some rather dubious logic whereby it assumes
that if invoked as a symlink, then it is running from a separate
source tree and build tree, otherwise it assumes the current working
directory is a combined source and build tree.

This doesn't work if you want to invoke the 'check' script using
its full source tree path while still using a split source and build
tree layout. This would be a typical situation with meson if you ask
it to find the 'check' script path using files('check').

Rather than trying to make the logic more magical, add support for
explicitly passing the dirs using --source-dir and --build-dir. If
either is omitted the current logic is maintained.

Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/check  | 25 +++--
 tests/qemu-iotests/testenv.py | 13 -
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 9bdda1394e..da7e8a87fe 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -26,9 +26,23 @@ from findtests import TestFinder
 from testenv import TestEnv
 from testrunner import TestRunner
 
+def get_default_path(follow_link=False):
+"""
+Try to automagically figure out the path we are running from.
+"""
+# called from the build tree?
+if os.path.islink(sys.argv[0]):
+if follow_link:
+return os.path.dirname(os.readlink(sys.argv[0]))
+else:
+return os.path.dirname(os.path.abspath(sys.argv[0]))
+else:  # or source tree?
+return os.getcwd()
 
 def make_argparser() -> argparse.ArgumentParser:
-p = argparse.ArgumentParser(description="Test run options")
+p = argparse.ArgumentParser(
+description="Test run options",
+formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 
 p.add_argument('-n', '--dry-run', action='store_true',
help='show me, do not run tests')
@@ -113,6 +127,11 @@ def make_argparser() -> argparse.ArgumentParser:
'middle of the process.')
 g_sel.add_argument('tests', metavar='TEST_FILES', nargs='*',
help='tests to run, or "--" followed by a command')
+g_sel.add_argument('--build-dir', default=get_default_path(),
+   help='Path to iotests build directory')
+g_sel.add_argument('--source-dir',
+   default=get_default_path(follow_link=True),
+   help='Path to iotests build directory')
 
 return p
 
@@ -120,7 +139,9 @@ def make_argparser() -> argparse.ArgumentParser:
 if __name__ == '__main__':
 args = make_argparser().parse_args()
 
-env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto,
+env = TestEnv(source_dir=args.source_dir,
+  build_dir=args.build_dir,
+  imgfmt=args.imgfmt, imgproto=args.imgproto,
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
   debug=args.debug, valgrind=args.valgrind,
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index a864c74b12..aa9d735414 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -170,7 +170,8 @@ def root(*names: str) -> str:
 if not isxfile(b):
 sys.exit('Not executable: ' + b)
 
-def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
+def __init__(self, source_dir: str, build_dir: str,
+ imgfmt: str, imgproto: str, aiomode: str,
  cachemode: Optional[str] = None,
  imgopts: Optional[str] = None,
  misalign: bool = False,
@@ -211,14 +212,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
 # which are needed to initialize some environment variables. They are
 # used by init_*() functions as well.
 
-if os.path.islink(sys.argv[0]):
-# called from the build tree
-self.source_iotests = os.path.dirname(os.readlink(sys.argv[0]))
-self.build_iotests = os.path.dirname(os.path.abspath(sys.argv[0]))
-else:
-# called from the source tree
-self.source_iotests = os.getcwd()
-self.build_iotests = self.source_iotests
+self.source_iotests = source_dir
+self.build_iotests = build_dir
 
 self.build_root = os.path.join(self.build_iotests, '..', '..')
 
-- 
2.39.2




[PATCH v2 8/8] iotests: remove the check-block.sh script

2023-03-03 Thread Daniel P . Berrangé
Now that meson directly invokes the individual I/O tests, the
check-block.sh wrapper script is no longer required.

Signed-off-by: Daniel P. Berrangé 
---
 tests/check-block.sh | 43 ---
 1 file changed, 43 deletions(-)
 delete mode 100755 tests/check-block.sh

diff --git a/tests/check-block.sh b/tests/check-block.sh
deleted file mode 100755
index 5de2c1ba0b..00
--- a/tests/check-block.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/bin/sh
-
-if [ "$#" -eq 0 ]; then
-echo "Usage: $0 fmt..." >&2
-exit 99
-fi
-
-# Honor the SPEED environment variable, just like we do it for "meson test"
-format_list="$@"
-if [ "$SPEED" = "slow" ] || [ "$SPEED" = "thorough" ]; then
-group=
-else
-group="-g auto"
-fi
-
-skip() {
-echo "1..0 #SKIP $*"
-exit 0
-}
-
-if [ -z "$(find . -name 'qemu-system-*' -print)" ]; then
-skip "No qemu-system binary available ==> Not running the qemu-iotests."
-fi
-
-cd tests/qemu-iotests
-
-# QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
-export QEMU_CHECK_BLOCK_AUTO=1
-export PYTHONUTF8=1
-# If make was called with -jN we want to call ./check with -j N. Extract the
-# flag from MAKEFLAGS, so that if it absent (or MAKEFLAGS is not defined), JOBS
-# would be an empty line otherwise JOBS is prepared string of flag with value:
-# "-j N"
-# Note, that the following works even if make was called with "-j N" or even
-# "--jobs N", as all these variants becomes simply "-jN" in MAKEFLAGS variable.
-JOBS=$(echo "$MAKEFLAGS" | sed -n 's/\(^\|.* \)-j\([0-9]\+\)\( .*\|$\)/-j 
\2/p')
-
-ret=0
-for fmt in $format_list ; do
-${PYTHON} ./check $JOBS -tap -$fmt $group || ret=1
-done
-
-exit $ret
-- 
2.39.2




[PATCH v2 4/8] iotests: print TAP protocol version when reporting tests

2023-03-03 Thread Daniel P . Berrangé
Recently meson started complaining that TAP test reports don't include
the TAP protocol version. While this warning is bogus and has since been
removed from Meson, it looks like good practice to include this header
going forward. The GLib library test harness has started unconditionally
printing the version, so this brings the I/O tests into line.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/testrunner.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 5a771da86e..e734800b3d 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -391,6 +391,7 @@ def run_tests(self, tests: List[str], jobs: int = 1) -> 
bool:
 casenotrun = []
 
 if self.tap:
+print('TAP version 13')
 self.env.print_env('# ')
 print('1..%d' % len(tests))
 else:
-- 
2.39.2




[PATCH v2 7/8] iotests: register each I/O test separately with meson

2023-03-03 Thread Daniel P . Berrangé
Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

 * You cannot ask 'meson test' to invoke a single I/O test
 * The meson test timeout can't be applied to the individual
   tests
 * Meson only gets a pass/fail for the overall I/O test group
   not individual tests
 * If a CI job gets killed by the GitLab timeout, we don't
   get visibility into how far through the I/O tests
   execution got.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.

Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/meson.build | 35 --
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
index 323a4acb6a..a162f683ef 100644
--- a/tests/qemu-iotests/meson.build
+++ b/tests/qemu-iotests/meson.build
@@ -32,16 +32,39 @@ foreach k, v : emulators
   endif
 endforeach
 
+qemu_iotests_check_cmd = files('check')
+
 foreach format, speed: qemu_iotests_formats
   if speed == 'quick'
 suites = 'block'
   else
 suites = ['block-' + speed, speed]
   endif
-  test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
format],
-   depends: qemu_iotests_binaries, env: qemu_iotests_env,
-   protocol: 'tap',
-   suite: suites,
-   timeout: 0,
-   is_parallel: false)
+
+  args = ['-tap', '-' + format]
+  if speed == 'quick'
+  args += ['-g', 'auto']
+  endif
+
+  rc = run_command(
+  [qemu_iotests_check_cmd] + args + ['-n'],
+  check: true,
+  )
+
+  foreach item: rc.stdout().strip().split()
+  args = ['-tap', '-' + format, item,
+  '--source-dir', meson.current_source_dir(),
+  '--build-dir', meson.current_build_dir()]
+  # Some individual tests take as long as 45 seconds
+  # Bump the timeout to 3 minutes for some headroom
+  # on slow machines to minimize spurious failures
+  test('io-' + format + '-' + item,
+   qemu_iotests_check_cmd,
+   args: args,
+   depends: qemu_iotests_binaries,
+   env: qemu_iotests_env,
+   protocol: 'tap',
+   timeout: 180,
+   suite: suites)
+  endforeach
 endforeach
-- 
2.39.2




[PATCH v2 5/8] iotests: connect stdin to /dev/null when running tests

2023-03-03 Thread Daniel P . Berrangé
Currently the tests have their stdin inherited from the test harness,
meaning they are connected to a TTY. The QEMU processes spawned by
certain tests, however, modify TTY settings and if the test exits
abnormally the settings might not be restored.

The python test harness thus has some logic which will capture the
initial TTY settings and restore them once all tests are finished.

This does not, however, take into account the possibility of many
copies of the 'check' program running in parallel. With parallel
execution, a later invokation may save the TTY state that QEMU has
already modified, and thus restore bad state leaving the TTY
non-functional.

None of the I/O tests shnould actually be interactive requiring
user input and so they should not require a TTY at all. To avoid
this while TTY save/restore complexity we can connect the test
stdin to /dev/null instead.

Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/testrunner.py | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index e734800b3d..81519ed6e2 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -24,12 +24,10 @@
 import subprocess
 import contextlib
 import json
-import termios
 import shutil
 import sys
 from multiprocessing import Pool
-from contextlib import contextmanager
-from typing import List, Optional, Iterator, Any, Sequence, Dict, \
+from typing import List, Optional, Any, Sequence, Dict, \
 ContextManager
 
 from testenv import TestEnv
@@ -56,22 +54,6 @@ def file_diff(file1: str, file2: str) -> List[str]:
 return res
 
 
-# We want to save current tty settings during test run,
-# since an aborting qemu call may leave things screwed up.
-@contextmanager
-def savetty() -> Iterator[None]:
-isterm = sys.stdin.isatty()
-if isterm:
-fd = sys.stdin.fileno()
-attr = termios.tcgetattr(fd)
-
-try:
-yield
-finally:
-if isterm:
-termios.tcsetattr(fd, termios.TCSADRAIN, attr)
-
-
 class LastElapsedTime(ContextManager['LastElapsedTime']):
 """ Cache for elapsed time for tests, to show it during new test run
 
@@ -169,7 +151,6 @@ def __enter__(self) -> 'TestRunner':
 self._stack = contextlib.ExitStack()
 self._stack.enter_context(self.env)
 self._stack.enter_context(self.last_elapsed)
-self._stack.enter_context(savetty())
 return self
 
 def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> None:
@@ -294,6 +275,7 @@ def do_run_test(self, test: str, mp: bool) -> TestResult:
 t0 = time.time()
 with f_bad.open('w', encoding="utf-8") as f:
 with subprocess.Popen(args, cwd=str(f_test.parent), env=env,
+  stdin=subprocess.DEVNULL,
   stdout=f, stderr=subprocess.STDOUT) as proc:
 try:
 proc.wait()
-- 
2.39.2




Re: [PATCH v2 5/8] iotests: connect stdin to /dev/null when running tests

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

Currently the tests have their stdin inherited from the test harness,
meaning they are connected to a TTY. The QEMU processes spawned by
certain tests, however, modify TTY settings and if the test exits
abnormally the settings might not be restored.

The python test harness thus has some logic which will capture the
initial TTY settings and restore them once all tests are finished.

This does not, however, take into account the possibility of many
copies of the 'check' program running in parallel. With parallel
execution, a later invokation may save the TTY state that QEMU has


s/invokation/invocation/


already modified, and thus restore bad state leaving the TTY
non-functional.

None of the I/O tests shnould actually be interactive requiring


s/shnould/should/


user input and so they should not require a TTY at all. To avoid
this while TTY save/restore complexity we can connect the test
stdin to /dev/null instead.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/testrunner.py | 22 ++
  1 file changed, 2 insertions(+), 20 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 6/8] iotests: always use a unique sub-directory per test

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

The current test runner is only safe against parallel execution within
a single instance of the 'check' process, and only if -j is given a
value greater than 2. This prevents running multiple copies of the
'check' process for different test scenarios.

This change switches the output / socket directories to always include
the test name, image format and image protocol. This should allow full
parallelism of all distinct test scenarios. eg running both qcow2 and
raw tests at the same time, or both file and nbd tests at the same
time.

It would be possible to allow for parallelism of the same test scenario
by including the pid, but that would potentially let many directories
accumulate over time on failures, so is not done.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/testrunner.py | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 7/8] iotests: register each I/O test separately with meson

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

  * You cannot ask 'meson test' to invoke a single I/O test
  * The meson test timeout can't be applied to the individual
tests
  * Meson only gets a pass/fail for the overall I/O test group
not individual tests
  * If a CI job gets killed by the GitLab timeout, we don't
get visibility into how far through the I/O tests
execution got.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/meson.build | 35 --
  1 file changed, 29 insertions(+), 6 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 8/8] iotests: remove the check-block.sh script

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

Now that meson directly invokes the individual I/O tests, the
check-block.sh wrapper script is no longer required.

Signed-off-by: Daniel P. Berrangé 
---
  tests/check-block.sh | 43 ---
  1 file changed, 43 deletions(-)
  delete mode 100755 tests/check-block.sh


Reviewed-by: Thomas Huth 




Re: [PATCH v2 1/8] iotests: explicitly pass source/build dir to 'check' command

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

The 'check' script has some rather dubious logic whereby it assumes
that if invoked as a symlink, then it is running from a separate
source tree and build tree, otherwise it assumes the current working
directory is a combined source and build tree.

This doesn't work if you want to invoke the 'check' script using
its full source tree path while still using a split source and build
tree layout. This would be a typical situation with meson if you ask
it to find the 'check' script path using files('check').

Rather than trying to make the logic more magical, add support for
explicitly passing the dirs using --source-dir and --build-dir. If
either is omitted the current logic is maintained.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qemu-iotests/check  | 25 +++--
  tests/qemu-iotests/testenv.py | 13 -
  2 files changed, 27 insertions(+), 11 deletions(-)


Reviewed-by: Thomas Huth 





Re: [PATCH v2 0/8] iotests: make meson aware of individual I/O tests

2023-03-03 Thread Thomas Huth

On 03/03/2023 17.07, Daniel P. Berrangé wrote:

To just repeat the patch 5 description...

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

  * You cannot ask 'meson test' to invoke a single I/O test
  * The meson test timeout can't be applied to the individual
tests
  * Meson only gets a pass/fail for the overall I/O test group
not individual tests
  * Meson can't show the time of individual I/O tests, so we
can't see why 4-5 are consuming the bulk of the time
and ripe for optimization
  * If a CI job gets killed by the GitLab timeout, we don't
get visibility into how far through the I/O tests
execution got.

This is not really specific to the I/O tests, the problem is common
to any case of us running a test which is in fact another test
harness which runs many tests. It would be nice to have meson have
the full view of all tests run. Adapting the I/O tests is as easy
win in this respect.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.


Series
Tested-by: Thomas Huth 




test-blockjob: intermittent CI failures in msys2-64bit job

2023-03-03 Thread Peter Maydell
I've noticed that test-blockjob seems to fail intermittently
on the msys2-64bit job:

https://gitlab.com/qemu-project/qemu/-/jobs/3872508803
https://gitlab.com/qemu-project/qemu/-/jobs/3871061024
https://gitlab.com/qemu-project/qemu/-/jobs/3865312440

Sample output:
| 53/89 ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby:
assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR
53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3

I haven't looked at whether these jobs all failed on the
same subtest or whether there is better info lurking in
some log file in the pipeline's artefacts.

thanks
-- PMM



[PATCH] docs: Prefer 'cookie' over 'handle'

2023-03-03 Thread Eric Blake
In libnbd, we quickly learned that distinguishing between 'handle'
(verb for acting on an object) and 'handle' (noun describing which
object to act on) could get confusing; we solved it by renaming the
latter to 'cookie'.  Copy that approach into the NBD spec, and make it
obvious that a cookie is opaque data from the point of view of the
server.  Makes no difference to implementations (other than older code
still using 'handle' may be slightly harder to tie back to the spec).

Suggested-by: Richard W.M. Jones 
Signed-off-by: Eric Blake 
---
 doc/proto.md | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 3a96703..abb23e8 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -301,11 +301,11 @@ may be handled by the server asynchronously), and 
structured reply
 chunks from one request may be interleaved with reply messages from
 other requests; however, there may be constraints that prevent
 arbitrary reordering of structured reply chunks within a given reply.
-Clients SHOULD use a handle that is distinct from all other currently
-pending transactions, but MAY reuse handles that are no longer in
-flight; handles need not be consecutive.  In each reply message
+Clients SHOULD use a cookie that is distinct from all other currently
+pending transactions, but MAY reuse cookies that are no longer in
+flight; cookies need not be consecutive.  In each reply message
 (whether simple or structured), the server MUST use the same value for
-handle as was sent by the client in the corresponding request.  In
+cookie as was sent by the client in the corresponding request.  In
 this way, the client can correlate which request is receiving a
 response.

@@ -349,7 +349,7 @@ The request message, sent by the client, looks as follows:
 C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
 C: 16 bits, command flags  
 C: 16 bits, type  
-C: 64 bits, handle  
+C: 64 bits, cookie  
 C: 64 bits, offset (unsigned)  
 C: 32 bits, length (unsigned)  
 C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
@@ -366,7 +366,7 @@ follows:
 S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
`NBD_REPLY_MAGIC`)  
 S: 32 bits, error (MAY be zero)  
-S: 64 bits, handle  
+S: 64 bits, cookie  
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
 *error* is zero)  

@@ -381,7 +381,7 @@ server must initiate a hard disconnect).  Second, there is 
no way to
 efficiently skip over portions of a sparse file that are known to
 contain all zeroes.  Finally, it is not possible to reliably decode
 the server traffic without also having context of what pending read
-requests were sent by the client, to see which *handle* values will
+requests were sent by the client, to see which *cookie* values will
 have accompanying payload on success.  Therefore structured replies
 are also permitted if negotiated.

@@ -398,7 +398,7 @@ sending errors via a structured reply, as the error can 
then be
 accompanied by a string payload to present to a human user.

 A structured reply MAY occupy multiple structured chunk messages
-(all with the same value for "handle"), and the
+(all with the same value for "cookie"), and the
 `NBD_REPLY_FLAG_DONE` reply flag is used to identify the final
 chunk.  Unless further documented by individual requests below,
 the chunks MAY be sent in any order, except that the chunk with
@@ -418,7 +418,7 @@ A structured reply chunk message looks as follows:
 S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
 S: 16 bits, flags  
 S: 16 bits, type  
-S: 64 bits, handle  
+S: 64 bits, cookie  
 S: 32 bits, length of payload (unsigned)  
 S: *length* bytes of payload data (if *length* is nonzero)  

-- 
2.39.2




Re: [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-03 Thread Eric Blake
On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11/15/22 01:46, Eric Blake wrote:
> > The spec was silent on how many extents a server could reply with.
> > However, both qemu and nbdkit (the two server implementations known to
> > have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
> > cap, and will truncate the amount of extents in a reply to avoid
> > sending a client a reply so large that the client would treat it as a
> > denial of service attack.  Clients currently have no way during
> > negotiation to request such a limit of the server, so it is easier to
> > just document this as a restriction on viable server implementations
> > than to add yet another round of handshaking.  Also, mentioning
> > amplification effects is worthwhile.
> > 
> > When qemu first implemented NBD_CMD_BLOCK_STATUS for the
> > base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
> > as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
> > responded with more than one extent.  Later, when adding its
> > qemu:dirty-bitmap:XYZ context extension (qemu commit 3d068aff16, Jun
> > 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
> > applied to base:allocation once qemu started sending multiple extents
> > for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
> > extents are never smaller than 512 bytes (other than an exception at
> > the end of a file whose size is not aligned to 512), but even so, a
> > request for just under 4G of block status could produce 8M extents,
> > resulting in a reply of 64M if it were not capped smaller.
> > 
> > When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
> > Mar 2019), it did not impose any restriction on the number of extents
> > in the reply chunk.  But because it allows extents as small as one
> > byte, it is easy to write a server that can amplify a client's request
> > of status over 1M of the image into a reply over 8M in size, and it
> > was very easy to demonstrate that a hard cap was needed to avoid
> > crashing clients or otherwise killing the connection (a bad server
> > impacting the client negatively).  So nbdkit enforced a bound of 1M
> > extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019).  [Unrelated
> > to this patch, but worth noting for history: nbdkit's situation also
> > has to deal with the fact that it is designed for plugin server
> > implementations; and not capping the number of extents in a reply also
> > posed a problem to nbdkit as the server, where a plugin could exhaust
> > memory and kill the server, unrelated to any size constraints enforced
> > by a client.]
> > 
> > Since the limit chosen by these two implementations is different, and
> > since nbdkit has versions that were not limited, add this as a SHOULD
> > NOT instead of MUST NOT constraint on servers implementing block
> > status.  It does not matter that qemu picked a smaller limit that it
> > truncates to, since we have already documented that the server may
> > truncate for other reasons (such as it being inefficient to collect
> > that many extents in the first place).  But documenting the limit now
> > becomes even more important in the face of a future addition of 64-bit
> > requests, where a client's request is no longer bounded to 4G and
> > could thereby produce even more than 8M extents for the corner case
> > when every 512 bytes is a new extent, if it were not for this
> > recommendation.
> 
> s-o-b line missed.

I'm not sure if the NBD project has a strict policy on including one,
but I don't mind adding it.

> 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 2/6] spec: Tweak description of maximum block size

2023-03-03 Thread Eric Blake
On Fri, Dec 16, 2022 at 11:22:49PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11/15/22 01:46, Eric Blake wrote:
> > Commit 9f30fedb improved the spec to allow non-payload requests that
> > exceed any advertised maximum block size.  Take this one step further
> > by permitting the server to use NBD_EOVERFLOW as a hint to the client
> > when a request is oversize (while permitting NBD_EINVAL for
> > back-compat), and by rewording the text to explicitly call out that
> > what is being advertised is the maximum payload length, not maximum
> > block size.  This becomes more important when we add 64-bit
> > extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> > to have both an effect length (how much of the image does the client
> > want status on - may be larger than 32 bits) and an optional payload
> > length (a way to filter the response to a subset of negotiated
> > metadata contexts).  In the shorter term, it means that a server may
> > (but not must) accept a read request larger than the maximum block
> > size if it can use structured replies to keep each chunk of the
> > response under the maximum payload limits.
> > ---
> >   doc/proto.md | 127 +--
> >   1 file changed, 73 insertions(+), 54 deletions(-)
> > 
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 8f08583..53c334a 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> > 
> >   During transmission phase, several operations are constrained by the
> >   export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> > -as well as by three block size constraints defined here (minimum,
> > -preferred, and maximum).
> > +as well as by three block size constraints defined here (minimum
> > +block, preferred block, and maximum payload).
> > 
> >   If a client can honour server block size constraints (as set out below
> >   and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> > @@ -772,15 +772,15 @@ learn the server's constraints without committing to 
> > them.
> > 
> >   If block size constraints have not been advertised or agreed on
> >   externally, then a server SHOULD support a default minimum block size
> > -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> > -that is effectively unlimited (0x, or the export size if that
> > -is smaller), while a client desiring maximum interoperability SHOULD
> > -constrain its requests to a minimum block size of 2^9 (512), and limit
> > -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> > -2^25 (33,554,432).  A server that wants to enforce block sizes other
> > -than the defaults specified here MAY refuse to go into transmission
> > -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> > -disconnect) or which uses `NBD_OPT_GO` without requesting
> > +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> > +payload size that is at least 2^25 (33,554,432) (even if the export
> 
> I'm not sure about term "block payload size".. What's "block payload"? May be 
> "reply payload size" or something like this?
> 
> Or should we simply say about simple-reply / structured-reply-chunk total 
> size limit?

Yes, I could live with 'reply payload size limit'; a simple-reply
always has one payload, while a structured-reply can be divided across
multiple chunks where each chunk payload fits that limit.

> 
> > +size is smaller); while a client desiring maximum interoperability
> > +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> > +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> > +block size of 2^25 (33,554,432).  A server that wants to enforce block
> > +sizes other than the defaults specified here MAY refuse to go into
> > +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> > +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
> >   `NBD_INFO_BLOCK_SIZE` (via an error reply of
> >   `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
> >   that do not request sizing information when the server supports
> > @@ -818,17 +818,40 @@ the preferred block size for that export.  The server 
> > MAY advertise an
> >   export size that is not an integer multiple of the preferred block
> >   size.
> > 
> > -The maximum block size represents the maximum length that the server
> 
> The term "maximum block size" is not defined anymore (and removed from 
> NBD_INFO_BLOCK_SIZE)...
> 
> > -is willing to handle in one request.  If advertised, it MAY be
> > -something other than a power of 2, but MUST be either an integer
> > -multiple of the minimum block size or the value 0x for no
> > -inherent limit, MUST be at least as large as the smaller of the
> > +The maximum block payload size represents the maximum payload length
> > +that the server is willing to handle in one request.  If advertised,
> >

Re: [PATCH v2 2/6] spec: Tweak description of maximum block size

2023-03-03 Thread Eric Blake
On Tue, Feb 21, 2023 at 05:21:37PM +0200, Wouter Verhelst wrote:
> Hi Eric,
> 
> Busy days, busy times. Sorry about the insane delays here.

No problem; I've been tackling other things in the meantime too, so
this extension has taken far longer than I planned for more reasons
than just slow review time.

> 
> On Mon, Nov 14, 2022 at 04:46:51PM -0600, Eric Blake wrote:
> > Commit 9f30fedb improved the spec to allow non-payload requests that
> > exceed any advertised maximum block size.  Take this one step further
> > by permitting the server to use NBD_EOVERFLOW as a hint to the client
> > when a request is oversize (while permitting NBD_EINVAL for
> > back-compat), and by rewording the text to explicitly call out that
> > what is being advertised is the maximum payload length, not maximum
> > block size.  This becomes more important when we add 64-bit
> > extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> > to have both an effect length (how much of the image does the client
> > want status on - may be larger than 32 bits) and an optional payload
> > length (a way to filter the response to a subset of negotiated
> > metadata contexts).  In the shorter term, it means that a server may
> > (but not must) accept a read request larger than the maximum block
> > size if it can use structured replies to keep each chunk of the
> > response under the maximum payload limits.
> > ---
> >  doc/proto.md | 127 +--
> >  1 file changed, 73 insertions(+), 54 deletions(-)
> > 
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 8f08583..53c334a 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> > 
> >  During transmission phase, several operations are constrained by the
> >  export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> > -as well as by three block size constraints defined here (minimum,
> > -preferred, and maximum).
> > +as well as by three block size constraints defined here (minimum
> > +block, preferred block, and maximum payload).
> > 
> >  If a client can honour server block size constraints (as set out below
> >  and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> > @@ -772,15 +772,15 @@ learn the server's constraints without committing to 
> > them.
> > 
> >  If block size constraints have not been advertised or agreed on
> >  externally, then a server SHOULD support a default minimum block size
> > -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> > -that is effectively unlimited (0x, or the export size if that
> > -is smaller), while a client desiring maximum interoperability SHOULD
> > -constrain its requests to a minimum block size of 2^9 (512), and limit
> > -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> > -2^25 (33,554,432).  A server that wants to enforce block sizes other
> > -than the defaults specified here MAY refuse to go into transmission
> > -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> > -disconnect) or which uses `NBD_OPT_GO` without requesting
> > +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> > +payload size that is at least 2^25 (33,554,432) (even if the export
> > +size is smaller); while a client desiring maximum interoperability
> > +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> > +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> > +block size of 2^25 (33,554,432).  A server that wants to enforce block
> > +sizes other than the defaults specified here MAY refuse to go into
> > +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> > +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
> 
> This does more than what the commit message says: it also limits payload
> size from 0x to 2^25. We already have a "A server that desires
> maximum interoperability" clause that mentions the 2^25, so I'm not
> entirely sure why we need to restrict that for the default cause.
> 
> Remember, apart from specifying how something should be implemented, the
> spec also documents current and historic behavior. I am probably
> convinced that it makes more sense to steer people towards limiting to
> 2^25, but it should be done in such a way that servers which accept a
> 0x block size are not suddenly noncompliant. I don't think this
> does that.

I'll have to play more with the wording here.  A client shouldn't send
larger write requests to a server without knowing the server will
accept it (because traditional servers disconnect instead - the
alternative is to read the client's entire payload, and the larger the
payload is, the longer that takes - the client is starving the server
from serving other clients by making it chew through data).  But
sending larger read requests MAY be tolerated by a server that sends a
graceful error message ("you req

Re: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS

2023-03-03 Thread Eric Blake
On Wed, Feb 22, 2023 at 11:49:18AM +0200, Wouter Verhelst wrote:
> On Mon, Nov 14, 2022 at 04:46:52PM -0600, Eric Blake wrote:
> [...]
> > @@ -1370,9 +1475,10 @@ of the newstyle negotiation.
> >  Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
> >  followed by an `NBD_REP_ACK` or an error.
> > 
> > -This option SHOULD NOT be requested unless structured replies have
> > -been negotiated first. If a client attempts to do so, a server
> > -MAY send `NBD_REP_ERR_INVALID`.
> > +This option SHOULD NOT be requested unless structured replies or
> > +extended headers have been negotiated first. If a client attempts
> > +to do so, a server MAY send `NBD_REP_ERR_INVALID` or
> > +`NBD_REP_ERR_EXT_HEADER_REQD`.
> 
> Is it the intent that NBD_REP_ERR_EXT_HEADER_REQD means structured
> replies are not supported by this server? I think that could be
> clarified here.

My intent here was that a newer server that ONLY wants to talk to
clients that understand extended headers can use
NBD_REP_ERR_EXT_HEADER_REQD as its error message to clients that have
not requested extended headers yet.  Older clients may not know what
that error code means, but our spec already has reasonable constraints
that the client should at least recognize it as an error code.

That way, a server only has to implement extended headers, rather than
maintaining the structured header code in parallel.

> 
> (this occurs twice)
> 
> [...]
> > +* `NBD_OPT_EXTENDED_HEADERS` (11)
> > +
> > +The client wishes to use extended headers during the transmission
> > +phase.  The client MUST NOT send any additional data with the
> > +option, and the server SHOULD reject a request that includes data
> > +with `NBD_REP_ERR_INVALID`.
> > +
> > +When successful, this option takes precedence over structured
> > +replies.  A client MAY request structured replies first, although
> > +a server SHOULD support this option even if structured replies are
> > +not negotiated.
> > +
> > +It is envisioned that future extensions will add other new
> > +requests that support a data payload in the request or reply.  A
> > +server that supports such extensions SHOULD NOT advertise those
> > +extensions until the client has negotiated extended headers; and a
> > +client MUST NOT make use of those extensions without first
> > +enabling support for reply payloads.
> > +
> > +The server replies with the following, or with an error permitted
> > +elsewhere in this document:
> > +
> > +- `NBD_REP_ACK`: Extended headers have been negotiated; the client
> > +  MUST use the 32-byte extended request header, with proper use of
> > +  `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload;
> > +  and the server MUST use the 32-byte extended reply header.
> > +- For backwards compatibility, clients SHOULD be prepared to also
> > +  handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> > +  transmission headers will be used.
> > +
> > +Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not
> > +make sense in response to this command, but a server MAY fail with
> > +that error for a later `NBD_OPT_GO` without a client request for
> > +`NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides
> > +more incentive for a client to promise to obey block size
> > +constraints.
> > +
> > +If the client requests `NBD_OPT_STARTTLS` after this option, it
> > +MUST renegotiate extended headers.
> > +
> 
> Does it make sense here to also forbid use of NBD_OPT_EXPORT_NAME? I
> think the sooner we get rid of that, the better ;-)

I hadn't thought of that, but it does indeed sound desirable.  I can
touch up the spec to state that NBD_OPT_EXPORT_NAME MUST NOT be used
by a client that requested extended headers.

> 
> [...]
> > @@ -1746,13 +1914,15 @@ unrecognized flags.
> > 
> >   Structured reply types
> > 
> > -These values are used in the "type" field of a structured reply.
> > -Some chunk types can additionally be categorized by role, such as
> > -*error chunks* or *content chunks*.  Each type determines how to
> > -interpret the "length" bytes of payload.  If the client receives
> > -an unknown or unexpected type, other than an *error chunk*, it
> > -MUST initiate a hard disconnect.  A server MUST NOT send a chunk
> > -larger than any advertised maximum block payload size.
> > +These values are used in the "type" field of a structured reply.  Some
> > +chunk types can additionally be categorized by role, such as *error
> > +chunks*, *content chunks*, or *status chunks*.  Each type determines
> > +how to interpret the "length" bytes of payload.  If the client
> > +receives an unknown or unexpected type, other than an *error chunk*,
> > +it MAY initiate a hard disconnect on the grounds that the client is
> > +uncertain whether the server handled the request as desired.  A server
> > +MUST NOT send a chu

Re: [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD

2023-03-03 Thread Eric Blake
On Wed, Feb 22, 2023 at 12:05:44PM +0200, Wouter Verhelst wrote:
> On Mon, Nov 14, 2022 at 04:46:54PM -0600, Eric Blake wrote:
> >   Simple reply message
> > 
> > @@ -1232,6 +1235,19 @@ The field has the following format:
> >will be faster than a regular write). Clients MUST NOT set the
> >`NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
> >is set.
> > +- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server
> > +  understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to
> > +  `NBD_CMD_BLOCK_STATUS` to allow the client to request that the
> > +  server filters its response to a specific subset of negotiated
> > +  metacontext ids passed in via a client payload, rather than the
> > +  default of replying to all metacontext ids. Servers MUST NOT
> > +  advertise this bit unless the client successfully negotiates
> > +  extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT
> > +  advertise this bit in response to `NBD_OPT_EXPORT_NAME` or
> > +  `NBD_OPT_GO` if the client does not negotiate metacontexts with
> > +  `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the
> > +  `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless
> > +  this transmission flag is set.
> 
> Given that we are introducing this at the same time as the extended
> headers (and given that we're running out of available flags in this
> 16-bit field), isn't it better to make the ability to understand
> PAYLOAD_LEN be implied by extended headers? Or is there a use case for
> implementing extended headers but not a payload length on
> CMD_BLOCK_STATUS that I'm missing?

In my proof of implementation, this was a distinct feature addition on
top of 64-bit headers.

It is easy to write a server that does not want to deal with a client
payload on a block status request (for example, a server that only
knows about one metacontext, and therefore never needs to deal with a
client wanting a subset of negotiated metacontexts).  Forcing a server
to have to support a client-side filtering request on block status
commands, merely because the server now supports 64-bit lengths,
seemed like a stretch too far, which is why I decided to use a feature
bit for this one.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [Libguestfs] [PATCH] docs: Prefer 'cookie' over 'handle'

2023-03-03 Thread Laszlo Ersek
On 3/3/23 23:15, Eric Blake wrote:
> In libnbd, we quickly learned that distinguishing between 'handle'
> (verb for acting on an object) and 'handle' (noun describing which
> object to act on) could get confusing; we solved it by renaming the
> latter to 'cookie'.  Copy that approach into the NBD spec, and make it
> obvious that a cookie is opaque data from the point of view of the
> server.  Makes no difference to implementations (other than older code
> still using 'handle' may be slightly harder to tie back to the spec).
> 
> Suggested-by: Richard W.M. Jones 
> Signed-off-by: Eric Blake 
> ---
>  doc/proto.md | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

ugh, the subject prefix does not name the project this patch is for, and
seeing it through the libguestfs mailing list confused me...

Is this for
?

Anyway, from a superficial reading, this seems certainly sane. I've
known "cookie" from similar (but independent) contexts, and I agree it's
superior to "handle" (noun).

> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 3a96703..abb23e8 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -301,11 +301,11 @@ may be handled by the server asynchronously), and 
> structured reply
>  chunks from one request may be interleaved with reply messages from
>  other requests; however, there may be constraints that prevent
>  arbitrary reordering of structured reply chunks within a given reply.
> -Clients SHOULD use a handle that is distinct from all other currently
> -pending transactions, but MAY reuse handles that are no longer in
> -flight; handles need not be consecutive.  In each reply message
> +Clients SHOULD use a cookie that is distinct from all other currently
> +pending transactions, but MAY reuse cookies that are no longer in
> +flight; cookies need not be consecutive.  In each reply message
>  (whether simple or structured), the server MUST use the same value for
> -handle as was sent by the client in the corresponding request.  In
> +cookie as was sent by the client in the corresponding request.  In
>  this way, the client can correlate which request is receiving a
>  response.
> 
> @@ -349,7 +349,7 @@ The request message, sent by the client, looks as follows:
>  C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
>  C: 16 bits, command flags  
>  C: 16 bits, type  
> -C: 64 bits, handle  
> +C: 64 bits, cookie  
>  C: 64 bits, offset (unsigned)  
>  C: 32 bits, length (unsigned)  
>  C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
> @@ -366,7 +366,7 @@ follows:
>  S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> `NBD_REPLY_MAGIC`)  
>  S: 32 bits, error (MAY be zero)  
> -S: 64 bits, handle  
> +S: 64 bits, cookie  
>  S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>  *error* is zero)  
> 
> @@ -381,7 +381,7 @@ server must initiate a hard disconnect).  Second, there 
> is no way to
>  efficiently skip over portions of a sparse file that are known to
>  contain all zeroes.  Finally, it is not possible to reliably decode
>  the server traffic without also having context of what pending read
> -requests were sent by the client, to see which *handle* values will
> +requests were sent by the client, to see which *cookie* values will
>  have accompanying payload on success.  Therefore structured replies
>  are also permitted if negotiated.
> 
> @@ -398,7 +398,7 @@ sending errors via a structured reply, as the error can 
> then be
>  accompanied by a string payload to present to a human user.
> 
>  A structured reply MAY occupy multiple structured chunk messages
> -(all with the same value for "handle"), and the
> +(all with the same value for "cookie"), and the
>  `NBD_REPLY_FLAG_DONE` reply flag is used to identify the final
>  chunk.  Unless further documented by individual requests below,
>  the chunks MAY be sent in any order, except that the chunk with
> @@ -418,7 +418,7 @@ A structured reply chunk message looks as follows:
>  S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
>  S: 16 bits, flags  
>  S: 16 bits, type  
> -S: 64 bits, handle  
> +S: 64 bits, cookie  
>  S: 32 bits, length of payload (unsigned)  
>  S: *length* bytes of payload data (if *length* is nonzero)  
> 

Acked-by: Laszlo Ersek 

Laszlo