Re: [PATCH 0/7] nsis: gitlab-ci: Improve QEMU Windows installer packaging

2022-10-30 Thread Thomas Huth

On 29/10/2022 15.45, Bin Meng wrote:

Hi Thomas,

On Wed, Sep 21, 2022 at 8:24 PM Thomas Huth  wrote:


On 21/09/2022 14.18, Bin Meng wrote:

Hi,

On Thu, Sep 8, 2022 at 9:28 PM Bin Meng  wrote:


At present packaging the required DLLs of QEMU executables is a
manual process, and error prone.

Improve scripts/nsis.py by adding a logic to automatically package
required DLLs of QEMU executables.

'make installer' is tested in the cross-build on Linux in CI, but
not in the Windows native build. Update CI to test the installer
generation on Windows too.

During testing a 32-bit build issue was exposed in block/nfs.c and
the fix is included in this series.


Bin Meng (7):
scripts/nsis.py: Drop the unnecessary path separator
scripts/nsis.py: Fix destination directory name when invoked on
  Windows
scripts/nsis.py: Automatically package required DLLs of QEMU
  executables
.gitlab-ci.d/windows.yml: Drop the sed processing in the 64-bit build
block/nfs: Fix 32-bit Windows build
.gitlab-ci.d/windows.yml: Unify the prerequisite packages
.gitlab-ci.d/windows.yml: Test 'make installer' in the CI

   meson.build  |  1 +
   block/nfs.c  |  8 ++
   .gitlab-ci.d/windows.yml | 40 ---
   scripts/nsis.py  | 60 +---
   4 files changed, 89 insertions(+), 20 deletions(-)



I see Thomas only queued patch #4 (".gitlab-ci.d/windows.yml: Drop the
sed processing in the 64-bit build")

What about other patches?


I hope that Stefan Weil (our W32 maintainer) could have a look at these first...



Stefan has reviewed / tested patch 1-3. Not sure who is going to queue
these 3 patches?


If Stefan has time for a pull request, I think he would be the best fit. Stefan?

Otherwise, maybe Marc-André could take those patches, since he apparently 
wrote that nsis.py script?


(By the way, should we have an entry for that script in MAINTAINERS? ... 
likely in the W32/W64 section?)


 Thomas




Re: [PULL 00/58] Block layer patches

2022-10-30 Thread Stefan Hajnoczi
On Thu, 27 Oct 2022 at 14:33, Kevin Wolf  wrote:
>
> The following changes since commit 344744e148e6e865f5a57e745b02a87e5ea534ad:
>
>   Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
> into staging (2022-10-26 10:53:49 -0400)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream

Please update the URL to https. Unencrypted git:// has fallen out of
use and is slightly less secure (e.g. if someone quickly applies your
pull request for testing and forgets to verify the tag's GPG
signature).

Stefan



Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer
On Sun, Oct 30, 2022 at 7:38 PM Nir Soffer  wrote:

> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:
>
>> On 01.09.22 16:32, Nir Soffer wrote:
>>
> [...]

> > +/* The current chunk. */
>> > +int64_t offset;
>> > +int64_t length;
>> > +bool zero;
>> > +
>> > +/* Always true for zero extent, false for data extent. Set to true
>> > + * when reading the chunk completes. */
>>
>> Qemu codestyle requires /* and */ to be on separate lines for multi-line
>> comments (see checkpatch.pl).
>>
>
> I'll change that. Do we have a good way to run checkpatch.pl when using
> git-publish?
>
> Maybe a way to run checkpatch.pl on all patches generated by git publish
> automatically?
>

I found
https://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html
and it seems to work well.


Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add coroutine based loop inspired by `qemu-img convert` design.
> >
> > Changes compared to `qemu-img convert`:
> >
> > - State for the entire image is kept in ImgChecksumState
> >
> > - State for single worker coroutine is kept in ImgChecksumworker.
> >
> > - "Writes" are always in-order, ensured using a queue.
> >
> > - Calling block status once per image extent, when the current extent is
> >consumed by the workers.
> >
> > - Using 1m buffer size - testings shows that this gives best read
> >performance both with buffered and direct I/O.
>
> Why does patch 1 then choose to use 2 MB?
>

The first patch uses sync I/O, and in this case 2 MB is a little faster.


> > - Number of coroutines is not configurable. Testing does not show
> >improvement when using more than 8 coroutines.
> >
> > - Progress include entire image, not only the allocated state.
> >
> > Comparing to the simple read loop shows that this version is up to 4.67
> > times faster when computing a checksum for an image full of zeroes. For
> > real images it is 1.59 times faster with direct I/O, and with buffered
> > I/O there is no difference.
> >
> > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
> >
> > | image| size | i/o   | before | after  | change
> |
> >
> |--|--|---||||
> > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67
> |
> > | zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12
> |
> > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02
> |
> > | real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59
> |
> > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36
> |
> > | nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02
> |
> >
> > [1] raw image full of zeroes
> > [2] raw fedora 35 image with additional random data, 50% full
> > [3] image [2] exported by qemu-nbd via unix socket
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   qemu-img.c | 343 +
> >   1 file changed, 270 insertions(+), 73 deletions(-)
>
> Looks good!
>
> Just a couple of style comments below.
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7edcfe4bc8..bfa8e2862f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1613,48 +1613,288 @@ out:
> >   qemu_vfree(buf2);
> >   blk_unref(blk2);
> >   out2:
> >   blk_unref(blk1);
> >   out3:
> >   qemu_progress_end();
> >   return ret;
> >   }
> >
> >   #ifdef CONFIG_BLKHASH
> > +
> > +#define CHECKSUM_COROUTINES 8
> > +#define CHECKSUM_BUF_SIZE (1 * MiB)
> > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> > +
> > +typedef struct ImgChecksumState ImgChecksumState;
> > +
> > +typedef struct ImgChecksumWorker {
> > +QTAILQ_ENTRY(ImgChecksumWorker) entry;
> > +ImgChecksumState *state;
> > +Coroutine *co;
> > +uint8_t *buf;
> > +
> > +/* The current chunk. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +/* Always true for zero extent, false for data extent. Set to true
> > + * when reading the chunk completes. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments (see checkpatch.pl).
>

I'll change that. Do we have a good way to run checkpatch.pl when using
git-publish?

Maybe a way to run checkpatch.pl on all patches generated by git publish
automatically?


> > +bool ready;
> > +} ImgChecksumWorker;
> > +
> > +struct ImgChecksumState {
> > +const char *filename;
> > +BlockBackend *blk;
> > +BlockDriverState *bs;
> > +int64_t total_size;
> > +
> > +/* Current extent, modified in checksum_co_next. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +int running_coroutines;
> > +CoMutex lock;
> > +ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> > +
> > +/* Ensure in-order updates. Update are scheduled at the tail of the
> > + * queue and processed from the head of the queue when a worker is
> > + * ready. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments.
>
> > +QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
> > +
> > +struct blkhash *hash;
> > +int ret;
> > +};
>
> [...]
>
> > +static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
> > +{
> > +ImgChecksumState *s = w->state;
> > +
> > +qemu_co_mutex_lock(&s->lock);
> > +
> > +if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
> > +qemu_co_mutex_unlock(&s->lock);
> > +return false;
> > +}
> > +
> > +if (s->length == 0 && checksum_block_status(s)) {
>
> I’d prefer `checksum_block_status(s) < 0` so that it is clear that
> negative values indicate errors.  Otherwise, based on this code

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add simple tests creating an image with all kinds of extents, different
> > formats, different backing chain, different protocol, and different
> > image options. Since all images have the same guest visible content they
> > must have the same checksum.
> >
> > To help debugging in case of failures, the output includes a json map of
> > every test image.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
> >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> >   2 files changed, 223 insertions(+)
> >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> b/tests/qemu-iotests/tests/qemu-img-checksum
> > new file mode 100755
> > index 00..3a85ba33f2
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > @@ -0,0 +1,149 @@
> > +#!/usr/bin/env python3
> > +# group: rw auto quick
> > +#
> > +# Test cases for qemu-img checksum.
> > +#
> > +# Copyright (C) 2022 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see .
> > +
> > +import re
> > +
> > +import iotests
> > +
> > +from iotests import (
> > +filter_testfiles,
> > +qemu_img,
> > +qemu_img_log,
> > +qemu_io,
> > +qemu_nbd_popen,
> > +)
> > +
> > +
> > +def checksum_available():
> > +out = qemu_img("--help").stdout
> > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > +
> > +
> > +if not checksum_available():
> > +iotests.notrun("checksum command not available")
> > +
> > +iotests.script_initialize(
> > +supported_fmts=["raw", "qcow2"],
> > +supported_cache_modes=["none", "writeback"],
>
> It doesn’t work with writeback, though, because it uses -T none below.
>

Good point


>
> Which by the way is a heavy cost, because I usually run tests in tmpfs,
> where this won’t work.  Is there any way of not doing the -T none below?
>

Testing using tempfs is problematic since you cannot test -T none. In oVirt
we alway use /var/tmp which usually uses something that supports direct I/O.

Do we have a way to specify cache mode in the tests, so we can use -T none
only when the option is set?


>
> > +supported_protocols=["file", "nbd"],
> > +required_fmts=["raw", "qcow2"],
> > +)
> > +
> > +print()
> > +print("=== Test images ===")
> > +print()
> > +
> > +disk_raw = iotests.file_path('raw')
> > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> > +qemu_io("-f", "raw",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", "write -z 4m 2m", # zero allocated
> > +"-c", "write -z -u 6m 2m",  # zero hole
> > +# unallocated
> > +disk_raw)
> > +print(filter_testfiles(disk_raw))
> > +qemu_img_log("map", "--output", "json", disk_raw)
> > +
> > +disk_qcow2 = iotests.file_path('qcow2')
> > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> > +qemu_io("-f", "qcow2",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", "write -z 4m 2m", # zero allocated
> > +"-c", "write -z -u 6m 2m",  # zero hole
> > +# unallocated
> > +disk_qcow2)
> > +print(filter_testfiles(disk_qcow2))
> > +qemu_img_log("map", "--output", "json", disk_qcow2)
>
> This isn’t how iotests work, generally.  When run with -qcow2 -file, it
> should only test qcow2 on file, not raw on file, not raw on nbd. Perhaps
> this way this test could even support other formats than qcow2 and raw.
>

For this type of tests, running the same code with raw, qcow2 (and other
formats)
and using file or nbd is the best way to test this feature - one test
verifies all the
use cases.

I can change this to use the current format (raw, qcow2, ...), protocol
(file, nbd, ...)
and cache value (none, writeback), but I'm not sure how this can work with
the
out files. The output from nbd is different from file. Maybe we need
different out
files for file and nbd (qemu-img-checksum.file.out,

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > The checksum command compute a checksum for disk image content using the
> > blkhash library[1]. The blkhash library is not packaged yet, but it is
> > available via copr[2].
> >
> > Example run:
> >
> >  $ ./qemu-img checksum -p fedora-35.qcow2
> >  6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
> fedora-35.qcow2
> >
> > The block checksum is constructed by splitting the image to fixed sized
> > blocks and computing a digest of every block. The image checksum is the
> > digest of the all block digests.
> >
> > The checksum uses internally the "sha256" algorithm but it cannot be
> > compared with checksums created by other tools such as `sha256sum`.
> >
> > The blkhash library supports sparse images, zero detection, and
> > optimizes zero block hashing (they are practically free). The library
> > uses multiple threads to speed up the computation.
> >
> > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> > faster, depending on the amount of data in the image:
> >
> >  $ ./qemu-img info /scratch/50p.raw
> >  file format: raw
> >  virtual size: 6 GiB (6442450944 bytes)
> >  disk size: 2.91 GiB
> >
> >  $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
> /scratch/50p.raw" \
> >   "sha256sum /scratch/50p.raw"
> >  Benchmark 1: ./qemu-img checksum /scratch/50p.raw
> >Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s,
> System: 0.962 s]
> >Range (min … max):1.813 s …  1.908 s5 runs
> >
> >  Benchmark 2: sha256sum /scratch/50p.raw
> >Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s,
> System: 1.003 s]
> >Range (min … max):   14.501 s … 14.697 s5 runs
> >
> >  Summary
> >'./qemu-img checksum /scratch/50p.raw' ran
> >  7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
> >
> > The new command is available only when `blkhash` is available during
> > build. To test the new command please install the `blkhash-devel`
> > package:
> >
> >  $ dnf copr enable nsoffer/blkhash
> >  $ sudo dnf install blkhash-devel
> >
> > [1] https://gitlab.com/nirs/blkhash
> > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
> >  sha256sum (estimate): 17,749s
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   docs/tools/qemu-img.rst |  22 +
> >   meson.build |  10 ++-
> >   meson_options.txt   |   2 +
> >   qemu-img-cmds.hx|   8 ++
> >   qemu-img.c  | 191 
> >   5 files changed, 232 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 85a6e05b35..8be9c45cbf 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -347,20 +347,42 @@ Command description:
> >   Check completed, image is corrupted
> > 3
> >   Check completed, image has leaked clusters, but is not corrupted
> > 63
> >   Checks are not supported by the image format
> >
> > If ``-r`` is specified, exit codes representing the image state
> refer to the
> > state after (the attempt at) repairing it. That is, a successful
> ``-r all``
> > will yield the exit code 0, independently of the image state before.
> >
> > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T
> SRC_CACHE] [-p] FILENAME
> > +
> > +  Print a checksum for image *FILENAME* guest visible content.
>
> Why not say which kind of checksum it is?
>

Do you mean the algorithm used? This may be confusing, for example we write

   Print a sha256 checksum ...

User will expect to get the same result from "sha256sum disk.img". How about

   Print a blkhash checksum ...

And add a link to the blkhash project?


>
> >  Images
> with
> > +  different format or settings wil have the same checksum.
>
> s/wil/will/
>

Fixing


>
> > +
> > +  The format is probed unless you specify it by ``-f``.
> > +
> > +  The checksum is computed for guest visible content. Allocated areas
> full of
> > +  zeroes, zero clusters, and unallocated areas are read as zeros so
> they will
> > +  have the same checksum. Images with single or multiple files or
> backing files
> > +  will have the same checksums if the guest will see the same content
> when
> > +  reading the image.
> > +
> > +  Image metadata that is not visible to the guest such as dirty bitmaps
> does
> > +  not affect the checksum.
> > +
> > +  Computing a checksum requires a read-only image. You cannot compute a
> > +  checksum of an active image used by a guest,
>
> Makes me ask: Why not?  Other subcommands have the -U flag for this.
>

The text is not precise enough, the issue is not active image but having a
read only
imag

Re: [PATCH v4 6/7] hw/sd/sdhci: Implement Freescale eSDHC device model

2022-10-30 Thread Bernhard Beschow
On Sun, Oct 30, 2022 at 1:10 AM Philippe Mathieu-Daudé 
wrote:

> On 29/10/22 20:28, Bernhard Beschow wrote:
> > Am 29. Oktober 2022 13:04:00 UTC schrieb Bernhard Beschow <
> shen...@gmail.com>:
> >> Am 29. Oktober 2022 11:33:51 UTC schrieb Bernhard Beschow <
> shen...@gmail.com>:
> >>> Am 27. Oktober 2022 21:40:01 UTC schrieb "Philippe Mathieu-Daudé" <
> phi...@linaro.org>:
>  Hi Bernhard,
> 
>  On 18/10/22 23:01, Bernhard Beschow wrote:
> > Will allow e500 boards to access SD cards using just their own
> devices.
> >
> > Signed-off-by: Bernhard Beschow 
> > ---
> >hw/sd/sdhci.c | 120
> +-
> >include/hw/sd/sdhci.h |   3 ++
> >2 files changed, 122 insertions(+), 1 deletion(-)
>
>  So now, I'd create 1 UNIMP region for ESDHC_WML and map it
>  into SDHC_REGISTERS_MAP (s->iomem) with priority 1, and add
>  another UNIMP region of ESDHC_REGISTERS_MAP_SIZE -
> SDHC_REGISTERS_MAP_SIZE (= 0x310) and map it normally at offset
>  0x100 (SDHC_REGISTERS_MAP_SIZE). Look at create_unimp() in
>  hw/arm/bcm2835_peripherals.c.
> 
>  But the ESDHC_WML register has address 0x44 and fits inside the
>  SDHC_REGISTERS_MAP region, so likely belong there. 0x44 is the
>  upper part of the SDHC_CAPAB register. These bits are undefined
>  on the spec v2, which I see you are setting in esdhci_init().
>  So this register should already return 0, otherwise we have
>  a bug. Thus we don't need to handle this ESDHC_WML particularly.
> >>
> >> My idea here was to catch this unimplemented case in order to indicate
> this clearly to users. Perhaps it nudges somebody to provide a patch?
> >>
> 
>  And your model is reduced to handling create_unimp() in
> esdhci_realize().
> 
>  Am I missing something?
> >>>
> >>> The mmio ops are big endian and need to be aligned to a 4-byte
> boundary. It took me quite a while to debug this. So shall I just create an
> additional memory region for the region above SDHC_REGISTERS_MAP_SIZE for
> ESDHC_DMA_SYSCTL?
> >>
> >> All in all I currently don't have a better idea than keeping the custom
> i/o ops for the standard region and adding an additional unimplemented
> region for ESDHC_DMA_SYSCTL. I think I'd have to dynamically allocate
> memory for it where I still need to figure out how not to leak it.
> >
> > By simply reusing sdhci_{read,write} in eSDHC's io_ops struct I was able
> to remove the custom implementations while having big endian and the
> alignments proper. However, I don't see a way of adding two memory regions
> - with or without a container. With a container I'd have to somehow
> preserve the mmio attribute which is initialized by the parent class,
> re-initialize it with the container, and add the preserved memory region as
> child. This seems very fragile, esp. since the parent class has created an
> alias for mmio in sysbus. Without a container, one would have two memory
> regions that both have to be mapped separately by the caller, i.e. it
> burdens the caller with an implementation detail.
> >
> > Any suggestions?
>
> Can you share branch and how to test?
>

QEMU branch: https://github.com/shentok/qemu/tree/e500-flash

How to test:
1. `git clone -b e500 https://github.com/shentok/buildroot.git`
2. `cd buildroot`
3. `make qemu_ppc_e500mc_defconfig`
4. `make`
5. `cd output/images`
6. `dd if=/dev/zero of=root.img bs=1M count=64 && dd if=rootfs.ext2
of=root.img bs=1M conv=notrunc`
7. `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive
-drive id=mydrive,if=none,file=root.img,format=raw`

Note that step 6 might be required every time before qemu-system-ppc is
started, otherwise this may cause an fsck. The output on the boot console
will look something along these lines (note that no errors are reported):

  Memory CAM mapping: 16/16/16/16/64/64/64 Mb, residual: 0Mb
  Activating Kernel Userspace Access Protection
  Activating Kernel Userspace Execution Prevention
  Linux version 5.17.7 (zcone-pisint@osoxes)
(powerpc-buildroot-linux-gnu-gcc.br_real (Buildroot
2022.08-655-gf8a0d1480a) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP Sat Oct
29 12:49:54 CEST 2022
  Using QEMU e500 machine description
  printk: bootconsole [udbg0] enabled
  CPU maps initialized for 1 thread per core
  -
  phys_mem_size = 0x1000
  dcache_bsize  = 0x40
  icache_bsize  = 0x40
  cpu_features  = 0x0194
possible= 0x0001039c
always  = 0x0100
  cpu_user_features = 0x8c008000 0x0800
  mmu_features  = 0x000a0010
  -
  qemu_e500_setup_arch()
  barrier-nospec: using isync; sync as speculation barrier
  Zone ranges:
Normal   [mem 0x-0x0fff]
HighMem  empty
  Mo

[RFC v4 3/3] virtio-blk: add some trace events for zoned emulation

2022-10-30 Thread Sam Li
Signed-off-by: Sam Li 
---
 hw/block/trace-events |  7 +++
 hw/block/virtio-blk.c | 12 
 2 files changed, 19 insertions(+)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 2c45a62bd5..f47da6fcd4 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -44,9 +44,16 @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: 
unknown command 0x%02x"
 # virtio-blk.c
 virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p 
status %d"
 virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, 
int ret) "vdev %p req %p nr_zones %d ret %d"
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p 
ret %d"
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int 
ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
 virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t 
nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
 virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t 
nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
 virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, 
uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs 
%d offset %"PRIu64" size %zu is_write %d"
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned 
int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %d"
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, 
int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, 
int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p 
req %p, append sector 0x%" PRIx64 ""
 
 # hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS 
%d %d %d"
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 4f3625840a..e7d85dc049 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -663,6 +663,7 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 int64_t nz = data->zone_report_data.nr_zones;
 int8_t err_status = VIRTIO_BLK_S_OK;
 
+trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
 if (ret) {
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
 goto out;
@@ -777,6 +778,8 @@ static int virtio_blk_handle_zone_report(VirtIOBlockReq 
*req)
 nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
 sizeof(struct virtio_blk_zone_report)) /
sizeof(struct virtio_blk_zone_descriptor);
+trace_virtio_blk_handle_zone_report(vdev, req,
+offset >> BDRV_SECTOR_BITS, nr_zones);
 
 zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
 data = g_malloc(sizeof(ZoneCmdData));
@@ -802,7 +805,9 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int 
ret)
 ZoneCmdData *data = opaque;
 VirtIOBlockReq *req = data->req;
 VirtIOBlock *s = req->dev;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 int8_t err_status = VIRTIO_BLK_S_OK;
+trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
 
 if (ret) {
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
@@ -830,6 +835,8 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, 
BlockZoneOp op)
 /* Entire drive capacity */
 offset = 0;
 len = capacity;
+trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
+   bs->total_sectors);
 } else {
 if (bs->bl.zone_size > capacity - offset) {
 /* The zoned device allows the last smaller zone. */
@@ -837,6 +844,9 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, 
BlockZoneOp op)
 } else {
 len = bs->bl.zone_size;
 }
+trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
+  offset >> BDRV_SECTOR_BITS,
+  len >> BDRV_SECTOR_BITS);
 }
 
 if (!check_zoned_request(s, offset, 0, false, &err_status)) {
@@ -882,6 +892,7 @@ static void virtio_blk_zone_append_complete(void *opaque, 
int ret)
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
 goto out;
 }
+trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
 
 out:
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
@@ -901,6 +912,7 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq 
*req,
 int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
 int64_t len = iov_size(out_iov, niov);
 
+trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
 if (!check_zoned_request(s, offset, len, true, &err_status)) {
 

[RFC v4 1/3] include: update virtio_blk headers

2022-10-30 Thread Sam Li
Use scripts/update-linux-headers.sh to update virtio-blk headers
from Dmitry's "virtio-blk:add support for zoned block devices"
Linux patches.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
---
 include/standard-headers/linux/virtio_blk.h | 158 ++--
 1 file changed, 142 insertions(+), 16 deletions(-)

diff --git a/include/standard-headers/linux/virtio_blk.h 
b/include/standard-headers/linux/virtio_blk.h
index 2dcc90826a..3744e4da1b 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -25,10 +25,10 @@
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE. */
-#include "standard-headers/linux/types.h"
-#include "standard-headers/linux/virtio_ids.h"
-#include "standard-headers/linux/virtio_config.h"
-#include "standard-headers/linux/virtio_types.h"
+#include 
+#include 
+#include 
+#include 
 
 /* Feature bits */
 #define VIRTIO_BLK_F_SIZE_MAX  1   /* Indicates maximum segment size */
@@ -40,6 +40,8 @@
 #define VIRTIO_BLK_F_MQ12  /* support more than one vq */
 #define VIRTIO_BLK_F_DISCARD   13  /* DISCARD is supported */
 #define VIRTIO_BLK_F_WRITE_ZEROES  14  /* WRITE ZEROES is supported */
+#define VIRTIO_BLK_F_SECURE_ERASE  16 /* Secure Erase is supported */
+#define VIRTIO_BLK_F_ZONED 17  /* Zoned block device */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -47,8 +49,10 @@
 #define VIRTIO_BLK_F_SCSI  7   /* Supports scsi command passthru */
 #define VIRTIO_BLK_F_FLUSH 9   /* Flush command supported */
 #define VIRTIO_BLK_F_CONFIG_WCE11  /* Writeback mode available in 
config */
+#ifndef __KERNEL__
 /* Old (deprecated) name for VIRTIO_BLK_F_FLUSH. */
 #define VIRTIO_BLK_F_WCE VIRTIO_BLK_F_FLUSH
+#endif
 #endif /* !VIRTIO_BLK_NO_LEGACY */
 
 #define VIRTIO_BLK_ID_BYTES20  /* ID string length */
@@ -63,8 +67,8 @@ struct virtio_blk_config {
/* geometry of the device (if VIRTIO_BLK_F_GEOMETRY) */
struct virtio_blk_geometry {
__virtio16 cylinders;
-   uint8_t heads;
-   uint8_t sectors;
+   __u8 heads;
+   __u8 sectors;
} geometry;
 
/* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */
@@ -72,17 +76,17 @@ struct virtio_blk_config {
 
/* the next 4 entries are guarded by VIRTIO_BLK_F_TOPOLOGY  */
/* exponent for physical block per logical block. */
-   uint8_t physical_block_exp;
+   __u8 physical_block_exp;
/* alignment offset in logical blocks. */
-   uint8_t alignment_offset;
+   __u8 alignment_offset;
/* minimum I/O size without performance penalty in logical blocks. */
__virtio16 min_io_size;
/* optimal sustained I/O size in logical blocks. */
__virtio32 opt_io_size;
 
/* writeback mode (if VIRTIO_BLK_F_CONFIG_WCE) */
-   uint8_t wce;
-   uint8_t unused;
+   __u8 wce;
+   __u8 unused;
 
/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
__virtio16 num_queues;
@@ -116,10 +120,35 @@ struct virtio_blk_config {
 * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the
 * deallocation of one or more of the sectors.
 */
-   uint8_t write_zeroes_may_unmap;
+   __u8 write_zeroes_may_unmap;
 
-   uint8_t unused1[3];
-} QEMU_PACKED;
+   __u8 unused1[3];
+
+   /* the next 3 entries are guarded by VIRTIO_BLK_F_SECURE_ERASE */
+   /*
+* The maximum secure erase sectors (in 512-byte sectors) for
+* one segment.
+*/
+   __virtio32 max_secure_erase_sectors;
+   /*
+* The maximum number of secure erase segments in a
+* secure erase command.
+*/
+   __virtio32 max_secure_erase_seg;
+   /* Secure erase commands must be aligned to this number of sectors. */
+   __virtio32 secure_erase_sector_alignment;
+
+   /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
+   struct virtio_blk_zoned_characteristics {
+   __virtio32 zone_sectors;
+   __virtio32 max_open_zones;
+   __virtio32 max_active_zones;
+   __virtio32 max_append_sectors;
+   __virtio32 write_granularity;
+   __u8 model;
+   __u8 unused2[3];
+   } zoned;
+} __attribute__((packed));
 
 /*
  * Command types
@@ -153,6 +182,30 @@ struct virtio_blk_config {
 /* Write zeroes command */
 #define VIRTIO_BLK_T_WRITE_ZEROES  13
 
+/* Secure erase command */
+#define VIRTIO_BLK_T_SECURE_ERASE  14
+
+/* Zone append command */
+#define VIRTIO_BLK_T_ZONE_APPEND15
+
+/* Report zones command */
+#define VIRTIO_BLK_T_ZONE_REPORT16
+
+/* Open zone command */
+#define VIRTIO_BLK_T_ZONE_OPEN  18

[RFC v4 2/3] virtio-blk: add zoned storage emulation for zoned devices

2022-10-30 Thread Sam Li
This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.

The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.

The guest os can use blktests, fio to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.

Signed-off-by: Sam Li 
---
 hw/block/virtio-blk-common.c |   2 +
 hw/block/virtio-blk.c| 387 +++
 2 files changed, 389 insertions(+)

diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
index ac52d7c176..e2f8e2f6da 100644
--- a/hw/block/virtio-blk-common.c
+++ b/hw/block/virtio-blk-common.c
@@ -29,6 +29,8 @@ static const VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_blk_config, discard_sector_alignment)},
 {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
  .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{.flags = 1ULL << VIRTIO_BLK_F_ZONED,
+ .end = endof(struct virtio_blk_config, zoned)},
 {}
 };
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8131ec2dbc..4f3625840a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -26,6 +26,9 @@
 #include "hw/virtio/virtio-blk.h"
 #include "dataplane/virtio-blk.h"
 #include "scsi/constants.h"
+#if defined(CONFIG_BLKZONED)
+#include 
+#endif
 #ifdef __linux__
 # include 
 #endif
@@ -592,6 +595,332 @@ err:
 return err_status;
 }
 
+typedef struct ZoneCmdData {
+VirtIOBlockReq *req;
+union {
+struct {
+unsigned int nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+int64_t offset;
+} zone_append_data;
+};
+} ZoneCmdData;
+
+/*
+ * check zoned_request: error checking before issuing requests. If all checks
+ * passed, return true.
+ * append: true if only zone append requests issued.
+ */
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
+ bool append, uint8_t *status) {
+BlockDriverState *bs = blk_bs(s->blk);
+int index = offset / bs->bl.zone_size;
+
+if (offset < 0 || len < 0 || offset > (bs->total_sectors << 
BDRV_SECTOR_BITS) - len) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
+*status = VIRTIO_BLK_S_UNSUPP;
+return false;
+}
+
+if (append) {
+if ((offset % bs->bl.write_granularity) != 0) {
+*status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
+return false;
+}
+
+if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (len / 512 > bs->bl.max_append_sectors) {
+if (bs->bl.max_append_sectors == 0) {
+*status = VIRTIO_BLK_S_UNSUPP;
+} else {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+}
+return false;
+}
+}
+return true;
+}
+
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
+{
+ZoneCmdData *data = opaque;
+VirtIOBlockReq *req = data->req;
+VirtIOBlock *s = req->dev;
+VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
+struct iovec *in_iov = req->elem.in_sg;
+unsigned in_num = req->elem.in_num;
+int64_t zrp_size, n, j = 0;
+int64_t nz = data->zone_report_data.nr_zones;
+int8_t err_status = VIRTIO_BLK_S_OK;
+
+if (ret) {
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
+struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
+.nr_zones = cpu_to_le64(nz),
+};
+zrp_size = sizeof(struct virtio_blk_zone_report)
+   + sizeof(struct virtio_blk_zone_descriptor) * nz;
+n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
+if (n != sizeof(zrp_hdr)) {
+virtio_error(vdev, "Driver provided intput buffer that is too small!");
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
+for (size_t i = sizeof(zrp_hdr); i < zrp_size;
+i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
+struct virtio_blk_zone_descriptor desc =
+(struct virtio_blk_zone_descriptor) {
+.z_start = cpu_to_le64(data->zone_report_data.zones[j].start
+>> BDRV_SECTOR_BITS),
+.z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
+>> BDRV_SECTOR_BITS),
+.z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
+>> BDRV_SECTOR_BITS),
+};
+
+switch (data->zone

[RFC v4 0/3] Add zoned storage emulation to virtio-blk driver

2022-10-30 Thread Sam Li
Note: the virtio-blk headers isn't upstream in the kernel yet therefore
marked as an RFC. More information can be found here:
https://patchwork.kernel.org/project/linux-block/cover/20221030043545.974223-1-dmitry.fomic...@wdc.com/

v4:
- change the way writing zone append request result to buffer
- change zone state, zone type value of virtio_blk_zone_descriptor
- add trace events for new zone APIs

v3:
- use qemuio_from_buffer to write status bit [Stefan]
- avoid using req->elem directly [Stefan]
- fix error checkings and memory leak [Stefan]

v2:
- change units of emulated zone op coresponding to block layer APIs
- modify error checking cases [Stefan, Damien]

v1:
- add zoned storage emulation

Sam Li (3):
  include: update virtio_blk headers
  virtio-blk: add zoned storage emulation for zoned devices
  virtio-blk: add some trace events for zoned emulation

 hw/block/trace-events   |   7 +
 hw/block/virtio-blk-common.c|   2 +
 hw/block/virtio-blk.c   | 399 
 include/standard-headers/linux/virtio_blk.h | 158 +++-
 4 files changed, 550 insertions(+), 16 deletions(-)

-- 
2.38.1