Re: [PATCH v4 3/7] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
On Mon, Jul 25, 2022 at 10:48 AM Jason Wang wrote: > > > 在 2022/7/22 19:12, Eugenio Pérez 写道: > > So its generic enough to accept any out sg buffer and we can inject > > NIC state messages. > > > > Signed-off-by: Eugenio Pérez > > --- > > net/vhost-vdpa.c | 29 +++-- > > 1 file changed, 15 insertions(+), 14 deletions(-) > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > index 1b82ac2e07..bbe1830824 100644 > > --- a/net/vhost-vdpa.c > > +++ b/net/vhost-vdpa.c > > @@ -302,35 +302,36 @@ dma_map_err: > > } > > > > /** > > - * Copy the guest element into a dedicated buffer suitable to be sent to > > NIC > > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to > > NIC > >* > > - * @iov: [0] is the out buffer, [1] is the in one > > + * @dev_iov: [0] is the out buffer, [1] is the in one > > > This still has assumption on the layout. I wonder if it's better to > simply use in_sg and out_sg. > Sure, I can resend that way. It complicates the code a little bit because of error paths. We currently send one out sg and one in sg always. But we can make it more generic for sure. Thanks! > Thanks > > > >*/ > > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s, > > -VirtQueueElement *elem, > > -struct iovec *iov) > > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s, > > + const struct iovec *out, size_t > > out_num, > > + struct iovec *dev_iov) > > { > > size_t in_copied; > > bool ok; > > > > -iov[0].iov_base = s->cvq_cmd_out_buffer; > > -ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, > > elem->out_num, > > -vhost_vdpa_net_cvq_cmd_len(), > > iov[0].iov_base, > > -&iov[0].iov_len, false); > > +dev_iov[0].iov_base = s->cvq_cmd_out_buffer; > > +ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num, > > +vhost_vdpa_net_cvq_cmd_len(), > > +dev_iov[0].iov_base, &dev_iov[0].iov_len, > > +false); > > if (unlikely(!ok)) { > > return false; > > } > > > > -iov[1].iov_base = s->cvq_cmd_in_buffer; > > +dev_iov[1].iov_base = s->cvq_cmd_in_buffer; > > ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0, > > -sizeof(virtio_net_ctrl_ack), > > iov[1].iov_base, > > -&in_copied, true); > > +sizeof(virtio_net_ctrl_ack), > > +dev_iov[1].iov_base, &in_copied, true); > > if (unlikely(!ok)) { > > vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer); > > return false; > > } > > > > -iov[1].iov_len = sizeof(virtio_net_ctrl_ack); > > +dev_iov[1].iov_len = sizeof(virtio_net_ctrl_ack); > > return true; > > } > > > > @@ -434,7 +435,7 @@ static int > > vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, > > }; > > bool ok; > > > > -ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers); > > +ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num, > > dev_buffers); > > if (unlikely(!ok)) { > > goto out; > > } >
[Bug 1180923] Re: unused memory filled with 0x00 instead of 0xFF
** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #1133 https://gitlab.com/qemu-project/qemu/-/issues/1133 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1180923 Title: unused memory filled with 0x00 instead of 0xFF Status in QEMU: Expired Bug description: Qemu, ever since it was made (so, since 2003), has this problem in DOS (either PC-DOS or MS-DOS and partly Windows 9x) not recognizing the memory available when the memory is filled with 0x00 but when it is filled with 0xFF it gets recognized properly, where should I patch qemu to solve this memory problem? To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1180923/+subscriptions
Re: [Bug 1180923] Re: unused memory filled with 0x00 instead of 0xFF
I create a issue at https://gitlab.com/qemu-project/qemu/-/issues/1133 to tracking this On Mon, Aug 1, 2022 at 5:06 AM David Glover <1180...@bugs.launchpad.net> wrote: > I know this is expired but it's still a problem in qemu 7.0.0. For > example, when running MS-DOS 6.22, "mem" reports 0K of upper memory, and > memmaker fails to run, complaining that it could not allocate any. I'd > love to know if there's a workaround. > > -- > You received this bug notification because you are a member of qemu- > devel-ml, which is subscribed to QEMU. > https://bugs.launchpad.net/bugs/1180923 > > Title: > unused memory filled with 0x00 instead of 0xFF > > Status in QEMU: > Expired > > Bug description: > Qemu, ever since it was made (so, since 2003), has this problem in DOS > (either PC-DOS or MS-DOS and partly Windows 9x) not recognizing the > memory available when the memory is filled with 0x00 but when it is > filled with 0xFF it gets recognized properly, where should I patch > qemu to solve this memory problem? > > To manage notifications about this bug go to: > https://bugs.launchpad.net/qemu/+bug/1180923/+subscriptions > > > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
Re: [PATCH] ipmi:smbus: Add a check around a memcpy
On Sun, Jul 31, 2022 at 06:02:46PM -0500, miny...@acm.org wrote: > From: Corey Minyard > > In one case: > > memcpy(sid->inmsg + sid->inlen, buf, len); > > if len == 0 then sid->inmsg + sig->inlen can point to one past the inmsg > array if the array is full. We have to allow len == 0 due to some > vagueness in the spec, but we don't have to call memcpy. > > Found by Coverity. This is not a problem in practice, but the results > are technically (maybe) undefined. So make Coverity happy. > > Reported-by: Peter Maydell > Signed-off-by: Corey Minyard Acked-by: Michael S. Tsirkin > --- > hw/ipmi/smbus_ipmi.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > I think this should do it. > > diff --git a/hw/ipmi/smbus_ipmi.c b/hw/ipmi/smbus_ipmi.c > index 9ef9112dd5..d0991ab7f9 100644 > --- a/hw/ipmi/smbus_ipmi.c > +++ b/hw/ipmi/smbus_ipmi.c > @@ -281,7 +281,9 @@ static int ipmi_write_data(SMBusDevice *dev, uint8_t > *buf, uint8_t len) > */ > send = true; > } > -memcpy(sid->inmsg + sid->inlen, buf, len); > +if (len > 0) { > +memcpy(sid->inmsg + sid->inlen, buf, len); > +} > sid->inlen += len; > break; > } > -- > 2.25.1
RE: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
> -Original Message- > From: Jason Wang > Sent: Monday, August 1, 2022 12:18 PM > To: Zhang, Chen ; Xu, Tao3 > Cc: qemu-devel@nongnu.org; Li Zhijian ; Peter > Maydell > Subject: Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet > is not parsed correctly > > > 在 2022/7/29 21:58, Peter Maydell 写道: > > On Wed, 20 Jul 2022 at 10:04, Jason Wang wrote: > >> From: Zhang Chen > >> > >> When COLO use only one vnet_hdr_support parameter between > >> filter-redirector and filter-mirror(or colo-compare), COLO will crash > >> with segmentation fault. Back track as follow: > >> > >> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation > fault. > >> 0x55cb200b in eth_get_l2_hdr_length (p=0x0) > >> at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 > >> 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto); > >> (gdb) bt > >> 0 0x55cb200b in eth_get_l2_hdr_length (p=0x0) > >> at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 > >> 1 0x55cb22b4 in parse_packet_early (pkt=0x56a44840) at > >> net/colo.c:49 > >> 2 0x55cb2b91 in is_tcp_packet (pkt=0x56a44840) at > >> net/filter-rewriter.c:63 > >> > >> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to > >> raise error and add trace-events to track vnet_hdr_len. > >> > >> Signed-off-by: Tao Xu > >> Signed-off-by: Zhang Chen > >> Reviewed-by: Li Zhijian > >> Signed-off-by: Jason Wang > > Hi -- Coverity points out that there's a problem with this fix (CID > > 1490786): > > > >> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt) > >> static const uint8_t vlan[] = {0x81, 0x00}; > >> uint8_t *data = pkt->data + pkt->vnet_hdr_len; > > data here is set to pkt->data + pkt->vnet_hdr_len. > > If pkt->data is NULL then this addition is C undefined behaviour, and > > if pkt->data is not NULL but the integer added is such that the > > pointer ends up not pointing within data, then that is also C > > undefined behaviour... > > > Right. And I don't see how pkt->data can be NULL, maybe a hint of bug > elsewhere. > > > > > >> uint16_t l3_proto; > >> -ssize_t l2hdr_len = eth_get_l2_hdr_length(data); > >> +ssize_t l2hdr_len; > >> + > >> +if (data == NULL) { > > ...so the compiler is allowed to assume that data cannot be NULL > > here, and this entire error checking clause is logically dead code. > > > > If you're trying to check whether vnet_hdr_len is valid, you > > need to do that as an integer arithmetic check before you start > > using it for pointer arithmetic. And you probably want to be > > checking against some kind of bound, not just for whether the > > result is going to be NULL. > > > Or we can let the caller to validate the Pkt before calling this function. > > > > > > Overall this looks kinda sketchy and could probably use more > > detailed code review. Where does the bogus vnet_hdr_len come from in > > the first place? Is this data from the guest, or from the network > > (ie uncontrolled)? > > > If I understand correctly the vnet_hdr_len is set during handshake > (net_fill_rstate()) which is sent from a network backend. > > Chen or Xu, please post a fix and explain what happens in the changelog. OK, we will send a patch to fix it. Thanks Chen > > Thanks > > > > > >> +trace_colo_proxy_main_vnet_info("This packet is not parsed > correctly, " > >> +"pkt->vnet_hdr_len", > >> pkt->vnet_hdr_len); > >> +return 1; > >> +} > > thanks > > -- PMM > >
Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
在 2022/7/29 21:58, Peter Maydell 写道: On Wed, 20 Jul 2022 at 10:04, Jason Wang wrote: From: Zhang Chen When COLO use only one vnet_hdr_support parameter between filter-redirector and filter-mirror(or colo-compare), COLO will crash with segmentation fault. Back track as follow: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x55cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto); (gdb) bt 0 0x55cb200b in eth_get_l2_hdr_length (p=0x0) at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296 1 0x55cb22b4 in parse_packet_early (pkt=0x56a44840) at net/colo.c:49 2 0x55cb2b91 in is_tcp_packet (pkt=0x56a44840) at net/filter-rewriter.c:63 So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to raise error and add trace-events to track vnet_hdr_len. Signed-off-by: Tao Xu Signed-off-by: Zhang Chen Reviewed-by: Li Zhijian Signed-off-by: Jason Wang Hi -- Coverity points out that there's a problem with this fix (CID 1490786): @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt) static const uint8_t vlan[] = {0x81, 0x00}; uint8_t *data = pkt->data + pkt->vnet_hdr_len; data here is set to pkt->data + pkt->vnet_hdr_len. If pkt->data is NULL then this addition is C undefined behaviour, and if pkt->data is not NULL but the integer added is such that the pointer ends up not pointing within data, then that is also C undefined behaviour... Right. And I don't see how pkt->data can be NULL, maybe a hint of bug elsewhere. uint16_t l3_proto; -ssize_t l2hdr_len = eth_get_l2_hdr_length(data); +ssize_t l2hdr_len; + +if (data == NULL) { ...so the compiler is allowed to assume that data cannot be NULL here, and this entire error checking clause is logically dead code. If you're trying to check whether vnet_hdr_len is valid, you need to do that as an integer arithmetic check before you start using it for pointer arithmetic. And you probably want to be checking against some kind of bound, not just for whether the result is going to be NULL. Or we can let the caller to validate the Pkt before calling this function. Overall this looks kinda sketchy and could probably use more detailed code review. Where does the bogus vnet_hdr_len come from in the first place? Is this data from the guest, or from the network (ie uncontrolled)? If I understand correctly the vnet_hdr_len is set during handshake (net_fill_rstate()) which is sent from a network backend. Chen or Xu, please post a fix and explain what happens in the changelog. Thanks +trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, " +"pkt->vnet_hdr_len", pkt->vnet_hdr_len); +return 1; +} thanks -- PMM
Re: [PULL V2 19/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
在 2022/7/29 22:08, Peter Maydell 写道: On Wed, 20 Jul 2022 at 10:04, Jason Wang wrote: From: Eugenio Pérez To know the device features is needed for CVQ SVQ, so SVQ knows if it can handle all commands or not. Extract from vhost_vdpa_get_max_queue_pairs so we can reuse it. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang Reviewed-by: Michael S. Tsirkin Signed-off-by: Jason Wang Hi; this change introduces a resource leak in the new error-exit return path in net_init_vhost_vdpa(). Spotted by Coverity, CID 1490785. @@ -517,10 +521,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp) { const NetdevVhostVDPAOptions *opts; +uint64_t features; int vdpa_device_fd; g_autofree NetClientState **ncs = NULL; NetClientState *nc; -int queue_pairs, i, has_cvq = 0; +int queue_pairs, r, i, has_cvq = 0; assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA); opts = &netdev->u.vhost_vdpa; @@ -534,7 +539,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, return -errno; } -queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, +r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp); +if (unlikely(r < 0)) { +return r; At this point in the code we have allocated the file descriptor vdpa_device_fd, but this return path fails to close it. Exactly. +} + +queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features, &has_cvq, errp); if (queue_pairs < 0) { qemu_close(vdpa_device_fd); Compare this pre-existing error-exit path, which correctly calls qemu_close() on the fd. Related question: is this function supposed to return -1 on failure, or negative-errno ? Kind of either: if (net_client_init_fun[netdev->type](netdev, netdev->id, peer, errp) < 0) { /* FIXME drop when all init functions store an Error */ if (errp && !*errp) { error_setg(errp, "Device '%s' could not be initialized", NetClientDriver_str(netdev->type)); } return -1; } At the moment it has a mix of both. I think that the sole caller only really wants "<0 on error", in which case the error-exit code paths could probably be tidied up so that instead of explicitly calling qemu_close() and returning r, queue_pairs, or whatever they got back from the function they just called, they could just 'goto err_svq' which will do the "close the fd and return -1" work. Better still, by initializing 'i' to 0 at the top of the function (and naming it something clearer, ideally), all the code paths after the initial qemu_open() succeeds could be made to use 'goto err' for the error-exit case. Yes, having a consistent goto based error handling seems much better. Eugenio, please post patch to fix this. Thanks thanks -- PMM
RE: [PATCH] disas/nanomips: Convert nanoMIPS disassembler to C
On Fri, 29 Jul 2022 at 10:18, Peter Maydell wrote: > ... > Is it possible to break this down into smaller pieces so it isn't one single > enormous 5000 line patch ? > > I guess partial conversion is likely to run into compilation difficulties > mid-series; if so we could do "disable the disassembler; convert it; reenable > it". > > The rationale here is code review -- reviewing a single huge patch is > essentially impossible, so it needs to be broken down into coherent smaller > pieces to be reviewable. Hi Peter, Something like 90% of the patch is purely mechanical transformations of which I've excerpted two examples below. (There are 3-4 chunks of mechanical transformations, of which I've shown the most complicated type that accounts for ~60% of the changed lines.) Did you intend to review this by having a human inspect all of these? Would it be feasible instead to provide a sed script to effect the mechanical parts of the patch (the reviewer could review the script, apply it themselves to verify equivalence if desired, and spot check the result) together with a (much smaller) non-mechanical diff? -Vince @@ -2004,17 +1740,17 @@ std::string NMD::ADD_S(uint64 instruction) * rt - * rs - */ -std::string NMD::ADDIU_32_(uint64 instruction) +static char *ADDIU_32_(uint64 instruction) { uint64 rt_value = extract_rt_25_24_23_22_21(instruction); uint64 rs_value = extract_rs_20_19_18_17_16(instruction); uint64 u_value = extract_u_15_to_0(instruction); -std::string rt = GPR(copy(rt_value)); -std::string rs = GPR(copy(rs_value)); -std::string u = IMMEDIATE(copy(u_value)); +const char *rt = GPR(copy_ui(rt_value)); +const char *rs = GPR(copy_ui(rs_value)); +const char *u = IMMEDIATE_UI(copy_ui(u_value)); -return img::format("ADDIU %s, %s, %s", rt, rs, u); +return img_format("ADDIU %s, %s, %s", rt, rs, u); } @@ -2027,15 +1763,15 @@ std::string NMD::ADDIU_32_(uint64 instruction) * rt - * rs - */ -std::string NMD::ADDIU_48_(uint64 instruction) +static char *ADDIU_48_(uint64 instruction) { uint64 rt_value = extract_rt_41_40_39_38_37(instruction); int64 s_value = extract_s__se31_15_to_0_31_to_16(instruction); -std::string rt = GPR(copy(rt_value)); -std::string s = IMMEDIATE(copy(s_value)); +const char *rt = GPR(copy_ui(rt_value)); +const char *s = IMMEDIATE_I(copy_i(s_value)); -return img::format("ADDIU %s, %s", rt, s); +return img_format("ADDIU %s, %s", rt, s); }
Re: [RFC 0/3] Add Generic SPI GPIO model
On Sun, 31 Jul 2022, at 06:48, Cédric Le Goater wrote: > On 7/29/22 19:30, Peter Delevoryas wrote: >> Certainly we'd like to use IRQ's instead, but she ran into correctness >> problems. Maybe we can investigate that further and fix it. Yes, let's not work around problems that we have the ability to fix. > > So much is happening in that mode. Yes, though while there's no-doubt accidental complexity in terms of its implementation, the underlying hardware is also quite haphazard and we need an approach that scales to the large number of GPIOs it provides. So I'd argue there's a bunch of essential complexity involved as well. Do we start with a fresh implementation that allows us to get the expected behaviour and then build that out to replace the current implementation? Or, can we add more tests for the existing model to pin down where the bugs are? > We need more trace events in the Aspeed > GPIO model at least an extra in aspeed_gpio_update() We can always fall back to this but my hope is we can get better test coverage to iron out the bugs. Maybe that gets us a more refined and maintainable model implementation along the way. Cheers, Andrew
[RFC v5 08/11] virtio-blk: add zoned storage APIs for zoned devices
This patch extends virtio-blk emulation to handle zoned device commands by calling the new block layer APIs to perform zoned device I/O on behalf of the guest. It supports Report Zone, and four zone oparations (open, close, finish, reset). The virtio-blk zoned device command specifications is currently in the reviewing process. VIRTIO_BLK_F_ZONED will only be set if the host does support zoned block devices. The regular block device will not be set. The guest os having zoned device support can use blkzone(8) to test those commands. Signed-off-by: Sam Li --- block/block-backend.c | 92 hw/block/virtio-blk.c | 172 +- include/sysemu/block-backend-io.h | 6 ++ 3 files changed, 268 insertions(+), 2 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index ef6a1f33d5..8f2cfcbd9d 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo { void *iobuf; int ret; BdrvRequestFlags flags; +union { +struct { +unsigned int *nr_zones; +BlockZoneDescriptor *zones; +} zone_report; +struct { +BlockZoneOp op; +} zone_mgmt; +}; } BlkRwCo; int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) @@ -1775,6 +1784,89 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } +static void blk_aio_zone_report_entry(void *opaque) { +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = &acb->rwco; + +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, + rwco->zone_report.nr_zones, + rwco->zone_report.zones); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones, +BlockCompletionFunc *cb, void *opaque) +{ +BlkAioEmAIOCB *acb; +Coroutine *co; + +blk_inc_in_flight(blk); +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_report = { +.zones = zones, +.nr_zones = nr_zones, +}, +}; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return &acb->common; +} + +static void blk_aio_zone_mgmt_entry(void *opaque) { +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = &acb->rwco; + +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op, + rwco->offset, acb->bytes); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque) { +BlkAioEmAIOCB *acb; +Coroutine *co; + +blk_inc_in_flight(blk); +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_mgmt = { +.op = op, +}, +}; +acb->bytes = len; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return &acb->common; +} + /* * Send a zone_report command. * offset is a byte offset from the start of the device. No alignment diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e9ba752f6b..9722f447a2 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -37,6 +37,7 @@ /* Config size before the discard support (hide associated config fields) */ #define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ max_discard_sectors) + /* * Starting from the discard feature, we can use this array to properly * set the config size depending on the features enabled. @@ -46,6 +47,8 @@ static const VirtIOFeature feature_sizes[] = { .end = endof(struct virtio_blk_config, discard_sector_alignment)}, {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, +{.flags = 1ULL << VIRTIO_BLK_F_ZONED, + .end = endo
[RFC v5 11/11] docs/zoned-storage: add zoned device documentation
Add the documentation about the zoned device support to virtio-blk emulation. Signed-off-by: Sam Li --- docs/devel/zoned-storage.rst | 68 ++ docs/system/qemu-block-drivers.rst.inc | 6 +++ 2 files changed, 74 insertions(+) create mode 100644 docs/devel/zoned-storage.rst diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst new file mode 100644 index 00..e62927dceb --- /dev/null +++ b/docs/devel/zoned-storage.rst @@ -0,0 +1,68 @@ += +zoned-storage += + +Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones +that are larger than the LBA size. It can only allow sequential writes, which +reduces write amplification in SSD, leading to higher throughput and increased +capacity. More details about ZBDs can be found at: + +https://zonedstorage.io/docs/introduction/zoned-storage + +zone emulation +-- +In its current status, the virtio-blk device is not aware of ZBDs but the guest +sees host-managed drives as regular drive that will runs correctly under the +most common write workloads. + +The zoned device support aims to let guests (virtual machines) access zoned +storage devices on the host (hypervisor) through a virtio-blk device. This +involves extending QEMU's block layer and virtio-blk emulation code. + +If the host supports zoned block devices, it can set VIRTIO_BLK_F_ZONED. Then +in the guest side, it appears following situations: +1) If the guest virtio-blk driver sees the VIRTIO_BLK_F_ZONED bit set, then it +will assume that the zoned characteristics fields of the config space are valid. +2) If the guest virtio-blk driver sees a zoned model that is NONE, then it is +known that is a regular block device. +3) If the guest virtio-blk driver sees a zoned model that is HM(or HA), then it +is known that is a zoned block device and probes the other zone fields. + +On QEMU sides, +1) The DEFINE PROP BIT macro must be used to declare that the host supports +zones. +2) BlockDrivers can declare zoned device support once known the zoned model +for the block device is not NONE. + +zoned storage APIs +-- + +Zone emulation part extends the block layer APIs and virtio-blk emulation section +with the minimum set of zoned commands that are necessary to support zoned +devices. The commands are - Report Zones, four zone operations and Zone Append +(developing). + +testing +--- + +It can be tested on a null_blk device using qemu-io, qemu-iotests or blkzone(8) +command in the guest os. + +1. For example, the command line for zone report using qemu-io is: + +$ path/to/qemu-io --image-opts driver=zoned_host_device,filename=/dev/nullb0 -c +"zrp offset nr_zones" + +To enable zoned device in the guest os, the guest kernel must have the virtio-blk +driver with ZBDs support. The link to such patches for the kernel is: + +https://github.com/dmitry-fomichev/virtblk-zbd + +Then, add the following options to the QEMU command line: +-blockdev node-name=drive0,driver=zoned_host_device,filename=/dev/nullb0 + +After the guest os booting, use blkzone(8) to test zone operations: +blkzone report -o offset -c nr_zones /dev/vda + +2. We can also use the qemu-iotests in ./tests/qemu-iotests/tests/zoned.sh. + diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index dfe5d2293d..2a761a4b80 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -430,6 +430,12 @@ Hard disks you may corrupt your host data (use the ``-snapshot`` command line option or modify the device permissions accordingly). +Zoned block devices + Zoned block devices can passed through to the guest if the emulated storage + controller supports zoned storage. Use ``--blockdev zoned_host_device, + node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0`` + as ``drive0``. + Windows ^^^ -- 2.37.1
[RFC v5 09/11] qemu-io: add zoned block device operations.
Add zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li --- block/io.c | 24 ++--- qemu-io-cmds.c | 144 + 2 files changed, 148 insertions(+), 20 deletions(-) diff --git a/block/io.c b/block/io.c index a4625fb0e1..de9ec1d740 100644 --- a/block/io.c +++ b/block/io.c @@ -3209,19 +3209,11 @@ int bdrv_co_zone_report(BlockDriverState *bs, int64_t offset, IO_CODE(); bdrv_inc_in_flight(bs); -if (!drv || (!drv->bdrv_co_zone_report)) { +if (!drv || !drv->bdrv_co_zone_report) { co.ret = -ENOTSUP; goto out; } - -if (drv->bdrv_co_zone_report) { -co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones); -} else { -co.ret = -ENOTSUP; -goto out; -qemu_coroutine_yield(); -} - +co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones); out: bdrv_dec_in_flight(bs); return co.ret; @@ -3237,19 +3229,11 @@ int bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, IO_CODE(); bdrv_inc_in_flight(bs); -if (!drv || (!drv->bdrv_co_zone_mgmt)) { +if (!drv || !drv->bdrv_co_zone_mgmt) { co.ret = -ENOTSUP; goto out; } - -if (drv->bdrv_co_zone_mgmt) { -co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len); -} else { -co.ret = -ENOTSUP; -goto out; -qemu_coroutine_yield(); -} - +co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len); out: bdrv_dec_in_flight(bs); return co.ret; diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 952dc940f1..5a215277c7 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1712,6 +1712,145 @@ static const cmdinfo_t flush_cmd = { .oneline= "flush all in-core file state to disk", }; +static int zone_report_f(BlockBackend *blk, int argc, char **argv) +{ +int ret; +int64_t offset; +unsigned int nr_zones; + +++optind; +offset = cvtnum(argv[optind]); +++optind; +nr_zones = cvtnum(argv[optind]); + +g_autofree BlockZoneDescriptor *zones = NULL; +zones = g_new(BlockZoneDescriptor, nr_zones); +ret = blk_zone_report(blk, offset, &nr_zones, zones); +if (ret < 0) { +printf("zone report failed: %s\n", strerror(-ret)); +} else { +for (int i = 0; i < nr_zones; ++i) { +printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", " + "cap"" 0x%" PRIx64 ",wptr 0x%" PRIx64 ", " + "zcond:%u, [type: %u]\n", + zones[i].start, zones[i].length, zones[i].cap, zones[i].wp, + zones[i].cond, zones[i].type); +} +} +return ret; +} + +static const cmdinfo_t zone_report_cmd = { +.name = "zone_report", +.altname = "zrp", +.cfunc = zone_report_f, +.argmin = 2, +.argmax = 2, +.args = "offset number", +.oneline = "report zone information", +}; + +static int zone_open_f(BlockBackend *blk, int argc, char **argv) +{ +int ret; +int64_t offset, len; +++optind; +offset = cvtnum(argv[optind]); +++optind; +len = cvtnum(argv[optind]); +ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len); +if (ret < 0) { +printf("zone open failed: %s\n", strerror(-ret)); +} +return ret; +} + +static const cmdinfo_t zone_open_cmd = { +.name = "zone_open", +.altname = "zo", +.cfunc = zone_open_f, +.argmin = 2, +.argmax = 2, +.args = "offset len", +.oneline = "explicit open a range of zones in zone block device", +}; + +static int zone_close_f(BlockBackend *blk, int argc, char **argv) +{ +int ret; +int64_t offset, len; +++optind; +offset = cvtnum(argv[optind]); +++optind; +len = cvtnum(argv[optind]); +ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len); +if (ret < 0) { +printf("zone close failed: %s\n", strerror(-ret)); +} +return ret; +} + +static const cmdinfo_t zone_close_cmd = { +.name = "zone_close", +.altname = "zc", +.cfunc = zone_close_f, +.argmin = 2, +.argmax = 2, +.args = "offset len", +.oneline = "close a range of zones in zone block device", +}; + +static int zone_finish_f(BlockBackend *blk, int argc, char **argv) +{ +int ret; +int64_t offset, len; +++optind; +offset = cvtnum(argv[optind]); +++optind; +len = cvtnum(argv[optind]); +ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len); +if (ret < 0) { +printf("zone finish failed: %s\n", strerror(-ret)); +} +return ret; +} + +static const cmdinfo_t zone_finish_cmd = { +.name = "zone_finish", +.altname = "zf", +
[RFC v5 06/11] raw-format: add zone operations to pass through requests
raw-format driver usually sits on top of file-posix driver. It needs to pass through requests of zone commands. Signed-off-by: Sam Li --- block/raw-format.c | 13 + 1 file changed, 13 insertions(+) diff --git a/block/raw-format.c b/block/raw-format.c index 69fd650eaf..6b20bd22ef 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs, return bdrv_co_pdiscard(bs->file, offset, bytes); } +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) { +return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones); +} + +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) { +return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len); +} + static int64_t raw_getlength(BlockDriverState *bs) { int64_t len; @@ -614,6 +625,8 @@ BlockDriver bdrv_raw = { .bdrv_co_pwritev = &raw_co_pwritev, .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes, .bdrv_co_pdiscard = &raw_co_pdiscard, +.bdrv_co_zone_report = &raw_co_zone_report, +.bdrv_co_zone_mgmt = &raw_co_zone_mgmt, .bdrv_co_block_status = &raw_co_block_status, .bdrv_co_copy_range_from = &raw_co_copy_range_from, .bdrv_co_copy_range_to = &raw_co_copy_range_to, -- 2.37.1
[RFC v5 10/11] qemu-iotests: test new zone operations
We have added new block layer APIs of zoned block devices. Test it with: Create a null_blk device, run each zone operation on it and see whether reporting right zone information. Signed-off-by: Sam Li --- tests/qemu-iotests/tests/zoned.out | 53 ++ tests/qemu-iotests/tests/zoned.sh | 86 ++ 2 files changed, 139 insertions(+) create mode 100644 tests/qemu-iotests/tests/zoned.out create mode 100755 tests/qemu-iotests/tests/zoned.sh diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out new file mode 100644 index 00..d09be2ffcd --- /dev/null +++ b/tests/qemu-iotests/tests/zoned.out @@ -0,0 +1,53 @@ +QA output created by zoned.sh +Testing a null_blk device: +Simple cases: if the operations work +(1) report the first zone: +start: 0x0, len 0x8, cap 0x8,wptr 0x0, zcond:1, [type: 2] + +report the first 10 zones +start: 0x0, len 0x8, cap 0x8,wptr 0x0, zcond:1, [type: 2] +start: 0x8, len 0x8, cap 0x8,wptr 0x8, zcond:1, [type: 2] +start: 0x10, len 0x8, cap 0x8,wptr 0x10, zcond:1, [type: 2] +start: 0x18, len 0x8, cap 0x8,wptr 0x18, zcond:1, [type: 2] +start: 0x20, len 0x8, cap 0x8,wptr 0x20, zcond:1, [type: 2] +start: 0x28, len 0x8, cap 0x8,wptr 0x28, zcond:1, [type: 2] +start: 0x30, len 0x8, cap 0x8,wptr 0x30, zcond:1, [type: 2] +start: 0x38, len 0x8, cap 0x8,wptr 0x38, zcond:1, [type: 2] +start: 0x40, len 0x8, cap 0x8,wptr 0x40, zcond:1, [type: 2] +start: 0x48, len 0x8, cap 0x8,wptr 0x48, zcond:1, [type: 2] + +report the last zone: +start: 0x1f38, len 0x8, cap 0x8,wptr 0x1f38, zcond:1, [type: 2] + + +(2) opening the first zone +report after: +start: 0x0, len 0x8, cap 0x8,wptr 0x0, zcond:3, [type: 2] + +opening the second zone +report after: +start: 0x8, len 0x8, cap 0x8,wptr 0x8, zcond:3, [type: 2] + +opening the last zone +report after: +start: 0x1f38, len 0x8, cap 0x8,wptr 0x1f38, zcond:3, [type: 2] + + +(3) closing the first zone +report after: +start: 0x0, len 0x8, cap 0x8,wptr 0x0, zcond:1, [type: 2] + +closing the last zone +report after: +start: 0x1f38, len 0x8, cap 0x8,wptr 0x1f38, zcond:1, [type: 2] + + +(4) finishing the second zone +After finishing a zone: +start: 0x8, len 0x8, cap 0x8,wptr 0x10, zcond:14, [type: 2] + + +(5) resetting the second zone +After resetting a zone: +start: 0x8, len 0x8, cap 0x8,wptr 0x8, zcond:1, [type: 2] +*** done diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh new file mode 100755 index 00..db68aa88d4 --- /dev/null +++ b/tests/qemu-iotests/tests/zoned.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +# +# Test zone management operations. +# + +seq="$(basename $0)" +echo "QA output created by $seq" +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img + sudo rmmod null_blk +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter +. ./common.qemu + +# This test only runs on Linux hosts with raw image files. +_supported_fmt raw +_supported_proto file +_supported_os Linux + +QEMU_IO="build/qemu-io" +IMG="--image-opts driver=zoned_host_device,filename=/dev/nullb0" +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT + +echo "Testing a null_blk device:" +echo "Simple cases: if the operations work" +sudo modprobe null_blk nr_devices=1 zoned=1 + +echo "(1) report the first zone:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "report the first 10 zones" +sudo $QEMU_IO $IMG -c "zrp 0 10" +echo +echo "report the last zone:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" +echo +echo +echo "(2) opening the first zone" +sudo $QEMU_IO $IMG -c "zo 0 0x8" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "opening the second zone" +sudo $QEMU_IO $IMG -c "zo 524288 0x8" # 524288 is the zone sector size +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" # 268435456 / 512 = 524288 +echo +echo "opening the last zone" +sudo $QEMU_IO $IMG -c "zo 0x1f38 0x8" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" +echo +echo +echo "(3) closing the first zone" +sudo $QEMU_IO $IMG -c "zc 0 0x8" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "closing the last zone" +sudo $QEMU_IO $IMG -c "zc 0x1f38 0x8" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" +echo +echo +echo "(4) finishing the second zone" +sudo $QEMU_IO $IMG -c "zf 524288 0x8" +echo "After finishing a zone:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo + +echo "(5) resetting the second zone" +sudo $QEMU_IO $IMG -c "zrs 524288 0x8" +echo "After resetting a zone:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" +# success, all done +echo "*** done" +rm -f $seq.full
[RFC v5 07/11] config: add check to block layer
Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li --- block.c | 13 + block/file-posix.c | 2 ++ block/raw-format.c | 1 + include/block/block_int-common.h | 10 ++ 4 files changed, 26 insertions(+) diff --git a/block.c b/block.c index bc85f46eed..8a259b158c 100644 --- a/block.c +++ b/block.c @@ -7947,6 +7947,19 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, return; } +/* + * Non-zoned block drivers do not follow zoned storage constraints + * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned + * drivers in a graph. + */ +if (!parent_bs->drv->supports_zoned_children && child_bs->drv->is_zoned) { +error_setg(errp, "Cannot add a %s child to a %s parent", + child_bs->drv->is_zoned ? "zoned" : "non-zoned", + parent_bs->drv->supports_zoned_children ? + "support zoned children" : "not support zoned children"); +return; +} + if (!QLIST_EMPTY(&child_bs->parents)) { error_setg(errp, "The node %s already has a parent", child_bs->node_name); diff --git a/block/file-posix.c b/block/file-posix.c index 6c045eb6e8..8eb0b7bc9b 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -4023,6 +4023,8 @@ static BlockDriver bdrv_zoned_host_device = { .format_name = "zoned_host_device", .protocol_name = "zoned_host_device", .instance_size = sizeof(BDRVRawState), +.is_zoned = true, +.supports_zoned_children = true, .bdrv_needs_filename = true, .bdrv_probe_device = hdev_probe_device, .bdrv_parse_filename = zoned_host_device_parse_filename, diff --git a/block/raw-format.c b/block/raw-format.c index 6b20bd22ef..9441536819 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c, BlockDriver bdrv_raw = { .format_name = "raw", .instance_size= sizeof(BDRVRawState), +.supports_zoned_children = true, .bdrv_probe = &raw_probe, .bdrv_reopen_prepare = &raw_reopen_prepare, .bdrv_reopen_commit = &raw_reopen_commit, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index de44c7b6f4..0476cd0491 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -126,6 +126,16 @@ struct BlockDriver { */ bool is_format; +/* + * Set to true if the BlockDriver is a zoned block driver. + */ +bool is_zoned; + +/* + * Set to true if the BlockDriver supports zoned children. + */ +bool supports_zoned_children; + /* * Drivers not implementing bdrv_parse_filename nor bdrv_open should have * this field set to true, except ones that are defined only by their -- 2.37.1
[RFC v5 03/11] file-posix: introduce get_sysfs_long_val for the long sysfs attribute
Use sysfs attribute files to get the long value of zoned device information. Signed-off-by: Sam Li --- block/file-posix.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 48cd096624..bcf898f0cb 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1210,15 +1210,19 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st) #endif } -static int hdev_get_max_segments(int fd, struct stat *st) -{ +/* + * Get zoned device information (chunk_sectors, zoned_append_max_bytes, + * max_open_zones, max_active_zones) through sysfs attribute files. + */ +static long get_sysfs_long_val(int fd, struct stat *st, + const char *attribute) { #ifdef CONFIG_LINUX char buf[32]; const char *end; char *sysfspath = NULL; int ret; int sysfd = -1; -long max_segments; +long val; if (S_ISCHR(st->st_mode)) { if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) { @@ -1231,8 +1235,9 @@ static int hdev_get_max_segments(int fd, struct stat *st) return -ENOTSUP; } -sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments", -major(st->st_rdev), minor(st->st_rdev)); +sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s", +major(st->st_rdev), minor(st->st_rdev), +attribute); sysfd = open(sysfspath, O_RDONLY); if (sysfd == -1) { ret = -errno; @@ -1250,9 +1255,9 @@ static int hdev_get_max_segments(int fd, struct stat *st) } buf[ret] = 0; /* The file is ended with '\n', pass 'end' to accept that. */ -ret = qemu_strtol(buf, &end, 10, &max_segments); +ret = qemu_strtol(buf, &end, 10, &val); if (ret == 0 && end && *end == '\n') { -ret = max_segments; +ret = val; } out: @@ -1266,6 +1271,10 @@ out: #endif } +static int hdev_get_max_segments(int fd, struct stat *st) { +return get_sysfs_long_val(fd, st, "max_segments"); +} + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; -- 2.37.1
[RFC v5 05/11] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
By adding zone management operations in BlockDriver, storage controller emulation can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset). BlockDriver can get zone information from null_blk device by refreshing BLockLimits. Signed-off-by: Sam Li --- block/block-backend.c| 47 ++ block/coroutines.h | 6 + block/file-posix.c | 272 ++- block/io.c | 57 +++ include/block/block-common.h | 1 - include/block/block-io.h | 13 ++ include/block/block_int-common.h | 22 ++- include/block/raw-aio.h | 6 +- meson.build | 1 + qapi/block-core.json | 7 +- 10 files changed, 426 insertions(+), 6 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index d4a5df2ac2..ef6a1f33d5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1775,6 +1775,53 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } +/* + * Send a zone_report command. + * offset is a byte offset from the start of the device. No alignment + * required for offset. + * nr_zones represents IN maximum and OUT actual. + */ +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones) +{ +int ret; +IO_CODE(); + +blk_inc_in_flight(blk); /* increase before waiting */ +blk_wait_while_drained(blk); +if (!blk_is_available(blk)) { +return -ENOMEDIUM; +} +ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones); +blk_dec_in_flight(blk); +return ret; +} + +/* + * Send a zone_management command. + * offset is the starting zone specified as a sector offset. + * len is the maximum number of sectors the command should operate on. + */ +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, +int64_t offset, int64_t len) +{ +int ret; +IO_CODE(); + +ret = blk_check_byte_request(blk, offset, len); +if (ret < 0) +return ret; +blk_inc_in_flight(blk); +blk_wait_while_drained(blk); +if (!blk_is_available(blk)) { +return -ENOMEDIUM; +} +ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len); +blk_dec_in_flight(blk); +return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); diff --git a/block/coroutines.h b/block/coroutines.h index 3a2bad564f..e3f62d94e5 100644 --- a/block/coroutines.h +++ b/block/coroutines.h @@ -63,6 +63,12 @@ nbd_co_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp); +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones); +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len); + /* * "I/O or GS" API functions. These functions can run without * the BQL, but only in one specific iothread/main loop. diff --git a/block/file-posix.c b/block/file-posix.c index 0d8b4acdc7..6c045eb6e8 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -67,6 +67,9 @@ #include #include #include +#if defined(CONFIG_BLKZONED) +#include +#endif #include #include #include @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData { PreallocMode prealloc; Error **errp; } truncate; +struct { +unsigned int *nr_zones; +BlockZoneDescriptor *zones; +} zone_report; +struct { +BlockZoneOp op; +} zone_mgmt; }; } RawPosixAIOData; @@ -1386,7 +1396,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) #endif if (bs->sg || S_ISBLK(st.st_mode)) { -int ret = hdev_get_max_hw_transfer(s->fd, &st); +ret = hdev_get_max_hw_transfer(s->fd, &st); if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) { bs->bl.max_hw_transfer = ret; @@ -1402,6 +1412,27 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) if (ret < 0) zoned = BLK_Z_NONE; bs->bl.zoned = zoned; +if (zoned != BLK_Z_NONE) { +ret = get_sysfs_long_val(s->fd, &st, "chunk_sectors"); +if (ret > 0) { +bs->bl.zone_sectors = ret; +} + +ret = get_sysfs_long_val(s->fd, &st, "zone_append_max_bytes"); +if (ret > 0) { +bs->bl.zone_append_max_bytes = ret; +} + +ret = get_sysfs_long_val(s->fd, &st, "max_open_zones"); +if (ret > 0) { +bs->bl.max_open_zones = ret; +} + +ret = get_sysfs_long_val(s->fd, &st, "max_active_zones"); +if (ret > 0) { +bs->bl.ma
[RFC v5 04/11] file-posix: introduce get_sysfs_str_val for device zoned model
Use sysfs attribute files to get the string value of device zoned model. Then get_sysfs_zoned_model can convert it to BlockZoneModel type in QEMU. Signed-off-by: Sam Li --- block/file-posix.c | 86 include/block/block_int-common.h | 3 ++ 2 files changed, 89 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index bcf898f0cb..0d8b4acdc7 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1271,6 +1271,85 @@ out: #endif } +/* + * Convert the zoned attribute file in sysfs to internal value. + */ +static int get_sysfs_str_val(int fd, struct stat *st, + const char *attribute, + char **val) { +#ifdef CONFIG_LINUX +char buf[32]; +char *sysfspath = NULL; +int ret, offset; +int sysfd = -1; + +if (S_ISCHR(st->st_mode)) { +if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) { +return ret; +} +return -ENOTSUP; +} + +if (!S_ISBLK(st->st_mode)) { +return -ENOTSUP; +} + +sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s", +major(st->st_rdev), minor(st->st_rdev), +attribute); +sysfd = open(sysfspath, O_RDONLY); +if (sysfd == -1) { +ret = -errno; +goto out; +} +offset = 0; +do { +ret = read(sysfd, buf + offset, sizeof(buf) - 1 + offset); +if (ret > 0) { +offset += ret; +} +} while (ret == -1); +/* The file is ended with '\n' */ +if (buf[ret - 1] == '\n') { +buf[ret - 1] = '\0'; +} + +if (!strncpy(*val, buf, ret)) { +goto out; +} + +out: +if (sysfd != -1) { +close(sysfd); +} +g_free(sysfspath); +return ret; +#else +return -ENOTSUP; +#endif +} + +static int get_sysfs_zoned_model(int fd, struct stat *st, + BlockZoneModel *zoned) { +g_autofree char *val = NULL; +val = g_malloc(32); +get_sysfs_str_val(fd, st, "zoned", &val); +if (!val) { +return -ENOTSUP; +} + +if (strcmp(val, "host-managed") == 0) { +*zoned = BLK_Z_HM; +} else if (strcmp(val, "host-aware") == 0) { +*zoned = BLK_Z_HA; +} else if (strcmp(val, "none") == 0) { +*zoned = BLK_Z_NONE; +} else { +return -ENOTSUP; +} +return 0; +} + static int hdev_get_max_segments(int fd, struct stat *st) { return get_sysfs_long_val(fd, st, "max_segments"); } @@ -1279,6 +1358,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; struct stat st; +int ret; +BlockZoneModel zoned; s->needs_alignment = raw_needs_alignment(bs); raw_probe_alignment(bs, s->fd, errp); @@ -1316,6 +1397,11 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_hw_iov = ret; } } + +ret = get_sysfs_zoned_model(s->fd, &st, &zoned); +if (ret < 0) +zoned = BLK_Z_NONE; +bs->bl.zoned = zoned; } static int check_for_dasd(int fd) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 8947abab76..7f7863cc9e 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -825,6 +825,9 @@ typedef struct BlockLimits { /* maximum number of iovec elements */ int max_iov; + +/* device zone model */ +BlockZoneModel zoned; } BlockLimits; typedef struct BdrvOpBlocker BdrvOpBlocker; -- 2.37.1
[RFC v5 02/11] include: import virtio_blk headers from linux with zoned storage support
Add file from Dmitry's "virtio-blk:add support for zoned block devices" linux patch using scripts/update-linux-headers.sh. There is a link for more information: https://github.com/dmitry-fomichev/virtblk-zbd Signed-off-by: Sam Li --- include/standard-headers/linux/virtio_blk.h | 118 1 file changed, 118 insertions(+) diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h index 2dcc90826a..5c6856aec3 100644 --- a/include/standard-headers/linux/virtio_blk.h +++ b/include/standard-headers/linux/virtio_blk.h @@ -40,6 +40,7 @@ #define VIRTIO_BLK_F_MQ12 /* support more than one vq */ #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ +#define VIRTIO_BLK_F_ZONED 17 /* Zoned block device */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -119,6 +120,20 @@ struct virtio_blk_config { uint8_t write_zeroes_may_unmap; uint8_t unused1[3]; + + /* Secure erase fields that are defined in the virtio spec */ + uint8_t sec_erase[12]; + + /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ + struct virtio_blk_zoned_characteristics { + __virtio32 zone_sectors; + __virtio32 max_open_zones; + __virtio32 max_active_zones; + __virtio32 max_append_sectors; + __virtio32 write_granularity; + uint8_t model; + uint8_t unused2[3]; + } zoned; } QEMU_PACKED; /* @@ -153,6 +168,24 @@ struct virtio_blk_config { /* Write zeroes command */ #define VIRTIO_BLK_T_WRITE_ZEROES 13 +/* Zone append command */ +#define VIRTIO_BLK_T_ZONE_APPEND15 + +/* Report zones command */ +#define VIRTIO_BLK_T_ZONE_REPORT16 + +/* Open zone command */ +#define VIRTIO_BLK_T_ZONE_OPEN 18 + +/* Close zone command */ +#define VIRTIO_BLK_T_ZONE_CLOSE 20 + +/* Finish zone command */ +#define VIRTIO_BLK_T_ZONE_FINISH22 + +/* Reset zone command */ +#define VIRTIO_BLK_T_ZONE_RESET 24 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x8000 @@ -172,6 +205,84 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* + * Supported zoned device models. + */ + +/* Regular block device */ +#define VIRTIO_BLK_Z_NONE 0 +/* Host-managed zoned device */ +#define VIRTIO_BLK_Z_HM1 +/* Host-aware zoned device */ +#define VIRTIO_BLK_Z_HA2 + +/* ZBD Management Out ALL flag */ +#define VIRTIO_BLK_ZONED_FLAG_ALL (1 << 0) + +/* + * Header for VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, + * VIRTIO_BLK_T_ZONE_RESET, VIRTIO_BLK_T_ZONE_FINISH requests. + */ +struct virtio_blk_zone_mgmt_outhdr { + /* Zoned request flags */ + __virtio32 flags; +}; + +/* + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply. + */ +struct virtio_blk_zone_descriptor { + /* Zone capacity */ + __virtio64 z_cap; + /* The starting sector of the zone */ + __virtio64 z_start; + /* Zone write pointer position in sectors */ + __virtio64 z_wp; + /* Zone type */ + uint8_t z_type; + /* Zone state */ + uint8_t z_state; + uint8_t reserved[38]; +}; + +struct virtio_blk_zone_report { + __virtio64 nr_zones; + uint8_t reserved[56]; + struct virtio_blk_zone_descriptor zones[]; +}; + +/* + * Supported zone types. + */ + +/* Conventional zone */ +#define VIRTIO_BLK_ZT_CONV 1 +/* Sequential Write Required zone */ +#define VIRTIO_BLK_ZT_SWR 2 +/* Sequential Write Preferred zone */ +#define VIRTIO_BLK_ZT_SWP 3 + +/* + * Zone states that are available for zones of all types. + */ + +/* Not a write pointer (conventional zones only) */ +#define VIRTIO_BLK_ZS_NOT_WP 0 +/* Empty */ +#define VIRTIO_BLK_ZS_EMPTY1 +/* Implicitly Open */ +#define VIRTIO_BLK_ZS_IOPEN2 +/* Explicitly Open */ +#define VIRTIO_BLK_ZS_EOPEN3 +/* Closed */ +#define VIRTIO_BLK_ZS_CLOSED 4 +/* Read-Only */ +#define VIRTIO_BLK_ZS_RDONLY 13 +/* Full */ +#define VIRTIO_BLK_ZS_FULL 14 +/* Offline */ +#define VIRTIO_BLK_ZS_OFFLINE 15 + /* Unmap this range (only valid for write zeroes command) */ #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x0001 @@ -198,4 +309,11 @@ struct virtio_scsi_inhdr { #define VIRTIO_BLK_S_OK0 #define VIRTIO_BLK_S_IOERR 1 #define VIRTIO_BLK_S_UNSUPP2 + +/* Error codes that are specific to zoned block devices */ +#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 +#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP4 +#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 +#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 + #endif /* _LINUX_VIRTIO_BLK_H */ -- 2.37.1
[RFC v5 01/11] include: add zoned device structs
Signed-off-by: Sam Li --- include/block/block-common.h | 43 1 file changed, 43 insertions(+) diff --git a/include/block/block-common.h b/include/block/block-common.h index fdb7306e78..c9d28b1c51 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver; typedef struct BdrvChild BdrvChild; typedef struct BdrvChildClass BdrvChildClass; +typedef enum BlockZoneOp { +BLK_ZO_OPEN, +BLK_ZO_CLOSE, +BLK_ZO_FINISH, +BLK_ZO_RESET, +} BlockZoneOp; + +typedef enum BlockZoneModel { +BLK_Z_NONE = 0x0, /* Regular block device */ +BLK_Z_HM = 0x1, /* Host-aware zoned block device */ +BLK_Z_HA = 0x2, /* Host-managed zoned block device */ +} BlockZoneModel; + +typedef enum BlockZoneCondition { +BLK_ZS_NOT_WP = 0x0, +BLK_ZS_EMPTY = 0x1, +BLK_ZS_IOPEN = 0x2, +BLK_ZS_EOPEN = 0x3, +BLK_ZS_CLOSED = 0x4, +BLK_ZS_RDONLY = 0xD, +BLK_ZS_FULL = 0xE, +BLK_ZS_OFFLINE = 0xF, +} BlockZoneCondition; + +typedef enum BlockZoneType { +BLK_ZT_CONV = 0x1, +BLK_ZT_SWR = 0x2, +BLK_ZT_SWP = 0x3, +} BlockZoneType; + +/* + * Zone descriptor data structure. + * Provide information on a zone with all position and size values in bytes. + */ +typedef struct BlockZoneDescriptor { +uint64_t start; +uint64_t length; +uint64_t cap; +uint64_t wp; +BlockZoneType type; +BlockZoneCondition cond; +} BlockZoneDescriptor; + typedef struct BlockDriverInfo { /* in bytes, 0 if irrelevant */ int cluster_size; -- 2.37.1
[RFC v5 00/11] Add support for zoned device
Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones that are larger than the LBA size. It can only allow sequential writes, which reduces write amplification in SSD, leading to higher throughput and increased capacity. More details about ZBDs can be found at: https://zonedstorage.io/docs/introduction/zoned-storage The zoned device support aims to let guests (virtual machines) access zoned storage devices on the host (hypervisor) through a virtio-blk device. This involves extending QEMU's block layer and virtio-blk emulation code. In its current status, the virtio-blk device is not aware of ZBDs but the guest sees host-managed drives as regular drive that will runs correctly under the most common write workloads. This patch series extend the block layer APIs and virtio-blk emulation section with the minimum set of zoned commands that are necessary to support zoned devices. The commands are - Report Zones, four zone operations and Zone Append (developing). It can be tested on a null_blk device using qemu-io, qemu-iotests or blkzone(8) command in the guest os. For example, the command line for zone report using qemu-io is: $ path/to/qemu-io --image-opts driver=zoned_host_device,filename=/dev/nullb0 -c "zrp offset nr_zones" To enable zoned device in the guest os, the guest kernel must have the virtio-blk driver with ZBDs support. The link to such patches for the kernel is: https://github.com/dmitry-fomichev/virtblk-zbd Then, add the following options to the QEMU command line: -blockdev node-name=drive0,driver=zoned_host_device,filename=/dev/nullb0 After the guest os booting, use blkzone(8) to test zone operations: blkzone report -o offset -c nr_zones /dev/vda v5: - add zoned storage emulation to virtio-blk device - add documentation for zoned storage - address review comments * fix qemu-iotests * fix check to block layer * modify interfaces of sysfs helper functions * rename zoned device structs according to QEMU styles * reorder patches v4: - add virtio-blk headers for zoned device - add configurations for zoned host device - add zone operations for raw-format - address review comments * fix memory leak bug in zone_report * add checks to block layers * fix qemu-iotests format * fix sysfs helper functions v3: - add helper functions to get sysfs attributes - address review comments * fix zone report bugs * fix the qemu-io code path * use thread pool to avoid blocking ioctl() calls v2: - add qemu-io sub-commands - address review comments * modify interfaces of APIs v1: - add block layer APIs resembling Linux ZoneBlockDevice ioctls Sam Li (11): include: add zoned device structs include: import virtio_blk headers from linux with zoned storage support file-posix: introduce get_sysfs_long_val for the long sysfs attribute file-posix: introduce get_sysfs_str_val for device zoned model block: add block layer APIs resembling Linux ZonedBlockDevice ioctls raw-format: add zone operations to pass through requests config: add check to block layer virtio-blk: add zoned storage APIs for zoned devices qemu-io: add zoned block device operations. qemu-iotests: test new zone operations docs/zoned-storage: add zoned device documentation block.c | 13 + block/block-backend.c | 139 +++ block/coroutines.h | 6 + block/file-posix.c | 383 +++- block/io.c | 41 +++ block/raw-format.c | 14 + docs/devel/zoned-storage.rst| 68 docs/system/qemu-block-drivers.rst.inc | 6 + hw/block/virtio-blk.c | 172 - include/block/block-common.h| 44 ++- include/block/block-io.h| 13 + include/block/block_int-common.h| 35 +- include/block/raw-aio.h | 6 +- include/standard-headers/linux/virtio_blk.h | 118 ++ include/sysemu/block-backend-io.h | 6 + meson.build | 1 + qapi/block-core.json| 7 +- qemu-io-cmds.c | 144 tests/qemu-iotests/tests/zoned.out | 53 +++ tests/qemu-iotests/tests/zoned.sh | 86 + 20 files changed, 1340 insertions(+), 15 deletions(-) create mode 100644 docs/devel/zoned-storage.rst create mode 100644 tests/qemu-iotests/tests/zoned.out create mode 100755 tests/qemu-iotests/tests/zoned.sh -- 2.37.1
[PULL 3/3] Hexagon (tests/tcg/hexagon) reference file for float_convd
The test is in tests/tcg/multiarch/float_convd.c Signed-off-by: Taylor Simpson Acked-by: Richard Henderson Message-Id: <20220718230320.2-4-tsimp...@quicinc.com> --- tests/tcg/hexagon/float_convd.ref | 988 ++ 1 file changed, 988 insertions(+) create mode 100644 tests/tcg/hexagon/float_convd.ref diff --git a/tests/tcg/hexagon/float_convd.ref b/tests/tcg/hexagon/float_convd.ref new file mode 100644 index 00..aba1e13e35 --- /dev/null +++ b/tests/tcg/hexagon/float_convd.ref @@ -0,0 +1,988 @@ +### Rounding to nearest +from double: f64(nan:0x007ff4) + to single: f32(-nan:0x) (INVALID) + to int32: -1 (INVALID) + to int64: -1 (INVALID) + to uint32: -1 (INVALID) + to uint64: -1 (INVALID) +from double: f64(-nan:0x00fff8) + to single: f32(-nan:0x) (OK) + to int32: -1 (INVALID) + to int64: -1 (INVALID) + to uint32: -1 (INVALID) + to uint64: -1 (INVALID) +from double: f64(-inf:0x00fff0) + to single: f32(-inf:0xff80) (OK) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.f000p+1023:0x00ffef) + to single: f32(-inf:0xff80) (OVERFLOW INEXACT ) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.fe00p+127:0x00c7efe000) + to single: f32(-0x1.fe00p+127:0xff7f) (OK) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.fe00p+127:0x00c7efe000) + to single: f32(-0x1.fe00p+127:0xff7f) (OK) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.1874b135ff654000p+103:0x00c661874b135ff654) + to single: f32(-0x1.1874b200p+103:0xf30c3a59) (INEXACT ) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.c0bab523323b9000p+99:0x00c62c0bab523323b9) + to single: f32(-0x1.c0bab600p+99:0xf1605d5b) (INEXACT ) + to int32: -2147483648 (INVALID) + to int64: -9223372036854775808 (INVALID) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.p+1:0x00c000) + to single: f32(-0x1.p+1:0xc000) (OK) + to int32: -2 (OK) + to int64: -2 (OK) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.p+0:0x00bff0) + to single: f32(-0x1.p+0:0xbf80) (OK) + to int32: -1 (OK) + to int64: -1 (OK) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.p-1022:0x008010) + to single: f32(-0x0.p+0:0x8000) (UNDERFLOW INEXACT ) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(-0x1.p-126:0x00b810) + to single: f32(-0x1.p-126:0x8080) (OK) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INVALID) + to uint64: 0 (INVALID) +from double: f64(0x0.p+0:) + to single: f32(0x0.p+0:00) (OK) + to int32: 0 (OK) + to int64: 0 (OK) + to uint32: 0 (OK) + to uint64: 0 (OK) +from double: f64(0x1.p-126:0x003810) + to single: f32(0x1.p-126:0x0080) (OK) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INEXACT ) + to uint64: 0 (INEXACT ) +from double: f64(0x1.0001c5f68000p-25:0x003e60001c5f68) + to single: f32(0x1.p-25:0x3300) (INEXACT ) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INEXACT ) + to uint64: 0 (INEXACT ) +from double: f64(0x1.e6cb2fa82000p-25:0x003e6e6cb2fa82) + to single: f32(0x1.e600p-25:0x3373) (INEXACT ) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INEXACT ) + to uint64: 0 (INEXACT ) +from double: f64(0x1.ff801a9af58a1000p-15:0x003f0ff801a9af58a1) + to single: f32(0x1.ff801a00p-15:0x387fc00d) (INEXACT ) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INEXACT ) + to uint64: 0 (INEXACT ) +from double: f64(0x1.0c06a1ef5000p-14:0x003f10c06a1ef5) + to single: f32(0x1.0c00p-14:0x3886) (INEXACT ) + to int32: 0 (INEXACT ) + to int64: 0 (INEXACT ) + to uint32: 0 (INEXACT ) + to uint64: 0 (INEXACT ) +from double: f64(0x1.p+0:0x003ff0) +
[PULL 2/3] Hexagon (tests/tcg/hexagon) Fix alignment in load_unpack.c
The increment used in :brev tests was causing unaligned addresses Change the increment and the relevant expected values Signed-off-by: Taylor Simpson Acked-by: Richard Henderson Message-Id: <20220718230320.2-3-tsimp...@quicinc.com> --- tests/tcg/hexagon/load_unpack.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tests/tcg/hexagon/load_unpack.c b/tests/tcg/hexagon/load_unpack.c index 3575a37a28..4aa26fc388 100644 --- a/tests/tcg/hexagon/load_unpack.c +++ b/tests/tcg/hexagon/load_unpack.c @@ -245,7 +245,7 @@ TEST_pr(loadbsw4_pr, long long, S, 4, 0xff00ff00LL, */ #define BxW_LOAD_pbr(SZ, RES, PTR) \ __asm__( \ -"r4 = #(1 << (16 - 3))\n\t" \ +"r4 = #(1 << (16 - 4))\n\t" \ "m0 = r4\n\t" \ "%0 = mem" #SZ "(%1++m0:brev)\n\t" \ : "=r"(RES), "+r"(PTR) \ @@ -273,15 +273,15 @@ void test_##NAME(void) \ } TEST_pbr(loadbzw2_pbr, int, Z, 0x, -0x00020081, 0x00060085, 0x00040083, 0x00080087) +0x00020081, 0x000a0089, 0x00060085, 0x000e008d) TEST_pbr(loadbsw2_pbr, int, S, 0xff00, -0x00020081, 0x00060085, 0x00040083, 0x00080087) +0x00020081, 0x000aff89, 0x0006ff85, 0x000eff8d) TEST_pbr(loadbzw4_pbr, long long, Z, 0xLL, -0x0004008300020081LL, 0x0008008700060085LL, -0x0006008500040083LL, 0x000a008900080087LL) +0x0004008300020081LL, 0x000c008b000a0089LL, +0x0008008700060085LL, 0x0010008f000e008dLL) TEST_pbr(loadbsw4_pbr, long long, S, 0xff00ff00LL, -0x0004008300020081LL, 0x0008008700060085LL, -0x0006008500040083LL, 0x000a008900080087LL) +0x0004008300020081LL, 0x000cff8b000aff89LL, +0x0008ff870006ff85LL, 0x0010ff8f000eff8dLL) /* -- 2.17.1
[PULL 1/3] Hexagon (target/hexagon) make VyV operands use a unique temp
VyV operand is only used in the vshuff and vdeal instructions. These instructions write to both VyV and VxV operands. In the case where both operands are the same register, we need a separate location for VyV. We use the existing vtmp field in CPUHexagonState. Test case added in tests/tcg/hexagon/hvx_misc.c Signed-off-by: Taylor Simpson Reviewed-by: Richard Henderson Message-Id: <20220718230320.2-2-tsimp...@quicinc.com> --- tests/tcg/hexagon/hvx_misc.c| 45 + target/hexagon/gen_tcg_funcs.py | 9 +++ 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c index b896f5897e..6e2c9ab3cd 100644 --- a/tests/tcg/hexagon/hvx_misc.c +++ b/tests/tcg/hexagon/hvx_misc.c @@ -498,6 +498,49 @@ static void test_vsubuwsat_dv(void) check_output_w(__LINE__, 2); } +static void test_vshuff(void) +{ +/* Test that vshuff works when the two operands are the same register */ +const uint32_t splat = 0x089be55c; +const uint32_t shuff = 0x454fa926; +MMVector v0, v1; + +memset(expect, 0x12, sizeof(MMVector)); +memset(output, 0x34, sizeof(MMVector)); + +asm volatile("v25 = vsplat(%0)\n\t" + "vshuff(v25, v25, %1)\n\t" + "vmem(%2 + #0) = v25\n\t" + : /* no outputs */ + : "r"(splat), "r"(shuff), "r"(output) + : "v25", "memory"); + +/* + * The semantics of Hexagon are the operands are pass-by-value, so create + * two copies of the vsplat result. + */ +for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) { +v0.uw[i] = splat; +v1.uw[i] = splat; +} +/* Do the vshuff operation */ +for (int offset = 1; offset < MAX_VEC_SIZE_BYTES; offset <<= 1) { +if (shuff & offset) { +for (int k = 0; k < MAX_VEC_SIZE_BYTES; k++) { +if (!(k & offset)) { +uint8_t tmp = v0.ub[k]; +v0.ub[k] = v1.ub[k + offset]; +v1.ub[k + offset] = tmp; +} +} +} +} +/* Put the result in the expect buffer for verification */ +expect[0] = v1; + +check_output_b(__LINE__, 1); +} + int main() { init_buffers(); @@ -533,6 +576,8 @@ int main() test_vadduwsat(); test_vsubuwsat_dv(); +test_vshuff(); + puts(err ? "FAIL" : "PASS"); return err ? 1 : 0; } diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py index 1fd9de95d5..d72c689ad7 100755 --- a/target/hexagon/gen_tcg_funcs.py +++ b/target/hexagon/gen_tcg_funcs.py @@ -1,7 +1,7 @@ #!/usr/bin/env python3 ## -## Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved. +## Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights Reserved. ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by @@ -164,7 +164,9 @@ def genptr_decl(f, tag, regtype, regid, regno): (regtype, regid, regno)) f.write("const intptr_t %s%sV_off =\n" % \ (regtype, regid)) -if (hex_common.is_tmp_result(tag)): +if (regid == "y"): +f.write("offsetof(CPUHexagonState, vtmp);\n") +elif (hex_common.is_tmp_result(tag)): f.write("ctx_tmp_vreg_off(ctx, %s%sN, 1, true);\n" % \ (regtype, regid)) else: @@ -379,9 +381,6 @@ def genptr_src_read(f, tag, regtype, regid): f.write("vreg_src_off(ctx, %s%sN),\n" % \ (regtype, regid)) f.write("sizeof(MMVector), sizeof(MMVector));\n") -if (not hex_common.skip_qemu_helper(tag)): -f.write("tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \ - (regtype, regid, regtype, regid)) else: print("Bad register parse: ", regtype, regid) elif (regtype == "Q"): -- 2.17.1
[PULL 0/3] Hexagon bug fixes and test improvements
The following changes since commit 3916603e0c1d909e14e09d5ebcbdaa9c9e21adf3: Merge tag 'pull-la-20220729' of https://gitlab.com/rth7680/qemu into staging (2022-07-29 17:39:17 -0700) are available in the Git repository at: https://github.com/quic/qemu tags/pull-hex-20220731 for you to fetch changes up to 7eabb050ea77e529f549ea1ddaaa18e91ae01e34: Hexagon (tests/tcg/hexagon) reference file for float_convd (2022-07-31 16:22:09 -0700) Hexagon bug fixes and test improvements 1) Fixes a bug in qemu-hexagon 2) Fixes a bug in a test case 3) Adds reference file for float_convd test case Taylor Simpson (3): Hexagon (target/hexagon) make VyV operands use a unique temp Hexagon (tests/tcg/hexagon) Fix alignment in load_unpack.c Hexagon (tests/tcg/hexagon) reference file for float_convd tests/tcg/hexagon/hvx_misc.c | 45 ++ tests/tcg/hexagon/load_unpack.c | 14 +- target/hexagon/gen_tcg_funcs.py | 9 +- tests/tcg/hexagon/float_convd.ref | 988 ++ 4 files changed, 1044 insertions(+), 12 deletions(-) create mode 100644 tests/tcg/hexagon/float_convd.ref
[PATCH] ipmi:smbus: Add a check around a memcpy
From: Corey Minyard In one case: memcpy(sid->inmsg + sid->inlen, buf, len); if len == 0 then sid->inmsg + sig->inlen can point to one past the inmsg array if the array is full. We have to allow len == 0 due to some vagueness in the spec, but we don't have to call memcpy. Found by Coverity. This is not a problem in practice, but the results are technically (maybe) undefined. So make Coverity happy. Reported-by: Peter Maydell Signed-off-by: Corey Minyard --- hw/ipmi/smbus_ipmi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) I think this should do it. diff --git a/hw/ipmi/smbus_ipmi.c b/hw/ipmi/smbus_ipmi.c index 9ef9112dd5..d0991ab7f9 100644 --- a/hw/ipmi/smbus_ipmi.c +++ b/hw/ipmi/smbus_ipmi.c @@ -281,7 +281,9 @@ static int ipmi_write_data(SMBusDevice *dev, uint8_t *buf, uint8_t len) */ send = true; } -memcpy(sid->inmsg + sid->inlen, buf, len); +if (len > 0) { +memcpy(sid->inmsg + sid->inlen, buf, len); +} sid->inlen += len; break; } -- 2.25.1
[Bug 1180923] Re: unused memory filled with 0x00 instead of 0xFF
I know this is expired but it's still a problem in qemu 7.0.0. For example, when running MS-DOS 6.22, "mem" reports 0K of upper memory, and memmaker fails to run, complaining that it could not allocate any. I'd love to know if there's a workaround. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1180923 Title: unused memory filled with 0x00 instead of 0xFF Status in QEMU: Expired Bug description: Qemu, ever since it was made (so, since 2003), has this problem in DOS (either PC-DOS or MS-DOS and partly Windows 9x) not recognizing the memory available when the memory is filled with 0x00 but when it is filled with 0xFF it gets recognized properly, where should I patch qemu to solve this memory problem? To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1180923/+subscriptions
Re: virtio: why no full reset on virtio_set_status 0 ?
On 7/29/22 16:00, Claudio Fontana wrote: > On 7/29/22 15:21, Alex Bennée wrote: >> >> Claudio Fontana writes: >> >>> On 7/29/22 12:13, Michael S. Tsirkin wrote: On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote: >>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque) >>> VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev); >>> int i; >>> >>> -virtio_set_status(vdev, 0); >>> if (current_cpu) { >>> /* Guest initiated reset */ >>> vdev->device_endian = virtio_current_cpu_endian(); >>> -- >>> 2.26.2 >> >> As you say this is incomplete ... bout could you share a bit more >> of what issue does this address? >> > > Hi, the problem I am trying to address is a segfault in OVS/dpdk that > looks like this: Sorry I was not clear. What I mean is, you don't yet know why does removing virtio_set_status call here prevent the crash in ovs, do you? >>> >>> I have no idea. Trying to collect logs to figure things out, but as >>> mentioned the logs easily hide the issue. >>> Likely there is just more to study here. >> >> Given the OVS is going off on a NULL ptr deref could it just be it's not >> handling the disabling/reenabling of the virtqueues as you pause and >> restart properly? I could certainly imagine a backend jumping the gun to >> read a queue going very wrong if the current queue state is disabled. >> > > In this case both the ovs buf_addr and buf_iova are NULL, which is a nice > case as they are more detectable, > however I also have segfaults where the addresses are just garbage. > > I wonder whether it's possible that given the fact that the guest is going > away without notification (SIGKILL), > as the guest driver resets the device and communicates with QEMU, QEMU adapts > the state without notifying ovs, > so ovs happily tries to dequeue data from memory that isn't there. But I am > just guessing. > > I am still studying the qemu vhost user side and ovs/dpdk side to try to > understand how this whole thing works. > > Thanks, > > CLaudio > I am pursuing this as a DPDK library issue. It would be cool to have ovs, dpdk and vhost-user with the default test-pmd application somehow hooked up in a basic test in one of these projects.. Thanks, Claudio
Re: virtio: why no full reset on virtio_set_status 0 ?
On 7/28/22 12:24, Cornelia Huck wrote: > On Thu, Jul 28 2022, Claudio Fontana wrote: > >> On 7/28/22 09:43, Claudio Fontana wrote: >>> On 7/28/22 03:27, Jason Wang wrote: On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin wrote: > > On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote: >> Hi Michael and all, >> >> I have started researching a qemu / ovs / dpdk bug: >> >> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf38...@redhat.com/T/ >> >> that seems to be affecting multiple parties in the telco space, >> >> and during this process I noticed that qemu/hw/virtio/virtio.c does not >> do a full virtio reset >> in virtio_set_status, when receiving a status value of 0. >> >> It seems it has always been this way, so I am clearly missing / >> forgetting something basic, >> >> I checked the virtio spec at https://docs.oasis-open.org/ >> >> and from: >> >> " >> 4.1.4.3 Common configuration structure layout >> >> device_status >> The driver writes the device status here (see 2.1). Writing 0 into this >> field resets the device. >> >> " >> >> and >> >> " >> 2.4.1 Device Requirements: Device Reset >> A device MUST reinitialize device status to 0 after receiving a reset. >> " > > Side note: We can also have a reset without writing 0 to the device > status (RESET ccw on the virtio-ccw transport). > >> >> I would conclude that in virtio.c::virtio_set_status we should >> unconditionally do a full virtio_reset. >> >> Instead, we have just the check: >> >> if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) != >> (val & VIRTIO_CONFIG_S_DRIVER_OK)) { >> virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK); >> } >> >> which just sets the started field, >> >> and then we have the call to the virtio device class set_status >> (virtio_net...), >> but the VirtioDevice is not fully reset, as per the virtio_reset() call >> we are missing: >> >> " >> vdev->start_on_kick = false; >> vdev->started = false; >> vdev->broken = false; >> vdev->guest_features = 0; >> vdev->queue_sel = 0; >> vdev->status = 0; >> vdev->disabled = false; >> qatomic_set(&vdev->isr, 0); >> vdev->config_vector = VIRTIO_NO_VECTOR; >> virtio_notify_vector(vdev, vdev->config_vector); >> >> for(i = 0; i < VIRTIO_QUEUE_MAX; i++) { >> ... initialize vdev->vq[i] ... >> } >> " >> >> Doing a full reset seems to fix the problem for me, so I can send >> tentative patches if necessary, >> but what am I missing here? >> >> Thanks, >> >> Claudio >> >> -- >> Claudio Fontana >> Engineering Manager Virtualization, SUSE Labs Core >> >> SUSE Software Solutions Italy Srl > > > So for example for pci: > > case VIRTIO_PCI_STATUS: > > > > > if (vdev->status == 0) { > virtio_pci_reset(DEVICE(proxy)); > } > > FWIW, ccw ends up calling virtio_ccw_reset_virtio() when the driver > issues a reset command, or when it issues a write status 0 command, or > when the generic reset function is invoked. > > > which I suspect is a bug because: > > static void virtio_pci_reset(DeviceState *qdev) > { > VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev); > VirtioBusState *bus = VIRTIO_BUS(&proxy->bus); > PCIDevice *dev = PCI_DEVICE(qdev); > int i; > > virtio_bus_reset(bus); Note that we do virtio_reset() here. >>> >>> >>> Yes, thank you, I completely overlooked it, I noticed this in Michael's >>> response as well. >>> >>> However we end up with multiple calls to k->set_status, one from the >>> virtio_set_status call, >>> and one from the virtio_bus_reset(), which is probably something we don't >>> want. >>> >>> All in all it is not clear what the meaning of virtio_set_status is >>> supposed to be I think, >>> and I wonder what the assumptions are among all the callers. >>> If it is supposed to be an implementation of the virtio standard field as >>> described, I think we should do the reset right then and there, >>> but maybe the true meaning of the function is another one I couldn't >>> understand, since _some_ of the cases are processes there. > > Hm. Maybe there needs to be a distinction between "we're forwarding the > status setting by the driver to the core, take any appropriate action" > and "we've just reset the device, now we just need to zero out the > status field"? Right, and the reset function of virtio already sets ->status to 0 manually as part of the reset. Fundamentally, the only issue I am seeing in qemu is this semantic thing, and the fact that the virtio devic
[PATCH v2 3/5] tests/acpi: allow changes for core_count2 test
Signed-off-by: Julia Suvorova --- tests/qtest/bios-tables-test-allowed-diff.h | 3 +++ tests/data/acpi/q35/APIC.core-count2| 0 tests/data/acpi/q35/DSDT.core-count2| 0 tests/data/acpi/q35/FACP.core-count2| 0 4 files changed, 3 insertions(+) create mode 100644 tests/data/acpi/q35/APIC.core-count2 create mode 100644 tests/data/acpi/q35/DSDT.core-count2 create mode 100644 tests/data/acpi/q35/FACP.core-count2 diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h index dfb8523c8b..e81dc67a2e 100644 --- a/tests/qtest/bios-tables-test-allowed-diff.h +++ b/tests/qtest/bios-tables-test-allowed-diff.h @@ -1 +1,4 @@ /* List of comma-separated changed AML files to ignore */ +"tests/data/acpi/q35/APIC.core-count2", +"tests/data/acpi/q35/DSDT.core-count2", +"tests/data/acpi/q35/FACP.core-count2", diff --git a/tests/data/acpi/q35/APIC.core-count2 b/tests/data/acpi/q35/APIC.core-count2 new file mode 100644 index 00..e69de29bb2 diff --git a/tests/data/acpi/q35/DSDT.core-count2 b/tests/data/acpi/q35/DSDT.core-count2 new file mode 100644 index 00..e69de29bb2 diff --git a/tests/data/acpi/q35/FACP.core-count2 b/tests/data/acpi/q35/FACP.core-count2 new file mode 100644 index 00..e69de29bb2 -- 2.35.3
[PATCH v2 2/5] bios-tables-test: teach test to use smbios 3.0 tables
Introduce the 64-bit entry point. Since we no longer have a total number of structures, stop checking for the new ones at the EOF structure (type 127). Signed-off-by: Julia Suvorova --- tests/qtest/bios-tables-test.c | 95 +- 1 file changed, 71 insertions(+), 24 deletions(-) diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c index 359916c228..e352d5249f 100644 --- a/tests/qtest/bios-tables-test.c +++ b/tests/qtest/bios-tables-test.c @@ -88,8 +88,8 @@ typedef struct { uint64_t rsdp_addr; uint8_t rsdp_table[36 /* ACPI 2.0+ RSDP size */]; GArray *tables; -uint32_t smbios_ep_addr; -struct smbios_21_entry_point smbios_ep_table; +uint64_t smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE__MAX]; +SmbiosEntryPoint smbios_ep_table; uint16_t smbios_cpu_max_speed; uint16_t smbios_cpu_curr_speed; uint8_t *required_struct_types; @@ -533,10 +533,9 @@ static void test_acpi_asl(test_data *data) free_test_data(&exp_data); } -static bool smbios_ep_table_ok(test_data *data) +static bool smbios_ep2_table_ok(test_data *data, uint32_t addr) { -struct smbios_21_entry_point *ep_table = &data->smbios_ep_table; -uint32_t addr = data->smbios_ep_addr; +struct smbios_21_entry_point *ep_table = &data->smbios_ep_table.ep21; qtest_memread(data->qts, addr, ep_table, sizeof(*ep_table)); if (memcmp(ep_table->anchor_string, "_SM_", 4)) { @@ -559,13 +558,29 @@ static bool smbios_ep_table_ok(test_data *data) return true; } -static void test_smbios_entry_point(test_data *data) +static bool smbios_ep3_table_ok(test_data *data, uint64_t addr) +{ +struct smbios_30_entry_point *ep_table = &data->smbios_ep_table.ep30; + +qtest_memread(data->qts, addr, ep_table, sizeof(*ep_table)); +if (memcmp(ep_table->anchor_string, "_SM3_", 5)) { +return false; +} + +if (acpi_calc_checksum((uint8_t *)ep_table, sizeof *ep_table)) { +return false; +} + +return true; +} + +static SmbiosEntryPointType test_smbios_entry_point(test_data *data) { uint32_t off; /* find smbios entry point structure */ for (off = 0xf; off < 0x10; off += 0x10) { -uint8_t sig[] = "_SM_"; +uint8_t sig[] = "_SM_", sig3[] = "_SM3_"; int i; for (i = 0; i < sizeof sig - 1; ++i) { @@ -574,14 +589,30 @@ static void test_smbios_entry_point(test_data *data) if (!memcmp(sig, "_SM_", sizeof sig)) { /* signature match, but is this a valid entry point? */ -data->smbios_ep_addr = off; -if (smbios_ep_table_ok(data)) { +if (smbios_ep2_table_ok(data, off)) { +data->smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE_32] = off; +} +} + +for (i = 0; i < sizeof sig3 - 1; ++i) { +sig3[i] = qtest_readb(data->qts, off + i); +} + +if (!memcmp(sig3, "_SM3_", sizeof sig3)) { +if (smbios_ep3_table_ok(data, off)) { +data->smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE_64] = off; +/* found 64-bit entry point, no need to look for 32-bit one */ break; } } } -g_assert_cmphex(off, <, 0x10); +/* found at least one entry point */ +g_assert_true(data->smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE_32] || + data->smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE_64]); + +return data->smbios_ep_addr[SMBIOS_ENTRY_POINT_TYPE_64] ? + SMBIOS_ENTRY_POINT_TYPE_64 : SMBIOS_ENTRY_POINT_TYPE_32; } static inline bool smbios_single_instance(uint8_t type) @@ -625,16 +656,23 @@ static bool smbios_cpu_test(test_data *data, uint32_t addr) return true; } -static void test_smbios_structs(test_data *data) +static void test_smbios_structs(test_data *data, SmbiosEntryPointType ep_type) { DECLARE_BITMAP(struct_bitmap, SMBIOS_MAX_TYPE+1) = { 0 }; -struct smbios_21_entry_point *ep_table = &data->smbios_ep_table; -uint32_t addr = le32_to_cpu(ep_table->structure_table_address); -int i, len, max_len = 0; + +SmbiosEntryPoint *ep_table = &data->smbios_ep_table; +int i = 0, len, max_len = 0; uint8_t type, prv, crt; +uint64_t addr; + +if (ep_type == SMBIOS_ENTRY_POINT_TYPE_32) { +addr = le32_to_cpu(ep_table->ep21.structure_table_address); +} else { +addr = le64_to_cpu(ep_table->ep30.structure_table_address); +} /* walk the smbios tables */ -for (i = 0; i < le16_to_cpu(ep_table->number_of_structures); i++) { +do { /* grab type and formatted area length from struct header */ type = qtest_readb(data->qts, addr); @@ -660,19 +698,28 @@ static void test_smbios_structs(test_data *data) } /* keep track of max. struct size */ -if (max_len < len) { +if (ep_type == SMBIOS_ENTRY_POINT_TYPE_32 && max_len < len) { max_len = len; -g_assert_c
[PATCH v2 4/5] bios-tables-test: add test for number of cores > 255
The new test is run with a large number of cpus and checks if the core_count field in smbios_cpu_test (structure type 4) is correct. Choose q35 as it allows to run with -smp > 255. Signed-off-by: Julia Suvorova --- tests/qtest/bios-tables-test.c | 53 +- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c index e352d5249f..cebfa430ac 100644 --- a/tests/qtest/bios-tables-test.c +++ b/tests/qtest/bios-tables-test.c @@ -92,6 +92,8 @@ typedef struct { SmbiosEntryPoint smbios_ep_table; uint16_t smbios_cpu_max_speed; uint16_t smbios_cpu_curr_speed; +uint8_t smbios_core_count; +uint16_t smbios_core_count2; uint8_t *required_struct_types; int required_struct_types_len; QTestState *qts; @@ -631,29 +633,38 @@ static inline bool smbios_single_instance(uint8_t type) } } -static bool smbios_cpu_test(test_data *data, uint32_t addr) +static void smbios_cpu_test(test_data *data, uint32_t addr) { -uint16_t expect_speed[2]; -uint16_t real; +uint8_t core_count, expected_core_count = data->smbios_core_count; +uint16_t speed, core_count2, expected_core_count2 = data->smbios_core_count2; +uint16_t expected_speed[2]; int offset[2]; int i; /* Check CPU speed for backward compatibility */ offset[0] = offsetof(struct smbios_type_4, max_speed); offset[1] = offsetof(struct smbios_type_4, current_speed); -expect_speed[0] = data->smbios_cpu_max_speed ? : 2000; -expect_speed[1] = data->smbios_cpu_curr_speed ? : 2000; +expected_speed[0] = data->smbios_cpu_max_speed ? : 2000; +expected_speed[1] = data->smbios_cpu_curr_speed ? : 2000; for (i = 0; i < 2; i++) { -real = qtest_readw(data->qts, addr + offset[i]); -if (real != expect_speed[i]) { -fprintf(stderr, "Unexpected SMBIOS CPU speed: real %u expect %u\n", -real, expect_speed[i]); -return false; -} +speed = qtest_readw(data->qts, addr + offset[i]); +g_assert_cmpuint(speed, ==, expected_speed[i]); } -return true; +core_count = qtest_readb(data->qts, + addr + offsetof(struct smbios_type_4, core_count)); +core_count2 = qtest_readw(data->qts, + addr + offsetof(struct smbios_type_4, core_count2)); + +if (expected_core_count) { +g_assert_cmpuint(core_count, ==, expected_core_count); +} + +/* Core Count has reached its limit, checking Core Count 2 */ +if (expected_core_count == 0xFF && expected_core_count2) { +g_assert_cmpuint(core_count2, ==, expected_core_count2); +} } static void test_smbios_structs(test_data *data, SmbiosEntryPointType ep_type) @@ -686,7 +697,7 @@ static void test_smbios_structs(test_data *data, SmbiosEntryPointType ep_type) set_bit(type, struct_bitmap); if (type == 4) { -g_assert(smbios_cpu_test(data, addr)); +smbios_cpu_test(data, addr); } /* seek to end of unformatted string area of this struct ("\0\0") */ @@ -903,6 +914,21 @@ static void test_acpi_q35_tcg(void) free_test_data(&data); } +static void test_acpi_q35_tcg_core_count2(void) +{ +test_data data = { +.machine = MACHINE_Q35, +.variant = ".core-count2", +.required_struct_types = base_required_struct_types, +.required_struct_types_len = ARRAY_SIZE(base_required_struct_types), +.smbios_core_count = 0xFF, +.smbios_core_count2 = 275, +}; + +test_acpi_one("-machine smbios-entry-point-type=64 -smp 275", &data); +free_test_data(&data); +} + static void test_acpi_q35_tcg_bridge(void) { test_data data; @@ -1822,6 +1848,7 @@ int main(int argc, char *argv[]) qtest_add_func("acpi/piix4/pci-hotplug/off", test_acpi_piix4_no_acpi_pci_hotplug); qtest_add_func("acpi/q35", test_acpi_q35_tcg); +qtest_add_func("acpi/q35/core-count2", test_acpi_q35_tcg_core_count2); qtest_add_func("acpi/q35/bridge", test_acpi_q35_tcg_bridge); qtest_add_func("acpi/q35/multif-bridge", test_acpi_q35_multif_bridge); qtest_add_func("acpi/q35/mmio64", test_acpi_q35_tcg_mmio64); -- 2.35.3
[PATCH v2 5/5] tests/acpi: update tables for new core count test
Changes in the tables (for 275 cores): FACP: + Use APIC Cluster Model (V4) : 1 APIC: +[02Ch 0044 1]Subtable Type : 00 [Processor Local APIC] +[02Dh 0045 1] Length : 08 +[02Eh 0046 1] Processor ID : 00 +[02Fh 0047 1]Local Apic ID : 00 +[030h 0048 4]Flags (decoded below) : 0001 + Processor Enabled : 1 ... + +[81Ch 2076 1]Subtable Type : 00 [Processor Local APIC] +[81Dh 2077 1] Length : 08 +[81Eh 2078 1] Processor ID : FE +[81Fh 2079 1]Local Apic ID : FE +[820h 2080 4]Flags (decoded below) : 0001 + Processor Enabled : 1 + Runtime Online Capable : 0 + +[824h 2084 1]Subtable Type : 09 [Processor Local x2APIC] +[825h 2085 1] Length : 10 +[826h 2086 2] Reserved : +[828h 2088 4] Processor x2Apic ID : 00FF +[82Ch 2092 4]Flags (decoded below) : 0001 + Processor Enabled : 1 +[830h 2096 4]Processor UID : 00FF ... DSDT: +Processor (C001, 0x01, 0x, 0x00) +{ +Method (_STA, 0, Serialized) // _STA: Status +{ +Return (CSTA (One)) +} + +Name (_MAT, Buffer (0x08) // _MAT: Multiple APIC Table Entry +{ + 0x00, 0x08, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00 // +}) +Method (_EJ0, 1, NotSerialized) // _EJx: Eject Device, x=0-9 +{ +CEJ0 (One) +} + +Method (_OST, 3, Serialized) // _OST: OSPM Status Indication +{ +COST (One, Arg0, Arg1, Arg2) +} +} ... +Processor (C0FE, 0xFE, 0x, 0x00) +{ +Method (_STA, 0, Serialized) // _STA: Status +{ +Return (CSTA (0xFE)) +} + +Name (_MAT, Buffer (0x08) // _MAT: Multiple APIC Table Entry +{ + 0x00, 0x08, 0xFE, 0xFE, 0x01, 0x00, 0x00, 0x00 // +}) +Method (_EJ0, 1, NotSerialized) // _EJx: Eject Device, x=0-9 +{ +CEJ0 (0xFE) +} + +Method (_OST, 3, Serialized) // _OST: OSPM Status Indication +{ +COST (0xFE, Arg0, Arg1, Arg2) +} +} + +Device (C0FF) +{ +Name (_HID, "ACPI0007" /* Processor Device */) // _HID: Hardware ID +Name (_UID, 0xFF) // _UID: Unique ID +Method (_STA, 0, Serialized) // _STA: Status +{ +Return (CSTA (0xFF)) +} + +Name (_MAT, Buffer (0x10) // _MAT: Multiple APIC Table Entry +{ +/* */ 0x09, 0x10, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00, // +/* 0008 */ 0x01, 0x00, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00 // +}) +Method (_EJ0, 1, NotSerialized) // _EJx: Eject Device, x=0-9 +{ +CEJ0 (0xFF) +} + +Method (_OST, 3, Serialized) // _OST: OSPM Status Indication +{ +COST (0xFF, Arg0, Arg1, Arg2) +} +} + ... Signed-off-by: Julia Suvorova --- tests/qtest/bios-tables-test-allowed-diff.h | 3 --- tests/data/acpi/q35/APIC.core-count2| Bin 0 -> 2478 bytes tests/data/acpi/q35/DSDT.core-count2| Bin 0 -> 32414 bytes tests/data/acpi/q35/FACP.core-count2| Bin 0 -> 244 bytes 4 files changed, 3 deletions(-) diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h index e81dc67a2e..dfb8523c8b 100644 --- a/tests/qtest/bios-tables-test-allowed-diff.h +++ b/tests/qtest/bios-tables-test-allowed-diff.h @@ -1,4 +1 @@ /* List of comma-separated changed AML files to ignore */ -"tests/data/acpi/q35/APIC.core-count2", -"tests/data/acpi/q35/DSDT.core-count2", -"tests/data/acpi/q35/FACP.core-count2", diff --git a/tests/data/acpi/q35/APIC.core-count2 b/tests/data/acpi/q35/APIC.core-count2 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..a255082ef5bc39f0d92d3e372b91f09dd6d0d9a1 100644 GIT binary patch literal 2478 zcmXZeWl$AS7=Youz=a#M-K}6ZwgLuNAQ;$~*xjvQcY@uCVs{~Sf`WpLVt2Rbe!ORA z_B`J^b9R56*&pj2=M2p(=#;b`y@>U1KQZ2 ztu5Nwq0xx;_UPb%CKH;?XtAKxijI!xf-7uzGc@Q3Gq% z#9Fnmc5SRv2fe+~#|M4+PE2*{()H?L{rcFT0s8r&zdtr?h>aRyuWjcwXs+qT%Q9ky?e9Xepgju;w>ojPIX&e)|3 zcI}GYx?%V37#4;-dSK6<*sB-z?u~u=VBfyjuOIgBj{^qaz=1eu5Dp%ULx$
[PATCH v2 0/5] hw/smbios: add core_count2 to smbios table type 4
The SMBIOS 3.0 specification provides the ability to reflect over 255 cores. The 64-bit entry point has been used for a while, but structure type 4 has not been updated before, so the dmidecode output looked like this (-smp 280): Handle 0x0400, DMI type 4, 42 bytes Processor Information ... Core Count: 24 Core Enabled: 24 Thread Count: 1 ... Big update in the bios-tables-test as it couldn't work with SMBIOS 3.0. v2: * generate tables type 4 of different sizes based on the selected smbios version * use SmbiosEntryPoint* types instead of creating new constants * refactor smbios_cpu_test [Igor, Ani] * clarify signature check [Igor] * add comments with specifications and clarification of the structure loop [Ani] Julia Suvorova (5): hw/smbios: add core_count2 to smbios table type 4 bios-tables-test: teach test to use smbios 3.0 tables tests/acpi: allow changes for core_count2 test bios-tables-test: add test for number of cores > 255 tests/acpi: update tables for new core count test hw/smbios/smbios_build.h | 9 +- include/hw/firmware/smbios.h | 11 ++ hw/smbios/smbios.c | 18 +++- tests/qtest/bios-tables-test.c | 148 --- tests/data/acpi/q35/APIC.core-count2 | Bin 0 -> 2478 bytes tests/data/acpi/q35/DSDT.core-count2 | Bin 0 -> 32414 bytes tests/data/acpi/q35/FACP.core-count2 | Bin 0 -> 244 bytes 7 files changed, 144 insertions(+), 42 deletions(-) create mode 100644 tests/data/acpi/q35/APIC.core-count2 create mode 100644 tests/data/acpi/q35/DSDT.core-count2 create mode 100644 tests/data/acpi/q35/FACP.core-count2 -- 2.35.3
[PATCH v2 1/5] hw/smbios: add core_count2 to smbios table type 4
In order to use the increased number of cpus, we need to bring smbios tables in line with the SMBIOS 3.0 specification. This allows us to introduce core_count2 which acts as a duplicate of core_count if we have fewer cores than 256, and contains the actual core number per socket if we have more. core_enabled2 and thread_count2 fields work the same way. Signed-off-by: Julia Suvorova --- hw/smbios/smbios_build.h | 9 +++-- include/hw/firmware/smbios.h | 11 +++ hw/smbios/smbios.c | 18 +++--- 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/smbios/smbios_build.h b/hw/smbios/smbios_build.h index 56b5a1e3f3..351660024e 100644 --- a/hw/smbios/smbios_build.h +++ b/hw/smbios/smbios_build.h @@ -27,6 +27,11 @@ extern unsigned smbios_table_max; extern unsigned smbios_table_cnt; #define SMBIOS_BUILD_TABLE_PRE(tbl_type, tbl_handle, tbl_required)\ +SMBIOS_BUILD_TABLE_PRE_SIZE(tbl_type, tbl_handle, tbl_required, \ +sizeof(struct smbios_type_##tbl_type))\ + +#define SMBIOS_BUILD_TABLE_PRE_SIZE(tbl_type, tbl_handle, \ +tbl_required, tbl_len)\ struct smbios_type_##tbl_type *t; \ size_t t_off; /* table offset into smbios_tables */ \ int str_index = 0;\ @@ -39,12 +44,12 @@ extern unsigned smbios_table_cnt; /* use offset of table t within smbios_tables */ \ /* (pointer must be updated after each realloc) */\ t_off = smbios_tables_len;\ -smbios_tables_len += sizeof(*t); \ +smbios_tables_len += tbl_len; \ smbios_tables = g_realloc(smbios_tables, smbios_tables_len); \ t = (struct smbios_type_##tbl_type *)(smbios_tables + t_off); \ \ t->header.type = tbl_type;\ -t->header.length = sizeof(*t);\ +t->header.length = tbl_len; \ t->header.handle = cpu_to_le16(tbl_handle); \ } while (0) diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h index 4b7ad77a44..56f7bf0fea 100644 --- a/include/hw/firmware/smbios.h +++ b/include/hw/firmware/smbios.h @@ -18,6 +18,8 @@ #define SMBIOS_MAX_TYPE 127 +#define offsetofend(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER)) /* memory area description, used by type 19 table */ struct smbios_phys_mem_area { @@ -187,8 +189,17 @@ struct smbios_type_4 { uint8_t thread_count; uint16_t processor_characteristics; uint16_t processor_family2; +/* SMBIOS spec 3.0.0, Table 21 */ +uint16_t core_count2; +uint16_t core_enabled2; +uint16_t thread_count2; } QEMU_PACKED; +typedef enum smbios_type_4_len_ver { +SMBIOS_TYPE_4_LEN_V28 = offsetofend(struct smbios_type_4, processor_family2), +SMBIOS_TYPE_4_LEN_V30 = offsetofend(struct smbios_type_4, thread_count2), +} smbios_type_4_len_ver; + /* SMBIOS type 11 - OEM strings */ struct smbios_type_11 { struct smbios_structure_header header; diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c index 60349ee402..657093e5f6 100644 --- a/hw/smbios/smbios.c +++ b/hw/smbios/smbios.c @@ -681,8 +681,13 @@ static void smbios_build_type_3_table(void) static void smbios_build_type_4_table(MachineState *ms, unsigned instance) { char sock_str[128]; +size_t tbl_len = SMBIOS_TYPE_4_LEN_V28; -SMBIOS_BUILD_TABLE_PRE(4, T4_BASE + instance, true); /* required */ +if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) { +tbl_len = SMBIOS_TYPE_4_LEN_V30; +} + +SMBIOS_BUILD_TABLE_PRE_SIZE(4, T4_BASE + instance, true, tbl_len); /* required */ snprintf(sock_str, sizeof(sock_str), "%s%2x", type4.sock_pfx, instance); SMBIOS_TABLE_SET_STR(4, socket_designation_str, sock_str); @@ -709,8 +714,15 @@ static void smbios_build_type_4_table(MachineState *ms, unsigned instance) SMBIOS_TABLE_SET_STR(4, serial_number_str, type4.serial); SMBIOS_TABLE_SET_STR(4, asset_tag_number_str, type4.asset); SMBIOS_TABLE_SET_STR(4, part_number_str, type4.part); -t->core_count = t->core_enabled = ms->smp.cores; -t->thread_count = ms->smp.threads; + +t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores; +t->core_enabled = t->core_count; + +t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores); + +t->thread_count = (ms->smp.threads > 255) ? 0xFF : ms->smp.threads; +t->thread_count2 = cpu_to_le16(ms->smp.threads); + t->processor_characteristics = cpu_to_le16(0
Re: [PATCH] linux-user: Implement faccessat2
On 7/29/22 13:14, Richard Henderson wrote: -ret = get_errno(access(path(p), arg2)); -unlock_user(p, arg1, 0); -return ret; +return do_faccessat2(AT_FDCWD, arg1, arg2, 0); Oops, dropped path(). Should perhaps be incorporated into the helper, because newer targets won't have or use plain access()... r~