Re: [PATCH 1/4] hw/acpi/aml-build: Only generate cluster node in PPTT when specified

2022-10-08 Thread wangyanan (Y)

Hi Yicong,

On 2022/9/22 21:11, Yicong Yang wrote:

From: Yicong Yang

Currently we'll always generate a cluster node no matter user has
specified '-smp clusters=X' or not. Cluster is an optional level
and it's unncessary to build it if user don't need. So only generate
it when user specify explicitly.

Also update the test ACPI tables.

It would be much more helpful to explain the problem you
have met in practice without this patch. (maybe have some
description or a link of the issue in the cover-letter if we
need a v2).

In qemu which behaves as like a firmware vendor for VM,
the ACPI PPTT is built based on the topology info produced
by machine_parse_smp_config(). And machine_parse_smp_config
will always calculate a complete topology hierarchy using its
algorithm, if the user gives an incomplete -smp CLI.

I think there are two options for us to chose:
1) approach described in this patch
2) qemu will always generate a full topology hierarchy in PPTT
with all the topo members it currently supports. While users
need to consider the necessity to use an incomplete -smp or
an complete one according to their specific scenario, and
should be aware of the kernel behavior resulted from the
config.

There is some Doc for users to explain how qemu will
parse user-specified -smp in [1].
[1] https://www.mankier.com/1/qemu#Options

Thanks,
Yanan

Signed-off-by: Yicong Yang
---
  hw/acpi/aml-build.c   | 2 +-
  hw/core/machine-smp.c | 3 +++
  include/hw/boards.h   | 2 ++
  3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index e6bfac95c7..aab73af66d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2030,7 +2030,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
  0, socket_id, NULL, 0);
  }
  
-if (mc->smp_props.clusters_supported) {

+if (mc->smp_props.clusters_supported && ms->smp.build_cluster) {
  if (cpus->cpus[n].props.cluster_id != cluster_id) {
  assert(cpus->cpus[n].props.cluster_id > cluster_id);
  cluster_id = cpus->cpus[n].props.cluster_id;
diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index b39ed21e65..5d37e8d07a 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -158,6 +158,9 @@ void machine_parse_smp_config(MachineState *ms,
  ms->smp.threads = threads;
  ms->smp.max_cpus = maxcpus;
  
+if (config->has_clusters)

+ms->smp.build_cluster = true;
+
  /* sanity-check of the computed topology */
  if (sockets * dies * clusters * cores * threads != maxcpus) {
  g_autofree char *topo_msg = cpu_hierarchy_to_string(ms);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7b416c9787..24aafc213d 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -305,6 +305,7 @@ typedef struct DeviceMemoryState {
   * @cores: the number of cores in one cluster
   * @threads: the number of threads in one core
   * @max_cpus: the maximum number of logical processors on the machine
+ * @build_cluster: build cluster topology or not
   */
  typedef struct CpuTopology {
  unsigned int cpus;
@@ -314,6 +315,7 @@ typedef struct CpuTopology {
  unsigned int cores;
  unsigned int threads;
  unsigned int max_cpus;
+bool build_cluster;
  } CpuTopology;
  
  /**





Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-10-08 Thread Ani Sinha
On Sun, Oct 9, 2022 at 10:51 AM Ani Sinha  wrote:
>
> On Wed, Sep 28, 2022 at 1:14 PM Thomas Huth  wrote:
> >
> >
> > > Do not do any of this stuff, it is irrelevant to QEMU's needs.
> > > A developer using Avocado with QEMU does nothing more than:
> > >
> > >  make check-avocado
> >
> > Right. And if you want to run individual tests, you can also do it like 
> > this:
> >
> >  make check-venv   # Only for the first time
> >  ./tests/venv/bin/avocado run tests/avocado/boot_linux.py
>
> Ok this seems to work after I did a pip3 install of avocado in the host.
>
>  ./tests/venv/bin/avocado run tests/avocado/version.py
> JOB ID : 8dd90b1cb5baf3780cc764ca4a1ae838374a0a5f
> JOB LOG: 
> /home/anisinha/avocado/job-results/job-2022-10-09T10.48-8dd90b1/job.log
>  (1/1) tests/avocado/version.py:Version.test_qmp_human_info_version:
> PASS (0.04 s)
> RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
> | CANCEL 0
> JOB TIME   : 3.51 s
>
> I see that the output is not tap compliant like the qtests tests are.
> how do I choose tap?
>
>  ./tests/venv/bin/avocado-runner-tap --help
> usage: avocado-runner-tap [-h]
> {capabilities,runnable-run,runnable-run-recipe,task-run,task-run-recipe}
> ...
>
> nrunner application for executable tests that produce TAP
>
> positional arguments:
>   {capabilities,runnable-run,runnable-run-recipe,task-run,task-run-recipe}
> capabilitiesOutputs capabilities, including runnables and commands
> runnable-runRuns a runnable definition from arguments
> runnable-run-recipe
> Runs a runnable definition from a recipe
> task-runRuns a task from arguments
> task-run-recipe Runs a task from a recipe
>
> options:
>   -h, --helpshow this help message and exit
>

Never mind

$ ./tests/venv/bin/avocado run tests/avocado/version.py --tap -
1..1
ok 1 tests/avocado/version.py:Version.test_qmp_human_info_version

from https://avocado-framework.readthedocs.io/en/52.0/ResultFormats.html .



[PATCH] linux-user: Implement faccessat2

2022-10-08 Thread WANG Xuerui
User space has been preferring this syscall for a while, due to its
closer match with C semantics, and newer platforms such as LoongArch
apparently have libc implementations that don't fallback to faccessat
so normal access checks are failing without the emulation in place.

Tested by successfully emerging several packages within a Gentoo loong
stage3 chroot, emulated on amd64 with help of static qemu-loongarch64.

Reported-by: Andreas K. Hüttel 
Signed-off-by: WANG Xuerui 
---
 linux-user/strace.list | 3 +++
 linux-user/syscall.c   | 9 +
 2 files changed, 12 insertions(+)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index a87415bf3d..3df2184580 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -178,6 +178,9 @@
 #ifdef TARGET_NR_faccessat
 { TARGET_NR_faccessat, "faccessat" , NULL, print_faccessat, NULL },
 #endif
+#ifdef TARGET_NR_faccessat2
+{ TARGET_NR_faccessat2, "faccessat2" , NULL, print_faccessat, NULL },
+#endif
 #ifdef TARGET_NR_fadvise64
 { TARGET_NR_fadvise64, "fadvise64" , NULL, NULL, NULL },
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 2e954d8dbd..a81f0b65b9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9110,6 +9110,15 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 unlock_user(p, arg2, 0);
 return ret;
 #endif
+#if defined(TARGET_NR_faccessat2) && defined(__NR_faccessat2)
+case TARGET_NR_faccessat2:
+if (!(p = lock_user_string(arg2))) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(faccessat(arg1, p, arg3, arg4));
+unlock_user(p, arg2, 0);
+return ret;
+#endif
 #ifdef TARGET_NR_nice /* not on alpha */
 case TARGET_NR_nice:
 return get_errno(nice(arg1));
-- 
2.38.0




[PATCH V3 4/4] intel-iommu: PASID support

2022-10-08 Thread Jason Wang
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush
4) PASID based IOTLB invalidation

For simplicity PASID cache is not implemented so we can simply
implement the PASID cache flush as a no and leave it to be implemented
in the future. For PASID based IOTLB invalidation, since we haven't
had L1 stage support, the PASID based IOTLB invalidation is not
implemented yet. For PASID based device IOTLB invalidation, it
requires the support for vhost so we forbid enabling device IOTLB when
PASID is enabled now. Those work could be done in the future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address without PASID. vtd_find_add_as() has
been extended to provision address space with PASID which could be
utilized by the future extension of PCI core to allow device model to
use PASID based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work
3) future PRS and vSVA work

Signed-off-by: Jason Wang 
---
Changes since V2:
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors
---
 hw/i386/intel_iommu.c  | 415 +
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |   7 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 338 insertions(+), 104 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0d534c9e93..ba45029ee4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -58,6 +58,14 @@
 struct vtd_as_key {
 PCIBus *bus;
 uint8_t devfn;
+uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+uint16_t sid;
+uint32_t pasid;
+uint64_t gfn;
+uint32_t level;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -199,14 +207,24 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
 {
-return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+const struct vtd_iotlb_key *key1 = v1;
+const struct vtd_iotlb_key *key2 = v2;
+
+return key1->sid == key2->sid &&
+   key1->pasid == key2->pasid &&
+   key1->level == key2->level &&
+   key1->gfn == key2->gfn;
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_iotlb_hash(gconstpointer v)
 {
-return (guint)*(const uint64_t *)v;
+const struct vtd_iotlb_key *key = v;
+
+return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+   (key->level) << VTD_IOTLB_LVL_SHIFT |
+   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
 }
 
 static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
@@ -214,7 +232,8 @@ static gboolean vtd_as_equal(gconstpointer v1, 
gconstpointer v2)
 const struct vtd_as_key *key1 = v1;
 const struct vtd_as_key *key2 = v2;
 
-return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+   (key1->pasid == key2->pasid);
 }
 
 /*
@@ -302,13 +321,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-  uint32_t level)
-{
-return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-   ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -316,15 +328,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-   hwaddr addr)
+   hwaddr addr, uint32_t pasid)
 {
+struct vtd_iotlb_key key;
 VTDIOTLBEntry *entry;
-uint64_t key;
 int level;
 
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
-key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
-source_id, level);
+key.gfn = vtd_get_iotlb_gfn(addr, level);
+key.level = level;
+key.sid = source_id;
+key.pasid = pasid;
 entry = g_hash_table_lookup(s->iotlb, &key);
 if (entry) {
 goto out;
@@ -338,10 +352,11 @@ out:
 /* Must be with IOMMU lock held */
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16

[PATCH V3 2/4] intel-iommu: drop VTDBus

2022-10-08 Thread Jason Wang
We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.

To prevent being slower for past vtd_find_as_from_bus_num() callers, a
vtd_as cache indexed by the bus number is introduced to store the last
recent search result of a vtd_as belongs to a specific bus.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V2:
- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
---
 hw/i386/intel_iommu.c | 234 +-
 include/hw/i386/intel_iommu.h |  11 +-
 2 files changed, 118 insertions(+), 127 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3455e5d907..3bf28e7f47 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,16 @@
 } \
 }
 
+/*
+ * PCI bus number (or SID) is not reliable since the device is usaully
+ * initalized before guest can configure the PCI bridge
+ * (SECONDARY_BUS_NUMBER).
+ */
+struct vtd_as_key {
+PCIBus *bus;
+uint8_t devfn;
+};
+
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
@@ -210,6 +220,27 @@ static guint vtd_uint64_hash(gconstpointer v)
 return (guint)*(const uint64_t *)v;
 }
 
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_as_key *key1 = v1;
+const struct vtd_as_key *key2 = v2;
+
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
+
+/*
+ * Note that we use pointer to PCIBus as the key, so hashing/shifting
+ * based on the pointer value is intended. Note that we deal with
+ * collisions through vtd_as_equal().
+ */
+static guint vtd_as_hash(gconstpointer v)
+{
+const struct vtd_as_key *key = v;
+guint value = (guint)(uintptr_t)key->bus;
+
+return (guint)(value << 8 | key->devfn);
+}
+
 static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
   gpointer user_data)
 {
@@ -248,22 +279,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
 static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
 VTDAddressSpace *vtd_as;
-VTDBus *vtd_bus;
-GHashTableIter bus_it;
-uint32_t devfn_it;
+GHashTableIter as_it;
 
 trace_vtd_context_cache_reset();
 
-g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
+g_hash_table_iter_init(&as_it, s->vtd_address_spaces);
 
-while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-vtd_as = vtd_bus->dev_as[devfn_it];
-if (!vtd_as) {
-continue;
-}
-vtd_as->context_cache_entry.context_cache_gen = 0;
-}
+while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
+vtd_as->context_cache_entry.context_cache_gen = 0;
 }
 s->context_cache_gen = 1;
 }
@@ -993,32 +1016,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
uint32_t level)
 return slpte & rsvd_mask;
 }
 
-/* Find the VTD address space associated with a given bus number */
-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
-{
-VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
-GHashTableIter iter;
-
-if (vtd_bus) {
-return vtd_bus;
-}
-
-/*
- * Iterate over the registered buses to find the one which
- * currently holds this bus number and update the bus_num
- * lookup table.
- */
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-if (pci_bus_num(vtd_bus->bus) == bus_num) {
-s->vtd_as_by_bus_num[bus_num] = vtd_bus;
-return vtd_bus;
-}
-}
-
-return NULL;
-}
-
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -1634,24 +1631,13 @@ static bool vtd_switch_address_space(VTDAddressSpace 
*as)
 
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
+VTDAddressSpace *vtd_as;
 GHashTableIter iter;
-VTDBus *vtd_bus;
-int i;
-
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-for (i = 0; i < PCI_DEVFN_MAX; i++) {
-if (!vtd_bus->dev_as[i]) {
- 

Re: [PATCH v10 00/17] qapi: net: add unix socket type support to netdev backend

2022-10-08 Thread Jason Wang
On Thu, Oct 6, 2022 at 7:21 PM Michael S. Tsirkin  wrote:
>
> On Wed, Oct 05, 2022 at 06:20:34PM +0200, Laurent Vivier wrote:
> > "-netdev socket" only supports inet sockets.
> >
> > It's not a complex task to add support for unix sockets, but
> > the socket netdev parameters are not defined to manage well unix
> > socket parameters.
>
> Looks good.
>
> Acked-by: Michael S. Tsirkin 
>
> Belongs in Jason's tree.

I've queued this series.

Thanks


>
> > As discussed in:
> >
> >   "socket.c added support for unix domain socket datagram transport"
> >   
> > https://lore.kernel.org/qemu-devel/1c0e1bc5-904f-46b0-8044-68e43e67b...@gmail.com/
> >
> > This series adds support of unix socket type using SocketAddress QAPI 
> > structure.
> >
> > Two new netdev backends, "stream" and "dgram" are added, that are barely a 
> > copy of "socket"
> > backend but they use the SocketAddress QAPI to provide socket parameters.
> > And then they also implement unix sockets (TCP and UDP).
> >
> > Some examples of CLI syntax:
> >
> >   for TCP:
> >
> >   -netdev 
> > stream,id=socket0,addr.type=inet,addr.host=localhost,addr.port=1234
> >   -netdev 
> > stream,id=socket0,server=off,addr.type=inet,addr.host=localhost,addr.port=1234
> >
> >   -netdev dgram,id=socket0,\
> >   local.type=inet,local.host=localhost,local.port=1234,\
> >   remote.type=inet,remote.host=localhost,remote.port=1235
> >
> >   for UNIX:
> >
> >   -netdev stream,id=socket0,addr.type=unix,addr.path=/tmp/qemu0
> >   -netdev stream,id=socket0,server=off,addr.type=unix,addr.path=/tmp/qemu0
> >
> >   -netdev dgram,id=socket0,\
> >   local.type=unix,local.path=/tmp/qemu0,\
> >   remote.type=unix,remote.path=/tmp/qemu1
> >
> >   for FD:
> >
> >   -netdev stream,id=socket0,addr.type=fd,addr.str=4
> >   -netdev stream,id=socket0,server=off,addr.type=fd,addr.str=5
> >
> >   -netdev dgram,id=socket0,local.type=fd,addr.str=4
> >
> > v10:
> >   - add Red Hat copyright
> >   - initialize dgram_dst to NULL in SOCKET_ADDRESS_TYPE_FD
> >   - remove redundente _stream / _dgram in functions name
> >   - move net_dgram_init() into net_init_dgram()
> >   - address Thomas' comments on qtest
> >   - add a function qemu_set_info_str() to set info string
> >   - tested stream netdev with fd type using qrap/passt and
> > "-netdev stream,addr.type=fd,server=off,addr.str=5,id=netdev0"
> >
> > v9:
> >   - add events to report stream connection/disconnection
> >   - remove from net/dgram.c send_fn, listen_fd, net_dgram_accept()
> > net_dgram_connect() and net_dgram_send() that are only
> > needed by net/stream.c
> >   - remove from net/stream.c send_fn
> >   - add Red Hat copyright
> >   - add original net/socket.c Stefano's patch (EINVAL)
> >
> > v8:
> >   - test ipv4 and ipv6 parameters (stream inet)
> >   - test abstract parameter (stream unix)
> >   - add SocketAddressInet supported parameters in qemu-options.hx
> > (only stream, supported by the move to QIO)
> >   - with qio_channel_writev() replace (ret == -1 && errno == EAGAIN)
> > by (ret == QIO_CHANNEL_ERR_BLOCK)
> >
> > v7:
> >   - add qtests
> >   - update parameters table in net.json
> >   - update socket_uri() and socket_parse()
> >
> > v6:
> >   - s/netdev option/-netdev option/ PATCH 4
> >   - s/ / /
> >   - update @NetdevStreamOptions and @NetdevDgramOptions comments
> >   - update PATCH 4 description message
> >   - add missing return in error case for unix stream socket
> >   - split socket_uri() patch: move and rename, then change content
> >
> > v5:
> >   - remove RFC prefix
> >   - put the change of net_client_parse() into its own patch (exit() in the
> > function)
> >   - update comments regarding netdev_is_modern() and netdev_parse_modern()
> >   - update error case in net_stream_server_init()
> >   - update qemu-options.hx with unix type
> >   - fix HMP "info network" with unix protocol/server side.
> >
> > v4:
> >   - net_client_parse() fails with exit() rather than with return.
> >   - keep "{ 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' }" on its
> > own line in qapi/net.json
> >   - add a comment in qapi/net.json about parameters usage
> >   - move netdev_is_modern() check to qemu_init()
> >   - in netdev_is_modern(), check for JSON and use qemu_opts_do_parse()
> > to parse parameters and detect type value.
> >   - add a blank line after copyright comment
> >
> > v3:
> >   - remove support of "-net" for dgram and stream. They are only
> > supported with "-netdev" option.
> >   - use &error_fatal directly in net_client_inits()
> >   - update qemu-options.hx
> >   - move to QIO for stream socket
> >
> > v2:
> >   - use "stream" and "dgram" rather than "socket-ng,mode=stream"
> > and ""socket-ng,mode=dgram"
> >   - extract code to bypass qemu_opts_parse_noisily() to
> > a new patch
> >   - do not ignore EINVAL (Stefano)
> >   - fix "-net" option
> >
> > CC: Ralph Schmieder 
> > CC: Stefano Brivio 
> > CC: Daniel P. Berrangé 
> > CC: Markus A

[PATCH V3 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function

2022-10-08 Thread Jason Wang
We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
internal goto which prevents it from being reused. This patch convert
that macro to a dedicated function and let the caller to decide what
to do (e.g using goto or not). This makes sure it can be re-used for
other function that requires fault reporting.

Signed-off-by: Jason Wang 
---
Changes since V2:
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
---
 hw/i386/intel_iommu.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3bf28e7f47..0d534c9e93 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -49,17 +49,6 @@
 /* pe operations */
 #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
 #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & 
VTD_SM_PASID_ENTRY_AW))
-#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write) {\
-if (ret_fr) { \
-ret_fr = -ret_fr; \
-if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {   \
-trace_vtd_fault_disabled();   \
-} else {  \
-vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);  \
-} \
-goto error;   \
-} \
-}
 
 /*
  * PCI bus number (or SID) is not reliable since the device is usaully
@@ -1718,6 +1707,19 @@ out:
 trace_vtd_pt_enable_fast_path(source_id, success);
 }
 
+static void vtd_report_qualify_fault(IntelIOMMUState *s,
+ int err, bool is_fpd_set,
+ uint16_t source_id,
+ hwaddr addr,
+ bool is_write)
+{
+if (is_fpd_set && vtd_is_qualified_fault(err)) {
+trace_vtd_fault_disabled();
+} else {
+vtd_report_dmar_fault(s, source_id, addr, err, is_write);
+}
+}
+
 /* Map dev to context-entry then do a paging-structures walk to do a iommu
  * translation.
  *
@@ -1778,7 +1780,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
 if (!is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 }
 } else {
 ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
@@ -1786,7 +1792,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 if (!ret_fr && !is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
 }
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 /* Update context-cache */
 trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
   cc_entry->context_cache_gen,
@@ -1822,7 +1832,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 
 ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
&reads, &writes, s->aw_bits);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, source_id,
+ addr, is_write);
+goto error;
+}
 
 page_mask = vtd_slpt_level_page_mask(level);
 access_flags = IOMMU_ACCESS_FLAG(reads, writes);
-- 
2.25.1




[PATCH V3 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-10-08 Thread Jason Wang
We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 05d53a1aa9..3455e5d907 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
VTDContextEntry *ce)
 if (s->root_scalable) {
 ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
 if (ret) {
-error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-  __func__, ret);
+/*
+ * This error is guest triggerable. We should assumt PT
+ * not enabled for safety.
+ */
 return false;
 }
 return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
-- 
2.25.1




[PATCH V3 0/4] PASID support for Intel IOMMU

2022-10-08 Thread Jason Wang
Hi All:

This series tries to introduce PASID support for Intel IOMMU. The work
is based on the previous scalabe mode support by implement the
ECAP_PASID. A new "x-pasid-mode" is introduced to enable this
mode. All internal vIOMMU codes were extended to support PASID instead
of the current RID2PASID method. The code is also capable of
provisiong address space with PASID. Note that no devices can issue
PASID DMA right now, this needs future work.

This will be used for prototying PASID based device like virtio or
future vPASID support for Intel IOMMU.

Test has been done with the Linux guest with scalalbe mode enabled and
disabled. A virtio prototype[1][2] that can issue PAISD based DMA
request were also tested, different PASID were used in TX and RX in
those testing drivers.

Changes since V2:

- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors

Changes since V1:

- speed up IOMMU translation when RID2PASID is not used
- remove the unnecessary L1 PASID invalidation descriptor support
- adding support for cacthing the translation to interrupt range when
  in the case of PT and scalable mode
- refine the comments to explain the hash algorithm used in IOTLB
  lookups

Please review.

[1] https://github.com/jasowang/qemu.git virtio-pasid
[2] https://github.com/jasowang/linux.git virtio-pasid

Jason Wang (4):
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  intel-iommu: drop VTDBus
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: PASID support

 hw/i386/intel_iommu.c  | 685 ++---
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |  18 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 482 insertions(+), 241 deletions(-)

-- 
2.25.1




Re: [PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-08 Thread Jason Wang
On Sat, Oct 8, 2022 at 5:04 PM Si-Wei Liu  wrote:
>
> Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
> backend as another parameter to instantiate vhost-vdpa net client.
> This would benefit the use case where only open file descriptors, as
> opposed to raw vhost-vdpa device paths, are accessible from the QEMU
> process.
>
> (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1

Adding Cindy.

This has been discussed before, we've already had
vhostdev=/dev/fdset/$fd which should be functional equivalent to what
has been proposed here. (And this is how libvirt works if I understand
correctly).

Thanks

>
> Signed-off-by: Si-Wei Liu 
> Acked-by: Eugenio Pérez 
>
> ---
> v2:
>   - fixed typo in commit message
>   - s/fd's/file descriptors/
> ---
>  net/vhost-vdpa.c | 25 -
>  qapi/net.json|  3 +++
>  qemu-options.hx  |  6 --
>  3 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 182b3a1..366b070 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -683,14 +683,29 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> char *name,
>
>  assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>  opts = &netdev->u.vhost_vdpa;
> -if (!opts->vhostdev) {
> -error_setg(errp, "vdpa character device not specified with 
> vhostdev");
> +if (!opts->has_vhostdev && !opts->has_vhostfd) {
> +error_setg(errp,
> +   "vhost-vdpa: neither vhostdev= nor vhostfd= was 
> specified");
>  return -1;
>  }
>
> -vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
> -if (vdpa_device_fd == -1) {
> -return -errno;
> +if (opts->has_vhostdev && opts->has_vhostfd) {
> +error_setg(errp,
> +   "vhost-vdpa: vhostdev= and vhostfd= are mutually 
> exclusive");
> +return -1;
> +}
> +
> +if (opts->has_vhostdev) {
> +vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
> +if (vdpa_device_fd == -1) {
> +return -errno;
> +}
> +} else if (opts->has_vhostfd) {
> +vdpa_device_fd = monitor_fd_param(monitor_cur(), opts->vhostfd, 
> errp);
> +if (vdpa_device_fd == -1) {
> +error_prepend(errp, "vhost-vdpa: unable to parse vhostfd: ");
> +return -1;
> +}
>  }
>
>  r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
> diff --git a/qapi/net.json b/qapi/net.json
> index dd088c0..926ecc8 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -442,6 +442,8 @@
>  # @vhostdev: path of vhost-vdpa device
>  #(default:'/dev/vhost-vdpa-0')
>  #
> +# @vhostfd: file descriptor of an already opened vhost vdpa device
> +#
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #  (default: 1)
>  #
> @@ -456,6 +458,7 @@
>  { 'struct': 'NetdevVhostVDPAOptions',
>'data': {
>  '*vhostdev': 'str',
> +'*vhostfd':  'str',
>  '*queues':   'int',
>  '*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 913c71e..c040f74 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2774,8 +2774,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "configure a vhost-user network, backed by a chardev 
> 'dev'\n"
>  #endif
>  #ifdef __linux__
> -"-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
> +"-netdev vhost-vdpa,id=str[,vhostdev=/path/to/dev][,vhostfd=h]\n"
>  "configure a vhost-vdpa network,Establish a vhost-vdpa 
> netdev\n"
> +"use 'vhostdev=/path/to/dev' to open a vhost vdpa 
> device\n"
> +"use 'vhostfd=h' to connect to an already opened vhost 
> vdpa device\n"
>  #endif
>  #ifdef CONFIG_VMNET
>  "-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
> @@ -3280,7 +3282,7 @@ SRST
>   -netdev type=vhost-user,id=net0,chardev=chr0 \
>   -device virtio-net-pci,netdev=net0
>
> -``-netdev vhost-vdpa,vhostdev=/path/to/dev``
> +``-netdev vhost-vdpa[,vhostdev=/path/to/dev][,vhostfd=h]``
>  Establish a vhost-vdpa netdev.
>
>  vDPA device is a device that uses a datapath which complies with
> --
> 1.8.3.1
>




Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-10-08 Thread Ani Sinha
On Wed, Sep 28, 2022 at 1:14 PM Thomas Huth  wrote:
>
> On 28/09/2022 09.06, Daniel P. Berrangé wrote:
> > On Tue, Sep 27, 2022 at 06:09:22PM -0400, Michael S. Tsirkin wrote:
> >> On Tue, Sep 27, 2022 at 11:44:56PM +0200, Paolo Bonzini wrote:
> >>> I also second the idea of using avocado instead of pytest, by the way.
> >
> > snip
> >
> >> Problem is I don't think avocado is yet at the level where I can
> >> ask random developers to use it to check their ACPI patches.
> >>
> >> I just went ahead and rechecked and the situation isn't much better
> >> yet. I think the focus of avocado is system testing of full guests with
> >> KVM, not unit testing of ACPI.
> >>
> >> Let's start with installation on a clean box:
> >
> > ...snip...
> >
> > Do not do any of this stuff, it is irrelevant to QEMU's needs.
> > A developer using Avocado with QEMU does nothing more than:
> >
> >  make check-avocado
>
> Right. And if you want to run individual tests, you can also do it like this:
>
>  make check-venv   # Only for the first time
>  ./tests/venv/bin/avocado run tests/avocado/boot_linux.py

Ok this seems to work after I did a pip3 install of avocado in the host.

 ./tests/venv/bin/avocado run tests/avocado/version.py
JOB ID : 8dd90b1cb5baf3780cc764ca4a1ae838374a0a5f
JOB LOG: 
/home/anisinha/avocado/job-results/job-2022-10-09T10.48-8dd90b1/job.log
 (1/1) tests/avocado/version.py:Version.test_qmp_human_info_version:
PASS (0.04 s)
RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
| CANCEL 0
JOB TIME   : 3.51 s

I see that the output is not tap compliant like the qtests tests are.
how do I choose tap?

 ./tests/venv/bin/avocado-runner-tap --help
usage: avocado-runner-tap [-h]
{capabilities,runnable-run,runnable-run-recipe,task-run,task-run-recipe}
...

nrunner application for executable tests that produce TAP

positional arguments:
  {capabilities,runnable-run,runnable-run-recipe,task-run,task-run-recipe}
capabilitiesOutputs capabilities, including runnables and commands
runnable-runRuns a runnable definition from arguments
runnable-run-recipe
Runs a runnable definition from a recipe
task-runRuns a task from arguments
task-run-recipe Runs a task from a recipe

options:
  -h, --helpshow this help message and exit


>
> Or run tests via tags (very convenient for maintainers):
>
> ./tests/venv/bin/avocado run -t arch:s390x tests/avocado/
>
>   HTH,
>Thomas
>



Re: [PATCH v2 09/13] hw/ppc/e500: Implement pflash handling

2022-10-08 Thread Bin Meng
On Tue, Oct 4, 2022 at 5:40 AM Bernhard Beschow  wrote:
>
> Allows e500 boards to have their root file system reside on flash using
> only builtin devices located in the eLBC memory region.
>
> Note that the flash memory area is only created when a -pflash argument is
> given, and that the size is determined by the given file. The idea is to
> put users into control.
>
> Signed-off-by: Bernhard Beschow 
> ---
>  docs/system/ppc/ppce500.rst | 12 ++
>  hw/ppc/Kconfig  |  1 +
>  hw/ppc/e500.c   | 76 +
>  3 files changed, 89 insertions(+)
>
> diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
> index ba6bcb7314..1ed6c36599 100644
> --- a/docs/system/ppc/ppce500.rst
> +++ b/docs/system/ppc/ppce500.rst
> @@ -119,6 +119,18 @@ To boot the 32-bit Linux kernel:
>-initrd /path/to/rootfs.cpio \
>-append "root=/dev/ram"
>
> +Rather than using a root file system on ram disk, it is possible to have it 
> on
> +emulated flash. Given an ext2 image whose size must be a power of two, it can
> +be used as follows:
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-ppc{64|32} -M ppce500 -cpu e500mc -smp 4 -m 2G \
> +  -display none -serial stdio \
> +  -kernel vmlinux \
> +  -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
> +  -append "rootwait root=/dev/mtdblock0"

Could we add a separate sub-section "pflash" after the "networking"
part you did before?

> +
>  Running U-Boot
>  --
>
> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
> index 791fe78a50..769a1ead1c 100644
> --- a/hw/ppc/Kconfig
> +++ b/hw/ppc/Kconfig
> @@ -126,6 +126,7 @@ config E500
>  select ETSEC
>  select GPIO_MPC8XXX
>  select OPENPIC
> +select PFLASH_CFI01
>  select PLATFORM_BUS
>  select PPCE500_PCI
>  select SERIAL
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index 3e950ea3ba..2b1430fca4 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -23,8 +23,10 @@
>  #include "e500-ccsr.h"
>  #include "net/net.h"
>  #include "qemu/config-file.h"
> +#include "hw/block/flash.h"
>  #include "hw/char/serial.h"
>  #include "hw/pci/pci.h"
> +#include "sysemu/block-backend-io.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/reset.h"
> @@ -267,6 +269,31 @@ static void sysbus_device_create_devtree(SysBusDevice 
> *sbdev, void *opaque)
>  }
>  }
>
> +static void create_devtree_flash(SysBusDevice *sbdev,
> + PlatformDevtreeData *data)
> +{
> +g_autofree char *name = NULL;
> +uint64_t num_blocks = object_property_get_uint(OBJECT(sbdev),
> +   "num-blocks",
> +   &error_fatal);
> +uint64_t sector_length = object_property_get_uint(OBJECT(sbdev),
> +  "sector-length",
> +  &error_fatal);
> +uint64_t bank_width = object_property_get_uint(OBJECT(sbdev),
> +   "width",
> +   &error_fatal);
> +hwaddr flashbase = 0;
> +hwaddr flashsize = num_blocks * sector_length;
> +void *fdt = data->fdt;
> +
> +name = g_strdup_printf("%s/nor@%" PRIx64, data->node, flashbase);
> +qemu_fdt_add_subnode(fdt, name);
> +qemu_fdt_setprop_string(fdt, name, "compatible", "cfi-flash");
> +qemu_fdt_setprop_sized_cells(fdt, name, "reg",
> + 1, flashbase, 1, flashsize);
> +qemu_fdt_setprop_cell(fdt, name, "bank-width", bank_width);
> +}
> +
>  static void platform_bus_create_devtree(PPCE500MachineState *pms,
>  void *fdt, const char *mpic)
>  {
> @@ -276,6 +303,8 @@ static void 
> platform_bus_create_devtree(PPCE500MachineState *pms,
>  uint64_t addr = pmc->platform_bus_base;
>  uint64_t size = pmc->platform_bus_size;
>  int irq_start = pmc->platform_bus_first_irq;
> +SysBusDevice *sbdev;
> +bool ambiguous;
>
>  /* Create a /platform node that we can put all devices into */
>
> @@ -302,6 +331,13 @@ static void 
> platform_bus_create_devtree(PPCE500MachineState *pms,
>  /* Loop through all dynamic sysbus devices and create nodes for them */
>  foreach_dynamic_sysbus_device(sysbus_device_create_devtree, &data);
>
> +sbdev = SYS_BUS_DEVICE(object_resolve_path_type("", TYPE_PFLASH_CFI01,
> +&ambiguous));
> +if (sbdev) {
> +assert(!ambiguous);
> +create_devtree_flash(sbdev, &data);
> +}
> +
>  g_free(node);
>  }
>
> @@ -856,6 +892,7 @@ void ppce500_init(MachineState *machine)
>  unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4};
>  IrqLines *irqs;
>  DeviceState *dev, *mpicdev;
> +DriveInfo *dinfo;
>  CPUPPCState *firstenv = NULL;
>  Me

Re: [PATCH v2 00/13] ppc/e500: Add support for two types of flash, cleanup

2022-10-08 Thread Bin Meng
On Sun, Oct 9, 2022 at 12:11 AM Bernhard Beschow  wrote:
>
> Am 4. Oktober 2022 12:43:35 UTC schrieb Daniel Henrique Barboza 
> :
> >Hey,
> >
> >On 10/3/22 18:27, Philippe Mathieu-Daudé wrote:
> >> Hi Daniel,
> >>
> >> On 3/10/22 22:31, Bernhard Beschow wrote:
> >>> Cover letter:
> >>> ~
> >>>
> >>> This series adds support for -pflash and direct SD card access to the
> >>> PPC e500 boards. The idea is to increase compatibility with "real" 
> >>> firmware
> >>> images where only the bare minimum of drivers is compiled in.
> >>
> >>> Bernhard Beschow (13):
> >>>hw/ppc/meson: Allow e500 boards to be enabled separately
> >>>hw/gpio/meson: Introduce dedicated config switch for hw/gpio/mpc8xxx
> >>>docs/system/ppc/ppce500: Add heading for networking chapter
> >>>hw/ppc/e500: Reduce usage of sysbus API
> >>>hw/ppc/mpc8544ds: Rename wrongly named method
> >>>hw/ppc/mpc8544ds: Add platform bus
> >>>hw/ppc/e500: Remove if statement which is now always true
> >>
> >> This first part is mostly reviewed and can already go via your
> >> ppc-next queue.
> >
> >We're missing an ACK in patch 6/13:
> >
> >hw/ppc/mpc8544ds: Add platform bus
>
> Bin: Ping?
>

Sorry for the delay. I have provided the R-b to this patch.

Regards,
Bin



Re: [PATCH v2 06/13] hw/ppc/mpc8544ds: Add platform bus

2022-10-08 Thread Bin Meng
On Tue, Oct 4, 2022 at 5:22 AM Bernhard Beschow  wrote:
>
> Models the real device more closely.
>
> Address and size values are taken from mpc8544.dts from the linux-5.17.7
> tree. The IRQ range is taken from e500plat.c.
>
> Signed-off-by: Bernhard Beschow 
> ---
>  hw/ppc/mpc8544ds.c | 6 ++
>  1 file changed, 6 insertions(+)
>

Reviewed-by: Bin Meng 



Re: [PATCH v2 2/2] virtio-blk: add zoned storage emulation for zoned devices

2022-10-08 Thread Sam Li
Sam Li  于2022年10月9日周日 09:54写道:
>
> Stefan Hajnoczi  于2022年10月6日周四 23:04写道:
> >
> > On Thu, Sep 29, 2022 at 05:48:21PM +0800, Sam Li wrote:
> > > This patch extends virtio-blk emulation to handle zoned device commands
> > > by calling the new block layer APIs to perform zoned device I/O on
> > > behalf of the guest. It supports Report Zone, four zone oparations (open,
> > > close, finish, reset), and Append Zone.
> > >
> > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > > support zoned block devices. Regular block devices(conventional zones)
> > > will not be set.
> > >
> > > The guest os having zoned device support can use blkzone(8) to test those
> > > commands. Furthermore, using zonefs to test zone append write is also
> > > supported.
> > >
> > > Signed-off-by: Sam Li 
> > > ---
> > >  hw/block/virtio-blk.c | 393 ++
> > >  1 file changed, 393 insertions(+)
> > >
> > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > > index e9ba752f6b..1c2535bfeb 100644
> > > --- a/hw/block/virtio-blk.c
> > > +++ b/hw/block/virtio-blk.c
> > > @@ -26,6 +26,9 @@
> > >  #include "hw/virtio/virtio-blk.h"
> > >  #include "dataplane/virtio-blk.h"
> > >  #include "scsi/constants.h"
> > > +#if defined(CONFIG_BLKZONED)
> > > +#include 
> > > +#endif
> >
> > Why is this Linux-specific header file included? The virtio-blk
> > emulation code should only use QEMU block layer APIs, not Linux APIs.
> >
> > >  #ifdef __linux__
> > >  # include 
> > >  #endif
> > > @@ -46,6 +49,8 @@ static const VirtIOFeature feature_sizes[] = {
> > >   .end = endof(struct virtio_blk_config, discard_sector_alignment)},
> > >  {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
> > >   .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
> > > +{.flags = 1ULL << VIRTIO_BLK_F_ZONED,
> > > + .end = endof(struct virtio_blk_config, zoned)},
> > >  {}
> > >  };
> > >
> > > @@ -614,6 +619,340 @@ err:
> > >  return err_status;
> > >  }
> > >
> > > +typedef struct ZoneCmdData {
> > > +VirtIOBlockReq *req;
> > > +union {
> > > +struct {
> > > +unsigned int nr_zones;
> > > +BlockZoneDescriptor *zones;
> > > +} zone_report_data;
> > > +struct {
> > > +int64_t append_sector;
> > > +} zone_append_data;
> > > +};
> > > +} ZoneCmdData;
> > > +
> > > +/*
> > > + * check zoned_request: error checking before issuing requests. If all 
> > > checks
> > > + * passed, return true.
> > > + * append: true if only zone append requests issued.
> > > + */
> > > +static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t 
> > > len,
> > > + bool append, uint8_t *status) {
> > > +BlockDriverState *bs = blk_bs(s->blk);
> > > +int index = offset / bs->bl.zone_size;
> >
> > This function doesn't check that offset+len is in the same zone as
> > offset. Maybe that's correct because some request types allow [offset,
> > offset+len) to cross zones?
>
> Yes, zone_mgmt requests should allow that.
>
> >
> > > +
> > > +if (offset < 0 || offset + len > bs->bl.capacity) {
> >
> > Other cases that are not checked:
> > 1. len < 0
> > 2. offset >= bs->bl.capacity
> > 3. len > bs->bl.capacity - offset (catches integer overflow)
> >
> > It may be possible to combine these cases, but be careful about integer
> > overflow.
>
> Right. Combining above cases:
>
> if (offset < 0 || len < 0 || offset > cap - len)
>
> offset > cap - len can cover for  #2, #3 cases because any offset that
> is greater than cap-len is invalid must be also invalid when it's
> greater than cap.
>
> >
> > > +*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> > > +return false;
> > > +}
> > > +
> > > +if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
> > > +*status = VIRTIO_BLK_S_UNSUPP;
> > > +return false;
> > > +}
> > > +
> > > +if (append) {
> > > +if ((offset % bs->bl.write_granularity) != 0) {
> > > +*status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
> > > +return false;
> > > +}
> > > +
> > > +if (!BDRV_ZT_IS_SWR(bs->bl.wps->wp[index])) {
> > > +*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> > > +return false;
> > > +}
> >
> > Where does the virtio-blk zone spec say that only SWR zones allow zone
> > append commands? Should it work for SWP zones too?
>
> The spec says not. But it should work for SWP zones too. I'll change
> this to check conventional zones instead.
>
> +If the zone specified by the VIRTIO_BLK_T_ZONE_APPEND request is not
> a SWR zone,
> +then the request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD
> +\field{status}.
>
> >
> > > +
> > > +if (len / 512 > bs->bl.max_append_sectors) {
> > > +if (bs->bl.max_append_sectors == 0) {
> > > +*status = VIRTIO_BLK_S_UNSUPP;
> > > +} else {
> > > +  

Re: [PATCH 04/11] hw/ppc/mpc8544ds: Add platform bus

2022-10-08 Thread Bin Meng
Hi Bernhard,

On Sat, Sep 17, 2022 at 1:19 AM Bernhard Beschow  wrote:
>
> Am 16. September 2022 06:15:53 UTC schrieb Bin Meng :
> >On Thu, Sep 15, 2022 at 11:29 PM Bernhard Beschow  wrote:
> >>
> >> Models the real device more closely.
> >
> >Please describe the source (e.g.: I assume it's MPC8544DS board manual
> >or something like that?) that describe such memory map for the
> >platform bus.
> >
> >Is this the eLBC bus range that includes the NOR flash device?
>
> Good point. My numbers come from a different board. I'll fix them according 
> to the  mpc8544ds.dts in the Linux tree.
>
> This will leave an eLBC memory window of just 8MB while my proprietary load 
> needs 64MB. My proprietary load doesn't seem to have 64 bit physical memory 
> support so I can't use e500plat either. Any suggestions?
>

Currently QEMU does not model the eLBC registers so these memory
regions have to be hardcoded, unfortunately. Once we support eLBC
memory map completely I think we can remove such limitations by having
QEMU dynamically create the memory map per programmed values.

I guess you have to create another machine for your board at this point.

Regards,
Bin



Re: [PATCH v2 05/13] hw/ppc/mpc8544ds: Rename wrongly named method

2022-10-08 Thread Bin Meng
On Tue, Oct 4, 2022 at 5:15 AM Bernhard Beschow  wrote:
>
> Signed-off-by: Bernhard Beschow 
> ---
>  hw/ppc/mpc8544ds.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

Reviewed-by: Bin Meng 



Re: [PATCH v2 04/13] hw/ppc/e500: Reduce usage of sysbus API

2022-10-08 Thread Bin Meng
On Tue, Oct 4, 2022 at 5:24 AM Bernhard Beschow  wrote:
>
> PlatformBusDevice has an mmio attribute which gets aliased to
> SysBusDevice::mmio[0]. So PlatformbusDevice::mmio can be used directly,
> avoiding the sysbus API.
>
> Signed-off-by: Bernhard Beschow 
> ---
>  hw/ppc/e500.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>

Reviewed-by: Bin Meng 



Re: [PATCH v2 2/2] virtio-blk: add zoned storage emulation for zoned devices

2022-10-08 Thread Sam Li
Stefan Hajnoczi  于2022年10月6日周四 23:04写道:
>
> On Thu, Sep 29, 2022 at 05:48:21PM +0800, Sam Li wrote:
> > This patch extends virtio-blk emulation to handle zoned device commands
> > by calling the new block layer APIs to perform zoned device I/O on
> > behalf of the guest. It supports Report Zone, four zone oparations (open,
> > close, finish, reset), and Append Zone.
> >
> > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > support zoned block devices. Regular block devices(conventional zones)
> > will not be set.
> >
> > The guest os having zoned device support can use blkzone(8) to test those
> > commands. Furthermore, using zonefs to test zone append write is also
> > supported.
> >
> > Signed-off-by: Sam Li 
> > ---
> >  hw/block/virtio-blk.c | 393 ++
> >  1 file changed, 393 insertions(+)
> >
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index e9ba752f6b..1c2535bfeb 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -26,6 +26,9 @@
> >  #include "hw/virtio/virtio-blk.h"
> >  #include "dataplane/virtio-blk.h"
> >  #include "scsi/constants.h"
> > +#if defined(CONFIG_BLKZONED)
> > +#include 
> > +#endif
>
> Why is this Linux-specific header file included? The virtio-blk
> emulation code should only use QEMU block layer APIs, not Linux APIs.
>
> >  #ifdef __linux__
> >  # include 
> >  #endif
> > @@ -46,6 +49,8 @@ static const VirtIOFeature feature_sizes[] = {
> >   .end = endof(struct virtio_blk_config, discard_sector_alignment)},
> >  {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
> >   .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
> > +{.flags = 1ULL << VIRTIO_BLK_F_ZONED,
> > + .end = endof(struct virtio_blk_config, zoned)},
> >  {}
> >  };
> >
> > @@ -614,6 +619,340 @@ err:
> >  return err_status;
> >  }
> >
> > +typedef struct ZoneCmdData {
> > +VirtIOBlockReq *req;
> > +union {
> > +struct {
> > +unsigned int nr_zones;
> > +BlockZoneDescriptor *zones;
> > +} zone_report_data;
> > +struct {
> > +int64_t append_sector;
> > +} zone_append_data;
> > +};
> > +} ZoneCmdData;
> > +
> > +/*
> > + * check zoned_request: error checking before issuing requests. If all 
> > checks
> > + * passed, return true.
> > + * append: true if only zone append requests issued.
> > + */
> > +static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t 
> > len,
> > + bool append, uint8_t *status) {
> > +BlockDriverState *bs = blk_bs(s->blk);
> > +int index = offset / bs->bl.zone_size;
>
> This function doesn't check that offset+len is in the same zone as
> offset. Maybe that's correct because some request types allow [offset,
> offset+len) to cross zones?

Yes, zone_mgmt requests should allow that.

>
> > +
> > +if (offset < 0 || offset + len > bs->bl.capacity) {
>
> Other cases that are not checked:
> 1. len < 0
> 2. offset >= bs->bl.capacity
> 3. len > bs->bl.capacity - offset (catches integer overflow)
>
> It may be possible to combine these cases, but be careful about integer
> overflow.

Right. Combining above cases:

if (offset < 0 || len < 0 || offset > cap - len)

offset > cap - len can cover for  #2, #3 cases because any offset that
is greater than cap-len is invalid must be also invalid when it's
greater than cap.

>
> > +*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> > +return false;
> > +}
> > +
> > +if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
> > +*status = VIRTIO_BLK_S_UNSUPP;
> > +return false;
> > +}
> > +
> > +if (append) {
> > +if ((offset % bs->bl.write_granularity) != 0) {
> > +*status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
> > +return false;
> > +}
> > +
> > +if (!BDRV_ZT_IS_SWR(bs->bl.wps->wp[index])) {
> > +*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> > +return false;
> > +}
>
> Where does the virtio-blk zone spec say that only SWR zones allow zone
> append commands? Should it work for SWP zones too?

The spec says not. But it should work for SWP zones too. I'll change
this to check conventional zones instead.

+If the zone specified by the VIRTIO_BLK_T_ZONE_APPEND request is not
a SWR zone,
+then the request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD
+\field{status}.

>
> > +
> > +if (len / 512 > bs->bl.max_append_sectors) {
> > +if (bs->bl.max_append_sectors == 0) {
> > +*status = VIRTIO_BLK_S_UNSUPP;
> > +} else {
> > +*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> > +}
> > +return false;
> > +}
> > +}
> > +return true;
> > +}
> > +
> > +static void virtio_blk_zone_report_complete(void *opaque, int ret)
> > +{
> > +ZoneCmdData *data = opaque;
> > +VirtIOBlockReq *r

Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-08 Thread Jarkko Sakkinen
On Sat, Oct 08, 2022 at 07:15:17PM +0300, Jarkko Sakkinen wrote:
> On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote:
> > On Fri, Oct 07, 2022 at 02:58:54PM +, Sean Christopherson wrote:
> > > On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:
> > > > On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote:
> > > > > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
> > > > > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
> > > > > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
> > > > > > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, 
> > > > > > > > adds two
> > > > > > > > additional KVM memslot fields private_fd/private_offset to allow
> > > > > > > > userspace to specify that guest private memory provided from the
> > > > > > > > private_fd and guest_phys_addr mapped at the private_offset of 
> > > > > > > > the
> > > > > > > > private_fd, spanning a range of memory_size.
> > > > > > > > 
> > > > > > > > The extended memslot can still have the userspace_addr(hva). 
> > > > > > > > When use, a
> > > > > > > > single memslot can maintain both private memory through private
> > > > > > > > fd(private_fd/private_offset) and shared memory through
> > > > > > > > hva(userspace_addr). Whether the private or shared part is 
> > > > > > > > visible to
> > > > > > > > guest is maintained by other KVM code.
> > > > > > > 
> > > > > > > What is anyway the appeal of private_offset field, instead of 
> > > > > > > having just
> > > > > > > 1:1 association between regions and files, i.e. one memfd per 
> > > > > > > region?
> > > > > 
> > > > > Modifying memslots is slow, both in KVM and in QEMU (not sure about 
> > > > > Google's VMM).
> > > > > E.g. if a vCPU converts a single page, it will be forced to wait 
> > > > > until all other
> > > > > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is 
> > > > > faulting in
> > > > > memory.  KVM's memslot updates also hold a mutex for the entire 
> > > > > duration of the
> > > > > update, i.e. conversions on different vCPUs would be fully 
> > > > > serialized, exacerbating
> > > > > the SRCU problem.
> > > > > 
> > > > > KVM also has historical baggage where it "needs" to zap _all_ SPTEs 
> > > > > when any
> > > > > memslot is deleted.
> > > > > 
> > > > > Taking both a private_fd and a shared userspace address allows 
> > > > > userspace to convert
> > > > > between private and shared without having to manipulate memslots.
> > > > 
> > > > Right, this was really good explanation, thank you.
> > > > 
> > > > Still wondering could this possibly work (or not):
> > > > 
> > > > 1. Union userspace_addr and private_fd.
> > > 
> > > No, because userspace needs to be able to provide both userspace_addr 
> > > (shared
> > > memory) and private_fd (private memory) for a single memslot.
> > 
> > Got it, thanks for clearing my misunderstandings on this topic, and it
> > is quite obviously visible in 5/8 and 7/8. I.e. if I got it right,
> > memblock can be partially private, and you dig the shared holes with
> > KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock
> > per host mmap, I was looking into this dilated by that mindset but makes
> > definitely sense to support that.
> 
> For me the most useful reference with this feature is kvm_set_phys_mem()
> implementation in privmem-v8 branch. Took while to find it because I did
> not have much experience with QEMU code base. I'd even recommend to mention
> that function in the cover letter because it is really good reference on
> how this feature is supposed to be used.

While learning QEMU code, I also noticed bunch of comparison like this:

if (slot->flags | KVM_MEM_PRIVATE)

I guess those could be just replaced with unconditional fills as it does
not do any harm, if KVM_MEM_PRIVATE is not set.

BR, Jarkko



Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-08 Thread Jarkko Sakkinen
On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote:
> On Fri, Oct 07, 2022 at 02:58:54PM +, Sean Christopherson wrote:
> > On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:
> > > On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote:
> > > > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
> > > > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
> > > > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
> > > > > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, 
> > > > > > > adds two
> > > > > > > additional KVM memslot fields private_fd/private_offset to allow
> > > > > > > userspace to specify that guest private memory provided from the
> > > > > > > private_fd and guest_phys_addr mapped at the private_offset of the
> > > > > > > private_fd, spanning a range of memory_size.
> > > > > > > 
> > > > > > > The extended memslot can still have the userspace_addr(hva). When 
> > > > > > > use, a
> > > > > > > single memslot can maintain both private memory through private
> > > > > > > fd(private_fd/private_offset) and shared memory through
> > > > > > > hva(userspace_addr). Whether the private or shared part is 
> > > > > > > visible to
> > > > > > > guest is maintained by other KVM code.
> > > > > > 
> > > > > > What is anyway the appeal of private_offset field, instead of 
> > > > > > having just
> > > > > > 1:1 association between regions and files, i.e. one memfd per 
> > > > > > region?
> > > > 
> > > > Modifying memslots is slow, both in KVM and in QEMU (not sure about 
> > > > Google's VMM).
> > > > E.g. if a vCPU converts a single page, it will be forced to wait until 
> > > > all other
> > > > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is 
> > > > faulting in
> > > > memory.  KVM's memslot updates also hold a mutex for the entire 
> > > > duration of the
> > > > update, i.e. conversions on different vCPUs would be fully serialized, 
> > > > exacerbating
> > > > the SRCU problem.
> > > > 
> > > > KVM also has historical baggage where it "needs" to zap _all_ SPTEs 
> > > > when any
> > > > memslot is deleted.
> > > > 
> > > > Taking both a private_fd and a shared userspace address allows 
> > > > userspace to convert
> > > > between private and shared without having to manipulate memslots.
> > > 
> > > Right, this was really good explanation, thank you.
> > > 
> > > Still wondering could this possibly work (or not):
> > > 
> > > 1. Union userspace_addr and private_fd.
> > 
> > No, because userspace needs to be able to provide both userspace_addr 
> > (shared
> > memory) and private_fd (private memory) for a single memslot.
> 
> Got it, thanks for clearing my misunderstandings on this topic, and it
> is quite obviously visible in 5/8 and 7/8. I.e. if I got it right,
> memblock can be partially private, and you dig the shared holes with
> KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock
> per host mmap, I was looking into this dilated by that mindset but makes
> definitely sense to support that.

For me the most useful reference with this feature is kvm_set_phys_mem()
implementation in privmem-v8 branch. Took while to find it because I did
not have much experience with QEMU code base. I'd even recommend to mention
that function in the cover letter because it is really good reference on
how this feature is supposed to be used.

BR, Jarkko



Re: [PATCH v2 00/13] ppc/e500: Add support for two types of flash, cleanup

2022-10-08 Thread Bernhard Beschow
Am 4. Oktober 2022 12:43:35 UTC schrieb Daniel Henrique Barboza 
:
>Hey,
>
>On 10/3/22 18:27, Philippe Mathieu-Daudé wrote:
>> Hi Daniel,
>> 
>> On 3/10/22 22:31, Bernhard Beschow wrote:
>>> Cover letter:
>>> ~
>>> 
>>> This series adds support for -pflash and direct SD card access to the
>>> PPC e500 boards. The idea is to increase compatibility with "real" firmware
>>> images where only the bare minimum of drivers is compiled in.
>> 
>>> Bernhard Beschow (13):
>>>    hw/ppc/meson: Allow e500 boards to be enabled separately
>>>    hw/gpio/meson: Introduce dedicated config switch for hw/gpio/mpc8xxx
>>>    docs/system/ppc/ppce500: Add heading for networking chapter
>>>    hw/ppc/e500: Reduce usage of sysbus API
>>>    hw/ppc/mpc8544ds: Rename wrongly named method
>>>    hw/ppc/mpc8544ds: Add platform bus
>>>    hw/ppc/e500: Remove if statement which is now always true
>> 
>> This first part is mostly reviewed and can already go via your
>> ppc-next queue.
>
>We're missing an ACK in patch 6/13:
>
>hw/ppc/mpc8544ds: Add platform bus

Bin: Ping?

Best regards,
Bernhard
>
>I'll need some time to understand what's been doing there to provide my own
>R-b. Or you can toss a R-b there :D
>
>
>Thanks,
>
>
>Daniel
>
>
>
>> 
>>>    hw/block/pflash_cfi01: Error out if device length isn't a power of two
>>>    hw/ppc/e500: Implement pflash handling
>>>    hw/sd/sdhci-internal: Unexport ESDHC defines
>>>    hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
>>>    hw/sd/sdhci: Implement Freescale eSDHC device model
>>>    hw/ppc/e500: Add Freescale eSDHC to e500 boards
>> 
>> This second part still need work. I can take it via the sdmmc-next
>> queue.
>> 
>> Regards,
>> 
>> Phil.




Re: [PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX

2022-10-08 Thread Richard Henderson

On 10/8/22 08:36, Richard Henderson wrote:

Intel has now given guarantees about the atomicity of SSE read
and write instructions on cpus supporting AVX.  We can use these
instead of the much slower cmpxchg16b.

Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Signed-off-by: Richard Henderson 
---

Paolo, we probably ought to modify gen_ld[oy]_env_A0 to match,
at least with CF_PARALLEL set.


Or, rather, just gen_ldo/sto.
Curiously, there are no guarantees at all for

  vmovdqa mem, %ymmN


r~



[PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX

2022-10-08 Thread Richard Henderson
Intel has now given guarantees about the atomicity of SSE read
and write instructions on cpus supporting AVX.  We can use these
instead of the much slower cmpxchg16b.

Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Signed-off-by: Richard Henderson 
---

Paolo, we probably ought to modify gen_ld[oy]_env_A0 to match,
at least with CF_PARALLEL set.


r~

---
 include/qemu/atomic128.h | 44 ++
 util/atomic128.c | 67 
 util/meson.build |  1 +
 3 files changed, 112 insertions(+)
 create mode 100644 util/atomic128.c

diff --git a/include/qemu/atomic128.h b/include/qemu/atomic128.h
index adb9a1a260..d179c05ede 100644
--- a/include/qemu/atomic128.h
+++ b/include/qemu/atomic128.h
@@ -127,6 +127,50 @@ static inline void atomic16_set(Int128 *ptr, Int128 val)
 : [l] "r"(l), [h] "r"(h));
 }
 
+# define HAVE_ATOMIC128 1
+#elif !defined(CONFIG_USER_ONLY) && defined(__x86_64__)
+/*
+ * The latest Intel SDM has added:
+ * Processors that enumerate support for Intel® AVX (by setting
+ * the feature flag CPUID.01H:ECX.AVX[bit 28]) guarantee that the
+ * 16-byte memory operations performed by the following instructions
+ * will always be carried out atomically:
+ *  - MOVAPD, MOVAPS, and MOVDQA.
+ *  - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
+ *  - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded
+ *with EVEX.128 and k0 (masking disabled).
+ *Note that these instructions require the linear addresses of their
+ *memory operands to be 16-byte aligned.
+ *
+ * We do not yet have a similar guarantee from AMD, so we detect this
+ * at runtime rather than assuming the fact when __AVX__ is defined.
+ */
+extern bool have_atomic128;
+
+static inline Int128 atomic16_read(Int128 *ptr)
+{
+Int128 ret;
+if (have_atomic128) {
+asm("vmovdqa %1, %0" : "=x" (ret) : "m" (*ptr));
+} else {
+ret = atomic16_cmpxchg(ptr, 0, 0);
+}
+return ret;
+}
+
+static inline void atomic16_set(Int128 *ptr, Int128 val)
+{
+if (have_atomic128) {
+asm("vmovdqa %1, %0" : "=m" (*ptr) : "x" (val));
+} else {
+Int128 old = *ptr, cmp;
+do {
+cmp = old;
+old = atomic16_cmpxchg(ptr, cmp, val);
+} while (old != cmp);
+}
+}
+
 # define HAVE_ATOMIC128 1
 #elif !defined(CONFIG_USER_ONLY) && HAVE_CMPXCHG128
 static inline Int128 atomic16_read(Int128 *ptr)
diff --git a/util/atomic128.c b/util/atomic128.c
new file mode 100644
index 00..55863ce9bd
--- /dev/null
+++ b/util/atomic128.c
@@ -0,0 +1,67 @@
+/*
+ * Copyright (C) 2022, Linaro Ltd.
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/atomic128.h"
+
+#ifdef __x86_64__
+#include "qemu/cpuid.h"
+
+#ifndef signature_INTEL_ecx
+/* "Genu ineI ntel" */
+#define signature_INTEL_ebx 0x756e6547
+#define signature_INTEL_edx 0x49656e69
+#define signature_INTEL_ecx 0x6c65746e
+#endif
+
+/*
+ * The latest Intel SDM has added:
+ * Processors that enumerate support for Intel® AVX (by setting
+ * the feature flag CPUID.01H:ECX.AVX[bit 28]) guarantee that the
+ * 16-byte memory operations performed by the following instructions
+ * will always be carried out atomically:
+ *  - MOVAPD, MOVAPS, and MOVDQA.
+ *  - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
+ *  - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded
+ *with EVEX.128 and k0 (masking disabled).
+ *Note that these instructions require the linear addresses of their
+ *memory operands to be 16-byte aligned.
+ *
+ * We do not yet have a similar guarantee from AMD, so we detect this
+ * at runtime rather than assuming the fact when __AVX__ is defined.
+ */
+bool have_atomic128;
+
+static void __attribute__((constructor))
+init_have_atomic128(void)
+{
+unsigned int a, b, c, d, xcrl, xcrh;
+
+__cpuid(0, a, b, c, d);
+if (a < 1) {
+return; /* AVX leaf not present */
+}
+if (c != signature_INTEL_ecx) {
+return; /* Not an Intel product */
+}
+
+__cpuid(1, a, b, c, d);
+if ((c & (bit_AVX | bit_OSXSAVE)) != (bit_AVX | bit_OSXSAVE)) {
+return; /* AVX not present or XSAVE not enabled by OS */
+}
+
+/*
+ * The xgetbv instruction is not available to older versions of
+ * the assembler, so we encode the instruction manually.
+ */
+asm(".byte 0x0f, 0x01, 0xd0" : "=a" (xcrl), "=d" (xcrh) : "c" (0));
+if ((xcrl & 6) != 6) {
+return; /* AVX not enabled by OS */
+}
+
+have_atomic128 = true;
+}
+#endif /* __x86_64__ */
diff --git a/util/meson.build b/util/meson.build
index 5e282130df..4b29b719a8 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -2,6 +2,7 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 
'qemu-timer-common.c'))
 if not config_h

Re: [PATCH 1/8] hw/avr: Add limited support for avr gpio registers

2022-10-08 Thread Hee-cheol Yang
Sorry for your inconvenience.
There's something wrong on my patchwork.

I'll upload them again and share the link soon.


Thanks again.
Best regards

Heecheol yang

From: Michael Rolnik 
Sent: Wednesday, September 28, 2022 2:52:48 AM
To: Heecheol Yang 
Cc: qemu-devel@nongnu.org ; Philippe Mathieu-Daudé 

Subject: Re: [PATCH 1/8] hw/avr: Add limited support for avr gpio registers

Hi all,

Is there any kind of web UI where I can review it?
I don't find this patch in 
https://patchew.org/
  (there is only 2 year old version 
(https://patchew.org/search?q=project%3AQEMU+%22hw%2Favr%22))

Thank you,
Michael Rolnik

On Mon, Sep 12, 2022 at 2:21 PM Heecheol Yang 
mailto:heecheol.y...@outlook.com>> wrote:
Add some of these features for AVR GPIO:

  - GPIO I/O : PORTx registers
  - Data Direction : DDRx registers
  - DDRx toggling : PINx registers

Following things are not supported yet:
  - MCUR registers

Signed-off-by: Heecheol Yang 
mailto:heecheol.y...@outlook.com>>
Reviewed-by: Michael Rolnik mailto:mrol...@gmail.com>>
Message-Id: 
mailto:dm6pr16mb247368dbd3447abecdd795d7e6...@dm6pr16mb2473.namprd16.prod.outlook.com>>
[PMD: Use AVR_GPIO_COUNT]
Signed-off-by: Philippe Mathieu-Daudé mailto:f4...@amsat.org>>
Message-Id: 
<20210313165445.2113938-4-f4...@amsat.org>
---
 hw/avr/Kconfig |   1 +
 hw/avr/atmega.c|   7 +-
 hw/avr/atmega.h|   2 +
 hw/gpio/Kconfig|   3 +
 hw/gpio/avr_gpio.c | 138 +
 hw/gpio/meson.build|   1 +
 include/hw/gpio/avr_gpio.h |  53 ++
 7 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 hw/gpio/avr_gpio.c
 create mode 100644 include/hw/gpio/avr_gpio.h

diff --git a/hw/avr/Kconfig b/hw/avr/Kconfig
index d31298c3cc..16a57ced11 100644
--- a/hw/avr/Kconfig
+++ b/hw/avr/Kconfig
@@ -3,6 +3,7 @@ config AVR_ATMEGA_MCU
 select AVR_TIMER16
 select AVR_USART
 select AVR_POWER
+select AVR_GPIO

 config ARDUINO
 select AVR_ATMEGA_MCU
diff --git a/hw/avr/atmega.c b/hw/avr/atmega.c
index a34803e642..f5fb3a5225 100644
--- a/hw/avr/atmega.c
+++ b/hw/avr/atmega.c
@@ -282,8 +282,11 @@ static void atmega_realize(DeviceState *dev, Error **errp)
 continue;
 }
 devname = g_strdup_printf("atmega-gpio-%c", 'a' + (char)i);
-create_unimplemented_device(devname,
-OFFSET_DATA + mc->dev[idx].addr, 3);
+object_initialize_child(OBJECT(dev), devname, &s->gpio[i],
+TYPE_AVR_GPIO);
+sysbus_realize(SYS_BUS_DEVICE(&s->gpio[i]), &error_abort);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio[i]), 0,
+OFFSET_DATA + mc->dev[idx].addr);
 g_free(devname);
 }

diff --git a/hw/avr/atmega.h b/hw/avr/atmega.h
index a99ee15c7e..e2289d5744 100644
--- a/hw/avr/atmega.h
+++ b/hw/avr/atmega.h
@@ -13,6 +13,7 @@

 #include "hw/char/avr_usart.h"
 #include "hw/timer/avr_timer16.h"
+#include "hw/gpio/avr_gpio.h"
 #include "hw/misc/avr_power.h"
 #include "target/avr/cpu.h"
 #include "qom/object.h"
@@ -44,6 +45,7 @@ struct AtmegaMcuState {
 DeviceState *io;
 AVRMaskState pwr[POWER_MAX];
 AVRUsartState usart[USART_MAX];
+AVRGPIOState gpio[GPIO_MAX];
 AVRTimer16State timer[TIMER_MAX];
 uint64_t xtal_freq_hz;
 };
diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
index f0e7405f6e..fde7019b2b 100644
--- a/hw/gpio/Kconfig
+++ b/hw/gpio/Kconfig
@@ -13,3 +13,6 @@ config GPIO_PWR

 config SIFIVE_GPIO
 bool
+
+config AVR_GPIO
+bool
diff --git a/hw/gpio/avr_gpio.c b/hw/gpio/avr_gpio.c
new file mode 100644
index 00..cdb574ef0d
--- /dev/null
+++ b/hw/gpio/avr_gpio.c
@@ -0,0 +1,138 @@
+/*
+ * AVR processors GPIO registers emulation.
+ *
+ * Copyright (C) 2020 Heecheol Yang 
mailto:heecheol.y...@outlook.com>>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * T

[PATCH] ui/gtk: Fix the implicit mouse ungrabbing logic

2022-10-08 Thread Akihiko Odaki
Although the grab menu item represents the tabbed displays, the old
implicit mouse ungrabbing logic changes the grab menu item even for
an untabbed display.

Leave the grab menu item when implicitly ungrabbing mouse for an
untabbed display. The new ungrabbing logic introduced in
gd_mouse_mode_change() strictly follows the corresponding grabbing
logic found in gd_button_event().

Signed-off-by: Akihiko Odaki 
---
 ui/gtk.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 1467b8c7d7..6fc2e23963 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -681,9 +681,13 @@ static void gd_mouse_mode_change(Notifier *notify, void 
*data)
 
 s = container_of(notify, GtkDisplayState, mouse_mode_notifier);
 /* release the grab at switching to absolute mode */
-if (qemu_input_is_absolute() && gd_is_grab_active(s)) {
-gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s->grab_item),
-   FALSE);
+if (qemu_input_is_absolute() && s->ptr_owner) {
+if (!s->ptr_owner->window) {
+gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s->grab_item),
+   FALSE);
+} else {
+gd_ungrab_pointer(s);
+}
 }
 for (i = 0; i < s->nb_vcs; i++) {
 VirtualConsole *vc = &s->vc[i];
-- 
2.37.3




[PATCH v3] linux-user: mprotect() should returns 0 when len is 0.

2022-10-08 Thread Soichiro Isshiki
On Sat, Oct 8, 2022 at 12:41 AM Soichiro Isshiki  
wrote:
> A validation for wrap-around was added, I think it is neccesory.

I noticed the validation for wrap-around is *not* necessary, because it is done
by guest_range_valid_untagged().

Signed-off-by: Soichiro Isshiki 
---
 linux-user/mmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 28f3bc85ed..6f23a1ac39 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -124,17 +124,17 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 if ((start & ~TARGET_PAGE_MASK) != 0) {
 return -TARGET_EINVAL;
 }
-page_flags = validate_prot_to_pageflags(&host_prot, target_prot);
-if (!page_flags) {
-return -TARGET_EINVAL;
+if (len == 0) {
+return 0;
 }
 len = TARGET_PAGE_ALIGN(len);
-end = start + len;
 if (!guest_range_valid_untagged(start, len)) {
 return -TARGET_ENOMEM;
 }
-if (len == 0) {
-return 0;
+end = start + len;
+page_flags = validate_prot_to_pageflags(&host_prot, target_prot);
+if (!page_flags) {
+return -TARGET_EINVAL;
 }
 
 mmap_lock();
-- 
2.25.1




Re: [PATCH v8 00/16] QMP/HMP: introduce 'dumpdtb'

2022-10-08 Thread Daniel Henrique Barboza

Philippe,


I'm going to push the acked patches to ppc-next. If you send a r-b for patches
2 and 4 I can push them as well.

Alistair, I intend to push the acked RISC-V patches (patches 14 and 15) via the
ppc-next tree as well. Let me know if you'd rather pick them via the RISC-V
tree.


Thanks,

Daniel



On 9/26/22 14:38, Daniel Henrique Barboza wrote:

Hi,

This new version contains all changes proposed during the review process,
all of them done in the patch that introduces dumpdtb.

Other changes made:

- Patch 14/14, the one that introduces the command, is now patch 1. This
change is to make the other machine patches referencing 'dumpdtb QMP/HMP'
to reference an existing command.

- added two new patches based on Philippe's feedback: patch 2 and patch 4.

Mandatory patch pending review: patch 2
Optional machine patches pending review: 3, 4, 5, 7, 16

Changes from v7:
- patch 14: switched to start of the series, now patch 1
- patch 1:
   - changed hmp-commands.hx help to:
"dump the FDT in dtb format to 'filename'"

   - changed 'filename' to *filename*

   - changed filename description in machine.json to
 "name of the binary FDT file to be created"

   - changed 'size' to uint32_t
   - added a g_assert() for FDT size == zero
   - added a success message in hmp_dumpdtb()
- patch 2 (new):
   - free ms->fdt in machine_finalize()
- patch 4 (new):
   - assign ms->fdt in boston_mach_init()
- v7 link: https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01350.html

Daniel Henrique Barboza (16):
   qmp/hmp, device_tree.c: introduce dumpdtb
   hw/core: free ms->fdt in machine_finalize()
   hw/arm: do not free machine->fdt in arm_load_dtb()
   hw/mips: set machine->fdt in boston_mach_init()
   hw/microblaze: set machine->fdt in microblaze_load_dtb()
   hw/nios2: set machine->fdt in nios2_load_dtb()
   hw/ppc: set machine->fdt in ppce500_load_device_tree()
   hw/ppc: set machine->fdt in bamboo_load_device_tree()
   hw/ppc: set machine->fdt in sam460ex_load_device_tree()
   hw/ppc: set machine->fdt in xilinx_load_device_tree()
   hw/ppc: set machine->fdt in pegasos2_machine_reset()
   hw/ppc: set machine->fdt in pnv_reset()
   hw/ppc: set machine->fdt in spapr machine
   hw/riscv: set machine->fdt in sifive_u_machine_init()
   hw/riscv: set machine->fdt in spike_board_init()
   hw/xtensa: set machine->fdt in xtfpga_init()

  hmp-commands.hx  | 15 +++
  hw/arm/boot.c|  3 ++-
  hw/core/machine.c|  1 +
  hw/microblaze/boot.c |  8 +++-
  hw/microblaze/meson.build|  2 +-
  hw/mips/boston.c |  5 -
  hw/nios2/boot.c  |  8 +++-
  hw/nios2/meson.build |  2 +-
  hw/ppc/e500.c| 13 -
  hw/ppc/pegasos2.c|  4 
  hw/ppc/pnv.c |  8 +++-
  hw/ppc/ppc440_bamboo.c   | 25 +---
  hw/ppc/sam460ex.c| 21 ++--
  hw/ppc/spapr.c   |  3 +++
  hw/ppc/spapr_hcall.c |  8 
  hw/ppc/virtex_ml507.c| 25 +---
  hw/riscv/sifive_u.c  |  3 +++
  hw/riscv/spike.c |  6 ++
  hw/xtensa/meson.build|  2 +-
  hw/xtensa/xtfpga.c   |  6 +-
  include/sysemu/device_tree.h |  1 +
  monitor/misc.c   |  1 +
  qapi/machine.json| 18 ++
  softmmu/device_tree.c| 37 
  24 files changed, 183 insertions(+), 42 deletions(-)





Re: [External] Re: [PATCH 0/4] Add a new backend for cryptodev

2022-10-08 Thread Lei He

On 2022/10/7 22:25, Michael S. Tsirkin wrote:

On Mon, Sep 19, 2022 at 11:53:16AM +0800, Lei He wrote:

This patch adds a new backend called LKCF to cryptodev, LKCF stands
for Linux Kernel Cryptography Framework. If a cryptographic
accelerator that supports LKCF is installed on the the host (you can
see which algorithms are supported in host's LKCF by executing
'cat /proc/crypto'), then RSA operations can be offloaded.
More background info can refer to: https://lwn.net/Articles/895399/,
'keyctl[5]' in the picture.

This patch:
1. Modified some interfaces of cryptodev and cryptodev-backend to
support asynchronous requests.
2. Extended the DER encoder in crypto, so that we can export the
RSA private key into PKCS#8 format and upload it to host kernel.
3. Added a new backend for cryptodev.

I tested the backend with a QAT card, the qps of RSA-2048-decryption
is about 25k/s, and the main-loop becomes the bottleneck. The qps
using OpenSSL directly is about 6k/s (with 6 vCPUs). We will support
IO-thread for cryptodev in another series later.

Lei He (4):
   virtio-crypto: Support asynchronous mode
   crypto: Support DER encodings
   crypto: Support export akcipher to pkcs8
   cryptodev: Add a lkcf-backend for cryptodev


Seems to fail build for me - probably a conflict applying.
Coul you pls rebase and repost? Sorry about the noise.


I did a rebase but no conflicts seem to be found, this patch causes a 
compile error when neither nettle nor gcrypt is enabled - I've fixed it 
and reposted it as v2.





  backends/cryptodev-builtin.c|  69 +++--
  backends/cryptodev-lkcf.c   | 620 
  backends/cryptodev-vhost-user.c |  51 +++-
  backends/cryptodev.c|  44 +--
  backends/meson.build|   3 +
  crypto/akcipher.c   |  17 ++
  crypto/der.c| 307 ++--
  crypto/der.h| 211 +-
  crypto/rsakey.c |  42 +++
  crypto/rsakey.h |  11 +-
  hw/virtio/virtio-crypto.c   | 324 -
  include/crypto/akcipher.h   |  21 ++
  include/sysemu/cryptodev.h  |  61 ++--
  qapi/qom.json   |   2 +
  tests/unit/test-crypto-der.c| 126 ++--
  15 files changed, 1649 insertions(+), 260 deletions(-)
  create mode 100644 backends/cryptodev-lkcf.c

--
2.11.0




Best regards,
Lei He
--
helei.si...@bytedance.com




[PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-08 Thread Si-Wei Liu
Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
backend as another parameter to instantiate vhost-vdpa net client.
This would benefit the use case where only open file descriptors, as
opposed to raw vhost-vdpa device paths, are accessible from the QEMU
process.

(qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1

Signed-off-by: Si-Wei Liu 
Acked-by: Eugenio Pérez 

---
v2:
  - fixed typo in commit message
  - s/fd's/file descriptors/
---
 net/vhost-vdpa.c | 25 -
 qapi/net.json|  3 +++
 qemu-options.hx  |  6 --
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 182b3a1..366b070 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -683,14 +683,29 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = &netdev->u.vhost_vdpa;
-if (!opts->vhostdev) {
-error_setg(errp, "vdpa character device not specified with vhostdev");
+if (!opts->has_vhostdev && !opts->has_vhostfd) {
+error_setg(errp,
+   "vhost-vdpa: neither vhostdev= nor vhostfd= was specified");
 return -1;
 }
 
-vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
-if (vdpa_device_fd == -1) {
-return -errno;
+if (opts->has_vhostdev && opts->has_vhostfd) {
+error_setg(errp,
+   "vhost-vdpa: vhostdev= and vhostfd= are mutually 
exclusive");
+return -1;
+}
+
+if (opts->has_vhostdev) {
+vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
+if (vdpa_device_fd == -1) {
+return -errno;
+}
+} else if (opts->has_vhostfd) {
+vdpa_device_fd = monitor_fd_param(monitor_cur(), opts->vhostfd, errp);
+if (vdpa_device_fd == -1) {
+error_prepend(errp, "vhost-vdpa: unable to parse vhostfd: ");
+return -1;
+}
 }
 
 r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
diff --git a/qapi/net.json b/qapi/net.json
index dd088c0..926ecc8 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -442,6 +442,8 @@
 # @vhostdev: path of vhost-vdpa device
 #(default:'/dev/vhost-vdpa-0')
 #
+# @vhostfd: file descriptor of an already opened vhost vdpa device
+#
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #  (default: 1)
 #
@@ -456,6 +458,7 @@
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
 '*vhostdev': 'str',
+'*vhostfd':  'str',
 '*queues':   'int',
 '*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 913c71e..c040f74 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2774,8 +2774,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a vhost-user network, backed by a chardev 
'dev'\n"
 #endif
 #ifdef __linux__
-"-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
+"-netdev vhost-vdpa,id=str[,vhostdev=/path/to/dev][,vhostfd=h]\n"
 "configure a vhost-vdpa network,Establish a vhost-vdpa 
netdev\n"
+"use 'vhostdev=/path/to/dev' to open a vhost vdpa device\n"
+"use 'vhostfd=h' to connect to an already opened vhost 
vdpa device\n"
 #endif
 #ifdef CONFIG_VMNET
 "-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
@@ -3280,7 +3282,7 @@ SRST
  -netdev type=vhost-user,id=net0,chardev=chr0 \
  -device virtio-net-pci,netdev=net0
 
-``-netdev vhost-vdpa,vhostdev=/path/to/dev``
+``-netdev vhost-vdpa[,vhostdev=/path/to/dev][,vhostfd=h]``
 Establish a vhost-vdpa netdev.
 
 vDPA device is a device that uses a datapath which complies with
-- 
1.8.3.1




[PATCH v2 3/4] crypto: Support export akcipher to pkcs8

2022-10-08 Thread Lei He
crypto: support export RSA private keys with PKCS#8 standard.
So that users can upload this private key to linux kernel.

Signed-off-by: lei he 
---
 crypto/akcipher.c | 18 ++
 crypto/rsakey.c   | 42 ++
 crypto/rsakey.h   | 11 ++-
 include/crypto/akcipher.h | 21 +
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/crypto/akcipher.c b/crypto/akcipher.c
index ad88379c1e..e4bbc6e5f1 100644
--- a/crypto/akcipher.c
+++ b/crypto/akcipher.c
@@ -22,6 +22,8 @@
 #include "qemu/osdep.h"
 #include "crypto/akcipher.h"
 #include "akcipherpriv.h"
+#include "der.h"
+#include "rsakey.h"
 
 #if defined(CONFIG_GCRYPT)
 #include "akcipher-gcrypt.c.inc"
@@ -106,3 +108,19 @@ void qcrypto_akcipher_free(QCryptoAkCipher *akcipher)
 
 drv->free(akcipher);
 }
+
+int qcrypto_akcipher_export_p8info(const QCryptoAkCipherOptions *opts,
+   uint8_t *key, size_t keylen,
+   uint8_t **dst, size_t *dst_len,
+   Error **errp)
+{
+switch (opts->alg) {
+case QCRYPTO_AKCIPHER_ALG_RSA:
+qcrypto_akcipher_rsakey_export_p8info(key, keylen, dst, dst_len);
+return 0;
+
+default:
+error_setg(errp, "Unsupported algorithm: %u", opts->alg);
+return -1;
+}
+}
diff --git a/crypto/rsakey.c b/crypto/rsakey.c
index cc40e072f0..7d6f273aef 100644
--- a/crypto/rsakey.c
+++ b/crypto/rsakey.c
@@ -19,6 +19,8 @@
  *
  */
 
+#include "qemu/osdep.h"
+#include "der.h"
 #include "rsakey.h"
 
 void qcrypto_akcipher_rsakey_free(QCryptoAkCipherRSAKey *rsa_key)
@@ -37,6 +39,46 @@ void qcrypto_akcipher_rsakey_free(QCryptoAkCipherRSAKey 
*rsa_key)
 g_free(rsa_key);
 }
 
+/**
+ * PKCS#8 private key info for RSA
+ *
+ * PrivateKeyInfo ::= SEQUENCE {
+ * version INTEGER,
+ * privateKeyAlgorithm PrivateKeyAlgorithmIdentifier,
+ * privateKey  OCTET STRING,
+ * attributes  [0] IMPLICIT Attributes OPTIONAL
+ * }
+ */
+void qcrypto_akcipher_rsakey_export_p8info(const uint8_t *key,
+   size_t keylen,
+   uint8_t **dst,
+   size_t *dlen)
+{
+QCryptoEncodeContext *ctx = qcrypto_der_encode_ctx_new();
+uint8_t version = 0;
+
+qcrypto_der_encode_seq_begin(ctx);
+
+/* version */
+qcrypto_der_encode_int(ctx, &version, sizeof(version));
+
+/* algorithm identifier */
+qcrypto_der_encode_seq_begin(ctx);
+qcrypto_der_encode_oid(ctx, (uint8_t *)QCRYPTO_OID_rsaEncryption,
+   sizeof(QCRYPTO_OID_rsaEncryption) - 1);
+qcrypto_der_encode_null(ctx);
+qcrypto_der_encode_seq_end(ctx);
+
+/* RSA private key */
+qcrypto_der_encode_octet_str(ctx, key, keylen);
+
+qcrypto_der_encode_seq_end(ctx);
+
+*dlen = qcrypto_der_encode_ctx_buffer_len(ctx);
+*dst = g_malloc(*dlen);
+qcrypto_der_encode_ctx_flush_and_free(ctx, *dst);
+}
+
 #if defined(CONFIG_NETTLE) && defined(CONFIG_HOGWEED)
 #include "rsakey-nettle.c.inc"
 #else
diff --git a/crypto/rsakey.h b/crypto/rsakey.h
index 974b76f659..00b3eccec7 100644
--- a/crypto/rsakey.h
+++ b/crypto/rsakey.h
@@ -22,7 +22,6 @@
 #ifndef QCRYPTO_RSAKEY_H
 #define QCRYPTO_RSAKEY_H
 
-#include "qemu/osdep.h"
 #include "qemu/host-utils.h"
 #include "crypto/akcipher.h"
 
@@ -84,6 +83,16 @@ QCryptoAkCipherRSAKey *qcrypto_akcipher_rsakey_parse(
 QCryptoAkCipherKeyType type,
 const uint8_t *key, size_t keylen, Error **errp);
 
+/**
+ * qcrypto_akcipher_rsakey_export_as_p8info:
+ *
+ * Export RSA private key to PKCS#8 private key info.
+ */
+void qcrypto_akcipher_rsakey_export_p8info(const uint8_t *key,
+   size_t keylen,
+   uint8_t **dst,
+   size_t *dlen);
+
 void qcrypto_akcipher_rsakey_free(QCryptoAkCipherRSAKey *key);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoAkCipherRSAKey,
diff --git a/include/crypto/akcipher.h b/include/crypto/akcipher.h
index 51f5fa2774..214e58ca47 100644
--- a/include/crypto/akcipher.h
+++ b/include/crypto/akcipher.h
@@ -153,6 +153,27 @@ int qcrypto_akcipher_max_dgst_len(QCryptoAkCipher 
*akcipher);
  */
 void qcrypto_akcipher_free(QCryptoAkCipher *akcipher);
 
+/**
+ * qcrypto_akcipher_export_p8info:
+ * @opts: the options of the akcipher to be exported.
+ * @key: the original key of the akcipher to be exported.
+ * @keylen: length of the 'key'
+ * @dst: output parameter, if export succeed, *dst is set to the
+ * PKCS#8 encoded private key, caller MUST free this key with
+ * g_free after use.
+ * @dst_len: output parameter, indicates the length of PKCS#8 encoded
+ * key.
+ *
+ * Export the akcipher into DER encoded pkcs#8 private key info, expects
+ * |key| stores a valid asymmetric PRIVATE key.
+ *
+ * Returns: 0 for succeed, otherwise -1 is r

[PATCH v2 1/4] virtio-crypto: Support asynchronous mode

2022-10-08 Thread Lei He
virtio-crypto: Modify the current interface of virtio-crypto
device to support asynchronous mode.

Signed-off-by: lei he 
---
 backends/cryptodev-builtin.c|  69 ++---
 backends/cryptodev-vhost-user.c |  51 +--
 backends/cryptodev.c|  44 +++---
 hw/virtio/virtio-crypto.c   | 324 ++--
 include/sysemu/cryptodev.h  |  60 +---
 5 files changed, 336 insertions(+), 212 deletions(-)

diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c
index 125cbad1d3..cda6ca3b71 100644
--- a/backends/cryptodev-builtin.c
+++ b/backends/cryptodev-builtin.c
@@ -355,42 +355,62 @@ static int cryptodev_builtin_create_akcipher_session(
 return index;
 }
 
-static int64_t cryptodev_builtin_create_session(
+static int cryptodev_builtin_create_session(
CryptoDevBackend *backend,
CryptoDevBackendSessionInfo *sess_info,
-   uint32_t queue_index, Error **errp)
+   uint32_t queue_index,
+   CryptoDevCompletionFunc cb,
+   void *opaque)
 {
 CryptoDevBackendBuiltin *builtin =
   CRYPTODEV_BACKEND_BUILTIN(backend);
 CryptoDevBackendSymSessionInfo *sym_sess_info;
 CryptoDevBackendAsymSessionInfo *asym_sess_info;
+int ret, status;
+Error *local_error = NULL;
 
 switch (sess_info->op_code) {
 case VIRTIO_CRYPTO_CIPHER_CREATE_SESSION:
 sym_sess_info = &sess_info->u.sym_sess_info;
-return cryptodev_builtin_create_cipher_session(
-   builtin, sym_sess_info, errp);
+ret = cryptodev_builtin_create_cipher_session(
+builtin, sym_sess_info, &local_error);
+break;
 
 case VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION:
 asym_sess_info = &sess_info->u.asym_sess_info;
-return cryptodev_builtin_create_akcipher_session(
-   builtin, asym_sess_info, errp);
+ret = cryptodev_builtin_create_akcipher_session(
+   builtin, asym_sess_info, &local_error);
+break;
 
 case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
 case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
 default:
-error_setg(errp, "Unsupported opcode :%" PRIu32 "",
+error_setg(&local_error, "Unsupported opcode :%" PRIu32 "",
sess_info->op_code);
-return -1;
+return -VIRTIO_CRYPTO_NOTSUPP;
 }
 
-return -1;
+if (local_error) {
+error_report_err(local_error);
+}
+if (ret < 0) {
+status = -VIRTIO_CRYPTO_ERR;
+} else {
+sess_info->session_id = ret;
+status = VIRTIO_CRYPTO_OK;
+}
+if (cb) {
+cb(opaque, status);
+}
+return 0;
 }
 
 static int cryptodev_builtin_close_session(
CryptoDevBackend *backend,
uint64_t session_id,
-   uint32_t queue_index, Error **errp)
+   uint32_t queue_index,
+   CryptoDevCompletionFunc cb,
+   void *opaque)
 {
 CryptoDevBackendBuiltin *builtin =
   CRYPTODEV_BACKEND_BUILTIN(backend);
@@ -407,6 +427,9 @@ static int cryptodev_builtin_close_session(
 
 g_free(session);
 builtin->sessions[session_id] = NULL;
+if (cb) {
+cb(opaque, VIRTIO_CRYPTO_OK);
+}
 return 0;
 }
 
@@ -506,7 +529,9 @@ static int cryptodev_builtin_asym_operation(
 static int cryptodev_builtin_operation(
  CryptoDevBackend *backend,
  CryptoDevBackendOpInfo *op_info,
- uint32_t queue_index, Error **errp)
+ uint32_t queue_index,
+ CryptoDevCompletionFunc cb,
+ void *opaque)
 {
 CryptoDevBackendBuiltin *builtin =
   CRYPTODEV_BACKEND_BUILTIN(backend);
@@ -514,11 +539,12 @@ static int cryptodev_builtin_operation(
 CryptoDevBackendSymOpInfo *sym_op_info;
 CryptoDevBackendAsymOpInfo *asym_op_info;
 enum CryptoDevBackendAlgType algtype = op_info->algtype;
-int ret = -VIRTIO_CRYPTO_ERR;
+int status = -VIRTIO_CRYPTO_ERR;
+Error *local_error = NULL;
 
 if (op_info->session_id >= MAX_NUM_SESSIONS ||
   builtin->sessions[op_info->session_id] == NULL) {
-error_setg(errp, "Cannot find a valid session id: %" PRIu64 "",
+error_setg(&local_error, "Cannot find a valid session id: %" PRIu64 "",
op_info->session_id);
 return -VIRTIO_CRYPTO_INVSESS;
 }
@@ -526,14 +552,21 @@ static int cryptodev_builtin_operation(
 sess = builtin->sessions[op_info->session_id];
 if (algtype == CRYPTODEV_BACKEND_ALG_SYM) {
 sym_op_info = op_info->u.sym_op_info;
-ret = cryptodev_builtin_sym_operation(sess, sym_op_info, errp);
+status = cryptodev_builtin_sym_operation(sess, sym_op_info,
+ &local_error);
 } else if (algtype == CRYPTODEV_BACKEND_ALG_ASYM) {
 asym_op_info = op_info->u.asym_op_info;
-

[PATCH v2 4/4] cryptodev: Add a lkcf-backend for cryptodev

2022-10-08 Thread Lei He
cryptodev: Added a new type of backend named lkcf-backend for
cryptodev. This backend upload asymmetric keys to linux kernel,
and let kernel do the accelerations if possible.
The lkcf stands for Linux Kernel Cryptography Framework.

Signed-off-by: lei he 
---
 backends/cryptodev-lkcf.c  | 645 +
 backends/meson.build   |   3 +
 include/sysemu/cryptodev.h |   1 +
 qapi/qom.json  |   2 +
 4 files changed, 651 insertions(+)
 create mode 100644 backends/cryptodev-lkcf.c

diff --git a/backends/cryptodev-lkcf.c b/backends/cryptodev-lkcf.c
new file mode 100644
index 00..133bd706a4
--- /dev/null
+++ b/backends/cryptodev-lkcf.c
@@ -0,0 +1,645 @@
+/*
+ * QEMU Cryptodev backend for QEMU cipher APIs
+ *
+ * Copyright (c) 2022 Bytedance.Inc
+ *
+ * Authors:
+ *lei he 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "crypto/cipher.h"
+#include "crypto/akcipher.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qemu/thread.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qom/object.h"
+#include "sysemu/cryptodev.h"
+#include "standard-headers/linux/virtio_crypto.h"
+
+#include 
+#include 
+
+/**
+ * @TYPE_CRYPTODEV_BACKEND_LKCF:
+ * name of backend that uses linux kernel crypto framework
+ */
+#define TYPE_CRYPTODEV_BACKEND_LKCF "cryptodev-backend-lkcf"
+
+OBJECT_DECLARE_SIMPLE_TYPE(CryptoDevBackendLKCF, CRYPTODEV_BACKEND_LKCF)
+
+#define INVALID_KEY_ID -1
+#define MAX_SESSIONS 256
+#define NR_WORKER_THREAD 64
+
+#define KCTL_KEY_TYPE_PKEY "asymmetric"
+/**
+ * Here the key is uploaded to the thread-keyring of worker thread, at least
+ * util linux-6.0:
+ * 1. process keyring seems to behave unexpectedly if main-thread does not
+ * create the keyring before creating any other thread.
+ * 2. at present, the guest kernel never perform multiple operations on a
+ * session.
+ * 3. it can reduce the load of the main-loop because the key passed by the
+ * guest kernel has been already checked.
+ */
+#define KCTL_KEY_RING KEY_SPEC_THREAD_KEYRING
+
+typedef struct CryptoDevBackendLKCFSession {
+uint8_t *key;
+size_t keylen;
+QCryptoAkCipherKeyType keytype;
+QCryptoAkCipherOptions akcipher_opts;
+} CryptoDevBackendLKCFSession;
+
+typedef struct CryptoDevBackendLKCF CryptoDevBackendLKCF;
+typedef struct CryptoDevLKCFTask CryptoDevLKCFTask;
+struct CryptoDevLKCFTask {
+CryptoDevBackendLKCFSession *sess;
+CryptoDevBackendOpInfo *op_info;
+CryptoDevCompletionFunc cb;
+void *opaque;
+int status;
+CryptoDevBackendLKCF *lkcf;
+QSIMPLEQ_ENTRY(CryptoDevLKCFTask) queue;
+};
+
+typedef struct CryptoDevBackendLKCF {
+CryptoDevBackend parent_obj;
+CryptoDevBackendLKCFSession *sess[MAX_SESSIONS];
+QSIMPLEQ_HEAD(, CryptoDevLKCFTask) requests;
+QSIMPLEQ_HEAD(, CryptoDevLKCFTask) responses;
+QemuMutex mutex;
+QemuCond cond;
+QemuMutex rsp_mutex;
+
+/**
+ * There is no async interface for asymmetric keys like AF_ALG sockets,
+ * we don't seem to have better way than create a lots of thread.
+ */
+QemuThread worker_threads[NR_WORKER_THREAD];
+bool running;
+int eventfd;
+} CryptoDevBackendLKCF;
+
+static void *cryptodev_lkcf_worker(void *arg);
+static int cryptodev_lkcf_close_session(CryptoDevBackend *backend,
+uint64_t session_id,
+uint32_t queue_index,
+CryptoDevCompletionFunc cb,
+void *opaque);
+
+static void cryptodev_lkcf_handle_response(void *opaque)
+{
+CryptoDevBackendLKCF *lkcf = (CryptoDevBackendLKCF *)opaque;
+QSIMPLEQ_HEAD(, CryptoDevLKCFTask) responses;
+CryptoDevLKCFTask *task, *next;
+eventfd_t nevent;
+
+QSIMPLEQ_INIT(&responses);
+eventfd_read(lkcf->eventfd, &nevent);
+
+qemu_mutex_lock(&lkcf->rsp_mutex);
+QSIMPLEQ_PREPEND(&responses, &lkcf->responses);
+qemu_mutex_unlock(&lkcf->rsp_mutex);
+
+QSIMPLEQ_FOREACH_SAFE(task, &responses, queue, next) {
+if (task->cb) {
+task->cb(task->opaque, task->status);
+}
+g_free(task);
+}
+}
+
+static int cryptodev_lkcf_set_op_desc(QCryptoAkCipherOptions *opts,
+

[PATCH v2 2/4] crypto: Support DER encodings

2022-10-08 Thread Lei He
Add encoding interfaces for DER encoding:
1. support decoding of 'bit string', 'octet string', 'object id'
and 'context specific tag' for DER encoder.
2. implemented a simple DER encoder.
3. add more testsuits for DER encoder.

Signed-off-by: lei he 
---
 crypto/der.c | 307 +++
 crypto/der.h | 211 -
 tests/unit/test-crypto-der.c | 126 ++
 3 files changed, 597 insertions(+), 47 deletions(-)

diff --git a/crypto/der.c b/crypto/der.c
index f877390bbb..dab3fe4f24 100644
--- a/crypto/der.c
+++ b/crypto/der.c
@@ -22,20 +22,93 @@
 #include "qemu/osdep.h"
 #include "crypto/der.h"
 
+typedef struct QCryptoDerEncodeNode {
+uint8_t tag;
+struct QCryptoDerEncodeNode *parent;
+struct QCryptoDerEncodeNode *next;
+/* for constructed type, data is null */
+const uint8_t *data;
+size_t dlen;
+} QCryptoDerEncodeNode;
+
+typedef struct QCryptoEncodeContext {
+QCryptoDerEncodeNode root;
+QCryptoDerEncodeNode *current_parent;
+QCryptoDerEncodeNode *tail;
+} QCryptoEncodeContext;
+
 enum QCryptoDERTypeTag {
 QCRYPTO_DER_TYPE_TAG_BOOL = 0x1,
 QCRYPTO_DER_TYPE_TAG_INT = 0x2,
 QCRYPTO_DER_TYPE_TAG_BIT_STR = 0x3,
 QCRYPTO_DER_TYPE_TAG_OCT_STR = 0x4,
-QCRYPTO_DER_TYPE_TAG_OCT_NULL = 0x5,
-QCRYPTO_DER_TYPE_TAG_OCT_OID = 0x6,
+QCRYPTO_DER_TYPE_TAG_NULL = 0x5,
+QCRYPTO_DER_TYPE_TAG_OID = 0x6,
 QCRYPTO_DER_TYPE_TAG_SEQ = 0x10,
 QCRYPTO_DER_TYPE_TAG_SET = 0x11,
 };
 
-#define QCRYPTO_DER_CONSTRUCTED_MASK 0x20
+enum QCryptoDERTagClass {
+QCRYPTO_DER_TAG_CLASS_UNIV = 0x0,
+QCRYPTO_DER_TAG_CLASS_APPL = 0x1,
+QCRYPTO_DER_TAG_CLASS_CONT = 0x2,
+QCRYPTO_DER_TAG_CLASS_PRIV = 0x3,
+};
+
+enum QCryptoDERTagEnc {
+QCRYPTO_DER_TAG_ENC_PRIM = 0x0,
+QCRYPTO_DER_TAG_ENC_CONS = 0x1,
+};
+
+#define QCRYPTO_DER_TAG_ENC_MASK 0x20
+#define QCRYPTO_DER_TAG_ENC_SHIFT 5
+
+#define QCRYPTO_DER_TAG_CLASS_MASK 0xc0
+#define QCRYPTO_DER_TAG_CLASS_SHIFT 6
+
+#define QCRYPTO_DER_TAG_VAL_MASK 0x1f
 #define QCRYPTO_DER_SHORT_LEN_MASK 0x80
 
+#define QCRYPTO_DER_TAG(class, enc, val)   \
+(((class) << QCRYPTO_DER_TAG_CLASS_SHIFT) |\
+ ((enc) << QCRYPTO_DER_TAG_ENC_SHIFT) | (val))
+
+/**
+ * qcrypto_der_encode_length:
+ * @src_len: the length of source data
+ * @dst: distination to save the encoded 'length', if dst is NULL, only compute
+ * the expected buffer size in bytes.
+ * @dst_len: output parameter, indicates how many bytes wrote.
+ *
+ * Encode the 'length' part of TLV tuple.
+ */
+static void qcrypto_der_encode_length(size_t src_len,
+  uint8_t *dst, size_t *dst_len)
+{
+size_t max_length = 0xFF;
+uint8_t length_bytes = 0, header_byte;
+
+if (src_len < QCRYPTO_DER_SHORT_LEN_MASK) {
+header_byte = src_len;
+*dst_len = 1;
+} else {
+for (length_bytes = 1; max_length < src_len; length_bytes++) {
+max_length = (max_length << 8) + max_length;
+}
+header_byte = length_bytes;
+header_byte |= QCRYPTO_DER_SHORT_LEN_MASK;
+*dst_len = length_bytes + 1;
+}
+if (!dst) {
+return;
+}
+*dst++ = header_byte;
+/* Bigendian length bytes */
+for (; length_bytes > 0; length_bytes--) {
+*dst++ = ((src_len >> (length_bytes - 1) * 8) & 0xFF);
+}
+}
+
 static uint8_t qcrypto_der_peek_byte(const uint8_t **data, size_t *dlen)
 {
 return **data;
@@ -150,40 +223,230 @@ static int qcrypto_der_extract_data(const uint8_t 
**data, size_t *dlen,
 return qcrypto_der_extract_definite_data(data, dlen, cb, ctx, errp);
 }
 
-int qcrypto_der_decode_int(const uint8_t **data, size_t *dlen,
-   QCryptoDERDecodeCb cb, void *ctx, Error **errp)
+static int qcrypto_der_decode_tlv(const uint8_t expected_tag,
+  const uint8_t **data, size_t *dlen,
+  QCryptoDERDecodeCb cb,
+  void *ctx, Error **errp)
 {
+const uint8_t *saved_data = *data;
+size_t saved_dlen = *dlen;
 uint8_t tag;
+int data_length;
+
 if (*dlen < 1) {
 error_setg(errp, "Need more data");
 return -1;
 }
 tag = qcrypto_der_cut_byte(data, dlen);
+if (tag != expected_tag) {
+error_setg(errp, "Unexpected tag: expected: %u, actual: %u",
+   expected_tag, tag);
+goto error;
+}
 
-/* INTEGER must encoded in primitive-form */
-if (tag != QCRYPTO_DER_TYPE_TAG_INT) {
-error_setg(errp, "Invalid integer type tag: %u", tag);
-return -1;
+data_length = qcrypto_der_extract_data(data, dlen, cb, ctx, errp);
+if (data_length < 0) {
+goto error;
 }
+return data_length;
 
-return qcrypto_der_extract_data(data, dlen, cb, ctx, errp);
+error:
+*data = saved_data;
+*dlen = saved_dlen;
+return -1;
+}
+
+in

[PATCH v2 0/4] Add a new backend for cryptodev

2022-10-08 Thread Lei He
v1 --> v2:
- Fix compile errors when neither 'nettle' nor 'gcrypt' are enabled.
- Trivial changes to error codes when neither 'nettle' nor 'gcrypt' are
enabled.

This patch adds a new backend called LKCF to cryptodev, LKCF stands
for Linux Kernel Cryptography Framework. If a cryptographic
accelerator that supports LKCF is installed on the the host (you can
see which algorithms are supported in host's LKCF by executing
'cat /proc/crypto'), then RSA operations can be offloaded.
More background info can refer to: https://lwn.net/Articles/895399/,
'keyctl[5]' in the picture.

This patch:
1. Modified some interfaces of cryptodev and cryptodev-backend to
support asynchronous requests.
2. Extended the DER encoder in crypto, so that we can export the
RSA private key into PKCS#8 format and upload it to host kernel.
3. Added a new backend for cryptodev.

I tested the backend with a QAT card, the qps of RSA-2048-decryption
is about 25k/s, and the main-loop becomes the bottleneck. The qps
using OpenSSL directly is about 6k/s (with 6 vCPUs). We will support 
IO-thread for cryptodev in another series later.


Lei He (4):
  virtio-crypto: Support asynchronous mode
  crypto: Support DER encodings
  crypto: Support export akcipher to pkcs8
  cryptodev: Add a lkcf-backend for cryptodev

 backends/cryptodev-builtin.c|  69 +++--
 backends/cryptodev-lkcf.c   | 645 
 backends/cryptodev-vhost-user.c |  51 +++-
 backends/cryptodev.c|  44 +--
 backends/meson.build|   3 +
 crypto/akcipher.c   |  18 ++
 crypto/der.c| 307 +--
 crypto/der.h| 211 -
 crypto/rsakey.c |  42 +++
 crypto/rsakey.h |  11 +-
 hw/virtio/virtio-crypto.c   | 324 +++-
 include/crypto/akcipher.h   |  21 ++
 include/sysemu/cryptodev.h  |  61 ++--
 qapi/qom.json   |   2 +
 tests/unit/test-crypto-der.c| 126 ++--
 15 files changed, 1675 insertions(+), 260 deletions(-)
 create mode 100644 backends/cryptodev-lkcf.c

-- 
2.11.0