Re: [PATCH v14 04/10] accel/tcg: add jit stats and time to TBStatistics

2023-05-31 Thread Wu, Fei
On 6/1/2023 9:08 AM, Richard Henderson wrote:
> On 5/30/23 01:35, Fei Wu wrote:
>> +static void collect_jit_profile_info(void *p, uint32_t hash, void
>> *userp)
>> +{
>> +    struct jit_profile_info *jpi = userp;
>> +    TBStatistics *tbs = p;
>> +
>> +    jpi->translations += tbs->translations.total;
>> +    jpi->ops += tbs->code.num_tcg_ops;
>> +    if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
>> +    jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
>> +    }
>> +    jpi->del_ops += tbs->code.deleted_ops;
>> +    jpi->temps += tbs->code.temps;
>> +    if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
>> +    jpi->temps_max = stat_per_translation(tbs, code.temps);
>> +    }
>> +    jpi->host += tbs->code.out_len;
>> +    jpi->guest += tbs->code.in_len;
>> +    jpi->search_data += tbs->code.search_out_len;
>> +
>> +    jpi->interm_time += stat_per_translation(tbs, gen_times.ir);
>> +    jpi->opt_time += stat_per_translation(tbs, gen_times.ir_opt);
>> +    jpi->la_time += stat_per_translation(tbs, gen_times.la);
>> +    jpi->code_time += stat_per_translation(tbs, gen_times.code);
>> +
>> +    /*
>> + * The restore time covers how long we have spent restoring state
>> + * from a given TB (e.g. recovering from a fault). It is therefor
>> + * not related to the number of translations we have done.
>> + */
>> +    jpi->restore_time += tbs->tb_restore_time;
>> +    jpi->restore_count += tbs->tb_restore_count;
>> +}
> 
> Why do sums of averages (stats_per_translation) instead of accumulating
> the complete total and dividing by jpi->translations?
> 
There has some inconsistency.

>> +void dump_jit_exec_time_info(uint64_t dev_time, GString *buf)
>> +{
>> +    static uint64_t last_cpu_exec_time;
>> +    uint64_t cpu_exec_time;
>> +    uint64_t delta;
>> +
>> +    cpu_exec_time = tcg_cpu_exec_time();
>> +    delta = cpu_exec_time - last_cpu_exec_time;
>> +
>> +    g_string_append_printf(buf, "async time  %" PRId64 " (%0.3f)\n",
>> +   dev_time, dev_time /
>> (double)NANOSECONDS_PER_SECOND);
>> +    g_string_append_printf(buf, "qemu time   %" PRId64 " (%0.3f)\n",
>> +   delta, delta /
>> (double)NANOSECONDS_PER_SECOND);
>> +    last_cpu_exec_time = cpu_exec_time;
>> +}
>> +
>> +/* dump JIT statisticis using TCGProfile and TBStats */
> 
> "statistics"
> 
ok.

>> diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
>> index 3973591508..749ad182f2 100644
>> --- a/accel/tcg/tcg-accel-ops.c
>> +++ b/accel/tcg/tcg-accel-ops.c
>> @@ -70,10 +70,17 @@ void tcg_cpus_destroy(CPUState *cpu)
>>   int tcg_cpus_exec(CPUState *cpu)
>>   {
>>   int ret;
>> +    uint64_t ti;
>> +
>>   assert(tcg_enabled());
>> +    ti = profile_getclock();
>> +
>>   cpu_exec_start(cpu);
>>   ret = cpu_exec(cpu);
>>   cpu_exec_end(cpu);
>> +
>> +    qatomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
> 
> You can't qatomic_add a 64-bit value on a 32-bit host.

Right, I only changed the counters to size_t, didn't fix time part. Is
it possible to support it with some kind of locks on 32-bit as a generic
fallback solution? After all, 32-bit host seems not popular, it might be
sacrifice performance a little bit.

For me, the times 'dev_time' and 'cpu_exec_time' is only for the
developer to get a sense of how large part of time spent on such as "ir"
and "code". Alex mentioned in another thread that it's arguable if they
are useful. If not, we can remove it.

> Nor is tcg_ctx a good place to put this.
> 
> If you want to accumulate per-cpu data, put it on the cpu where there's
> no chance of racing with anyone.
> 
> Finally, I suspect that this isn't even where you want to accumulate
> time for execution as separate from translation.  You'd to that in
> cpu_exec_enter/cpu_exec_exit.
> 
>> @@ -296,6 +315,8 @@ static TBStatistics *tb_get_stats(tb_page_addr_t
>> phys_pc, target_ulong pc,
>>   new_stats->cs_base = cs_base;
>>   new_stats->flags = flags;
>>   new_stats->stats_enabled = get_default_tbstats_flag();
>> +    new_stats->tbs = g_ptr_array_sized_new(4);
> 
> Why default to 4?  Is that just a guess, or are we really recomputing
> tbs that frequently?  Is there a good reason not to use g_ptr_array_new()?
> 
Is this the same question in 2019 :) I cannot find the original link
right now. I will try to find it again later.

>> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> index 80ffbfb455..a60a92091b 100644
>> --- a/accel/tcg/translator.c
>> +++ b/accel/tcg/translator.c
>> @@ -19,7 +19,7 @@
>>   #include "exec/plugin-gen.h"
>>   #include "exec/replay-core.h"
>>   -static void gen_tb_exec_count(TranslationBlock *tb)
>> +static inline void gen_tb_exec_count(TranslationBlock *tb)
> 
> Why?
> 
will remove.

>>   {
>>   if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
>>   TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
>> @@ -147,6 +147,11 @@ void translator_loop(CPUState 

Re: [PULL 00/21] Migration 20230530 patches

2023-05-31 Thread Juan Quintela
Richard Henderson  wrote:
> On 5/31/23 14:03, Juan Quintela wrote:
>> Richard Henderson  wrote:
>>> On 5/30/23 11:25, Juan Quintela wrote:
 The following changes since commit 
 aa9bbd865502ed517624ab6fe7d4b5d89ca95e43:
 Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu
 into staging (2023-05-29 14:31:52 -0700)
 are available in the Git repository at:
 https://gitlab.com/juan.quintela/qemu.git
 tags/migration-20230530-pull-request
 for you to fetch changes up to
 c63c544005e6b1375a9c038f0e0fb8dfb8b249f4:
 migration/rdma: Check sooner if we are in postcopy for
 save_page() (2023-05-30 19:23:50 +0200)
 
>> Added Markus and Daniel.
>> 
 Migration 20230530 Pull request (take 2)
 Hi
 Resend last PULL request, this time it compiles when CONFIG_RDMA is
 not configured in.
 [take 1]
 On this PULL request:
 - Set vmstate migration failure right (vladimir)
 - Migration QEMUFileHook removal (juan)
 - Migration Atomic counters (juan)
 Please apply.
 
 Juan Quintela (16):
 migration: Don't abuse qemu_file transferred for RDMA
 migration/RDMA: It is accounting for zero/normal pages in two places
 migration/rdma: Remove QEMUFile parameter when not used
 migration/rdma: Don't use imaginary transfers
 migration: Remove unused qemu_file_credit_transfer()
 migration/rdma: Simplify the function that saves a page
 migration: Create migrate_rdma()
 migration/rdma: Unfold ram_control_before_iterate()
 migration/rdma: Unfold ram_control_after_iterate()
 migration/rdma: Remove all uses of RAM_CONTROL_HOOK
 migration/rdma: Unfold hook_ram_load()
 migration/rdma: Create rdma_control_save_page()
 qemu-file: Remove QEMUFileHooks
 migration/rdma: Move rdma constants from qemu-file.h to rdma.h
 migration/rdma: Remove qemu_ prefix from exported functions
 migration/rdma: Check sooner if we are in postcopy for save_page()
 Vladimir Sementsov-Ogievskiy (5):
 runstate: add runstate_get()
 migration: never fail in global_state_store()
 runstate: drop unused runstate_store()
 migration: switch from .vm_was_running to .vm_old_state
 migration: restore vmstate on migration failure
>>>
>>> Appears to introduce multiple avocado failures:
>>>
>>> https://gitlab.com/qemu-project/qemu/-/jobs/4378066518#L286
>>>
>>> Test summary:
>>> tests/avocado/migration.py:X86_64.test_migration_with_exec: ERROR
>>> tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
>>> tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
>>> make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142:
>>> check-avocado] Error 1
>>> https://gitlab.com/qemu-project/qemu/-/jobs/4378066523#L387
>>>
>>> Test summary:
>>> tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
>>> tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
>>> make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: 
>>> check-avocado] Error 1
>>>
>>> Also fails QTEST_QEMU_BINARY=./qemu-system-aarch64 
>>> ./tests/qtest/migration-test
>>>
>>> ../src/migration/rdma.c:408:QIO_CHANNEL_RDMA: Object 0xf7bba680 is
>>> not an instance of type qio-channel-rdma
>> I am looking at the other errors, but this one is weird.  It is
>> failing
>> here:
>> #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
>> OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelRDMA, QIO_CHANNEL_RDMA)
>> In the OBJECT line.
>> I have no clue what problem are we having here with the object
>> system to
>> decide at declaration time that a variable is not of the type that we
>> are declaring.
>> I am missing something obvious here?
>
> This is where the inline function is declared, but you need to look at
> the backtrace, where you have applied QIO_CHANNEL_RDMA to an object
> that is *not* of that type.

Where is the stack trace?

Are you running aarch64 natively?  Here running qemu-system-aarch64 on
x86_64 works for me.  Neither avocado test nor migration-test fails with
my changes.

Cleber found the reason why I was having trouble running avocado
locally.  It appears that some change happened and there are several
tests that can't be run in parallel.  (Temporary) solution is to run it
as:

make -j1 check-avocado

Until they sort which tests are/aren't able to run in parallel.

Later, Juan.





[PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls

2023-05-31 Thread Zhenzhong Duan
Commit 63b88968f1 ("intel-iommu: rework the page walk logic") adds IOVA
tree to cache mapped ranges so we only need to send MAP or UNMAP when
there are changes. But there is still a corner case of unnecessary UNMAP.

During invalidation, either domain or device selective, we only need to
unmap when there are recorded mapped IOVA ranges, presuming most of OSes
allocating IOVA range continuously, e.g. on x86, linux sets up mapping
from 0x downwards.

Strace shows UNMAP ioctl taking 0.14us and we have 28 such ioctl()
in one invalidation, as two notifiers in x86 are split into power of 2
pieces.

ioctl(48, VFIO_IOMMU_UNMAP_DMA, 0x7d5c42f0) = 0 <0.14>

The other purpose of this patch is to eliminate noisy error log when we
work with IOMMUFD. It looks the duplicate UNMAP call will fail with IOMMUFD
while always succeed with legacy container. This behavior difference leads
to below error log for IOMMUFD:

IOMMU_IOAS_UNMAP failed: No such file or directory
vfio_container_dma_unmap(0x562012d6b6d0, 0x0, 0x8000) = -2 (No such file or 
directory)
IOMMU_IOAS_UNMAP failed: No such file or directory
vfio_container_dma_unmap(0x562012d6b6d0, 0x8000, 0x4000) = -2 (No such 
file or directory)
...

Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 061fcded0dfb..a5fd144aa246 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3743,6 +3743,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
 hwaddr start = n->start;
 hwaddr end = n->end;
 IntelIOMMUState *s = as->iommu_state;
+IOMMUTLBEvent event;
 DMAMap map;
 
 /*
@@ -3762,22 +3763,25 @@ static void vtd_address_space_unmap(VTDAddressSpace 
*as, IOMMUNotifier *n)
 assert(start <= end);
 size = remain = end - start + 1;
 
+event.type = IOMMU_NOTIFIER_UNMAP;
+event.entry.target_as = &address_space_memory;
+event.entry.perm = IOMMU_NONE;
+/* This field is meaningless for unmap */
+event.entry.translated_addr = 0;
+
 while (remain >= VTD_PAGE_SIZE) {
-IOMMUTLBEvent event;
 uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
 uint64_t size = mask + 1;
 
 assert(size);
 
-event.type = IOMMU_NOTIFIER_UNMAP;
-event.entry.iova = start;
-event.entry.addr_mask = mask;
-event.entry.target_as = &address_space_memory;
-event.entry.perm = IOMMU_NONE;
-/* This field is meaningless for unmap */
-event.entry.translated_addr = 0;
-
-memory_region_notify_iommu_one(n, &event);
+map.iova = start;
+map.size = mask;
+if (iova_tree_find(as->iova_tree, &map)) {
+event.entry.iova = start;
+event.entry.addr_mask = mask;
+memory_region_notify_iommu_one(n, &event);
+}
 
 start += size;
 remain -= size;
@@ -3791,7 +3795,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
  n->start, size);
 
 map.iova = n->start;
-map.size = size;
+map.size = size - 1; /* Inclusive */
 iova_tree_remove(as->iova_tree, map);
 }
 
-- 
2.34.1




[PATCH v2 0/4] Optimize UNMAP call and bug fix

2023-05-31 Thread Zhenzhong Duan
Hi All,

This is an extention to original patch [1] based on discuss in [2].

1. Fixed a potential VFIO migration issue Peter found, and optimize
   VFIO dirty page sync in intel_iommu incidentally.

2. Clarify the definition of replay() to match existent optimization
   in intel_iommu.

3. Optimize out some potential UNMAP calls

Tested net card passthrough, ping/ssh pass
Tested DSA vdev passthrough, start dmatest then do live migration, pass.
I checked the LM performance before and after patch, no explicit difference.
I think it may because dirty page sync isn't an important factor in LM
and it's small in number in my test, 4 or 5.

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg05549.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg06708.html

Thanks

Zhenzhong Duan (4):
  util: Add iova_tree_foreach_range_data
  intel_iommu: Fix a potential issue in VFIO dirty page sync
  memory: Document update on replay()
  intel_iommu: Optimize out some unnecessary UNMAP calls

 hw/i386/intel_iommu.c| 75 +---
 hw/vfio/common.c |  2 +-
 include/exec/memory.h| 19 --
 include/qemu/iova-tree.h | 17 +++--
 softmmu/memory.c |  4 +++
 util/iova-tree.c | 31 +
 6 files changed, 122 insertions(+), 26 deletions(-)

-- 
2.34.1




[PATCH v2 1/4] util: Add iova_tree_foreach_range_data

2023-05-31 Thread Zhenzhong Duan
This function is a variant of iova_tree_foreach and support tranversing
a range to trigger callback with a private data.

Signed-off-by: Zhenzhong Duan 
---
 include/qemu/iova-tree.h | 17 +++--
 util/iova-tree.c | 31 +++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8528e5c98fbc..8bbf049dc3f7 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -39,6 +39,7 @@ typedef struct DMAMap {
 IOMMUAccessFlags perm;
 } QEMU_PACKED DMAMap;
 typedef gboolean (*iova_tree_iterator)(DMAMap *map);
+typedef gboolean (*iova_tree_iterator_2)(DMAMap *map, gpointer *private);
 
 /**
  * iova_tree_new:
@@ -131,11 +132,23 @@ const DMAMap *iova_tree_find_address(const IOVATree 
*tree, hwaddr iova);
  * @iterator: the interator for the mappings, return true to stop
  *
  * Iterate over the iova tree.
- *
- * Return: 1 if found any overlap, 0 if not, <0 if error.
  */
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
 
+/**
+ * iova_tree_foreach_range_data:
+ *
+ * @tree: the iova tree to iterate on
+ * @range: the iova range to iterate in
+ * @func: the iterator for the mappings, return true to stop
+ * @private: parameter passed to @func
+ *
+ * Iterate over an iova range in iova tree.
+ */
+void iova_tree_foreach_range_data(IOVATree *tree, DMAMap *range,
+  iova_tree_iterator_2 func,
+  gpointer *private);
+
 /**
  * iova_tree_alloc_map:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 536789797e47..a3cbd5198410 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -42,6 +42,12 @@ typedef struct IOVATreeFindIOVAArgs {
 const DMAMap *result;
 } IOVATreeFindIOVAArgs;
 
+typedef struct IOVATreeIterator {
+DMAMap *range;
+iova_tree_iterator_2 func;
+gpointer *private;
+} IOVATreeIterator;
+
 /**
  * Iterate args to the next hole
  *
@@ -164,6 +170,31 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator 
iterator)
 g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
 }
 
+static gboolean iova_tree_traverse_range(gpointer key, gpointer value,
+ gpointer data)
+{
+DMAMap *map = key;
+IOVATreeIterator *iterator = data;
+DMAMap *range = iterator->range;
+
+g_assert(key == value);
+
+if (iova_tree_compare(map, range, NULL)) {
+return false;
+}
+
+return iterator->func(map, iterator->private);
+}
+
+void iova_tree_foreach_range_data(IOVATree *tree, DMAMap *range,
+  iova_tree_iterator_2 func,
+  gpointer *private)
+{
+IOVATreeIterator iterator = {range, func, private};
+
+g_tree_foreach(tree->tree, iova_tree_traverse_range, &iterator);
+}
+
 void iova_tree_remove(IOVATree *tree, DMAMap map)
 {
 const DMAMap *overlap;
-- 
2.34.1




[PATCH v2 3/4] memory: Document update on replay()

2023-05-31 Thread Zhenzhong Duan
Currently replay() callback is declared to be exactly same semantics
as memory_region_iommu_replay().

Customed replay() may provide some extent of optimization depending on
notifier's type. E.g. intel_iommu, IOMMU_NOTIFIER_MAP is optimized to
provide only changed entries.

Clarify the semantics of replay() and provide more flexibility.

Signed-off-by: Zhenzhong Duan 
---
 include/exec/memory.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index eecc3eec6702..9a523ef62a94 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -441,9 +441,9 @@ struct IOMMUMemoryRegionClass {
  * call the IOMMU translate method for every page in the address space
  * with flag == IOMMU_NONE and then call the notifier if translate
  * returns a valid mapping. If this method is implemented then it
- * overrides the default behaviour, and must provide the full semantics
- * of memory_region_iommu_replay(), by calling @notifier for every
- * translation present in the IOMMU.
+ * overrides the default behavior, and must provide corresponding
+ * semantics depending on notifier's type, e.g. IOMMU_NOTIFIER_MAP,
+ * notify changed entries; IOMMU_NOTIFIER_FULL_MAP, notify full entries.
  *
  * Optional method -- an IOMMU only needs to provide this method
  * if the default is inefficient or produces undesirable side effects.
-- 
2.34.1




[PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync

2023-05-31 Thread Zhenzhong Duan
Peter Xu found a potential issue:

"The other thing is when I am looking at the new code I found that we
actually extended the replay() to be used also in dirty tracking of vfio,
in vfio_sync_dirty_bitmap().  For that maybe it's already broken if
unmap_all() because afaiu log_sync() can be called in migration thread
anytime during DMA so I think it means the device is prone to DMA with the
IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
fail unexpectedly.  Copy Alex, Kirti and Neo."

To eliminate this small window with empty mapping, we should remove the
call to unmap_all(). Besides that, introduce a new notifier type called
IOMMU_NOTIFIER_FULL_MAP to get full mappings as intel_iommu only notifies
changed mappings while VFIO dirty page sync needs full mappings. Thanks
to current implementation of iova tree, we could pick mappings from iova
trees directly instead of walking through guest IOMMU page table.

IOMMU_NOTIFIER_MAP is still used to get changed mappings for optimization
purpose. As long as notification for IOMMU_NOTIFIER_MAP could ensure shadow
page table in sync, then it's OK.

Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu.c | 49 +++
 hw/vfio/common.c  |  2 +-
 include/exec/memory.h | 13 
 softmmu/memory.c  |  4 
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 94d52f4205d2..061fcded0dfb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3819,6 +3819,41 @@ static int vtd_replay_hook(IOMMUTLBEvent *event, void 
*private)
 return 0;
 }
 
+static gboolean vtd_replay_full_map(DMAMap *map, gpointer *private)
+{
+IOMMUTLBEvent event;
+
+event.type = IOMMU_NOTIFIER_MAP;
+event.entry.iova = map->iova;
+event.entry.addr_mask = map->size;
+event.entry.target_as = &address_space_memory;
+event.entry.perm = map->perm;
+event.entry.translated_addr = map->translated_addr;
+
+return vtd_replay_hook(&event, private);
+}
+
+/*
+ * This is a fast path to notify the full mappings falling in the scope
+ * of IOMMU notifier. The call site should ensure no iova tree update by
+ * taking necessary locks(e.x. BQL).
+ */
+static int vtd_page_walk_full_map_fast_path(IOVATree *iova_tree,
+IOMMUNotifier *n)
+{
+DMAMap map;
+
+map.iova = n->start;
+map.size = n->end - n->start;
+if (!iova_tree_find(iova_tree, &map)) {
+return 0;
+}
+
+iova_tree_foreach_range_data(iova_tree, &map, vtd_replay_full_map,
+ (gpointer *)n);
+return 0;
+}
+
 static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
 {
 VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
@@ -3826,13 +3861,6 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 uint8_t bus_n = pci_bus_num(vtd_as->bus);
 VTDContextEntry ce;
 
-/*
- * The replay can be triggered by either a invalidation or a newly
- * created entry. No matter what, we release existing mappings
- * (it means flushing caches for UNMAP-only registers).
- */
-vtd_address_space_unmap(vtd_as, n);
-
 if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
 trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
   "legacy mode",
@@ -3850,8 +3878,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 .as = vtd_as,
 .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
 };
-
-vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+if (n->notifier_flags & IOMMU_NOTIFIER_FULL_MAP) {
+vtd_page_walk_full_map_fast_path(vtd_as->iova_tree, n);
+} else {
+vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+}
 }
 } else {
 trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 78358ede2764..5dae4502b908 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1890,7 +1890,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer 
*container,
 
 iommu_notifier_init(&gdn.n,
 vfio_iommu_map_dirty_notify,
-IOMMU_NOTIFIER_MAP,
+IOMMU_NOTIFIER_FULL_MAP,
 section->offset_within_region,
 int128_get64(llend),
 idx);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c3661b2276c7..eecc3eec6702 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
  *   events (e.g. VFIO). Both notifications must be accurate so that
  

Re: [PATCH 2/4] target/riscv: Remove check on mode for MPRV

2023-05-31 Thread Weiwei Li



On 2023/6/1 13:27, Alistair Francis wrote:

On Mon, May 29, 2023 at 10:19 PM Weiwei Li  wrote:

Normally, MPRV can be set to 1 only in M mode (It will be cleared
when returning to lower-privilege mode by MRET/SRET).

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index bd892c05d4..45baf95c77 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -44,7 +44,7 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
  if (!ifetch) {
  uint64_t status = env->mstatus;

-if (mode == PRV_M && get_field(status, MSTATUS_MPRV)) {
+if (get_field(status, MSTATUS_MPRV)) {

The original check is correct though, why remove it?


Yeah. As described in the commit message, I think MPRV can only be set 
to 1 in M mode normally


which means check on MPRV is enough in this case. So I remove the check 
on mode here.


Regards,

Weiwei Li



Alistair


  mode = get_field(env->mstatus, MSTATUS_MPP);
  virt = get_field(env->mstatus, MSTATUS_MPV) &&
 (mode != PRV_M);
--
2.25.1







[PATCH v2 0/1] hw/arm/sbsa-ref: add XHCI controller on PCIe

2023-05-31 Thread Yuquan Wang
Please review the change.
 - sbsa-ref: Add an XHCI on PCIe bus to provide an availiable
 usb controller.

Yuquan Wang (1):
  hw/arm/sbsa-ref: add XHCI controller on PCIe

 hw/arm/sbsa-ref.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.34.1




[PATCH v2 1/1] hw/arm/sbsa-ref: add XHCI controller on PCIe

2023-05-31 Thread Yuquan Wang
The current sbsa-ref cannot use EHCI controller which is only
able to do 32-bit DMA, since sbsa-ref doesn't have RAM below 4GB.
Hence, this add an XHCI on PCIe to provide a usb controller with 64-bit
DMA capablity.

Signed-off-by: Yuquan Wang 
---
 hw/arm/sbsa-ref.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index de21200ff9..a33cd80d69 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -668,6 +668,8 @@ static void create_pcie(SBSAMachineState *sms)
 
 pci_create_simple(pci->bus, -1, "bochs-display");
 
+pci_create_simple(pci->bus, -1, "qemu-xhci");
+
 create_smmu(sms, pci->bus);
 }
 
-- 
2.34.1




Re: [PATCH 7/7] hw: Simplify using sysbus_init_irqs() [manual]

2023-05-31 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> Audit the sysbus_init_irq() calls and manually convert
> to sysbus_init_irqs() when a loop is involved.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/intc/loongarch_extioi.c | 3 +--
>  hw/intc/omap_intc.c| 3 +--
>  hw/pci-host/gpex.c | 2 +-
>  hw/timer/renesas_tmr.c | 9 +++--
>  4 files changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/hw/intc/loongarch_extioi.c b/hw/intc/loongarch_extioi.c
> index db941de20e..c579636215 100644
> --- a/hw/intc/loongarch_extioi.c
> +++ b/hw/intc/loongarch_extioi.c
> @@ -275,8 +275,7 @@ static void loongarch_extioi_instance_init(Object *obj)
>  LoongArchExtIOI *s = LOONGARCH_EXTIOI(obj);
>  int cpu, pin;
>  
> -sysbus_init_irqs(SYS_BUS_DEVICE(dev), s->irq, EXTIOI_IRQS);
> -
> +sysbus_init_irqs(dev, s->irq, EXTIOI_IRQS);

Commit message claims "when a loop is involved".  No loop here.  That
work was actually done in the previous patch:

  diff --git a/hw/intc/loongarch_extioi.c b/hw/intc/loongarch_extioi.c
  index 0e7a3e32f3..db941de20e 100644
  --- a/hw/intc/loongarch_extioi.c
  +++ b/hw/intc/loongarch_extioi.c
  @@ -273,11 +273,9 @@ static void loongarch_extioi_instance_init(Object *obj)
   {
   SysBusDevice *dev = SYS_BUS_DEVICE(obj);
   LoongArchExtIOI *s = LOONGARCH_EXTIOI(obj);
  -int i, cpu, pin;
  +int cpu, pin;

  -for (i = 0; i < EXTIOI_IRQS; i++) {
  -sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq[i]);
  -}
  +sysbus_init_irqs(SYS_BUS_DEVICE(dev), s->irq, EXTIOI_IRQS);

   qdev_init_gpio_in(DEVICE(obj), extioi_setirq, EXTIOI_IRQS);

In this patch, you merely delete a superfluous type conversion that is
present even before your series.

There are more of them in this function.  Please delete them all, and in
a separate patch.

Actually, there are more elsewhere.  Coccinelle script

@@
typedef SysBusDevice;
SysBusDevice *dev;
@@
-SYS_BUS_DEVICE(dev)
+dev

finds some in hw/arm/xlnx-versal.c and hw/rx/rx62n.c, too.

Would be nice to do this for every QOM type, but I don't know how
without duplicating the semantic patch for each of them.  There are
almost 150 uses os OBJECT_DECLARE_TYPE()...

You might want to address this in a separate series, to not delay this
one.

>  qdev_init_gpio_in(DEVICE(obj), extioi_setirq, EXTIOI_IRQS);
>  
>  for (cpu = 0; cpu < EXTIOI_CPUS; cpu++) {
> diff --git a/hw/intc/omap_intc.c b/hw/intc/omap_intc.c
> index 647bf324a8..f324b640e3 100644
> --- a/hw/intc/omap_intc.c
> +++ b/hw/intc/omap_intc.c
> @@ -627,8 +627,7 @@ static void omap2_intc_init(Object *obj)
>  
>  s->level_only = 1;
>  s->nbanks = 3;
> -sysbus_init_irq(sbd, &s->parent_intr[0]);
> -sysbus_init_irq(sbd, &s->parent_intr[1]);
> +sysbus_init_irqs(sbd, s->parent_intr, ARRAY_SIZE(s->parent_intr));

Unrolled loop.  s->parent_intr[] indeed has 2 elements.  Okay.

>  qdev_init_gpio_in(dev, omap_set_intr_noedge, s->nbanks * 32);
>  memory_region_init_io(&s->mmio, obj, &omap2_inth_mem_ops, s,
>"omap2-intc", 0x1000);

[...]




Re: [PATCH v14 03/10] accel: collecting TB execution count

2023-05-31 Thread Wu, Fei
On 6/1/2023 8:05 AM, Richard Henderson wrote:
> On 5/30/23 01:35, Fei Wu wrote:
>> From: "Vanderson M. do Rosario" 
>>
>> If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
>> enabled, then we instrument the start code of this TB
>> to atomically count the number of times it is executed.
>> We count both the number of "normal" executions and atomic
>> executions of a TB.
>>
>> The execution count of the TB is stored in its respective
>> TBS.
>>
>> All TBStatistics are created by default with the flags from
>> default_tbstats_flag.
>>
>> [Richard Henderson created the inline gen_tb_exec_count]
>>
>> Signed-off-by: Vanderson M. do Rosario 
>> Message-Id: <20190829173437.5926-3-vanderson...@gmail.com>
>> [AJB: Fix author]
>> Signed-off-by: Alex Bennée 
>> Signed-off-by: Fei Wu 
>> ---
>>   accel/tcg/cpu-exec.c  |  6 ++
>>   accel/tcg/tb-stats.c  |  6 ++
>>   accel/tcg/tcg-runtime.c   |  1 +
>>   accel/tcg/translate-all.c |  7 +--
>>   accel/tcg/translator.c    | 25 +
>>   include/exec/gen-icount.h |  1 +
>>   include/exec/tb-stats-flags.h |  5 +
>>   include/exec/tb-stats.h   | 13 +
>>   8 files changed, 62 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index 0e741960da..c0d8f26237 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -25,6 +25,7 @@
>>   #include "trace.h"
>>   #include "disas/disas.h"
>>   #include "exec/exec-all.h"
>> +#include "exec/tb-stats.h"
>>   #include "tcg/tcg.h"
>>   #include "qemu/atomic.h"
>>   #include "qemu/rcu.h"
>> @@ -562,7 +563,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
>>   mmap_unlock();
>>   }
>>   +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
>> +    tb->tb_stats->executions.atomic++;
>> +    }
>> +
>>   cpu_exec_enter(cpu);
>> +
>>   /* execute the generated code */
>>   trace_exec_tb(tb, pc);
>>   cpu_tb_exec(cpu, tb, &tb_exit);
>> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> index f988bd8a31..143a52ef5c 100644
>> --- a/accel/tcg/tb-stats.c
>> +++ b/accel/tcg/tb-stats.c
>> @@ -22,6 +22,7 @@ enum TBStatsStatus {
>>   };
>>     static enum TBStatsStatus tcg_collect_tb_stats;
>> +static uint32_t default_tbstats_flag;
>>     void init_tb_stats_htable(void)
>>   {
>> @@ -56,3 +57,8 @@ bool tb_stats_collection_paused(void)
>>   {
>>   return tcg_collect_tb_stats == TB_STATS_PAUSED;
>>   }
>> +
>> +uint32_t get_default_tbstats_flag(void)
>> +{
>> +    return default_tbstats_flag;
>> +}
> 
> What is the purpose of this function, instead of a global variable?
> What is the meaning of 'default' in its name?
> 
tbs have their specific settings, e.g. after 'filter' cmd:
* the last_search tbs has their stats_enabled kept
* tbs not in the list sets their flag to TB_PAUSED

I guess 'default' means the flag for creating new tbstats.

>> @@ -295,6 +295,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t
>> phys_pc, target_ulong pc,
>>   new_stats->pc = pc;
>>   new_stats->cs_base = cs_base;
>>   new_stats->flags = flags;
>> +    new_stats->stats_enabled = get_default_tbstats_flag();
> 
> Is this merely to record how we have generated a given TB?
> What is the purpose of this flag over the global variable?
> 
see above.

>> diff --git a/include/exec/tb-stats-flags.h
>> b/include/exec/tb-stats-flags.h
>> index 87ee3d902e..fa71eb6f0c 100644
>> --- a/include/exec/tb-stats-flags.h
>> +++ b/include/exec/tb-stats-flags.h
>> @@ -11,6 +11,9 @@
>>   #ifndef TB_STATS_FLAGS
>>   #define TB_STATS_FLAGS
>>   +#define TB_NOTHING    (1 << 0)
>> +#define TB_EXEC_STATS (1 << 1)
> 
> Why is NOTHING a non-zero flag?
> 
yes, it might looks better. But there is no correctness issue either as
it checks if the specific bit is enabled during collecting stats.

>> --- a/include/exec/tb-stats.h
>> +++ b/include/exec/tb-stats.h
>> @@ -31,6 +31,9 @@
>>   #include "exec/tb-stats-flags.h"
>>   #include "tcg/tcg.h"
>>   +#define tb_stats_enabled(tb, JIT_STATS) \
>> +    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> 
> Inline function, though again, why stats_enabled vs global variable?
> 
will inline it.

Thanks,
Fei.

> 
> r~




Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Wu, Fei
On 6/1/2023 12:16 PM, Richard Henderson wrote:
> On 5/31/23 20:19, Wu, Fei wrote:
>> On 6/1/2023 8:01 AM, Richard Henderson wrote:
>>> On 5/30/23 01:35, Fei Wu wrote:
 +/* TBStatistic collection controls */
 +enum TBStatsStatus {
 +    TB_STATS_DISABLED = 0,
 +    TB_STATS_RUNNING,
 +    TB_STATS_PAUSED,
 +    TB_STATS_STOPPED
 +};
>>>
>>> I don't see what PAUSED or STOPPED actually do.
>>> As far as I can see, stats are either being collected or not: a boolean.
>>>
>> If STOPPED, clean_tbstats() gets called, all the tbstats history is
>> destroyed, but it's not for PAUSED.
> 
> But it doesn't PAUSE everything either, since (1) code is built into
> each tb, and (2) each tb->tb_stats->stats_enabled neither changed. 
> Indeed, tb_stats_collection_enabled() is only checked in a fraction of
> the places stats are collected.
> 
tbs are always flushed at the end of hmp tb-stats cmd by tb_flush(),
stats_enabled can be changed e.g. during pause/filter cmd. It should be
a bug if stats are collected w/o tb_stats_collection_enabled().

Thanks,
Fei.
> 
> r~
> 




Re: [PATCH v7 2/3] riscv/virt: Support using pflash via -blockdev option

2023-05-31 Thread Alistair Francis
On Thu, Jun 1, 2023 at 3:01 PM Sunil V L  wrote:
>
> Currently, pflash devices can be configured only via -pflash
> or -drive options. This is the legacy way and the
> better way is to use -blockdev as in other architectures.
> libvirt also has moved to use -blockdev method.
>
> To support -blockdev option, pflash devices need to be
> created in instance_init itself. So, update the code to
> move the virt_flash_create() to instance_init. Also, use
> standard interfaces to detect whether pflash0 is
> configured or not.
>
> Signed-off-by: Sunil V L 
> Reported-by: Andrea Bolognani 
> Tested-by: Andrea Bolognani 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/virt.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 3e5dc649c3..76c7a3ba3b 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -1246,6 +1246,7 @@ static void virt_machine_done(Notifier *notifier, void 
> *data)
>  const char *firmware_name = riscv_default_firmware_name(&s->soc[0]);
>  uint32_t fdt_load_addr;
>  uint64_t kernel_entry = 0;
> +BlockBackend *pflash_blk0;
>
>  /*
>   * Only direct boot kernel is currently supported for KVM VM,
> @@ -1266,7 +1267,8 @@ static void virt_machine_done(Notifier *notifier, void 
> *data)
>  firmware_end_addr = riscv_find_and_load_firmware(machine, firmware_name,
>   start_addr, NULL);
>
> -if (drive_get(IF_PFLASH, 0, 0)) {
> +pflash_blk0 = pflash_cfi01_get_blk(s->flash[0]);
> +if (pflash_blk0) {
>  if (machine->firmware && !strcmp(machine->firmware, "none") &&
>  !kvm_enabled()) {
>  /*
> @@ -1499,8 +1501,6 @@ static void virt_machine_init(MachineState *machine)
>  sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
>  qdev_get_gpio_in(DEVICE(mmio_irqchip), RTC_IRQ));
>
> -virt_flash_create(s);
> -
>  for (i = 0; i < ARRAY_SIZE(s->flash); i++) {
>  /* Map legacy -drive if=pflash to machine properties */
>  pflash_cfi01_legacy_drive(s->flash[i],
> @@ -1527,6 +1527,8 @@ static void virt_machine_instance_init(Object *obj)
>  {
>  RISCVVirtState *s = RISCV_VIRT_MACHINE(obj);
>
> +virt_flash_create(s);
> +
>  s->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
>  s->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>  s->acpi = ON_OFF_AUTO_AUTO;
> --
> 2.34.1
>
>



Re: [PATCH v7 3/3] docs/system: riscv: Add pflash usage details

2023-05-31 Thread Alistair Francis
On Thu, Jun 1, 2023 at 3:01 PM Sunil V L  wrote:
>
> pflash devices can be used in virt machine for different
> purposes like for ROM code or S-mode FW payload. Add a
> section in the documentation on how to use pflash devices
> for different purposes.
>
> Signed-off-by: Sunil V L 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  docs/system/riscv/virt.rst | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
> index 4b16e41d7f..b33f45e5b3 100644
> --- a/docs/system/riscv/virt.rst
> +++ b/docs/system/riscv/virt.rst
> @@ -53,6 +53,37 @@ with the default OpenSBI firmware image as the -bios. It 
> also supports
>  the recommended RISC-V bootflow: U-Boot SPL (M-mode) loads OpenSBI fw_dynamic
>  firmware and U-Boot proper (S-mode), using the standard -bios functionality.
>
> +Using flash devices
> +---
> +
> +By default, the first flash device (pflash0) is expected to contain
> +S-mode firmware code. It can be configured as read-only, with the
> +second flash device (pflash1) available to store configuration data.
> +
> +For example, booting edk2 looks like
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-riscv64 \
> + -blockdev 
> node-name=pflash0,driver=file,read-only=on,filename= \
> + -blockdev node-name=pflash1,driver=file,filename= \
> + -M virt,pflash0=pflash0,pflash1=pflash1 \
> + ... other args 
> +
> +For TCG guests only, it is also possible to boot M-mode firmware from
> +the first flash device (pflash0) by additionally passing ``-bios
> +none``, as in
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-riscv64 \
> + -bios none \
> + -blockdev 
> node-name=pflash0,driver=file,read-only=on,filename= \
> + -M virt,pflash0=pflash0 \
> + ... other args 
> +
> +Firmware images used for pflash must be exactly 32 MiB in size.
> +
>  Machine-specific options
>  
>
> --
> 2.34.1
>
>



Re: [PATCH 4/4] target/riscv: Remove redundant assignment to SXL

2023-05-31 Thread Alistair Francis
On Mon, May 29, 2023 at 10:18 PM Weiwei Li  wrote:
>
> SXL is initialized as env->misa_mxl which is also the mxl value.
> So we can just remain it unchanged to keep it read-only.
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 6ac11d1f11..25345f3153 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1321,10 +1321,6 @@ static RISCVException write_mstatus(CPURISCVState 
> *env, int csrno,
>
>  mstatus = (mstatus & ~mask) | (val & mask);
>
> -if (xl > MXL_RV32) {
> -/* SXL field is for now read only */
> -mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
> -}
>  env->mstatus = mstatus;
>
>  /*
> --
> 2.25.1
>
>



Re: [PATCH 3/4] target/riscv: Support MSTATUS.MPV/GVA only when RVH is enabled

2023-05-31 Thread Alistair Francis
On Mon, May 29, 2023 at 10:18 PM Weiwei Li  wrote:
>
> MPV and GVA bits are added by hypervisor extension to mstatus
> and mstatush (if MXLEN=32).
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 58499b5afc..6ac11d1f11 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1311,11 +1311,9 @@ static RISCVException write_mstatus(CPURISCVState 
> *env, int csrno,
>  }
>
>  if (xl != MXL_RV32 || env->debugger) {
> -/*
> - * RV32: MPV and GVA are not in mstatus. The current plan is to
> - * add them to mstatush. For now, we just don't support it.
> - */
> -mask |= MSTATUS_MPV | MSTATUS_GVA;
> +if (riscv_has_ext(env, RVH)) {
> +mask |= MSTATUS_MPV | MSTATUS_GVA;
> +}
>  if ((val & MSTATUS64_UXL) != 0) {
>  mask |= MSTATUS64_UXL;
>  }
> @@ -1351,7 +1349,7 @@ static RISCVException write_mstatush(CPURISCVState 
> *env, int csrno,
>   target_ulong val)
>  {
>  uint64_t valh = (uint64_t)val << 32;
> -uint64_t mask = MSTATUS_MPV | MSTATUS_GVA;
> +uint64_t mask = riscv_has_ext(env, RVH) ? MSTATUS_MPV | MSTATUS_GVA : 0;
>
>  env->mstatus = (env->mstatus & ~mask) | (valh & mask);
>
> --
> 2.25.1
>
>



Re: [PATCH 2/4] target/riscv: Remove check on mode for MPRV

2023-05-31 Thread Alistair Francis
On Mon, May 29, 2023 at 10:19 PM Weiwei Li  wrote:
>
> Normally, MPRV can be set to 1 only in M mode (It will be cleared
> when returning to lower-privilege mode by MRET/SRET).
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> ---
>  target/riscv/cpu_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index bd892c05d4..45baf95c77 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -44,7 +44,7 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
>  if (!ifetch) {
>  uint64_t status = env->mstatus;
>
> -if (mode == PRV_M && get_field(status, MSTATUS_MPRV)) {
> +if (get_field(status, MSTATUS_MPRV)) {

The original check is correct though, why remove it?

Alistair

>  mode = get_field(env->mstatus, MSTATUS_MPP);
>  virt = get_field(env->mstatus, MSTATUS_MPV) &&
> (mode != PRV_M);
> --
> 2.25.1
>
>



Re: [PATCH 1/4] target/riscv: Make MPV only work when MPP != PRV_M

2023-05-31 Thread Alistair Francis
On Mon, May 29, 2023 at 10:19 PM Weiwei Li  wrote:
>
> Upon MRET or explicit memory access with MPRV=1, MPV should be ignored
> when MPP=PRV_M.
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_helper.c | 3 ++-
>  target/riscv/op_helper.c  | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 09ea227ceb..bd892c05d4 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -46,7 +46,8 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
>
>  if (mode == PRV_M && get_field(status, MSTATUS_MPRV)) {
>  mode = get_field(env->mstatus, MSTATUS_MPP);
> -virt = get_field(env->mstatus, MSTATUS_MPV);
> +virt = get_field(env->mstatus, MSTATUS_MPV) &&
> +   (mode != PRV_M);
>  if (virt) {
>  status = env->vsstatus;
>  }
> diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> index f563dc3981..9cdb9cdd06 100644
> --- a/target/riscv/op_helper.c
> +++ b/target/riscv/op_helper.c
> @@ -335,7 +335,8 @@ target_ulong helper_mret(CPURISCVState *env)
>  riscv_raise_exception(env, RISCV_EXCP_INST_ACCESS_FAULT, GETPC());
>  }
>
> -target_ulong prev_virt = get_field(env->mstatus, MSTATUS_MPV);
> +target_ulong prev_virt = get_field(env->mstatus, MSTATUS_MPV) &&
> + (prev_priv != PRV_M);
>  mstatus = set_field(mstatus, MSTATUS_MIE,
>  get_field(mstatus, MSTATUS_MPIE));
>  mstatus = set_field(mstatus, MSTATUS_MPIE, 1);
> --
> 2.25.1
>
>



Re: [PATCH v3 0/7] target/riscv: Add support for PC-relative translation

2023-05-31 Thread Alistair Francis
On Fri, May 26, 2023 at 5:24 PM Weiwei Li  wrote:
>
> This patchset tries to add support for PC-relative translation.
>
> The existence of CF_PCREL can improve performance with the guest
> kernel's address space randomization.  Each guest process maps libc.so
> (et al) at a different virtual address, and this allows those
> translations to be shared.
>
> And support of PC-relative translation is the precondition to support
> pointer mask for instruction.
>
> The port is available here:
> https://github.com/plctlab/plct-qemu/tree/plct-pcrel-upstream-v3
>
> v3:
>  * rebase on riscv-to-apply.next
>
> v2:
>  * rebase on upstream and add pc-relative translation for Zc* instructions
>
> Weiwei Li (7):
>   target/riscv: Fix target address to update badaddr
>   target/riscv: Introduce cur_insn_len into DisasContext
>   target/riscv: Change gen_goto_tb to work on displacements
>   target/riscv: Change gen_set_pc_imm to gen_update_pc
>   target/riscv: Use true diff for gen_pc_plus_diff
>   target/riscv: Enable PC-relative translation
>   target/riscv: Remove pc_succ_insn from DisasContext

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/cpu.c| 31 --
>  .../riscv/insn_trans/trans_privileged.c.inc   |  2 +-
>  target/riscv/insn_trans/trans_rvi.c.inc   | 43 ++---
>  target/riscv/insn_trans/trans_rvv.c.inc   |  4 +-
>  target/riscv/insn_trans/trans_rvzawrs.c.inc   |  2 +-
>  target/riscv/insn_trans/trans_rvzce.c.inc | 10 +-
>  target/riscv/insn_trans/trans_xthead.c.inc|  2 +-
>  target/riscv/translate.c  | 94 ---
>  8 files changed, 123 insertions(+), 65 deletions(-)
>
> --
> 2.25.1
>
>



[PATCH v7 1/3] hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"

2023-05-31 Thread Sunil V L
Currently, virt machine supports two pflash instances each with
32MB size. However, the first pflash is always assumed to
contain M-mode firmware and reset vector is set to this if
enabled. Hence, for S-mode payloads like EDK2, only one pflash
instance is available for use. This means both code and NV variables
of EDK2 will need to use the same pflash.

The OS distros keep the EDK2 FW code as readonly. When non-volatile
variables also need to share the same pflash, it is not possible
to keep it as readonly since variables need write access.

To resolve this issue, the code and NV variables need to be separated.
But in that case we need an extra flash. Hence, modify the convention
for non-KVM guests such that, pflash0 will contain the M-mode FW
only when "-bios none" option is used. Otherwise, pflash0 will contain
the S-mode payload FW. This enables both pflash instances available
for EDK2 use.

When KVM is enabled, pflash0 is always assumed to contain the
S-mode payload firmware only.

Example usage:
1) pflash0 containing M-mode FW
qemu-system-riscv64 -bios none -pflash  -machine virt
or
qemu-system-riscv64 -bios none \
-drive file=,if=pflash,format=raw,unit=0 -machine virt

2) pflash0 containing S-mode payload like EDK2
qemu-system-riscv64 -pflash  -pflash  -machine  virt
or
qemu-system-riscv64 -bios  \
-pflash  \
-pflash  \
-machine  virt
or
qemu-system-riscv64 -bios  \
-drive file=,if=pflash,format=raw,unit=0,readonly=on \
-drive file=,if=pflash,format=raw,unit=1 \
-machine virt

Signed-off-by: Sunil V L 
Reported-by: Heinrich Schuchardt 
Tested-by: Andrea Bolognani 
Reviewed-by: Alistair Francis 
---
 hw/riscv/virt.c | 53 -
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4e3efbee16..3e5dc649c3 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1245,7 +1245,7 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 target_ulong firmware_end_addr, kernel_start_addr;
 const char *firmware_name = riscv_default_firmware_name(&s->soc[0]);
 uint32_t fdt_load_addr;
-uint64_t kernel_entry;
+uint64_t kernel_entry = 0;
 
 /*
  * Only direct boot kernel is currently supported for KVM VM,
@@ -1266,42 +1266,31 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 firmware_end_addr = riscv_find_and_load_firmware(machine, firmware_name,
  start_addr, NULL);
 
-if (drive_get(IF_PFLASH, 0, 1)) {
-/*
- * S-mode FW like EDK2 will be kept in second plash (unit 1).
- * When both kernel, initrd and pflash options are provided in the
- * command line, the kernel and initrd will be copied to the fw_cfg
- * table and opensbi will jump to the flash address which is the
- * entry point of S-mode FW. It is the job of the S-mode FW to load
- * the kernel and initrd using fw_cfg table.
- *
- * If only pflash is given but not -kernel, then it is the job of
- * of the S-mode firmware to locate and load the kernel.
- * In either case, the next_addr for opensbi will be the flash address.
- */
-riscv_setup_firmware_boot(machine);
-kernel_entry = virt_memmap[VIRT_FLASH].base +
-   virt_memmap[VIRT_FLASH].size / 2;
-} else if (machine->kernel_filename) {
+if (drive_get(IF_PFLASH, 0, 0)) {
+if (machine->firmware && !strcmp(machine->firmware, "none") &&
+!kvm_enabled()) {
+/*
+ * Pflash was supplied but bios is none and not KVM guest,
+ * let's overwrite the address we jump to after reset to
+ * the base of the flash.
+ */
+start_addr = virt_memmap[VIRT_FLASH].base;
+} else {
+/*
+ * Pflash was supplied but either KVM guest or bios is not none.
+ * In this case, base of the flash would contain S-mode payload.
+ */
+riscv_setup_firmware_boot(machine);
+kernel_entry = virt_memmap[VIRT_FLASH].base;
+}
+}
+
+if (machine->kernel_filename && !kernel_entry) {
 kernel_start_addr = riscv_calc_kernel_start_addr(&s->soc[0],
  firmware_end_addr);
 
 kernel_entry = riscv_load_kernel(machine, &s->soc[0],
  kernel_start_addr, true, NULL);
-} else {
-   /*
-* If dynamic firmware is used, it doesn't know where is the next mode
-* if kernel argument is not set.
-*/
-kernel_entry = 0;
-}
-
-if (drive_get(IF_PFLASH, 0, 0)) {
-/*
- * Pflash was supplied, let's overwrite the address we jump to after
- * reset to the base of the flash.
- */
-start_addr = virt_memmap[VIRT_FLASH].base;
 }
 
 fdt_load_addr = riscv_compute_fdt

[PATCH v7 3/3] docs/system: riscv: Add pflash usage details

2023-05-31 Thread Sunil V L
pflash devices can be used in virt machine for different
purposes like for ROM code or S-mode FW payload. Add a
section in the documentation on how to use pflash devices
for different purposes.

Signed-off-by: Sunil V L 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/system/riscv/virt.rst | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
index 4b16e41d7f..b33f45e5b3 100644
--- a/docs/system/riscv/virt.rst
+++ b/docs/system/riscv/virt.rst
@@ -53,6 +53,37 @@ with the default OpenSBI firmware image as the -bios. It 
also supports
 the recommended RISC-V bootflow: U-Boot SPL (M-mode) loads OpenSBI fw_dynamic
 firmware and U-Boot proper (S-mode), using the standard -bios functionality.
 
+Using flash devices
+---
+
+By default, the first flash device (pflash0) is expected to contain
+S-mode firmware code. It can be configured as read-only, with the
+second flash device (pflash1) available to store configuration data.
+
+For example, booting edk2 looks like
+
+.. code-block:: bash
+
+  $ qemu-system-riscv64 \
+ -blockdev node-name=pflash0,driver=file,read-only=on,filename= 
\
+ -blockdev node-name=pflash1,driver=file,filename= \
+ -M virt,pflash0=pflash0,pflash1=pflash1 \
+ ... other args 
+
+For TCG guests only, it is also possible to boot M-mode firmware from
+the first flash device (pflash0) by additionally passing ``-bios
+none``, as in
+
+.. code-block:: bash
+
+  $ qemu-system-riscv64 \
+ -bios none \
+ -blockdev 
node-name=pflash0,driver=file,read-only=on,filename= \
+ -M virt,pflash0=pflash0 \
+ ... other args 
+
+Firmware images used for pflash must be exactly 32 MiB in size.
+
 Machine-specific options
 
 
-- 
2.34.1




[PATCH v7 0/3] hw/riscv/virt: pflash improvements

2023-05-31 Thread Sunil V L
This series improves the pflash usage in RISC-V virt machine with solutions to
below issues.

1) Currently the first pflash is reserved for ROM/M-mode firmware code. But 
S-mode
payload firmware like EDK2 need both pflash devices to have separate code and 
variable
store so that OS distros can keep the FW code as read-only. 

The issue is reported at
https://salsa.debian.org/qemu-team/edk2/-/commit/c345655a0149f64c5020bfc1e53c619ce60587f6

2) The latest way of using pflash devices in other architectures and libvirt
is by using -blockdev and machine options. However, currently this method is
not working in RISC-V.

With above issues fixed, added documentation on how to use pflash devices
in RISC-V virt machine.

This patch series is based on Alistair's riscv-to-apply.next branch.

Changes since v6:
1) Updated the documentation patch as per text provided by Andrea.

Changes since v5:
1) Added KVM use case as per feedback from Anup. Updated the 
documentation
   patch that only S-mode payload is supported for KVM guests. Tested 
with
   KVM enabled.
2) Updated tags.

Changes since v4:
1) Updated patch 2 to avoid accessing private field as per feedback 
from Philippe.
2) Updated documentation patch to add read-only for ROM usage.
3) Rebased to latest riscv-to-apply.next branch and updated tags.

Changes since v3:
1) Converted single patch to a series with a cover letter since there 
are
   multiple patches now.
2) Added a new patch to enable pflash usage via -blockdev option.
3) Separated the documentation change into new patch and updated the
   documentation to mention only -blockdev option which seems to be the
   recommended way of using pflash.

Changes since v2:
1) Reverted v2 changes and used v1 approach so that pflash0 can be used
   for code and pflash1 for variable store.
2) Rebased to latest riscv-to-apply.next branch.
3) Added documentation for pflash usage.

Changes since v1:
1) Simplified the fix such that it doesn't break current EDK2.

Sunil V L (3):
  hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"
  riscv/virt: Support using pflash via -blockdev option
  docs/system: riscv: Add pflash usage details

 docs/system/riscv/virt.rst | 31 
 hw/riscv/virt.c| 59 --
 2 files changed, 56 insertions(+), 34 deletions(-)

-- 
2.34.1




[PATCH v7 2/3] riscv/virt: Support using pflash via -blockdev option

2023-05-31 Thread Sunil V L
Currently, pflash devices can be configured only via -pflash
or -drive options. This is the legacy way and the
better way is to use -blockdev as in other architectures.
libvirt also has moved to use -blockdev method.

To support -blockdev option, pflash devices need to be
created in instance_init itself. So, update the code to
move the virt_flash_create() to instance_init. Also, use
standard interfaces to detect whether pflash0 is
configured or not.

Signed-off-by: Sunil V L 
Reported-by: Andrea Bolognani 
Tested-by: Andrea Bolognani 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/riscv/virt.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 3e5dc649c3..76c7a3ba3b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1246,6 +1246,7 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 const char *firmware_name = riscv_default_firmware_name(&s->soc[0]);
 uint32_t fdt_load_addr;
 uint64_t kernel_entry = 0;
+BlockBackend *pflash_blk0;
 
 /*
  * Only direct boot kernel is currently supported for KVM VM,
@@ -1266,7 +1267,8 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 firmware_end_addr = riscv_find_and_load_firmware(machine, firmware_name,
  start_addr, NULL);
 
-if (drive_get(IF_PFLASH, 0, 0)) {
+pflash_blk0 = pflash_cfi01_get_blk(s->flash[0]);
+if (pflash_blk0) {
 if (machine->firmware && !strcmp(machine->firmware, "none") &&
 !kvm_enabled()) {
 /*
@@ -1499,8 +1501,6 @@ static void virt_machine_init(MachineState *machine)
 sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
 qdev_get_gpio_in(DEVICE(mmio_irqchip), RTC_IRQ));
 
-virt_flash_create(s);
-
 for (i = 0; i < ARRAY_SIZE(s->flash); i++) {
 /* Map legacy -drive if=pflash to machine properties */
 pflash_cfi01_legacy_drive(s->flash[i],
@@ -1527,6 +1527,8 @@ static void virt_machine_instance_init(Object *obj)
 {
 RISCVVirtState *s = RISCV_VIRT_MACHINE(obj);
 
+virt_flash_create(s);
+
 s->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
 s->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 s->acpi = ON_OFF_AUTO_AUTO;
-- 
2.34.1




Re: Performance improvement and regression with 6d740fb01b9f0f5ea7a82f4d5e458d91940a19ee

2023-05-31 Thread Lukáš Doktor
Dne 31. 05. 23 v 19:14 Stefan Hajnoczi napsal(a):
> On Wed, 31 May 2023 at 12:50, Lukáš Doktor  wrote:
>>
>> Dne 26. 05. 23 v 12:56 Stefan Hajnoczi napsal(a):
>>> On Fri, 26 May 2023 at 04:07, Lukáš Doktor  wrote:

 Dne 25. 05. 23 v 17:21 Stefan Hajnoczi napsal(a):
> On Thu, 25 May 2023 at 06:18, Lukáš Doktor  wrote:
>> the perf-ci detected and bisected the 6d740fb - aio-posix: do not nest 
>> poll handlers - as a performance improvement when using multiple 
>> concurrent jobs and 4k (22%) as well as 1024k (63%) blocks on aarch64 
>> (on a slow rotational disk).
>>
>> 
>> https://ldoktor.github.io/tmp/RedHat-virtlab-arm09/v8.0.0/150-improvement.html
>>
>> Based on the commit message I guess it's expected so take this just as a 
>> record of an improvement.
>
> The commit was not intended to change performance and I'm not sure why
> it happens!
>

 It had and today the x86_64 pipeline finished which shows similar 
 improvement just not in read but rather in write instead and only for 4k 
 blocks (~40%). For 1024k blocks I can see it scoring a bit better (~1.5%). 
 Reads are too jittery to really tell anything on that machine. Anyway I 
 have not done any thorough testing, just a bisection with the most 
 significant setting.

 From around the same time I can see a NVMe regression in 4k writes, but 
 first bisection job showed nothing. I'll increase the range and try again 
 as each job since that day shows similar drop.
>>>
>>
>> Hello Stefan, folks,
>>
>> the regression proved to be there and stably reproducible. With NVMe 4k 
>> writes with jobs=10 and iodepth=4 I can see a 50% regression on my machine:
>>
>> 
>> https://ldoktor.github.io/tmp/RedHat-virtlab722/v8.0.0/150-regression.html
>>
>> The rest of the cases doesn't show any change at all. I can provide more 
>> data if someone is interested.
> 
> Which commit caused the regression?

Hello Stefan,

the same one that caused the improvement on rotational disks: 
6d740fb01b9f0f5ea7a82f4d5e458d91940a19ee

Lukáš

> 
> Stefan
> 

OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Richard Henderson

On 5/31/23 20:19, Wu, Fei wrote:

On 6/1/2023 8:01 AM, Richard Henderson wrote:

On 5/30/23 01:35, Fei Wu wrote:

+/* TBStatistic collection controls */
+enum TBStatsStatus {
+    TB_STATS_DISABLED = 0,
+    TB_STATS_RUNNING,
+    TB_STATS_PAUSED,
+    TB_STATS_STOPPED
+};


I don't see what PAUSED or STOPPED actually do.
As far as I can see, stats are either being collected or not: a boolean.


If STOPPED, clean_tbstats() gets called, all the tbstats history is
destroyed, but it's not for PAUSED.


But it doesn't PAUSE everything either, since (1) code is built into each tb, and (2) each 
tb->tb_stats->stats_enabled neither changed.  Indeed, tb_stats_collection_enabled() is 
only checked in a fraction of the places stats are collected.



r~




Re: [PATCH] decodetree: Add --output-null for meson testing

2023-05-31 Thread Richard Henderson

On 5/31/23 18:51, Thomas Huth wrote:

On 01/06/2023 01.25, Richard Henderson wrote:

Using "-o /dev/null" fails on Windows.  Rather that working
around this in meson, add a separate command-line option so
that we can use python's os.devnull.

Reported-by: Thomas Huth 
Fixes: 65dc7d1b ("tests/decode: Convert tests to meson")
Signed-off-by: Richard Henderson 
---
  scripts/decodetree.py    | 18 ++
  tests/decode/meson.build |  4 ++--
  2 files changed, 12 insertions(+), 10 deletions(-)


Reviewed-by: Thomas Huth 




Thanks.  Applied, with typo in the first line above fixed, to master in preparation for 
tomorrow's refresh of ci minutes.


r~



Re: [PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv

2023-05-31 Thread Richard Henderson

On 5/31/23 10:08, Richard Henderson wrote:

On 5/31/23 09:47, Ard Biesheuvel wrote:

On Wed, 31 May 2023 at 18:33, Richard Henderson

Thanks.  I spent some time yesterday looking at this, with an encrypted disk 
test case and
could only measure 0.6% and 0.5% for total overhead of decrypt and encrypt 
respectively.



I don't understand what 'overhead' means in this context. Are you
saying you saw barely any improvement?


I saw, without changes, just over 1% of total system emulation time was devoted to aes, 
which gives an upper limit to the runtime improvement possible there.  But I'll have a 
look at tcrypt.


Using

# insmod /lib/modules/5.10.0-21-arm64/kernel/crypto/tcrypt.ko mode=600 sec=10

I see

  25.50%  qemu-system-aar  qemu-system-aarch64  [.] helper_crypto_aese
  25.36%  qemu-system-aar  qemu-system-aarch64  [.] helper_crypto_aesmc
   6.66%  qemu-system-aar  qemu-system-aarch64  [.] rebuild_hflags_a64
   3.25%  qemu-system-aar  qemu-system-aarch64  [.] tb_lookup
   2.52%  qemu-system-aar  qemu-system-aarch64  [.] fp_exception_el
   2.35%  qemu-system-aar  qemu-system-aarch64  [.] helper_lookup_tb_ptr

Obviously a crypto-heavy test, but 51% of runtime is certainly worth more work.


r~



Re: [PATCH v2 1/1] target/i386: add support for LAM in CPUID enumeration

2023-05-31 Thread Binbin Wu



On 5/31/2023 11:45 AM, Xiaoyao Li wrote:

On 5/31/2023 9:32 AM, Binbin Wu wrote:

From: Robert Hoo 

Linear Address Masking (LAM) is a new Intel CPU feature, which allows 
software

to use of the untranslated address bits for metadata.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[26]

Add CPUID definition for LAM.

More info can be found in Intel ISE Chapter "LINEAR ADDRESS MASKING 
(LAM)"

https://cdrdv2.intel.com/v1/dl/getContent/671368


LAM defines new bits in CR3 and CR4. I think it needs corresponding 
support in QEMU as well.


In QEMU, there are several callers call cpu_x86_update_{cr3,cr4}().

* target/i386/tcg
  If there is no objection, LAM feature will not be supported for TCG 
of target-i386.

  LAM CPUID bit will not be added to TCG_7_1_EAX_FEATURES.
  helper_write_crN() and helper_vmrun() check CR4 reserved bit before 
calling cpu_x86_update_cr4(),

  i.e. CR4 LAM bit is not allowed to be set in TCG.
  helper_write_crN() and helper_vmrun() check max physcial address bits 
before calling cpu_x86_update_cr3(),

  no change needed, i.e. CR3 LAM bits are not allowed to be set in TCG.
  About CR4 reserved bits, although QEMU code only uses 
cr4_reserved_bits() in target/i386/tcg,

  still want to do the following changes:
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index eb800ba2e2..3946fe5393 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -261,6 +261,7 @@ typedef enum X86Seg {
 #define CR4_SMAP_MASK   (1U << 21)
 #define CR4_PKE_MASK   (1U << 22)
 #define CR4_PKS_MASK   (1U << 24)
+#define CR4_LAM_SUP_MASK (1U << 28)

 #define CR4_RESERVED_MASK \
 (~(target_ulong)(CR4_VME_MASK | CR4_PVI_MASK | CR4_TSD_MASK \
@@ -269,7 +270,8 @@ typedef enum X86Seg {
 | CR4_OSFXSR_MASK | CR4_OSXMMEXCPT_MASK | CR4_UMIP_MASK \
 | CR4_LA57_MASK \
 | CR4_FSGSBASE_MASK | CR4_PCIDE_MASK | CR4_OSXSAVE_MASK \
-    | CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | 
CR4_PKS_MASK))
+    | CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | 
CR4_PKS_MASK \

+    | CR4_LAM_SUP_MASK))

 #define DR6_BD  (1 << 13)
 #define DR6_BS  (1 << 14)
@@ -2469,6 +2471,9 @@ static inline uint64_t 
cr4_reserved_bits(CPUX86State *env)

 if (!(env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS)) {
 reserved_bits |= CR4_PKS_MASK;
 }
+    if (!(env->features[FEAT_7_1_EAX] & CPUID_7_1_EAX_LAM)) {
+    reserved_bits |= CR4_LAM_SUP_MASK;
+    }
 return reserved_bits;
 }


* target/i386/gdbstub
  x86_cpu_gdb_write_register() will call cpu_x86_update_cr3() to update 
cr3/cr4.
  Allow gdb to set the LAM bit(s) to CR3/CR4, if vcpu doesn't support 
LAM, set sregs will fail in KVM.



* target/i386/cpu
  x86_cpu_reset_hold() will call cpu_x86_update_cr4() to reset cr4, it 
should be OK.






Signed-off-by: Robert Hoo 
Co-developed-by: Binbin Wu 
Signed-off-by: Binbin Wu 
---
  target/i386/cpu.c | 2 +-
  target/i386/cpu.h | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1242bd541a..f4436b3657 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -881,7 +881,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
  "fsrc", NULL, NULL, NULL,
  NULL, NULL, NULL, NULL,
  NULL, "amx-fp16", NULL, "avx-ifma",
-    NULL, NULL, NULL, NULL,
+    NULL, NULL, "lam", NULL,
  NULL, NULL, NULL, NULL,
  },
  .cpuid = {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7201a71de8..eb800ba2e2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -924,6 +924,8 @@ uint64_t 
x86_cpu_get_supported_feature_word(FeatureWord w,

  #define CPUID_7_1_EAX_AMX_FP16  (1U << 21)
  /* Support for VPMADD52[H,L]UQ */
  #define CPUID_7_1_EAX_AVX_IFMA  (1U << 23)
+/* Linear Address Masking */
+#define CPUID_7_1_EAX_LAM   (1U << 26)
    /* Support for VPDPB[SU,UU,SS]D[,S] */
  #define CPUID_7_1_EDX_AVX_VNNI_INT8 (1U << 4)







[PATCH v2 1/2] net: Provide MemReentrancyGuard * to qemu_new_nic()

2023-05-31 Thread Akihiko Odaki
Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when delivering a packet to a device.

In preparation for such a change, add MemReentrancyGuard * as a
parameter of qemu_new_nic().

Signed-off-by: Akihiko Odaki 
---
 include/net/net.h | 1 +
 hw/net/allwinner-sun8i-emac.c | 3 ++-
 hw/net/allwinner_emac.c   | 3 ++-
 hw/net/cadence_gem.c  | 3 ++-
 hw/net/dp8393x.c  | 3 ++-
 hw/net/e1000.c| 3 ++-
 hw/net/e1000e.c   | 2 +-
 hw/net/eepro100.c | 4 +++-
 hw/net/etraxfs_eth.c  | 3 ++-
 hw/net/fsl_etsec/etsec.c  | 3 ++-
 hw/net/ftgmac100.c| 3 ++-
 hw/net/i82596.c   | 2 +-
 hw/net/igb.c  | 2 +-
 hw/net/imx_fec.c  | 2 +-
 hw/net/lan9118.c  | 3 ++-
 hw/net/mcf_fec.c  | 3 ++-
 hw/net/mipsnet.c  | 3 ++-
 hw/net/msf2-emac.c| 3 ++-
 hw/net/mv88w8618_eth.c| 3 ++-
 hw/net/ne2000-isa.c   | 3 ++-
 hw/net/ne2000-pci.c   | 3 ++-
 hw/net/npcm7xx_emc.c  | 3 ++-
 hw/net/opencores_eth.c| 3 ++-
 hw/net/pcnet.c| 3 ++-
 hw/net/rocker/rocker_fp.c | 4 ++--
 hw/net/rtl8139.c  | 3 ++-
 hw/net/smc91c111.c| 3 ++-
 hw/net/spapr_llan.c   | 3 ++-
 hw/net/stellaris_enet.c   | 3 ++-
 hw/net/sungem.c   | 2 +-
 hw/net/sunhme.c   | 3 ++-
 hw/net/tulip.c| 3 ++-
 hw/net/virtio-net.c   | 6 --
 hw/net/vmxnet3.c  | 2 +-
 hw/net/xen_nic.c  | 4 ++--
 hw/net/xgmac.c| 3 ++-
 hw/net/xilinx_axienet.c   | 3 ++-
 hw/net/xilinx_ethlite.c   | 3 ++-
 hw/usb/dev-network.c  | 3 ++-
 net/net.c | 1 +
 40 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 1448d00afb..a7d8deaccb 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -157,6 +157,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
NICConf *conf,
const char *model,
const char *name,
+   MemReentrancyGuard *reentrancy_guard,
void *opaque);
 void qemu_del_nic(NICState *nic);
 NetClientState *qemu_get_subqueue(NICState *nic, int queue_index);
diff --git a/hw/net/allwinner-sun8i-emac.c b/hw/net/allwinner-sun8i-emac.c
index fac4405f45..cc350d40e5 100644
--- a/hw/net/allwinner-sun8i-emac.c
+++ b/hw/net/allwinner-sun8i-emac.c
@@ -824,7 +824,8 @@ static void allwinner_sun8i_emac_realize(DeviceState *dev, 
Error **errp)
 
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
 s->nic = qemu_new_nic(&net_allwinner_sun8i_emac_info, &s->conf,
-   object_get_typename(OBJECT(dev)), dev->id, s);
+  object_get_typename(OBJECT(dev)), dev->id,
+  &dev->mem_reentrancy_guard, s);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/allwinner_emac.c b/hw/net/allwinner_emac.c
index 372e5b66da..e10965de14 100644
--- a/hw/net/allwinner_emac.c
+++ b/hw/net/allwinner_emac.c
@@ -453,7 +453,8 @@ static void aw_emac_realize(DeviceState *dev, Error **errp)
 
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
 s->nic = qemu_new_nic(&net_aw_emac_info, &s->conf,
-  object_get_typename(OBJECT(dev)), dev->id, s);
+  object_get_typename(OBJECT(dev)), dev->id,
+  &dev->mem_reentrancy_guard, s);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
 fifo8_create(&s->rx_fifo, RX_FIFO_SIZE);
diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index 42ea2411a2..a7bce1c120 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -1633,7 +1633,8 @@ static void gem_realize(DeviceState *dev, Error **errp)
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
 s->nic = qemu_new_nic(&net_gem_info, &s->conf,
-  object_get_typename(OBJECT(dev)), dev->id, s);
+  object_get_typename(OBJECT(dev)), dev->id,
+  &dev->mem_reentrancy_guard, s);
 
 if (s->jumbo_max_len > MAX_FRAME_SIZE) {
 error_setg(errp, "jumbo-max-len is greater than %d",
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 45b954e46c..abfcc6f69f 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -943,7 +943,8 @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
   "dp8393x-regs", SONIC_REG_COUNT << s->it_shift);
 
 s->nic = qemu_new_nic(&net_dp83932_info, &s->conf,
-  object_get_typename(OBJECT(dev)), dev->id, s);
+  object_get_typename(OBJECT(dev)), dev->id,
+  &dev->mem_reentrancy_guard,

[PATCH v2 2/2] net: Update MemReentrancyGuard for NIC

2023-05-31 Thread Akihiko Odaki
Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when delivering a packet to a device.

This implementation follows what bottom half does, but it does not add
a tracepoint for the case that the network device backend started
delivering a packet to a device which is already engaging in I/O. This
is because such reentrancy frequently happens for
qemu_flush_queued_packets() and is insignificant.

Fixes: CVE-2023-3019
Reported-by: Alexander Bulekov 
Signed-off-by: Akihiko Odaki 
---
 include/net/net.h |  1 +
 net/net.c | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index a7d8deaccb..685ec58318 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -124,6 +124,7 @@ typedef QTAILQ_HEAD(NetClientStateList, NetClientState) 
NetClientStateList;
 typedef struct NICState {
 NetClientState *ncs;
 NICConf *conf;
+MemReentrancyGuard *reentrancy_guard;
 void *opaque;
 bool peer_deleted;
 } NICState;
diff --git a/net/net.c b/net/net.c
index 982df2479f..3523cceafc 100644
--- a/net/net.c
+++ b/net/net.c
@@ -332,6 +332,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 nic = g_malloc0(info->size + sizeof(NetClientState) * queues);
 nic->ncs = (void *)nic + info->size;
 nic->conf = conf;
+nic->reentrancy_guard = reentrancy_guard,
 nic->opaque = opaque;
 
 for (i = 0; i < queues; i++) {
@@ -805,6 +806,7 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
int iovcnt,
void *opaque)
 {
+MemReentrancyGuard *owned_reentrancy_guard;
 NetClientState *nc = opaque;
 int ret;
 
@@ -817,12 +819,24 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 return 0;
 }
 
+if (nc->info->type != NET_CLIENT_DRIVER_NIC ||
+qemu_get_nic(nc)->reentrancy_guard->engaged_in_io) {
+owned_reentrancy_guard = NULL;
+} else {
+owned_reentrancy_guard = qemu_get_nic(nc)->reentrancy_guard;
+owned_reentrancy_guard->engaged_in_io = true;
+}
+
 if (nc->info->receive_iov && !(flags & QEMU_NET_PACKET_FLAG_RAW)) {
 ret = nc->info->receive_iov(nc, iov, iovcnt);
 } else {
 ret = nc_sendv_compat(nc, iov, iovcnt, flags);
 }
 
+if (owned_reentrancy_guard) {
+owned_reentrancy_guard->engaged_in_io = false;
+}
+
 if (ret == 0) {
 nc->receive_disabled = 1;
 }
-- 
2.40.1




Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Wu, Fei
On 6/1/2023 8:01 AM, Richard Henderson wrote:
> On 5/30/23 01:35, Fei Wu wrote:
>> +/* TBStatistic collection controls */
>> +enum TBStatsStatus {
>> +    TB_STATS_DISABLED = 0,
>> +    TB_STATS_RUNNING,
>> +    TB_STATS_PAUSED,
>> +    TB_STATS_STOPPED
>> +};
> 
> I don't see what PAUSED or STOPPED actually do.
> As far as I can see, stats are either being collected or not: a boolean.
> 
If STOPPED, clean_tbstats() gets called, all the tbstats history is
destroyed, but it's not for PAUSED.

Thanks,
Fei.

> 
> r~




[PATCH v2 0/2] net: Update MemReentrancyGuard for NIC

2023-05-31 Thread Akihiko Odaki
Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when delivering a packet to a device.

This implementation follows what bottom half does, but it does not add
a tracepoint for the case that the network device backend started
delivering a packet to a device which is already engaging in I/O. This
is because such reentrancy frequently happens for
qemu_flush_queued_packets() and is insignificant.

This series consists of two patches. The first patch makes a bulk change to
add a new parameter to qemu_new_nic() and does not contain behavioral changes.
The second patch actually implements MemReentrancyGuard update.

V1 -> V2: Added the 'Fixes: CVE-2023-3019' tag

Akihiko Odaki (2):
  net: Provide MemReentrancyGuard * to qemu_new_nic()
  net: Update MemReentrancyGuard for NIC

 include/net/net.h |  2 ++
 hw/net/allwinner-sun8i-emac.c |  3 ++-
 hw/net/allwinner_emac.c   |  3 ++-
 hw/net/cadence_gem.c  |  3 ++-
 hw/net/dp8393x.c  |  3 ++-
 hw/net/e1000.c|  3 ++-
 hw/net/e1000e.c   |  2 +-
 hw/net/eepro100.c |  4 +++-
 hw/net/etraxfs_eth.c  |  3 ++-
 hw/net/fsl_etsec/etsec.c  |  3 ++-
 hw/net/ftgmac100.c|  3 ++-
 hw/net/i82596.c   |  2 +-
 hw/net/igb.c  |  2 +-
 hw/net/imx_fec.c  |  2 +-
 hw/net/lan9118.c  |  3 ++-
 hw/net/mcf_fec.c  |  3 ++-
 hw/net/mipsnet.c  |  3 ++-
 hw/net/msf2-emac.c|  3 ++-
 hw/net/mv88w8618_eth.c|  3 ++-
 hw/net/ne2000-isa.c   |  3 ++-
 hw/net/ne2000-pci.c   |  3 ++-
 hw/net/npcm7xx_emc.c  |  3 ++-
 hw/net/opencores_eth.c|  3 ++-
 hw/net/pcnet.c|  3 ++-
 hw/net/rocker/rocker_fp.c |  4 ++--
 hw/net/rtl8139.c  |  3 ++-
 hw/net/smc91c111.c|  3 ++-
 hw/net/spapr_llan.c   |  3 ++-
 hw/net/stellaris_enet.c   |  3 ++-
 hw/net/sungem.c   |  2 +-
 hw/net/sunhme.c   |  3 ++-
 hw/net/tulip.c|  3 ++-
 hw/net/virtio-net.c   |  6 --
 hw/net/vmxnet3.c  |  2 +-
 hw/net/xen_nic.c  |  4 ++--
 hw/net/xgmac.c|  3 ++-
 hw/net/xilinx_axienet.c   |  3 ++-
 hw/net/xilinx_ethlite.c   |  3 ++-
 hw/usb/dev-network.c  |  3 ++-
 net/net.c | 15 +++
 40 files changed, 90 insertions(+), 41 deletions(-)

-- 
2.40.1




Re: [PATCH 7/7] hw: Simplify using sysbus_init_irqs() [manual]

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

Audit the sysbus_init_irq() calls and manually convert
to sysbus_init_irqs() when a loop is involved.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/intc/loongarch_extioi.c | 3 +--
  hw/intc/omap_intc.c| 3 +--
  hw/pci-host/gpex.c | 2 +-
  hw/timer/renesas_tmr.c | 9 +++--
  4 files changed, 6 insertions(+), 11 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 6/7] hw: Simplify using sysbus_init_irqs() [automatic]

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

Change created mechanically using the following coccinelle
semantic patch:

 @@
 expression array;
 identifier i;
 expression sbd, count;
 @@

 -for (i = 0; i < count; i++) {
 -sysbus_init_irq(sbd, &array[i]);
 -}
 +sysbus_init_irqs(sbd, array, count);

Signed-off-by: Philippe Mathieu-Daudé
---


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 5/7] hw/sysbus: Make SYSBUS_DEVICE_GPIO_IRQ API internal

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

Since we don't have any use of the SYSBUS_DEVICE_GPIO_IRQ definition
outside of sysbus.c, we can reduce its scope, making it internal to
the API.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/sysbus.h | 2 --
  hw/core/sysbus.c| 2 ++
  2 files changed, 2 insertions(+), 2 deletions(-)


Yay!

Reviewed-by: Richard Henderson 


r~



Re: [PATCH 4/7] hw/usb/hcd-xhci: Use sysbus_init_irqs()

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Use the recently introduced sysbus_init_irqs()
method to avoid using this internal definition.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/usb/hcd-xhci-sysbus.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 3/7] hw/sysbus: Introduce sysbus_init_irqs()

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

+void sysbus_init_irqs(SysBusDevice *dev, qemu_irq *p, unsigned count);


unsigned count does not match qdev_init_gpio_out_named int n.

Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH 2/7] hw/usb/xlnx: Do not open-code sysbus_pass_irq()

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Here we simply open-coded sysbus_pass_irq().
Replace to use the proper API.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/usb/xlnx-usb-subsystem.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 1/7] hw/arm/xlnx-versal: Do not open-code sysbus_connect_irq()

2023-05-31 Thread Richard Henderson

On 5/31/23 15:33, Philippe Mathieu-Daudé wrote:

The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Here we simply open-coded sysbus_connect_irq().
Replace to use the proper API.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/xlnx-versal.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH] hw/smbios: fix thead count field in type 4 table

2023-05-31 Thread bibo, mao




在 2023/6/1 10:26, Zhao Liu 写道:

On Thu, Jun 01, 2023 at 09:09:13AM +0800, bibo, mao wrote:

Date: Thu, 1 Jun 2023 09:09:13 +0800
From: "bibo, mao" 
Subject: Re: [PATCH] hw/smbios: fix thead count field in type 4 table



在 2023/5/31 16:42, Zhao Liu 写道:

On Tue, May 30, 2023 at 08:20:34PM +0800, Tianrui Zhao wrote:

Date: Tue, 30 May 2023 20:20:34 +0800
From: Tianrui Zhao 
Subject: [PATCH] hw/smbios: fix thead count field in type 4 table
X-Mailer: git-send-email 2.39.1

The thread_count value in smbios type_4 table should be
(ms->smp.cores * ms->smp.threads). As according to smbios spec 7.5
Processor Information (Type 4), the field "Thread Count" means the
"Number of threads per processor socket" rather than number of
threads per core.

When apply this patch, use "-smp 4,sockets=1,cores=2,threads=2" to
boot VM, the dmidecode -t 4 shows like:

Handle 0x0400, DMI type 4, 48 bytes
Processor Information
  Socket Designation: CPU 0
  ...
  Core Count: 2
  Core Enabled: 2
  Thread Count: 4
  Characteristics: None

Signed-off-by: Tianrui Zhao 
---
   hw/smbios/smbios.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index d2007e70fb..56aeaa069d 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -713,6 +713,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
   {
   char sock_str[128];
   size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
+int count;
   if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {
   tbl_len = SMBIOS_TYPE_4_LEN_V30;
@@ -749,15 +750,15 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
   t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;
   t->core_enabled = t->core_count;
-
-t->thread_count = (ms->smp.threads > 255) ? 0xFF : ms->smp.threads;
+count = ms->smp.cores * ms->smp.threads;


Hi Ani & Tianrui,

  From the comment of CpuTopology (include/hw/boards.h):

ms->cores means the "the number of cores in one cluster".
ms->threads means the "the number of threads in one core".

So ms->cores * ms->threads means the number of threads in one cluster
not one socket.

That's why I count the number of threads in a socket by "ms->smp.max_cpus
/ ms->smp.sockets" in [1].

The other correct way is:
ms->smp.cluster * ms->smp.cores * ms->smp.threads.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07229.html

ohh, we do not notice your patch. Your patch is better than us. one small
nit about core_count calcuation if cluster/die is support. core count should
be core number
  per package rather than per cluster or per die.

so it should be something like this?
 t->core_count = cpus_per_socket / ms->smp.threads
rather than ms->smp.cores


Yes, I also fixed this in the third patch of my patch series [2].

[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07231.html


Great, that sounds good to me.

Regards
Bibo, Mao



Regards,
Zhao




Regards
Bibo, mao


Thanks,
Zhao


+t->thread_count = (count > 255) ? 0xFF : count;
   t->processor_characteristics = cpu_to_le16(0x02); /* Unknown */
   t->processor_family2 = cpu_to_le16(0x01); /* Other */
   if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
   t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
-t->thread_count2 = cpu_to_le16(ms->smp.threads);
+t->thread_count2 = cpu_to_le16(count);
   }
   SMBIOS_BUILD_TABLE_POST;
--
2.39.1










Re: [PATCH] tap: introduce IFF_NAPI

2023-05-31 Thread Jason Wang
On Wed, May 31, 2023 at 9:46 PM Jon Kohler  wrote:
>
>
>
> > On May 31, 2023, at 1:27 AM, Jason Wang  wrote:
> >
> > On Wed, May 31, 2023 at 11:55 AM Jason Wang  wrote:
> >>
> >> On Wed, May 31, 2023 at 11:47 AM Jon Kohler  wrote:
> >>>
> >>>
> >>>
>  On May 30, 2023, at 11:35 PM, Jason Wang  wrote:
> 
>  On Wed, May 31, 2023 at 11:32 AM Jason Wang  wrote:
> >
> > On Wed, May 31, 2023 at 11:17 AM Jon Kohler  wrote:
> >>
> >> If kernel supports IFF_NAPI, lets use it, which is especially useful
> >> on kernels containing fb3f903769e8 ("tun: support NAPI for packets
> >> received from batched XDP buffs"), as IFF_NAPI allows the
> >> vhost_tx_batch path to use NAPI on XDP buffs.
> >>
> >> Benchmark w/ iperf -c (remote srv) -P (thread count) -l (stream size)
> >> from a guest running kernel 5.10.105 to remote bare metal running
> >> patched code on kernel 5.10.139. Guests were configured 1x virtio-net
> >> device with 4x queues, resulting in 4x vhost-worker threads. Hosts are
> >> identical with Intel ICX 4314 @ 2.4 GHz with Mellanox CX5 25GbE NIC ->
> >> Arista 25GbE switch. vhost-worker threads largely maxed out on CPU on
> >> "Before" and around ~50-60% utilization "After".
> >>
> >> Single Stream: iperf -P 1
> >> iperf -l size | Before  | After  | Increase
> >> 64B   | 593 Mbits/sec   | 712 Mbits/sec  | ~20%
> >> 128B  | 1.00 Gbits/sec  | 1.18 Gbits/sec | ~18%
> >> 4KB   | 17.6 Gbits/sec  | 22.7 Gbits/sec | ~29%
> >>
> >> Multiple Stream: iperf -P 12
> >> iperf -l size | Before  | After  | Increase
> >> 64B   | 6.35 Gbits/sec  | 7.78 Gbits/sec | ~23%
> >> 128B  | 10.8 Gbits/sec  | 14.2 Gbits/sec | ~31%
> >> 4KB   | 23.6 Gbits/sec  | 23.6 Gbits/sec | (line speed)
> >>
> >> Signed-off-by: Jon Kohler 
> >
> > Great, but I would suggest having an option.
> >
> > So we can:
> >
> > 1) ease the debug and compare
> > 2) enable this by default only for 8.1, disable it for pre 8.1
> >>>
> >>> Fair enough, one favor to ask though -
> >>> Would you be able to point me to an existing option like what you’re
> >>> proposing so I could make sure I’m on the same page?
> >>
> >> For example, the vhost option for tap. Maybe we can have an napi option.
>
> OK thanks, I’ll see what I can do there.
>
> >>
> >>>
> 
>  More thought, if the performance boost only after fb3f903769e8, we
>  probably need to disable it by default and let the mgmt layer to
>  enable it.
> 
> >>>
> >>> I focused my testing with that commit, but I could take it out and
> >>> we still should get benefit. Would you like me to profile that to 
> >>> validate?
> >>>
> >>
> >> One problem is that NAPI for TAP was originally used for kernel
> >> hardening. Looking at the history, it introduces a lot of bugs.
> >>
> >> Consider:
> >>
> >> 1) it has been merged for several years
> >> 2) tap has been widely used for a long time as well
> >>
> >> I think it would be still safe to keep the option off (at least for
> >> pre 8.1 machines).
> >>
> >>> Asking as NAPI support in tun.c has been there for a while, guessing
> >>> at first glance that there would be non-zero gains, with little downsides.
> >>> Looking at git blame, seems about ~5-6 years of support.
> >>
> >> Yes.
> >>
> >>>
> >>> Also for posterity, that commit has been in since 5.18, so a little over 
> >>> 1 year.
> >>
> >> Then I think we can make it enabled by default for 8.1 and see.
>
> OK, I’ll see what I can come up with.
>
> >
> > Btw, it would be better if we can have some PPS benchmark as well.
> >
> > Thanks
>
> Is there a set of specific benchmark(s) that you want to see? Certain packet
> sizes? TCP/UDP? Certain tool (netperf, iperf, etc)?

It could be like this:

1) Netperf TCP_STREAM with various packet size 64 to maximum
2) Pktgen test from guest to host

> The existing benchmarks
> in the commit msg have both single and multi stream and multiple payload
> sizes, all of which are a corollary to PPS, no?

The problem is that it's the PPS of the TCP, various layers in the
middle. For example, if TCP coalesce usersapce write to larger
packets, we will end up with a low PPS.

Using pktgen may help to know if TAP can deal with more packets per second.

Thanks

>
> Happy to do more profiling, just want to make sure I get you exactly what you
> want.
>
> >
> >>
> >> Thanks
> >>
> >>>
>  Thanks
> 
> >
> > Thanks
> >
> > Thanks
> >
> >> ---
> >> net/tap-linux.c | 4 
> >> net/tap-linux.h | 1 +
> >> 2 files changed, 5 insertions(+)
> >>
> >> diff --git a/net/tap-linux.c b/net/tap-linux.c
> >> index f54f308d359..fd94df166e0 100644
> >> --- a/net/tap-linux.c
> >> +++ b/net/tap-linux.c
> >> @@ -62,6 +62,10 @@ int tap_open(char *ifname, int ifname_size, int 
> >> *vnet_hdr,
> >

Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Richard Henderson

On 5/31/23 18:30, Wu, Fei wrote:

On 6/1/2023 7:59 AM, Richard Henderson wrote:

On 5/30/23 01:35, Fei Wu wrote:

+    /*
+ * We want to fetch the stats structure before we start code
+ * generation so we can count interesting things about this
+ * generation.
+ */
+    if (tb_stats_collection_enabled()) {
+    tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);


If cflags & CF_PCREL, then 'pc' should not be cached or matched.
Using 'phys_pc' will suffice, so pass 0 in that case.


OK, tb_get_stats(phys_pc, (cflags & CF_PCREL ? 0 : pc), cs_base, flags);

btw, is it possible to drop 'pc' even w/o CF_PCREL setting in cflags?
Two TBs with different 'pc' but same 'phys_pc', are they the same?


For the purpose of statistics, yes, plausibly.

Knowing how many different translations of the same bit of libc.so, for a guest that does 
not support CF_PCREL, could be revealing.  In any case, you can get back to the virtual 
addresses via tb_stats->tbs[i]->pc, so long as tb_stats->tb[i].cflags & CF_PCREL is not set.



r~



Re: [PATCH v14 08/10] Adding info [tb-list|tb] commands to HMP (WIP)

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

+static void do_dump_tbs_info(int total, int sort_by)
+{
+id = 1;
+GList *i;
+int count = total;
+
+g_list_free(last_search);
+last_search = NULL;
+
+qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
+&sort_by);
+


Why are you sorting on a list and not an array?
Intuitively, sorting a list of 1 million elements seems like the wrong choice.

Why did you put all the comparisons in one inverse_sort_tbs function, and require 
examining sort_by?  Better would be N sorting functions where sort_by is evaluated once 
before starting the sort.




+++ b/disas/disas.c
@@ -8,6 +8,8 @@
 #include "hw/core/cpu.h"
 #include "exec/memory.h"
 
+#include "qemu/log-for-trace.h"

+
 /* Filled in by elfload.c.  Simplistic, but will do for now. */
 struct syminfo *syminfos = NULL;
 
@@ -199,6 +201,24 @@ static void initialize_debug_host(CPUDebug *s)

 #endif
 }
 
+static int

+__attribute__((format(printf, 2, 3)))
+fprintf_log(FILE *a, const char *b, ...)
+{
+va_list ap;
+va_start(ap, b);
+
+if (!to_string) {
+vfprintf(a, b, ap);
+} else {
+qemu_vlog(b, ap);
+}
+
+va_end(ap);
+
+return 1;
+}
+


Not need on this either.  Global variable being checked on each callback, instead of 
selecting the proper callback earlier -- preferably without the global variable.


Did you really need something different than monitor_disas?  You almost certainly want to 
read physical memory and not virtual anyway.



+void qemu_log_to_monitor(bool enable)
+{
+to_monitor = enable;
+}
+
+void qemu_log_to_string(bool enable, GString *s)
+{
+to_string = enable;
+string = s;
+}


What are these for, and why do you need them?
Why would to_string ever be anything other than (string != NULL)?
Why are you using such very generic names for global variables?


r~



Re: [PATCH 1/1] hw/arm/sbsa-ref: use XHCI to replace EHCI

2023-05-31 Thread Chen Baozi
Hi Leif,

> On Jun 1, 2023, at 00:36, Leif Lindholm  wrote:
> 
> On 2023-05-31 16:27, Peter Maydell wrote:
>> On Wed, 31 May 2023 at 15:58, Graeme Gregory  wrote:
 The current sbsa-ref cannot use EHCI controller which is only
 able to do 32-bit DMA, since sbsa-ref doesn't have RAM above 4GB.
 Hence, this uses XHCI to provide a usb controller with 64-bit
 DMA capablity instead of EHCI.
>>> 
>>> Should this be below 4G?
>> It would be pretty disruptive to try to rearrange the memory
>> map to put RAM below 4GB at this point, though in theory it's
>> possible I guess. (I have a vague recollection that there was
>> some reason the RAM was all put above 4GB, but can't find
>> anything about that in my email archives. Perhaps Leif remembers?)
> 
> I think Graeme was just pointing out a typo in Marcin's email.
> 
> Yeah, we're not changing the DRAM base at this stage.
> I think the reason we put no RAM below 4GB was simply that we had several 
> real-world platforms where that was true, and given the intended use-case for 
> the platform, we explicitly wanted to trigger issues those platforms might 
> encounter.
> 
>>> Also has EHCI never worked, or has it worked in some modes and so this
>>> change should be versioned?
>> AIUI, EHCI has never worked and can never have worked, because
>> this board's RAM is all above 4G and the QEMU EHCI controller
>> implementation only allows DMA descriptors with 32-bit addresses.
>> Looking back at the archives, it seems we discussed XHCI vs
>> EHCI when the sbsa-ref board went in, and the conclusion was
>> that XHCI would be better. But there wasn't a sysbus XHCI device
>> at that point, so we ended up committing the sbsa-ref board
>> with EHCI and a plan to switch to XHCI when the sysbus-xhci
>> device was done, which we then forgot about:
>> https://mail.gnu.org/archive/html/qemu-arm/2018-11/msg00638.html
> 
> Ah, thanks! That explains why we did the thing that made no sense :)
> 
> To skip the migration hazard, my prefernece is we just leave the EHCI device 
> in for now, and add a separate XHCI on PCIe. We can drop the
> EHCI device at some point in the future.

What about introducing another SMMU for all the platform devices on sbsa-ref? I 
was thinking over this solution for some time. If that can be feasible, we then 
also have a prototype of IOMMU for platform device.

Regards,

Baozi.



Re: [PATCH] hw/smbios: fix thead count field in type 4 table

2023-05-31 Thread Zhao Liu
On Thu, Jun 01, 2023 at 09:09:13AM +0800, bibo, mao wrote:
> Date: Thu, 1 Jun 2023 09:09:13 +0800
> From: "bibo, mao" 
> Subject: Re: [PATCH] hw/smbios: fix thead count field in type 4 table
> 
> 
> 
> 在 2023/5/31 16:42, Zhao Liu 写道:
> > On Tue, May 30, 2023 at 08:20:34PM +0800, Tianrui Zhao wrote:
> > > Date: Tue, 30 May 2023 20:20:34 +0800
> > > From: Tianrui Zhao 
> > > Subject: [PATCH] hw/smbios: fix thead count field in type 4 table
> > > X-Mailer: git-send-email 2.39.1
> > > 
> > > The thread_count value in smbios type_4 table should be
> > > (ms->smp.cores * ms->smp.threads). As according to smbios spec 7.5
> > > Processor Information (Type 4), the field "Thread Count" means the
> > > "Number of threads per processor socket" rather than number of
> > > threads per core.
> > > 
> > > When apply this patch, use "-smp 4,sockets=1,cores=2,threads=2" to
> > > boot VM, the dmidecode -t 4 shows like:
> > > 
> > > Handle 0x0400, DMI type 4, 48 bytes
> > > Processor Information
> > >  Socket Designation: CPU 0
> > >  ...
> > >  Core Count: 2
> > >  Core Enabled: 2
> > >  Thread Count: 4
> > >  Characteristics: None
> > > 
> > > Signed-off-by: Tianrui Zhao 
> > > ---
> > >   hw/smbios/smbios.c | 7 ---
> > >   1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> > > index d2007e70fb..56aeaa069d 100644
> > > --- a/hw/smbios/smbios.c
> > > +++ b/hw/smbios/smbios.c
> > > @@ -713,6 +713,7 @@ static void smbios_build_type_4_table(MachineState 
> > > *ms, unsigned instance)
> > >   {
> > >   char sock_str[128];
> > >   size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
> > > +int count;
> > >   if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {
> > >   tbl_len = SMBIOS_TYPE_4_LEN_V30;
> > > @@ -749,15 +750,15 @@ static void smbios_build_type_4_table(MachineState 
> > > *ms, unsigned instance)
> > >   t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;
> > >   t->core_enabled = t->core_count;
> > > -
> > > -t->thread_count = (ms->smp.threads > 255) ? 0xFF : ms->smp.threads;
> > > +count = ms->smp.cores * ms->smp.threads;
> > 
> > Hi Ani & Tianrui,
> > 
> >  From the comment of CpuTopology (include/hw/boards.h):
> > 
> > ms->cores means the "the number of cores in one cluster".
> > ms->threads means the "the number of threads in one core".
> > 
> > So ms->cores * ms->threads means the number of threads in one cluster
> > not one socket.
> > 
> > That's why I count the number of threads in a socket by "ms->smp.max_cpus
> > / ms->smp.sockets" in [1].
> > 
> > The other correct way is:
> > ms->smp.cluster * ms->smp.cores * ms->smp.threads.
> > 
> > [1]: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07229.html
> ohh, we do not notice your patch. Your patch is better than us. one small
> nit about core_count calcuation if cluster/die is support. core count should
> be core number
>  per package rather than per cluster or per die.
> 
> so it should be something like this?
> t->core_count = cpus_per_socket / ms->smp.threads
> rather than ms->smp.cores

Yes, I also fixed this in the third patch of my patch series [2].

[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07231.html

Regards,
Zhao

> 
> 
> Regards
> Bibo, mao
> > 
> > Thanks,
> > Zhao
> > 
> > > +t->thread_count = (count > 255) ? 0xFF : count;
> > >   t->processor_characteristics = cpu_to_le16(0x02); /* Unknown */
> > >   t->processor_family2 = cpu_to_le16(0x01); /* Other */
> > >   if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
> > >   t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
> > > -t->thread_count2 = cpu_to_le16(ms->smp.threads);
> > > +t->thread_count2 = cpu_to_le16(count);
> > >   }
> > >   SMBIOS_BUILD_TABLE_POST;
> > > -- 
> > > 2.39.1
> > > 
> > > 
> 
> 



Re: [PATCH 3/3] hw/smbios: Fix core count in type4

2023-05-31 Thread Zhao Liu
On Wed, May 31, 2023 at 05:46:48PM +0200, Igor Mammedov wrote:
> Date: Wed, 31 May 2023 17:46:48 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH 3/3] hw/smbios: Fix core count in type4
> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)
> 
> On Tue, 30 May 2023 00:43:43 +0800
> Zhao Liu  wrote:
> 
> > From: Zhao Liu 
> > 
> > From SMBIOS 3.0 specification, core count field means:
> > 
> > Core Count is the number of cores detected by the BIOS for this
> > processor socket. [1]
> > 
> > Before 003f230e37d7 ("machine: Tweak the order of topology members in
> > struct CpuTopology"), MachineState.smp.cores means "the number of cores
> > in one package", and it's correct to use smp.cores for core count.
> > 
> > But 003f230e37d7 changes the smp.cores' meaning to "the number of cores
> > in one die" and doesn't change the original smp.cores' use in smbios as
> > well, which makes core count in type4 go wrong.
> > 
> > Fix this issue with the correct "cores per socket" caculation.
> > 
> > [1] SMBIOS 3.0.0, section 7.5.6, Processor Information - Core Count
> > 
> > Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in 
> > struct CpuTopology")
> > Signed-off-by: Zhao Liu 
> > ---
> >  hw/smbios/smbios.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> > index f80a701cdfc1..32e26bffa2df 100644
> > --- a/hw/smbios/smbios.c
> > +++ b/hw/smbios/smbios.c
> > @@ -714,6 +714,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  char sock_str[128];
> >  size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
> >  unsigned cpus_per_socket = ms->smp.max_cpus / ms->smp.sockets;
> > +unsigned cores_per_socket = cpus_per_socket / ms->smp.threads;
> wouldn't be
>smp.dies * smp.clusters * smp.cores
> cleaner and more self-describing
> 
> and then you can add sanity check 
>   g_assert(cores_per_socket != (cpus_per_socket / ms->smp.threads))
> so we won't miss change to CpuTopology in the future?

Thanks! Will use this way.

Regards,
Zhao

>  
> >  
> >  if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {
> >  tbl_len = SMBIOS_TYPE_4_LEN_V30;
> > @@ -748,7 +749,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  SMBIOS_TABLE_SET_STR(4, asset_tag_number_str, type4.asset);
> >  SMBIOS_TABLE_SET_STR(4, part_number_str, type4.part);
> >  
> > -t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;
> > +t->core_count = (cores_per_socket > 255) ? 0xFF : cores_per_socket;
> >  t->core_enabled = t->core_count;
> >  
> >  t->thread_count = (cpus_per_socket > 255) ? 0xFF : cpus_per_socket;
> > @@ -757,7 +758,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  t->processor_family2 = cpu_to_le16(0x01); /* Other */
> >  
> >  if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
> > -t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
> > +t->core_count2 = t->core_enabled2 = cpu_to_le16(cores_per_socket);
> >  t->thread_count2 = cpu_to_le16(cpus_per_socket);
> >  }
> >  
> 



Re: [PATCH] decodetree: Add --output-null for meson testing

2023-05-31 Thread Thomas Huth

On 01/06/2023 01.25, Richard Henderson wrote:

Using "-o /dev/null" fails on Windows.  Rather that working
around this in meson, add a separate command-line option so
that we can use python's os.devnull.

Reported-by: Thomas Huth 
Fixes: 65dc7d1b ("tests/decode: Convert tests to meson")
Signed-off-by: Richard Henderson 
---
  scripts/decodetree.py| 18 ++
  tests/decode/meson.build |  4 ++--
  2 files changed, 12 insertions(+), 10 deletions(-)


Reviewed-by: Thomas Huth 





Re: [PATCH] fpu: Add conversions between bfloat16 and [u]int8

2023-05-31 Thread LIU Zhiwei

On 2023/6/1 1:47, Richard Henderson wrote:

On 5/30/23 23:54, LIU Zhiwei wrote:

We missed these functions when upstreaming the bfloat16 support.

Signed-off-by: LIU Zhiwei 


They look ok, so far as it goes.  What will they be used for?


T-Head Xuantie CPUs custom extension need these interfaces. It uses a 
custom CSR(still not upstream) to switch between the fp16 and bfloat16. 
All fp16 instructions(Zfh) can process the bfloat16 types. In its custom 
matrix extension[1] or vector extension, this feature is also supported.


As a side note, the RISC-V port support for custom extension at least 
should have these aspects:


 *   ISA decoding (Ready, Philipp Tomsich)
 *   CSR (WIP, Andes?)
 *   Disassemble(Under review, Christopher)
 * Errata(Not start)
 * Split TB flags like ARM for custom(In the wild for the Xuantie CPUs)


1. 
https://github.com/T-head-Semi/riscv-matrix-extension-spec/releases/tag/v0.1.0





r~

Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Wu, Fei
On 6/1/2023 7:59 AM, Richard Henderson wrote:
> On 5/30/23 01:35, Fei Wu wrote:
>> +    /*
>> + * We want to fetch the stats structure before we start code
>> + * generation so we can count interesting things about this
>> + * generation.
>> + */
>> +    if (tb_stats_collection_enabled()) {
>> +    tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
> 
> If cflags & CF_PCREL, then 'pc' should not be cached or matched.
> Using 'phys_pc' will suffice, so pass 0 in that case.
> 
OK, tb_get_stats(phys_pc, (cflags & CF_PCREL ? 0 : pc), cs_base, flags);

btw, is it possible to drop 'pc' even w/o CF_PCREL setting in cflags?
Two TBs with different 'pc' but same 'phys_pc', are they the same?

Thanks,
Fei.

> 
> r~




Re: [PATCH v14 07/10] tb-stats: reset the tracked TBs on a tb_flush

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 805e1fc74d..139f049ffc 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -267,6 +267,25 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data 
icmd)
  g_free(cmdinfo);
  }
  
+/*

+ * We have to reset the tbs array on a tb_flush as those
+ * TranslationBlocks no longer exist and we no loner know if the
+ * current mapping is still valid.
+ */


The "and we no longer..." part is irrelevant: that's what phys_pc checks.
But the TranslationBlocks no longer exist, so that is sufficient.


r~



Re: [PATCH v14 06/10] monitor: adding tb_stats hmp command

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

From: "Vanderson M. do Rosario" 

Adding tb_stats [start|pause|stop|filter] command to hmp.
This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

tb_stats filter allow to only collect statistics for the TB
in the last_search list.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Acked-by: Dr. David Alan Gilbert 
Signed-off-by: Vanderson M. do Rosario 
Message-Id: <20190829173437.5926-8-vanderson...@gmail.com>
Message-Id: <20190829173437.5926-9-vanderson...@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée 
Signed-off-by: Fei Wu 
---



I still don't see what pause does.


diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 68ac7d3f73..805e1fc74d 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -16,18 +16,20 @@
  #include "qemu/timer.h"
  
  #include "exec/tb-stats.h"

+#include "exec/tb-flush.h"
  #include "tb-context.h"
  
  /* TBStatistic collection controls */

  enum TBStatsStatus {
  TB_STATS_DISABLED = 0,
  TB_STATS_RUNNING,
-TB_STATS_PAUSED,
-TB_STATS_STOPPED
+TB_STATS_PAUSED
  };


Why?

  
  static enum TBStatsStatus tcg_collect_tb_stats;

  static uint32_t default_tbstats_flag;
+/* only accessed in safe work */
+static GList *last_search;
  
  uint64_t dev_time;
  
@@ -170,6 +172,101 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf)

  g_free(jpi);
  }
  
+static void free_tbstats(void *p, uint32_t hash, void *userp)

+{
+g_free(p);
+}
+
+static void clean_tbstats(void)
+{
+/* remove all tb_stats */
+qht_iter(&tb_ctx.tb_stats, free_tbstats, NULL);
+qht_destroy(&tb_ctx.tb_stats);
+}
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+struct TbstatsCommand *cmdinfo = icmd.host_ptr;
+int cmd = cmdinfo->cmd;
+uint32_t level = cmdinfo->level;
+
+switch (cmd) {
+case START:
+if (tb_stats_collection_enabled()) {
+qemu_printf("TB information already being recorded");
+return;
+} else if (tb_stats_collection_paused()) {
+set_tbstats_flags(level);
+} else {
+qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
+ QHT_MODE_AUTO_RESIZE);
+}
+
+set_default_tbstats_flag(level);
+enable_collect_tb_stats();
+tb_flush(cpu);
+break;
+case PAUSE:
+if (!tb_stats_collection_enabled()) {
+qemu_printf("TB information not being recorded");
+return;
+}
+
+/*
+ * Continue to create TBStatistic structures but stop collecting
+ * statistics
+ */
+pause_collect_tb_stats();
+set_default_tbstats_flag(TB_NOTHING);
+set_tbstats_flags(TB_PAUSED);
+tb_flush(cpu);
+break;
+case STOP:
+if (tb_stats_collection_disabled()) {
+qemu_printf("TB information not being recorded");
+return;
+}
+
+/* Dissalloc all TBStatistics structures and stop creating new ones */
+disable_collect_tb_stats();
+clean_tbstats();
+tb_flush(cpu);
+break;
+case FILTER:
+if (!tb_stats_collection_enabled()) {
+qemu_printf("TB information not being recorded");
+return;
+}
+if (!last_search) {
+qemu_printf(
+"no search on record! execute info tbs before filtering!");
+return;
+}
+
+set_default_tbstats_flag(TB_NOTHING);
+
+/*
+ * Set all tbstats as paused, then return only the ones from 
last_search
+ */
+pause_collect_tb_stats();
+set_tbstats_flags(TB_PAUSED);
+
+for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
+TBStatistics *tbs = iter->data;
+tbs->stats_enabled = level;
+}
+
+tb_flush(cpu);
+
+break;
+default: /* INVALID */
+g_assert_not_reached();
+break;
+}
+
+g_free(cmdinfo);
+}


Why isn't all of this in monitor.c?
It's not used or usable from user-only mode.


+void set_tbstats_flags(uint32_t flag)
+{
+/* iterate over tbstats setting their flag as TB_NOTHING */
+qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
+}


Again, I question why a global variable is not more appropriate.


r~



Re: [PATCH v14 05/10] debug: add -d tb_stats to control TBStatistics collection:

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

From: "Vanderson M. do Rosario" 

  -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=]]

"dump_limit" is used to limit the number of dumped TBStats in
linux-user mode.


Why is user-mode special?



[all+jit+exec+time] control the profilling level used
by the TBStats. Can be used as follow:

-d tb_stats
-d tb_stats,level=jit+time


Comma is already used to separate different -d options.
You should not overload that.

"level" doesn't make sense for things that aren't hierarchical.

What's wrong with tb-stats-{all,jit,time,exec}?


r~



Re: [PATCH] hw/smbios: fix thead count field in type 4 table

2023-05-31 Thread bibo, mao




在 2023/5/31 16:42, Zhao Liu 写道:

On Tue, May 30, 2023 at 08:20:34PM +0800, Tianrui Zhao wrote:

Date: Tue, 30 May 2023 20:20:34 +0800
From: Tianrui Zhao 
Subject: [PATCH] hw/smbios: fix thead count field in type 4 table
X-Mailer: git-send-email 2.39.1

The thread_count value in smbios type_4 table should be
(ms->smp.cores * ms->smp.threads). As according to smbios spec 7.5
Processor Information (Type 4), the field "Thread Count" means the
"Number of threads per processor socket" rather than number of
threads per core.

When apply this patch, use "-smp 4,sockets=1,cores=2,threads=2" to
boot VM, the dmidecode -t 4 shows like:

Handle 0x0400, DMI type 4, 48 bytes
Processor Information
 Socket Designation: CPU 0
 ...
 Core Count: 2
 Core Enabled: 2
 Thread Count: 4
 Characteristics: None

Signed-off-by: Tianrui Zhao 
---
  hw/smbios/smbios.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index d2007e70fb..56aeaa069d 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -713,6 +713,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
  {
  char sock_str[128];
  size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
+int count;
  
  if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {

  tbl_len = SMBIOS_TYPE_4_LEN_V30;
@@ -749,15 +750,15 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
  
  t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;

  t->core_enabled = t->core_count;
-
-t->thread_count = (ms->smp.threads > 255) ? 0xFF : ms->smp.threads;
+count = ms->smp.cores * ms->smp.threads;


Hi Ani & Tianrui,

 From the comment of CpuTopology (include/hw/boards.h):

ms->cores means the "the number of cores in one cluster".
ms->threads means the "the number of threads in one core".

So ms->cores * ms->threads means the number of threads in one cluster
not one socket.

That's why I count the number of threads in a socket by "ms->smp.max_cpus
/ ms->smp.sockets" in [1].

The other correct way is:
ms->smp.cluster * ms->smp.cores * ms->smp.threads.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07229.html
ohh, we do not notice your patch. Your patch is better than us. one 
small nit about core_count calcuation if cluster/die is support. core 
count should be core number

 per package rather than per cluster or per die.

so it should be something like this?
t->core_count = cpus_per_socket / ms->smp.threads
rather than ms->smp.cores


Regards
Bibo, mao


Thanks,
Zhao


+t->thread_count = (count > 255) ? 0xFF : count;
  
  t->processor_characteristics = cpu_to_le16(0x02); /* Unknown */

  t->processor_family2 = cpu_to_le16(0x01); /* Other */
  
  if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {

  t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
-t->thread_count2 = cpu_to_le16(ms->smp.threads);
+t->thread_count2 = cpu_to_le16(count);
  }
  
  SMBIOS_BUILD_TABLE_POST;

--
2.39.1







Re: [PATCH v14 04/10] accel/tcg: add jit stats and time to TBStatistics

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

+static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
+{
+struct jit_profile_info *jpi = userp;
+TBStatistics *tbs = p;
+
+jpi->translations += tbs->translations.total;
+jpi->ops += tbs->code.num_tcg_ops;
+if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
+jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
+}
+jpi->del_ops += tbs->code.deleted_ops;
+jpi->temps += tbs->code.temps;
+if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
+jpi->temps_max = stat_per_translation(tbs, code.temps);
+}
+jpi->host += tbs->code.out_len;
+jpi->guest += tbs->code.in_len;
+jpi->search_data += tbs->code.search_out_len;
+
+jpi->interm_time += stat_per_translation(tbs, gen_times.ir);
+jpi->opt_time += stat_per_translation(tbs, gen_times.ir_opt);
+jpi->la_time += stat_per_translation(tbs, gen_times.la);
+jpi->code_time += stat_per_translation(tbs, gen_times.code);
+
+/*
+ * The restore time covers how long we have spent restoring state
+ * from a given TB (e.g. recovering from a fault). It is therefor
+ * not related to the number of translations we have done.
+ */
+jpi->restore_time += tbs->tb_restore_time;
+jpi->restore_count += tbs->tb_restore_count;
+}


Why do sums of averages (stats_per_translation) instead of accumulating the complete total 
and dividing by jpi->translations?



+void dump_jit_exec_time_info(uint64_t dev_time, GString *buf)
+{
+static uint64_t last_cpu_exec_time;
+uint64_t cpu_exec_time;
+uint64_t delta;
+
+cpu_exec_time = tcg_cpu_exec_time();
+delta = cpu_exec_time - last_cpu_exec_time;
+
+g_string_append_printf(buf, "async time  %" PRId64 " (%0.3f)\n",
+   dev_time, dev_time / 
(double)NANOSECONDS_PER_SECOND);
+g_string_append_printf(buf, "qemu time   %" PRId64 " (%0.3f)\n",
+   delta, delta / (double)NANOSECONDS_PER_SECOND);
+last_cpu_exec_time = cpu_exec_time;
+}
+
+/* dump JIT statisticis using TCGProfile and TBStats */


"statistics"


diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 3973591508..749ad182f2 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -70,10 +70,17 @@ void tcg_cpus_destroy(CPUState *cpu)
  int tcg_cpus_exec(CPUState *cpu)
  {
  int ret;
+uint64_t ti;
+
  assert(tcg_enabled());
+ti = profile_getclock();
+
  cpu_exec_start(cpu);
  ret = cpu_exec(cpu);
  cpu_exec_end(cpu);
+
+qatomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);


You can't qatomic_add a 64-bit value on a 32-bit host.
Nor is tcg_ctx a good place to put this.

If you want to accumulate per-cpu data, put it on the cpu where there's no chance of 
racing with anyone.


Finally, I suspect that this isn't even where you want to accumulate time for execution as 
separate from translation.  You'd to that in cpu_exec_enter/cpu_exec_exit.



@@ -296,6 +315,8 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, 
target_ulong pc,
  new_stats->cs_base = cs_base;
  new_stats->flags = flags;
  new_stats->stats_enabled = get_default_tbstats_flag();
+new_stats->tbs = g_ptr_array_sized_new(4);


Why default to 4?  Is that just a guess, or are we really recomputing tbs that frequently? 
 Is there a good reason not to use g_ptr_array_new()?



diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 80ffbfb455..a60a92091b 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -19,7 +19,7 @@
  #include "exec/plugin-gen.h"
  #include "exec/replay-core.h"
  
-static void gen_tb_exec_count(TranslationBlock *tb)

+static inline void gen_tb_exec_count(TranslationBlock *tb)


Why?


  {
  if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
  TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
@@ -147,6 +147,11 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
int *max_insns,
  tb->size = db->pc_next - db->pc_first;
  tb->icount = db->num_insns;
  
+/* Save number of guest instructions for TB_JIT_STATS */

+if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+tb->tb_stats->code.num_guest_inst += db->num_insns;
+}


Why do this here, as opposed to the block in tb_gen_code?


+#define stat_per_translation(stat, name) \
+(stat->translations.total ? stat->name / stat->translations.total : 0)


Not a fan of this macro, and as per above, the reason for doing the division here is 
questionable.



diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 9a91cb1248..ad0da18a5f 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -989,4 +989,10 @@ static inline int64_t cpu_get_host_ticks(void)
  }
  #endif
  
+static inline int64_t profile_getclock(void)

+{
+return get_clock();
+}


Why?  You use get_clock directly most places.


+/* Timestamps during translation  */
+typedef struct TCGTim

Re: [PATCH 1/3] hw/ufs: Initial commit for emulated Universal-Flash-Storage

2023-05-31 Thread Jeuk Kim
Hi Philippe,

>Hi Jeuk,
>
>[+Alistair]
>
>On 26/5/23 07:05, Jeuk Kim wrote:
>> Universal Flash Storage (UFS) is a high-performance mass storage device
>> with a serial interface. It is primarily used as a high-performance
>> data storage device for embedded applications.
>> 
>> This commit contains code for UFS device to be recognized
>> as a UFS PCI device.
>> Patches to handle UFS logical unit and Transfer Request will follow.
>> 
>> Signed-off-by: Jeuk Kim 
>> ---
>>   MAINTAINERS  |6 +
>>   hw/Kconfig   |1 +
>>   hw/meson.build   |1 +
>>   hw/ufs/Kconfig   |4 +
>>   hw/ufs/meson.build   |1 +
>>   hw/ufs/trace-events  |   33 +
>>   hw/ufs/trace.h   |1 +
>>   hw/ufs/ufs.c |  305 ++
>>   hw/ufs/ufs.h |   42 ++
>>   include/block/ufs.h  | 1251 ++
>>   include/hw/pci/pci.h |1 +
>>   include/hw/pci/pci_ids.h |1 +
>>   meson.build  |1 +
>>   13 files changed, 1648 insertions(+)
>>   create mode 100644 hw/ufs/Kconfig
>>   create mode 100644 hw/ufs/meson.build
>>   create mode 100644 hw/ufs/trace-events
>>   create mode 100644 hw/ufs/trace.h
>>   create mode 100644 hw/ufs/ufs.c
>>   create mode 100644 hw/ufs/ufs.h
>>   create mode 100644 include/block/ufs.h
>
>
>> diff --git a/include/block/ufs.h b/include/block/ufs.h
>> new file mode 100644
>> index 00..0ce3a19bc0
>> --- /dev/null
>> +++ b/include/block/ufs.h
>> @@ -0,0 +1,1251 @@
>> +#ifndef BLOCK_UFS_H
>> +#define BLOCK_UFS_H
>> +
>> +#include "hw/registerfields.h"
>
>Since you use the registerfields API, ...
>
>> +typedef struct QEMU_PACKED UfsReg {
>> +uint32_t cap;
>> +uint32_t rsvd0;
>> +uint32_t ver;
>> +uint32_t rsvd1;
>> +uint32_t hcpid;
>> +uint32_t hcmid;
>> +uint32_t ahit;
>> +uint32_t rsvd2;
>> +uint32_t is;
>> +uint32_t ie;
>> +uint32_t rsvd3[2];
>> +uint32_t hcs;
>> +uint32_t hce;
>> +uint32_t uecpa;
>> +uint32_t uecdl;
>> +uint32_t uecn;
>> +uint32_t uect;
>> +uint32_t uecdme;
>> +uint32_t utriacr;
>> +uint32_t utrlba;
>> +uint32_t utrlbau;
>> +uint32_t utrldbr;
>> +uint32_t utrlclr;
>> +uint32_t utrlrsr;
>> +uint32_t utrlcnr;
>> +uint32_t rsvd4[2];
>> +uint32_t utmrlba;
>> +uint32_t utmrlbau;
>> +uint32_t utmrldbr;
>> +uint32_t utmrlclr;
>> +uint32_t utmrlrsr;
>> +uint32_t rsvd5[3];
>> +uint32_t uiccmd;
>> +uint32_t ucmdarg1;
>> +uint32_t ucmdarg2;
>> +uint32_t ucmdarg3;
>> +uint32_t rsvd6[4];
>> +uint32_t rsvd7[4];
>> +uint32_t rsvd8[16];
>> +uint32_t ccap;
>> +} UfsReg;
>> +
>> +enum UfsRegOfs {
>> +UFS_REG_CAP = offsetof(UfsReg, cap),
>> +UFS_REG_VER = offsetof(UfsReg, ver),
>> +UFS_REG_HCPID = offsetof(UfsReg, hcpid),
>> +UFS_REG_HCMID = offsetof(UfsReg, hcmid),
>> +UFS_REG_AHIT = offsetof(UfsReg, ahit),
>> +UFS_REG_IS = offsetof(UfsReg, is),
>> +UFS_REG_IE = offsetof(UfsReg, ie),
>> +UFS_REG_HCS = offsetof(UfsReg, hcs),
>> +UFS_REG_HCE = offsetof(UfsReg, hce),
>> +UFS_REG_UECPA = offsetof(UfsReg, uecpa),
>> +UFS_REG_UECDL = offsetof(UfsReg, uecdl),
>> +UFS_REG_UECN = offsetof(UfsReg, uecn),
>> +UFS_REG_UECT = offsetof(UfsReg, uect),
>> +UFS_REG_UECDME = offsetof(UfsReg, uecdme),
>> +UFS_REG_UTRIACR = offsetof(UfsReg, utriacr),
>> +UFS_REG_UTRLBA = offsetof(UfsReg, utrlba),
>> +UFS_REG_UTRLBAU = offsetof(UfsReg, utrlbau),
>> +UFS_REG_UTRLDBR = offsetof(UfsReg, utrldbr),
>> +UFS_REG_UTRLCLR = offsetof(UfsReg, utrlclr),
>> +UFS_REG_UTRLRSR = offsetof(UfsReg, utrlrsr),
>> +UFS_REG_UTRLCNR = offsetof(UfsReg, utrlcnr),
>> +UFS_REG_UTMRLBA = offsetof(UfsReg, utmrlba),
>> +UFS_REG_UTMRLBAU = offsetof(UfsReg, utmrlbau),
>> +UFS_REG_UTMRLDBR = offsetof(UfsReg, utmrldbr),
>> +UFS_REG_UTMRLCLR = offsetof(UfsReg, utmrlclr),
>> +UFS_REG_UTMRLRSR = offsetof(UfsReg, utmrlrsr),
>> +UFS_REG_UICCMD = offsetof(UfsReg, uiccmd),
>> +UFS_REG_UCMDARG1 = offsetof(UfsReg, ucmdarg1),
>> +UFS_REG_UCMDARG2 = offsetof(UfsReg, ucmdarg2),
>> +UFS_REG_UCMDARG3 = offsetof(UfsReg, ucmdarg3),
>> +UFS_REG_CCAP = offsetof(UfsReg, ccap),
>> +};
>> +
>> +enum UfsCapShift {
>> +CAP_NUTRS_SHIFT = 0,
>> +CAP_RTT_SHIFT = 8,
>> +CAP_NUTMRS_SHIFT = 16,
>> +CAP_AUTOH8_SHIFT = 23,
>> +CAP_64AS_SHIFT = 24,
>> +CAP_OODDS_SHIFT = 25,
>> +CAP_UICDMETMS_SHIFT = 26,
>> +CAP_CS_SHIFT = 28,
>> +};
>> +
>> +enum UfsCapMask {
>> +CAP_NUTRS_MASK = 0x1f,
>> +CAP_RTT_MASK = 0xff,
>> +CAP_NUTMRS_MASK = 0x7,
>> +CAP_AUTOH8_MASK = 0x1,
>> +CAP_64AS_MASK = 0x1,
>> +CAP_OODDS_MASK = 0x1,
>> +CAP_UICDMETMS_MASK = 0x1,
>> +CAP_CS_MASK = 0x1,
>> +};
>> +
>> +#define UFS_CAP_NUTRS(cap) (((cap) >> CAP_NUTRS_SHIFT) & CAP_NUTRS_MASK)
>> +#define UFS_CAP_RTT(cap) (((cap) >> CAP_RTT_SHI

Re: [PATCH v14 03/10] accel: collecting TB execution count

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

From: "Vanderson M. do Rosario" 

If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
enabled, then we instrument the start code of this TB
to atomically count the number of times it is executed.
We count both the number of "normal" executions and atomic
executions of a TB.

The execution count of the TB is stored in its respective
TBS.

All TBStatistics are created by default with the flags from
default_tbstats_flag.

[Richard Henderson created the inline gen_tb_exec_count]

Signed-off-by: Vanderson M. do Rosario 
Message-Id: <20190829173437.5926-3-vanderson...@gmail.com>
[AJB: Fix author]
Signed-off-by: Alex Bennée 
Signed-off-by: Fei Wu 
---
  accel/tcg/cpu-exec.c  |  6 ++
  accel/tcg/tb-stats.c  |  6 ++
  accel/tcg/tcg-runtime.c   |  1 +
  accel/tcg/translate-all.c |  7 +--
  accel/tcg/translator.c| 25 +
  include/exec/gen-icount.h |  1 +
  include/exec/tb-stats-flags.h |  5 +
  include/exec/tb-stats.h   | 13 +
  8 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 0e741960da..c0d8f26237 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -25,6 +25,7 @@
  #include "trace.h"
  #include "disas/disas.h"
  #include "exec/exec-all.h"
+#include "exec/tb-stats.h"
  #include "tcg/tcg.h"
  #include "qemu/atomic.h"
  #include "qemu/rcu.h"
@@ -562,7 +563,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
  mmap_unlock();
  }
  
+if (tb_stats_enabled(tb, TB_EXEC_STATS)) {

+tb->tb_stats->executions.atomic++;
+}
+
  cpu_exec_enter(cpu);
+
  /* execute the generated code */
  trace_exec_tb(tb, pc);
  cpu_tb_exec(cpu, tb, &tb_exit);
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index f988bd8a31..143a52ef5c 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -22,6 +22,7 @@ enum TBStatsStatus {
  };
  
  static enum TBStatsStatus tcg_collect_tb_stats;

+static uint32_t default_tbstats_flag;
  
  void init_tb_stats_htable(void)

  {
@@ -56,3 +57,8 @@ bool tb_stats_collection_paused(void)
  {
  return tcg_collect_tb_stats == TB_STATS_PAUSED;
  }
+
+uint32_t get_default_tbstats_flag(void)
+{
+return default_tbstats_flag;
+}


What is the purpose of this function, instead of a global variable?
What is the meaning of 'default' in its name?


@@ -295,6 +295,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, 
target_ulong pc,
  new_stats->pc = pc;
  new_stats->cs_base = cs_base;
  new_stats->flags = flags;
+new_stats->stats_enabled = get_default_tbstats_flag();


Is this merely to record how we have generated a given TB?
What is the purpose of this flag over the global variable?


diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index 87ee3d902e..fa71eb6f0c 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -11,6 +11,9 @@
  #ifndef TB_STATS_FLAGS
  #define TB_STATS_FLAGS
  
+#define TB_NOTHING(1 << 0)

+#define TB_EXEC_STATS (1 << 1)


Why is NOTHING a non-zero flag?


--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -31,6 +31,9 @@
  #include "exec/tb-stats-flags.h"
  #include "tcg/tcg.h"
  
+#define tb_stats_enabled(tb, JIT_STATS) \

+(tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))


Inline function, though again, why stats_enabled vs global variable?


r~



Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

+/* TBStatistic collection controls */
+enum TBStatsStatus {
+TB_STATS_DISABLED = 0,
+TB_STATS_RUNNING,
+TB_STATS_PAUSED,
+TB_STATS_STOPPED
+};


I don't see what PAUSED or STOPPED actually do.
As far as I can see, stats are either being collected or not: a boolean.


r~



Re: [PATCH v14 02/10] accel/tcg: introduce TBStatistics structure

2023-05-31 Thread Richard Henderson

On 5/30/23 01:35, Fei Wu wrote:

+/*
+ * We want to fetch the stats structure before we start code
+ * generation so we can count interesting things about this
+ * generation.
+ */
+if (tb_stats_collection_enabled()) {
+tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);


If cflags & CF_PCREL, then 'pc' should not be cached or matched.
Using 'phys_pc' will suffice, so pass 0 in that case.


r~



[PATCH] decodetree: Add --output-null for meson testing

2023-05-31 Thread Richard Henderson
Using "-o /dev/null" fails on Windows.  Rather that working
around this in meson, add a separate command-line option so
that we can use python's os.devnull.

Reported-by: Thomas Huth 
Fixes: 65dc7d1b ("tests/decode: Convert tests to meson")
Signed-off-by: Richard Henderson 
---
 scripts/decodetree.py| 18 ++
 tests/decode/meson.build |  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 13db585d04..a8a6cb69cd 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -42,6 +42,7 @@
 input_file = ''
 output_file = None
 output_fd = None
+output_null = False
 insntype = 'uint32_t'
 decode_function = 'decode'
 
@@ -145,12 +146,7 @@ def error_with_file(file, lineno, *args):
 
 if output_file and output_fd:
 output_fd.close()
-# Do not try to remove e.g. -o /dev/null
-if not output_file.startswith("/dev"):
-try:
-os.remove(output_file)
-except PermissionError:
-pass
+os.remove(output_file)
 exit(0 if testforerror else 1)
 # end error_with_file
 
@@ -1501,6 +1497,7 @@ def main():
 global translate_prefix
 global output_fd
 global output_file
+global output_null
 global input_file
 global insnwidth
 global insntype
@@ -1514,7 +1511,8 @@ def main():
 decode_scope = 'static '
 
 long_opts = ['decode=', 'translate=', 'output=', 'insnwidth=',
- 'static-decode=', 'varinsnwidth=', 'test-for-error']
+ 'static-decode=', 'varinsnwidth=', 'test-for-error',
+ 'output-null']
 try:
 (opts, args) = getopt.gnu_getopt(sys.argv[1:], 'o:vw:', long_opts)
 except getopt.GetoptError as err:
@@ -1545,6 +1543,8 @@ def main():
 error(0, 'cannot handle insns of width', insnwidth)
 elif o == '--test-for-error':
 testforerror = True
+elif o == '--output-null':
+output_null = True
 else:
 assert False, 'unhandled option'
 
@@ -1574,7 +1574,9 @@ def main():
 stree = build_size_tree(toppat.pats, 8, 0, 0)
 prop_size(stree)
 
-if output_file:
+if output_null:
+output_fd = open(os.devnull, 'wt', encoding='utf-8', errors="ignore")
+elif output_file:
 output_fd = open(output_file, 'wt', encoding='utf-8')
 else:
 output_fd = io.TextIOWrapper(sys.stdout.buffer,
diff --git a/tests/decode/meson.build b/tests/decode/meson.build
index 38a0629d67..b13fada980 100644
--- a/tests/decode/meson.build
+++ b/tests/decode/meson.build
@@ -53,12 +53,12 @@ decodetree = find_program(meson.project_source_root() / 
'scripts/decodetree.py')
 
 foreach t: err_tests
 test(fs.replace_suffix(t, ''),
- decodetree, args: ['-o', '/dev/null', '--test-for-error', files(t)],
+ decodetree, args: ['--output-null', '--test-for-error', files(t)],
  suite: suite)
 endforeach
 
 foreach t: succ_tests
 test(fs.replace_suffix(t, ''),
- decodetree, args: ['-o', '/dev/null', files(t)],
+ decodetree, args: ['--output-null', files(t)],
  suite: suite)
 endforeach
-- 
2.34.1




Re: [PULL 0/5] Python patches

2023-05-31 Thread Richard Henderson

On 5/31/23 13:43, John Snow wrote:

The following changes since commit ab7252279727da51c01cdaf41c5fe563bbded3a6:

   gitlab: switch from 'stable' to 'latest' docker container tags (2023-05-31 
10:29:14 -0700)

are available in the Git repository at:

   https://gitlab.com/jsnow/qemu.git  tags/python-pull-request

for you to fetch changes up to c76e7652c786683edcc846ee0a7a65b587787792:

   Revert "python/qmp/protocol: add open_with_socket()" (2023-05-31 16:25:35 
-0400)


Python: synchronize python-qemu-qmp


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




[PATCH v2] pc: q35: Bump max_cpus to 1024

2023-05-31 Thread Suravee Suthikulpanit
Since KVM_MAX_VCPUS is currently defined to 1024 for x86 as shown in
arch/x86/include/asm/kvm_host.h, update QEMU limits to the same number.

In case KVM could not support the specified number of vcpus, QEMU would
return the following error message:

  qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (xxx): Invalid argument

Signed-off-by: Suravee Suthikulpanit 
---

Changes from V1:
(https://lore.kernel.org/all/ynkdgsii1vfvx...@redhat.com/T/)
 * Bump from 512 to KVM_MAX_VCPUS (per Igor's suggestion)

Note:
 From the last discussion, Daniel mentioned that SMBIO 2.1 tables might
 cause overflow at approx 720 CPUs, and it might require using the
 SMBIO 3.0 entry point. Also, we might need to change the default for
 the x86 machine type to SMBIO 3.0. However, I do not know the status
 of this.

Thank you,
Suravee

 hw/i386/pc_q35.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f02919d92c..e7dc226bd5 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -370,7 +370,7 @@ static void pc_q35_machine_options(MachineClass *m)
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
-m->max_cpus = 288;
+m->max_cpus = 1024;
 }
 
 static void pc_q35_8_1_machine_options(MachineClass *m)
-- 
2.34.1




[PATCH 3/7] hw/sysbus: Introduce sysbus_init_irqs()

2023-05-31 Thread Philippe Mathieu-Daudé
The SysBus API currently only provides a method to initialize
a single IRQ: sysbus_init_irq(). When we want to initialize
multiple SysBus IRQs, we have to call this function multiple
times. In order to allow further simplifications, introduce
the sysbus_init_irqs() method.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/sysbus.h | 1 +
 hw/core/sysbus.c| 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
index 3564b7b6a2..bc174b2dc3 100644
--- a/include/hw/sysbus.h
+++ b/include/hw/sysbus.h
@@ -70,6 +70,7 @@ typedef void FindSysbusDeviceFunc(SysBusDevice *sbdev, void 
*opaque);
 void sysbus_init_mmio(SysBusDevice *dev, MemoryRegion *memory);
 MemoryRegion *sysbus_mmio_get_region(SysBusDevice *dev, int n);
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p);
+void sysbus_init_irqs(SysBusDevice *dev, qemu_irq *p, unsigned count);
 void sysbus_pass_irq(SysBusDevice *dev, SysBusDevice *target);
 void sysbus_init_ioports(SysBusDevice *dev, uint32_t ioport, uint32_t size);
 
diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index 35f902b582..a1b4c362c9 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -175,10 +175,15 @@ void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, 
hwaddr addr,
 sysbus_mmio_map_common(dev, n, addr, true, priority);
 }
 
+void sysbus_init_irqs(SysBusDevice *dev, qemu_irq *p, unsigned count)
+{
+qdev_init_gpio_out_named(DEVICE(dev), p, SYSBUS_DEVICE_GPIO_IRQ, count);
+}
+
 /* Request an IRQ source.  The actual IRQ object may be populated later.  */
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
 {
-qdev_init_gpio_out_named(DEVICE(dev), p, SYSBUS_DEVICE_GPIO_IRQ, 1);
+sysbus_init_irqs(dev, p, 1);
 }
 
 /* Pass IRQs from a target device.  */
-- 
2.38.1




[PATCH 5/7] hw/sysbus: Make SYSBUS_DEVICE_GPIO_IRQ API internal

2023-05-31 Thread Philippe Mathieu-Daudé
Since we don't have any use of the SYSBUS_DEVICE_GPIO_IRQ definition
outside of sysbus.c, we can reduce its scope, making it internal to
the API.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/sysbus.h | 2 --
 hw/core/sysbus.c| 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
index bc174b2dc3..cdd83c555e 100644
--- a/include/hw/sysbus.h
+++ b/include/hw/sysbus.h
@@ -26,8 +26,6 @@ OBJECT_DECLARE_TYPE(SysBusDevice, SysBusDeviceClass,
  * classes overriding it are not required to invoke its implementation.
  */
 
-#define SYSBUS_DEVICE_GPIO_IRQ "sysbus-irq"
-
 struct SysBusDeviceClass {
 /*< private >*/
 DeviceClass parent_class;
diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index a1b4c362c9..f0ba57dcbf 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -24,6 +24,8 @@
 #include "monitor/monitor.h"
 #include "exec/address-spaces.h"
 
+#define SYSBUS_DEVICE_GPIO_IRQ "sysbus-irq"
+
 static void sysbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
 static char *sysbus_get_fw_dev_path(DeviceState *dev);
 
-- 
2.38.1




[PATCH 4/7] hw/usb/hcd-xhci: Use sysbus_init_irqs()

2023-05-31 Thread Philippe Mathieu-Daudé
The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Use the recently introduced sysbus_init_irqs()
method to avoid using this internal definition.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/usb/hcd-xhci-sysbus.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/usb/hcd-xhci-sysbus.c b/hw/usb/hcd-xhci-sysbus.c
index faf57b4797..e512849b34 100644
--- a/hw/usb/hcd-xhci-sysbus.c
+++ b/hw/usb/hcd-xhci-sysbus.c
@@ -40,9 +40,6 @@ static void xhci_sysbus_realize(DeviceState *dev, Error 
**errp)
 if (!qdev_realize(DEVICE(&s->xhci), NULL, errp)) {
 return;
 }
-s->irq = g_new0(qemu_irq, s->xhci.numintrs);
-qdev_init_gpio_out_named(dev, s->irq, SYSBUS_DEVICE_GPIO_IRQ,
- s->xhci.numintrs);
 if (s->xhci.dma_mr) {
 s->xhci.as =  g_malloc0(sizeof(AddressSpace));
 address_space_init(s->xhci.as, s->xhci.dma_mr, NULL);
@@ -50,6 +47,8 @@ static void xhci_sysbus_realize(DeviceState *dev, Error 
**errp)
 s->xhci.as = &address_space_memory;
 }
 
+s->irq = g_new0(qemu_irq, s->xhci.numintrs);
+sysbus_init_irqs(SYS_BUS_DEVICE(dev), s->irq, s->xhci.numintrs);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->xhci.mem);
 }
 
-- 
2.38.1




[PATCH 6/7] hw: Simplify using sysbus_init_irqs() [automatic]

2023-05-31 Thread Philippe Mathieu-Daudé
Change created mechanically using the following coccinelle
semantic patch:

@@
expression array;
identifier i;
expression sbd, count;
@@

-for (i = 0; i < count; i++) {
-sysbus_init_irq(sbd, &array[i]);
-}
+sysbus_init_irqs(sbd, array, count);

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/smmuv3.c   |  6 +-
 hw/arm/stellaris.c|  5 +
 hw/arm/strongarm.c|  5 +
 hw/arm/versatilepb.c  |  5 +
 hw/char/pl011.c   |  5 +
 hw/char/renesas_sci.c |  5 +
 hw/core/platform-bus.c|  5 +
 hw/dma/pl330.c|  4 +---
 hw/dma/sifive_pdma.c  |  5 +
 hw/gpio/sifive_gpio.c |  5 +
 hw/i386/kvm/xen_evtchn.c  |  5 +
 hw/intc/arm_gic_common.c  | 20 +---
 hw/intc/arm_gicv2m.c  |  5 +
 hw/intc/exynos4210_combiner.c |  5 +
 hw/intc/loongarch_extioi.c|  6 ++
 hw/intc/loongson_liointc.c|  5 +
 hw/intc/openpic.c |  6 ++
 hw/intc/slavio_intctl.c   |  6 ++
 hw/misc/avr_power.c   |  4 +---
 hw/misc/macio/gpio.c  |  5 +
 hw/misc/stm32f4xx_exti.c  |  5 +
 hw/net/cadence_gem.c  |  5 +
 hw/net/mcf_fec.c  |  5 +
 hw/pci-host/designware.c  |  5 +
 hw/pci-host/ppce500.c |  4 +---
 hw/pci-host/raven.c   |  4 +---
 hw/pci-host/sh_pci.c  |  5 +
 hw/pci-host/versatile.c   |  4 +---
 hw/ppc/ppc405_uc.c| 10 ++
 hw/ppc/ppc440_uc.c|  6 ++
 hw/ppc/ppc4xx_devs.c  |  4 +---
 hw/ppc/ppc4xx_pci.c   |  5 +
 hw/ssi/ibex_spi_host.c|  5 +
 hw/ssi/imx_spi.c  |  5 +
 hw/ssi/sifive_spi.c   |  5 +
 hw/ssi/xilinx_spi.c   |  5 +
 hw/ssi/xilinx_spips.c |  4 +---
 hw/ssi/xlnx-versal-ospi.c |  4 +---
 hw/timer/allwinner-a10-pit.c  |  4 +---
 hw/timer/exynos4210_mct.c |  4 +---
 hw/timer/hpet.c   |  4 +---
 hw/timer/renesas_cmt.c|  5 +
 hw/timer/sifive_pwm.c |  5 +
 43 files changed, 52 insertions(+), 177 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 932f009697..f080d97d3f 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1692,11 +1692,7 @@ static const MemoryRegionOps smmu_mem_ops = {
 
 static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
 {
-int i;
-
-for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
-sysbus_init_irq(dev, &s->irq[i]);
-}
+sysbus_init_irqs(dev, s->irq, ARRAY_SIZE(s->irq));
 }
 
 static void smmu_reset_hold(Object *obj)
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index f7e99baf62..4bf9ef05c8 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -934,11 +934,8 @@ static void stellaris_adc_init(Object *obj)
 DeviceState *dev = DEVICE(obj);
 StellarisADCState *s = STELLARIS_ADC(obj);
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-int n;
 
-for (n = 0; n < 4; n++) {
-sysbus_init_irq(sbd, &s->irq[n]);
-}
+sysbus_init_irqs(sbd, s->irq, 4);
 
 memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
   "adc", 0x1000);
diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
index cc73145053..f785dcf08e 100644
--- a/hw/arm/strongarm.c
+++ b/hw/arm/strongarm.c
@@ -658,7 +658,6 @@ static void strongarm_gpio_initfn(Object *obj)
 DeviceState *dev = DEVICE(obj);
 StrongARMGPIOInfo *s = STRONGARM_GPIO(obj);
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-int i;
 
 qdev_init_gpio_in(dev, strongarm_gpio_set, 28);
 qdev_init_gpio_out(dev, s->handler, 28);
@@ -667,9 +666,7 @@ static void strongarm_gpio_initfn(Object *obj)
   "gpio", 0x1000);
 
 sysbus_init_mmio(sbd, &s->iomem);
-for (i = 0; i < 11; i++) {
-sysbus_init_irq(sbd, &s->irqs[i]);
-}
+sysbus_init_irqs(sbd, s->irqs, 11);
 sysbus_init_irq(sbd, &s->irqX);
 }
 
diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index 05b9462a5b..6a5b1fc53e 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -160,12 +160,9 @@ static void vpb_sic_init(Object *obj)
 DeviceState *dev = DEVICE(obj);
 vpb_sic_state *s = VERSATILE_PB_SIC(obj);
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-int i;
 
 qdev_init_gpio_in(dev, vpb_sic_set_irq, 32);
-for (i = 0; i < 32; i++) {
-sysbus_init_irq(sbd, &s->parent[i]);
-}
+sysbus_init_irqs(sbd, s->parent, 32);
 s->irq = 31;
 memory_region_init_io(&s->iomem, obj, &vpb_sic_ops, s,
   "vpb-sic", 0x1000);
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 77bbc2a982..2056e32385 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -442,13 +442,10 @@ static void pl011_init(Object *obj)
 {
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 PL011State *s = PL011(obj);
-int i;
 
 memory_region_

[PATCH 7/7] hw: Simplify using sysbus_init_irqs() [manual]

2023-05-31 Thread Philippe Mathieu-Daudé
Audit the sysbus_init_irq() calls and manually convert
to sysbus_init_irqs() when a loop is involved.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/intc/loongarch_extioi.c | 3 +--
 hw/intc/omap_intc.c| 3 +--
 hw/pci-host/gpex.c | 2 +-
 hw/timer/renesas_tmr.c | 9 +++--
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/hw/intc/loongarch_extioi.c b/hw/intc/loongarch_extioi.c
index db941de20e..c579636215 100644
--- a/hw/intc/loongarch_extioi.c
+++ b/hw/intc/loongarch_extioi.c
@@ -275,8 +275,7 @@ static void loongarch_extioi_instance_init(Object *obj)
 LoongArchExtIOI *s = LOONGARCH_EXTIOI(obj);
 int cpu, pin;
 
-sysbus_init_irqs(SYS_BUS_DEVICE(dev), s->irq, EXTIOI_IRQS);
-
+sysbus_init_irqs(dev, s->irq, EXTIOI_IRQS);
 qdev_init_gpio_in(DEVICE(obj), extioi_setirq, EXTIOI_IRQS);
 
 for (cpu = 0; cpu < EXTIOI_CPUS; cpu++) {
diff --git a/hw/intc/omap_intc.c b/hw/intc/omap_intc.c
index 647bf324a8..f324b640e3 100644
--- a/hw/intc/omap_intc.c
+++ b/hw/intc/omap_intc.c
@@ -627,8 +627,7 @@ static void omap2_intc_init(Object *obj)
 
 s->level_only = 1;
 s->nbanks = 3;
-sysbus_init_irq(sbd, &s->parent_intr[0]);
-sysbus_init_irq(sbd, &s->parent_intr[1]);
+sysbus_init_irqs(sbd, s->parent_intr, ARRAY_SIZE(s->parent_intr));
 qdev_init_gpio_in(dev, omap_set_intr_noedge, s->nbanks * 32);
 memory_region_init_io(&s->mmio, obj, &omap2_inth_mem_ops, s,
   "omap2-intc", 0x1000);
diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index a6752fac5e..7b46e3e36e 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -128,8 +128,8 @@ static void gpex_host_realize(DeviceState *dev, Error 
**errp)
 sysbus_init_mmio(sbd, &s->io_ioport);
 }
 
+sysbus_init_irqs(sbd, s->irq, GPEX_NUM_IRQS);
 for (i = 0; i < GPEX_NUM_IRQS; i++) {
-sysbus_init_irq(sbd, &s->irq[i]);
 s->irq_num[i] = -1;
 }
 
diff --git a/hw/timer/renesas_tmr.c b/hw/timer/renesas_tmr.c
index c15f654738..dd2929d6e7 100644
--- a/hw/timer/renesas_tmr.c
+++ b/hw/timer/renesas_tmr.c
@@ -428,17 +428,14 @@ static void rtmr_init(Object *obj)
 {
 SysBusDevice *d = SYS_BUS_DEVICE(obj);
 RTMRState *tmr = RTMR(obj);
-int i;
 
 memory_region_init_io(&tmr->memory, OBJECT(tmr), &tmr_ops,
   tmr, "renesas-tmr", 0x10);
 sysbus_init_mmio(d, &tmr->memory);
 
-for (i = 0; i < ARRAY_SIZE(tmr->ovi); i++) {
-sysbus_init_irq(d, &tmr->cmia[i]);
-sysbus_init_irq(d, &tmr->cmib[i]);
-sysbus_init_irq(d, &tmr->ovi[i]);
-}
+sysbus_init_irqs(d, tmr->cmia, ARRAY_SIZE(tmr->cmia));
+sysbus_init_irqs(d, tmr->cmib, ARRAY_SIZE(tmr->cmib));
+sysbus_init_irqs(d, tmr->ovi, ARRAY_SIZE(tmr->ovi));
 timer_init_ns(&tmr->timer[0], QEMU_CLOCK_VIRTUAL, timer_event0, tmr);
 timer_init_ns(&tmr->timer[1], QEMU_CLOCK_VIRTUAL, timer_event1, tmr);
 }
-- 
2.38.1




[PATCH 0/7] hw/sysbus: Add sysbus_init_irqs and reduce SYSBUS_DEVICE_GPIO_IRQ scope

2023-05-31 Thread Philippe Mathieu-Daudé
This series:

- Remove uses (out of sysbus.c) to the SYSBUS_DEVICE_GPIO_IRQ
  definition, using proper SysBus API methods,
- Reduce SYSBUS_DEVICE_GPIO_IRQ scope, making it SysBus API
  internal,
- Convert various for() loops iterating over sysbus_init_irq()
  by calling a single sysbus_init_irqs() instead. Mostly an
  automatic convertion except 4 files.

The resulting code looks nicer IMHO, and is possibly less
bug prone.

Philippe Mathieu-Daudé (7):
  hw/arm/xlnx-versal: Do not open-code sysbus_connect_irq()
  hw/usb/xlnx: Do not open-code sysbus_pass_irq()
  hw/sysbus: Introduce sysbus_init_irqs()
  hw/usb/hcd-xhci: Use sysbus_init_irqs()
  hw/sysbus: Make SYSBUS_DEVICE_GPIO_IRQ API internal
  hw: Simplify using sysbus_init_irqs() [automatic]
  hw: Simplify using sysbus_init_irqs() [manual]

 include/hw/sysbus.h   |  3 +--
 hw/arm/smmuv3.c   |  6 +-
 hw/arm/stellaris.c|  5 +
 hw/arm/strongarm.c|  5 +
 hw/arm/versatilepb.c  |  5 +
 hw/arm/xlnx-versal.c  |  4 +---
 hw/char/pl011.c   |  5 +
 hw/char/renesas_sci.c |  5 +
 hw/core/platform-bus.c|  5 +
 hw/core/sysbus.c  |  9 -
 hw/dma/pl330.c|  4 +---
 hw/dma/sifive_pdma.c  |  5 +
 hw/gpio/sifive_gpio.c |  5 +
 hw/i386/kvm/xen_evtchn.c  |  5 +
 hw/intc/arm_gic_common.c  | 20 +---
 hw/intc/arm_gicv2m.c  |  5 +
 hw/intc/exynos4210_combiner.c |  5 +
 hw/intc/loongarch_extioi.c|  7 ++-
 hw/intc/loongson_liointc.c|  5 +
 hw/intc/omap_intc.c   |  3 +--
 hw/intc/openpic.c |  6 ++
 hw/intc/slavio_intctl.c   |  6 ++
 hw/misc/avr_power.c   |  4 +---
 hw/misc/macio/gpio.c  |  5 +
 hw/misc/stm32f4xx_exti.c  |  5 +
 hw/net/cadence_gem.c  |  5 +
 hw/net/mcf_fec.c  |  5 +
 hw/pci-host/designware.c  |  5 +
 hw/pci-host/gpex.c|  2 +-
 hw/pci-host/ppce500.c |  4 +---
 hw/pci-host/raven.c   |  4 +---
 hw/pci-host/sh_pci.c  |  5 +
 hw/pci-host/versatile.c   |  4 +---
 hw/ppc/ppc405_uc.c| 10 ++
 hw/ppc/ppc440_uc.c|  6 ++
 hw/ppc/ppc4xx_devs.c  |  4 +---
 hw/ppc/ppc4xx_pci.c   |  5 +
 hw/ssi/ibex_spi_host.c|  5 +
 hw/ssi/imx_spi.c  |  5 +
 hw/ssi/sifive_spi.c   |  5 +
 hw/ssi/xilinx_spi.c   |  5 +
 hw/ssi/xilinx_spips.c |  4 +---
 hw/ssi/xlnx-versal-ospi.c |  4 +---
 hw/timer/allwinner-a10-pit.c  |  4 +---
 hw/timer/exynos4210_mct.c |  4 +---
 hw/timer/hpet.c   |  4 +---
 hw/timer/renesas_cmt.c|  5 +
 hw/timer/renesas_tmr.c|  9 +++--
 hw/timer/sifive_pwm.c |  5 +
 hw/usb/hcd-xhci-sysbus.c  |  5 ++---
 hw/usb/xlnx-usb-subsystem.c   |  2 +-
 51 files changed, 70 insertions(+), 197 deletions(-)

-- 
2.38.1




[PATCH 1/7] hw/arm/xlnx-versal: Do not open-code sysbus_connect_irq()

2023-05-31 Thread Philippe Mathieu-Daudé
The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Here we simply open-coded sysbus_connect_irq().
Replace to use the proper API.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/xlnx-versal.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index 69b1b99e93..de5af506f7 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -704,9 +704,7 @@ static void versal_unimp(Versal *s)
 gpio_in);
 
 gpio_in = qdev_get_gpio_in_named(DEVICE(s), "irq-parity-imr-dummy", 0);
-qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr),
-SYSBUS_DEVICE_GPIO_IRQ, 0,
-gpio_in);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->pmc.iou.slcr), 0, gpio_in);
 }
 
 static void versal_realize(DeviceState *dev, Error **errp)
-- 
2.38.1




[PATCH 2/7] hw/usb/xlnx: Do not open-code sysbus_pass_irq()

2023-05-31 Thread Philippe Mathieu-Daudé
The SYSBUS_DEVICE_GPIO_IRQ definition should be internal to
the SysBus API. Here we simply open-coded sysbus_pass_irq().
Replace to use the proper API.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/usb/xlnx-usb-subsystem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/xlnx-usb-subsystem.c b/hw/usb/xlnx-usb-subsystem.c
index d8deeb6ced..462ce6c3ff 100644
--- a/hw/usb/xlnx-usb-subsystem.c
+++ b/hw/usb/xlnx-usb-subsystem.c
@@ -49,7 +49,7 @@ static void versal_usb2_realize(DeviceState *dev, Error 
**errp)
 }
 sysbus_init_mmio(sbd, &s->dwc3_mr);
 sysbus_init_mmio(sbd, &s->usb2Ctrl_mr);
-qdev_pass_gpios(DEVICE(&s->dwc3.sysbus_xhci), dev, SYSBUS_DEVICE_GPIO_IRQ);
+sysbus_pass_irq(sbd, SYS_BUS_DEVICE(&s->dwc3.sysbus_xhci));
 }
 
 static void versal_usb2_init(Object *obj)
-- 
2.38.1




Re: [RFC PATCH v2 0/4] vfio/pci: Atomic Ops completer support

2023-05-31 Thread Philippe Mathieu-Daudé

On 27/5/23 01:15, Alex Williamson wrote:

This RFC proposes to allow a vfio-pci device to manipulate the PCI
Express capability of an associated root port to enable Atomic Op
completer support as equivalent to host capabilities.  This would

[...]


While it's not exactly standard practice to modify root port device
capabilities runtime, it also does not seem to be precluded by the PCIe
Spec (6.0.1).  The Atomic Op completion bits of the DEVCAP2 register
are defined as Read-only:

7.4 Configuration Register Types
  Read-only - Register bits are read-only and cannot be altered by software.
  Where explicitly defined, these bits are used to reflect changing
  hardware state, and as a result bit values can be observed to
  change at run time. Register bit default values and bits that
  cannot change value at run time, are permitted to be hard-coded,
  initialized by system/device firmware, or initialized by hardware
  mechanisms such as pin strapping or nonvolatile storage.
  Initialization by system firmware is permitted only for system-
  integrated devices. If the optional feature that would Set the
  bits is not implemented, the bits must be hardwired to Zero.

Here "altered by software" is relative to guest writes to the config
space register, whereas in this implementation we're acting as hardware
and the bits are changing to reflect a change in runtime capabilities.
The spec does include a HwInit register type which would restrict the
value from changing at runtime outside of resets.  Therefore while it
would not be advised to update these bits arbitrarily, it does seem safe
and compatible with guest software to update the value on device attach
and detach.


From my previous (short) PCIe experience, this is also my understanding.



Re: [RFC PATCH v2 0/4] vfio/pci: Atomic Ops completer support

2023-05-31 Thread Philippe Mathieu-Daudé

On 1/6/23 00:24, Alex Williamson wrote:

On Wed, 31 May 2023 23:55:41 +0200
Robin Voetter  wrote:


Hi Alex,

Thanks for taking the time to implement support for Atomic Op completer
support properly :). I have tested out these patches and the kernel
patch, and apart from a minor issue with patch 2 everything works fine;


Yes, Cedric noted the extra semicolon as well, I'm about to post that
patch as non-RFC since it has no dependencies on anything else.




Policy decisions like that are generally left to management tools, so
there would always be a means to enable or disable the feature.  In
fact, that's specifically why I test that the Atomic Op completer bits
are unset in the root port before changing them so that this automatic
enablement could live alongside a command line option to statically
enable some bits.

That does however remind me that it is often good with these sorts of
"clever" automatic features to have an opt-out, so I'll likely add an
x-no-rp-atomics device option in the next version to provide that.


I was just going to suggest that, just in case :)




Re: [RFC PATCH v2 4/4] vfio/pci: Enable AtomicOps completers on root ports

2023-05-31 Thread Philippe Mathieu-Daudé

On 27/5/23 01:15, Alex Williamson wrote:

Dynamically enable Atomic Ops completer support around realize/exit of
vfio-pci devices reporting host support for these accesses and adhering
to a minimal configuration standard.  While the Atomic Ops completer
bits in the root port device capabilities2 register are read-only, the
PCIe spec does allow RO bits to change to reflect hardware state.  We
take advantage of that here around the realize and exit functions of
the vfio-pci device.

Signed-off-by: Alex Williamson 
---
  hw/vfio/pci.c | 78 +++
  hw/vfio/pci.h |  1 +
  2 files changed, 79 insertions(+)




+static void vfio_pci_enable_rp_atomics(VFIOPCIDevice *vdev)
+{
+struct vfio_device_info_cap_pci_atomic_comp *cap;
+g_autofree struct vfio_device_info *info = NULL;
+PCIBus *bus = pci_get_bus(&vdev->pdev);
+PCIDevice *parent = bus->parent_dev;
+struct vfio_info_cap_header *hdr;
+uint32_t mask = 0;
+uint8_t *pos;
+
+/*
+ * PCIe Atomic Ops completer support is only added automatically for single
+ * function devices downstream of a root port supporting DEVCAP2.  Support
+ * is added during realize and, if added, removed during device exit.  The
+ * single function requirement avoids conflicting requirements should a
+ * slot be composed of multiple devices with differing capabilities.
+ */
+if (pci_bus_is_root(bus) || !parent || !parent->exp.exp_cap ||
+pcie_cap_get_type(parent) != PCI_EXP_TYPE_ROOT_PORT ||
+pcie_cap_get_version(parent) != PCI_EXP_FLAGS_VER2 ||
+vdev->pdev.devfn ||
+vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+return;
+}
+
+pos = parent->config + parent->exp.exp_cap + PCI_EXP_DEVCAP2;
+
+/* Abort if there'a already an Atomic Ops configuration on the root port */


Optional here: trace event logging pci_get_long(pos).


+if (pci_get_long(pos) & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+ PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
+ PCI_EXP_DEVCAP2_ATOMIC_COMP128)) {
+return;
+}
+
+info = vfio_get_device_info(vdev->vbasedev.fd);
+if (!info) {
+return;
+}
+
+hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_PCI_ATOMIC_COMP);
+if (!hdr) {
+return;
+}
+
+cap = (void *)hdr;
+if (cap->flags & VFIO_PCI_ATOMIC_COMP32) {
+mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP32;
+}
+if (cap->flags & VFIO_PCI_ATOMIC_COMP64) {
+mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP64;
+}
+if (cap->flags & VFIO_PCI_ATOMIC_COMP128) {
+mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP128;
+}
+
+if (!mask) {
+return;
+}


Similarly optional, trace event logging (cap->flags, mask).


+
+pci_long_test_and_set_mask(pos, mask);
+vdev->clear_parent_atomics_on_exit = true;
+}
+
+static void vfio_pci_disable_rp_atomics(VFIOPCIDevice *vdev)
+{
+if (vdev->clear_parent_atomics_on_exit) {
+PCIDevice *parent = pci_get_bus(&vdev->pdev)->parent_dev;
+uint8_t *pos = parent->config + parent->exp.exp_cap + PCI_EXP_DEVCAP2;
+
+pci_long_test_and_clear_mask(pos, PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+  PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
+  PCI_EXP_DEVCAP2_ATOMIC_COMP128);
+}
+}


Reviewed-by: Philippe Mathieu-Daudé 




Re: [RFC PATCH v2 0/4] vfio/pci: Atomic Ops completer support

2023-05-31 Thread Alex Williamson
On Wed, 31 May 2023 23:55:41 +0200
Robin Voetter  wrote:

> Hi Alex,
> 
> Thanks for taking the time to implement support for Atomic Op completer
> support properly :). I have tested out these patches and the kernel
> patch, and apart from a minor issue with patch 2 everything works fine;

Yes, Cedric noted the extra semicolon as well, I'm about to post that
patch as non-RFC since it has no dependencies on anything else.

> ROCm programs that use device->host atomic operations work properly.

Great, thanks for testing!

> Something that I have been thinking about, are there any implications
> involved with enabling this feature automatically with no possibility of
> turning it off? I have no use case for that, though, and I cant really
> think of a reason other than preventing the guest from finding out
> hardware details about the host.

Not sure I follow, are you suggesting that Atomic Ops completion
support is proactively enabled in the VM to match the host, regardless
of attached assigned devices?  An obvious problem there is migration.
If I start a VM on a host with Atomic Ops support and want to migrate
to a host without Atomic Ops support, config space of the root ports is
now different and the migration fails.  QEMU would also require
elevated privileges to read config space to determine the host support,
and then what does it do if only some of the PCIe hierarchy supports
Atomic Ops.

Policy decisions like that are generally left to management tools, so
there would always be a means to enable or disable the feature.  In
fact, that's specifically why I test that the Atomic Op completer bits
are unset in the root port before changing them so that this automatic
enablement could live alongside a command line option to statically
enable some bits.

That does however remind me that it is often good with these sorts of
"clever" automatic features to have an opt-out, so I'll likely add an
x-no-rp-atomics device option in the next version to provide that.
Thanks,

Alex




Re: [PULL 00/27] tcg patch queue

2023-05-31 Thread Richard Henderson

On 5/31/23 11:07, Richard Henderson wrote:

On 5/31/23 09:12, Thomas Huth wrote:

On 31/05/2023 03.08, Richard Henderson wrote:

On 5/30/23 11:59, Richard Henderson wrote:

The following changes since commit 7fe6cb68117ac856e03c93d18aca09de015392b0:

   Merge tag 'pull-target-arm-20230530-1' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-05-30 08:02:05 
-0700)


are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230530

for you to fetch changes up to 276d77de503e8f5f5cbd3f7d94302ca12d1d982e:

   tests/decode: Add tests for various named-field cases (2023-05-30 10:55:39 
-0700)


Improvements to 128-bit atomics:
   - Separate __int128_t type and arithmetic detection
   - Support 128-bit load/store in backend for i386, aarch64, ppc64, s390x
   - Accelerate atomics via host/include/
Decodetree:
   - Add named field syntax
   - Move tests to meson


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


Too bad that we've run out of CI minutes for the Windows jobs ... FYI, this is causing 
now failure in the msys2 jobs:


  https://gitlab.com/thuth/qemu/-/jobs/4385862382#L4821
  https://gitlab.com/thuth/qemu/-/jobs/4385862378#L4632


Grr.  It doesn't even say what failed, just wrong exit code.
I wonder if the -o /dev/null is getting in the way...


Whee.  From meson-logs/testlog.txt:

FileNotFoundError: [Errno 2] No such file or directory: '/dev/null'


r~



Re: [RFC PATCH v2 3/4] pcie: Add a PCIe capability version helper

2023-05-31 Thread Philippe Mathieu-Daudé

On 1/6/23 00:02, Robin Voetter wrote:

On 5/27/23 01:15, Alex Williamson wrote:

Report the PCIe capability version for a device

Signed-off-by: Alex Williamson 


Reviewed-by: Robin Voetter 
Tested-by: Robin Voetter 

Reviewed-by: Philippe Mathieu-Daudé 



Kind regards,

Robin Voetter






Re: [RFC PATCH v2 4/4] vfio/pci: Enable AtomicOps completers on root ports

2023-05-31 Thread Robin Voetter



On 5/27/23 01:15, Alex Williamson wrote:
> Dynamically enable Atomic Ops completer support around realize/exit of
> vfio-pci devices reporting host support for these accesses and adhering
> to a minimal configuration standard.  While the Atomic Ops completer
> bits in the root port device capabilities2 register are read-only, the
> PCIe spec does allow RO bits to change to reflect hardware state.  We
> take advantage of that here around the realize and exit functions of
> the vfio-pci device.
> 
> Signed-off-by: Alex Williamson 
Reviewed-by: Robin Voetter 
Tested-by: Robin Voetter 

Kind regards,

Robin Voetter



Re: [RFC PATCH v2 3/4] pcie: Add a PCIe capability version helper

2023-05-31 Thread Robin Voetter
On 5/27/23 01:15, Alex Williamson wrote:
> Report the PCIe capability version for a device
> 
> Signed-off-by: Alex Williamson Reviewed-by: Robin 
> Voetter 
Tested-by: Robin Voetter 

Kind regards,

Robin Voetter



Re: [RFC PATCH v2 0/4] vfio/pci: Atomic Ops completer support

2023-05-31 Thread Robin Voetter
Hi Alex,

Thanks for taking the time to implement support for Atomic Op completer
support properly :). I have tested out these patches and the kernel
patch, and apart from a minor issue with patch 2 everything works fine;
ROCm programs that use device->host atomic operations work properly.

Something that I have been thinking about, are there any implications
involved with enabling this feature automatically with no possibility of
turning it off? I have no use case for that, though, and I cant really
think of a reason other than preventing the guest from finding out
hardware details about the host.

Thanks,

Robin Voetter, Stream HPC



Re: [PULL 01/11] *-user: remove the guest_user_syscall tracepoints

2023-05-31 Thread Richard Henderson

On 5/31/23 12:48, Stefan Hajnoczi wrote:

--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -531,7 +531,6 @@ abi_long do_freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
  CPUState *cpu = env_cpu(cpu_env);
  abi_long ret;
  
-trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8);

  if (do_strace) {
  print_freebsd_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
  }
@@ -541,7 +540,6 @@ abi_long do_freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
  if (do_strace) {
  print_freebsd_syscall_ret(num, ret);
  }
-trace_guest_user_syscall_ret(cpu, num, ret);
  
  return ret;

  }


Failed to remove the now unused cpu variable:

https://gitlab.com/qemu-project/qemu/-/jobs/4387911615#L6426

../bsd-user/freebsd/os-syscall.c:531:15: error: unused variable 'cpu' 
[-Werror,-Wunused-variable]

CPUState *cpu = env_cpu(cpu_env);
  ^
1 error generated.


r~



Re: [PULL 00/21] Migration 20230530 patches

2023-05-31 Thread Richard Henderson

On 5/31/23 14:03, Juan Quintela wrote:

Richard Henderson  wrote:

On 5/30/23 11:25, Juan Quintela wrote:

The following changes since commit aa9bbd865502ed517624ab6fe7d4b5d89ca95e43:
Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu
into staging (2023-05-29 14:31:52 -0700)
are available in the Git repository at:
https://gitlab.com/juan.quintela/qemu.git
tags/migration-20230530-pull-request
for you to fetch changes up to
c63c544005e6b1375a9c038f0e0fb8dfb8b249f4:
migration/rdma: Check sooner if we are in postcopy for
save_page() (2023-05-30 19:23:50 +0200)



Added Markus and Daniel.


Migration 20230530 Pull request (take 2)
Hi
Resend last PULL request, this time it compiles when CONFIG_RDMA is
not configured in.
[take 1]
On this PULL request:
- Set vmstate migration failure right (vladimir)
- Migration QEMUFileHook removal (juan)
- Migration Atomic counters (juan)
Please apply.

Juan Quintela (16):
migration: Don't abuse qemu_file transferred for RDMA
migration/RDMA: It is accounting for zero/normal pages in two places
migration/rdma: Remove QEMUFile parameter when not used
migration/rdma: Don't use imaginary transfers
migration: Remove unused qemu_file_credit_transfer()
migration/rdma: Simplify the function that saves a page
migration: Create migrate_rdma()
migration/rdma: Unfold ram_control_before_iterate()
migration/rdma: Unfold ram_control_after_iterate()
migration/rdma: Remove all uses of RAM_CONTROL_HOOK
migration/rdma: Unfold hook_ram_load()
migration/rdma: Create rdma_control_save_page()
qemu-file: Remove QEMUFileHooks
migration/rdma: Move rdma constants from qemu-file.h to rdma.h
migration/rdma: Remove qemu_ prefix from exported functions
migration/rdma: Check sooner if we are in postcopy for save_page()
Vladimir Sementsov-Ogievskiy (5):
runstate: add runstate_get()
migration: never fail in global_state_store()
runstate: drop unused runstate_store()
migration: switch from .vm_was_running to .vm_old_state
migration: restore vmstate on migration failure


Appears to introduce multiple avocado failures:

https://gitlab.com/qemu-project/qemu/-/jobs/4378066518#L286

Test summary:
tests/avocado/migration.py:X86_64.test_migration_with_exec: ERROR
tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: check-avocado] 
Error 1

https://gitlab.com/qemu-project/qemu/-/jobs/4378066523#L387

Test summary:
tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: check-avocado] 
Error 1

Also fails QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/migration-test

../src/migration/rdma.c:408:QIO_CHANNEL_RDMA: Object 0xf7bba680 is
not an instance of type qio-channel-rdma


I am looking at the other errors, but this one is weird.  It is failing
here:

#define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelRDMA, QIO_CHANNEL_RDMA)

In the OBJECT line.

I have no clue what problem are we having here with the object system to
decide at declaration time that a variable is not of the type that we
are declaring.

I am missing something obvious here?


This is where the inline function is declared, but you need to look at the backtrace, 
where you have applied QIO_CHANNEL_RDMA to an object that is *not* of that type.



r~



Re: [PATCH v3 3/7] hw/isa/vt82c686: Remove via_isa_set_irq()

2023-05-31 Thread Philippe Mathieu-Daudé

On 31/5/23 23:10, Bernhard Beschow wrote:

Now that via_isa_set_irq() is unused it can be removed.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Mark Cave-Ayland 
---
  include/hw/isa/vt82c686.h | 2 --
  hw/isa/vt82c686.c | 6 --
  2 files changed, 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 6/7] hw/ide/pci: Replace some magic numbers by constants

2023-05-31 Thread Philippe Mathieu-Daudé

On 31/5/23 23:10, Bernhard Beschow wrote:

Signed-off-by: Bernhard Beschow 
---
  hw/ide/pci.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH v3 5/7] hw/ide: Extract bmdma_status_writeb()

2023-05-31 Thread Philippe Mathieu-Daudé

On 31/5/23 23:10, Bernhard Beschow wrote:

Every TYPE_PCI_IDE device performs the same not-so-trivial bit manipulation by
copy'n'paste code. Extract this into bmdma_status_writeb(), mirroring
bmdma_cmd_writeb().

Signed-off-by: Bernhard Beschow 
Reviewed-by: BALATON Zoltan 
---
  include/hw/ide/pci.h | 1 +
  hw/ide/cmd646.c  | 2 +-
  hw/ide/pci.c | 5 +
  hw/ide/piix.c| 2 +-
  hw/ide/sii3112.c | 6 ++
  hw/ide/via.c | 2 +-
  6 files changed, 11 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v3 1/7] hw/ide/pci: Expose legacy interrupts as named GPIOs

2023-05-31 Thread Bernhard Beschow
Exposing the legacy IDE interrupts as GPIOs allows them to be connected in the
parent device through qdev_connect_gpio_out(), i.e. without accessing private
data of TYPE_PCI_IDE.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Mark Cave-Ayland 
---
 hw/ide/pci.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index fc9224bbc9..9a5a7089d4 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -522,10 +522,19 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState 
*d)
 bm->pci_dev = d;
 }
 
+static void pci_ide_init(Object *obj)
+{
+PCIIDEState *d = PCI_IDE(obj);
+
+qdev_init_gpio_out_named(DEVICE(d), d->isa_irq, "isa-irq",
+ ARRAY_SIZE(d->isa_irq));
+}
+
 static const TypeInfo pci_ide_type_info = {
 .name = TYPE_PCI_IDE,
 .parent = TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIIDEState),
+.instance_init = pci_ide_init,
 .abstract = true,
 .interfaces = (InterfaceInfo[]) {
 { INTERFACE_CONVENTIONAL_PCI_DEVICE },
-- 
2.40.1




[PATCH v3 4/7] hw/ide: Extract IDEBus assignment into bmdma_init()

2023-05-31 Thread Bernhard Beschow
Every invocation of bmdma_init() is followed by `d->bmdma[i].bus = &d->bus[i]`.
Resolve this redundancy by extracting it into bmdma_init().

Signed-off-by: Bernhard Beschow 
Reviewed-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Mark Cave-Ayland 
---
 hw/ide/cmd646.c  | 1 -
 hw/ide/pci.c | 1 +
 hw/ide/piix.c| 1 -
 hw/ide/sii3112.c | 1 -
 hw/ide/via.c | 1 -
 5 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a68357c1c5..a094a6e12a 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -297,7 +297,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(&d->bus[i], &d->bmdma[i], d);
-d->bmdma[i].bus = &d->bus[i];
 ide_bus_register_restart_cb(&d->bus[i]);
 }
 }
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 9a5a7089d4..4cf1e9c679 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -519,6 +519,7 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d)
 bus->dma = &bm->dma;
 bm->irq = bus->irq;
 bus->irq = qemu_allocate_irq(bmdma_irq, bm, 0);
+bm->bus = bus;
 bm->pci_dev = d;
 }
 
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..a32f7ccece 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -144,7 +144,6 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
 ide_bus_init_output_irq(&d->bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
 
 bmdma_init(&d->bus[i], &d->bmdma[i], d);
-d->bmdma[i].bus = &d->bus[i];
 ide_bus_register_restart_cb(&d->bus[i]);
 
 return true;
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index f9becdff8e..5dd3d03c29 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -287,7 +287,6 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 ide_bus_init_output_irq(&s->bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(&s->bus[i], &s->bmdma[i], s);
-s->bmdma[i].bus = &s->bus[i];
 ide_bus_register_restart_cb(&s->bus[i]);
 }
 }
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 0caae52276..91253fa4ef 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -196,7 +196,6 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 ide_bus_init_output_irq(&d->bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(&d->bus[i], &d->bmdma[i], d);
-d->bmdma[i].bus = &d->bus[i];
 ide_bus_register_restart_cb(&d->bus[i]);
 }
 }
-- 
2.40.1




[PATCH v3 3/7] hw/isa/vt82c686: Remove via_isa_set_irq()

2023-05-31 Thread Bernhard Beschow
Now that via_isa_set_irq() is unused it can be removed.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Mark Cave-Ayland 
---
 include/hw/isa/vt82c686.h | 2 --
 hw/isa/vt82c686.c | 6 --
 2 files changed, 8 deletions(-)

diff --git a/include/hw/isa/vt82c686.h b/include/hw/isa/vt82c686.h
index da1722daf2..b6e95b2851 100644
--- a/include/hw/isa/vt82c686.h
+++ b/include/hw/isa/vt82c686.h
@@ -34,6 +34,4 @@ struct ViaAC97State {
 uint32_t ac97_cmd;
 };
 
-void via_isa_set_irq(PCIDevice *d, int n, int level);
-
 #endif
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 8016c71315..57bdfb4e78 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -592,12 +592,6 @@ static const TypeInfo via_isa_info = {
 },
 };
 
-void via_isa_set_irq(PCIDevice *d, int n, int level)
-{
-ViaISAState *s = VIA_ISA(d);
-qemu_set_irq(s->isa_irqs_in[n], level);
-}
-
 static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
 {
 ViaISAState *s = opaque;
-- 
2.40.1




[PATCH v3 0/7] VIA and general PCI IDE cleanup

2023-05-31 Thread Bernhard Beschow
This series is split off from a more general PCI IDE refactoring aiming for a
common implementation of the PCI IDE controller specification for all
TYPE_PCI_IDE models [1].

The first three patches resolve a circular dependency between the VIA IDE
controller and its south bridge. The next three patches resolves redundant code
accross all TYPE_PCI_IDE models. The last patch modernizes VM state setup in
PIIX IDE.

Testing done:
* `make check`
* `make check-avocado`
* `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device \
   ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso \
   -bios pegasos2.rom`
   The machine booted successfully and a startup sound was hearable
* `qemu-system-ppc -machine sam460ex -rtc base=localtime -drive \
   if=none,id=cd,file=morphos-3.17.iso,format=raw -device \
   ide-cd,drive=cd,bus=ide.1`
   The machine booted successfully into graphical desktop environment

v3:
* Fix formatting (Mark) ... and split into two commits (Bernhard)

v2:
* Add missing Signed-off-by tag to last commit (Zoltan)

Changes since [1]:
* Turn legacy IRQs into named GPIOs (Mark)
* Don't make VIA IDE legacy IRQs routable; just wire up in host device (Zoltan)
* Rename extracted bmdma_clear_status() (Zoltan)
   ... to bmdma_status_writeb() (Mark)

[1] 
https://lore.kernel.org/qemu-devel/20230422150728.176512-1-shen...@gmail.com/

Bernhard Beschow (7):
  hw/ide/pci: Expose legacy interrupts as named GPIOs
  hw/ide/via: Wire up IDE legacy interrupts in host device
  hw/isa/vt82c686: Remove via_isa_set_irq()
  hw/ide: Extract IDEBus assignment into bmdma_init()
  hw/ide: Extract bmdma_status_writeb()
  hw/ide/pci: Replace some magic numbers by constants
  hw/ide/piix: Move registration of VMStateDescription to DeviceClass

 include/hw/ide/pci.h  |  1 +
 include/hw/isa/vt82c686.h |  2 --
 hw/ide/cmd646.c   |  3 +--
 hw/ide/pci.c  | 16 
 hw/ide/piix.c |  8 +++-
 hw/ide/sii3112.c  |  7 ++-
 hw/ide/via.c  |  9 +
 hw/isa/vt82c686.c | 11 +--
 8 files changed, 33 insertions(+), 24 deletions(-)

-- 
2.40.1




[PATCH v3 6/7] hw/ide/pci: Replace some magic numbers by constants

2023-05-31 Thread Bernhard Beschow
Signed-off-by: Bernhard Beschow 
---
 hw/ide/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 0b26a4ce9f..a25b352537 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -320,7 +320,8 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 
 void bmdma_status_writeb(BMDMAState *bm, uint32_t val)
 {
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 0x06);
+bm->status = (val & 0x60) | (bm->status & BM_STATUS_DMAING)
+ | (bm->status & ~val & (BM_STATUS_ERROR | BM_STATUS_INT));
 }
 
 static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,
-- 
2.40.1




[PATCH v3 2/7] hw/ide/via: Wire up IDE legacy interrupts in host device

2023-05-31 Thread Bernhard Beschow
Resolves circular depencency between IDE function and south bridge.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Mark Cave-Ayland 
---
 hw/ide/via.c  | 6 --
 hw/isa/vt82c686.c | 5 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 177baea9a7..0caae52276 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -31,6 +31,7 @@
 #include "sysemu/dma.h"
 #include "hw/isa/vt82c686.h"
 #include "hw/ide/pci.h"
+#include "hw/irq.h"
 #include "trace.h"
 
 static uint64_t bmdma_read(void *opaque, hwaddr addr,
@@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d)
 
 static void via_ide_set_irq(void *opaque, int n, int level)
 {
-PCIDevice *d = PCI_DEVICE(opaque);
+PCIIDEState *s = opaque;
+PCIDevice *d = PCI_DEVICE(s);
 
 if (level) {
 d->config[0x70 + n * 8] |= 0x80;
@@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int level)
 d->config[0x70 + n * 8] &= ~0x80;
 }
 
-via_isa_set_irq(pci_get_function_0(d), 14 + n, level);
+qemu_set_irq(s->isa_irq[n], level);
 }
 
 static void via_ide_reset(DeviceState *dev)
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index ca89119ce0..8016c71315 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -692,6 +692,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 if (!qdev_realize(DEVICE(&s->ide), BUS(pci_bus), errp)) {
 return;
 }
+for (i = 0; i < 2; i++) {
+qdev_connect_gpio_out_named(DEVICE(&s->ide), "isa-irq", i,
+s->isa_irqs_in[14 + i]);
+}
 
 /* Functions 2-3: USB Ports */
 for (i = 0; i < ARRAY_SIZE(s->uhci); i++) {
@@ -814,6 +818,7 @@ static void vt8231_isa_reset(DeviceState *dev)
  PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL);
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM);
 
+pci_conf[0x4c] = 0x04; /* IDE interrupt Routing */
 pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */
 pci_conf[0x67] = 0x08; /* Fast IR Config */
 pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */
-- 
2.40.1




[PATCH v3 5/7] hw/ide: Extract bmdma_status_writeb()

2023-05-31 Thread Bernhard Beschow
Every TYPE_PCI_IDE device performs the same not-so-trivial bit manipulation by
copy'n'paste code. Extract this into bmdma_status_writeb(), mirroring
bmdma_cmd_writeb().

Signed-off-by: Bernhard Beschow 
Reviewed-by: BALATON Zoltan 
---
 include/hw/ide/pci.h | 1 +
 hw/ide/cmd646.c  | 2 +-
 hw/ide/pci.c | 5 +
 hw/ide/piix.c| 2 +-
 hw/ide/sii3112.c | 6 ++
 hw/ide/via.c | 2 +-
 6 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 74c127e32f..1ff469de87 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -58,6 +58,7 @@ struct PCIIDEState {
 
 void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
 void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
+void bmdma_status_writeb(BMDMAState *bm, uint32_t val);
 extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
 
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a094a6e12a..cabe9048b1 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -144,7 +144,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 cmd646_update_irq(pci_dev);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_status_writeb(bm, val);
 break;
 case 3:
 if (bm == &bm->pci_dev->bmdma[0]) {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4cf1e9c679..0b26a4ce9f 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -318,6 +318,11 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 bm->cmd = val & 0x09;
 }
 
+void bmdma_status_writeb(BMDMAState *bm, uint32_t val)
+{
+bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 0x06);
+}
+
 static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,
 unsigned width)
 {
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a32f7ccece..47e0b474c3 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,7 +76,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 bmdma_cmd_writeb(bm, val);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_status_writeb(bm, val);
 break;
 }
 }
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 5dd3d03c29..63dc4a0494 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -149,8 +149,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 break;
 case 0x02:
 case 0x12:
-d->i.bmdma[0].status = (val & 0x60) | (d->i.bmdma[0].status & 1) |
-   (d->i.bmdma[0].status & ~val & 6);
+bmdma_status_writeb(&d->i.bmdma[0], val);
 break;
 case 0x04 ... 0x07:
 bmdma_addr_ioport_ops.write(&d->i.bmdma[0], addr - 4, val, size);
@@ -165,8 +164,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 break;
 case 0x0a:
 case 0x1a:
-d->i.bmdma[1].status = (val & 0x60) | (d->i.bmdma[1].status & 1) |
-   (d->i.bmdma[1].status & ~val & 6);
+bmdma_status_writeb(&d->i.bmdma[1], val);
 break;
 case 0x0c ... 0x0f:
 bmdma_addr_ioport_ops.write(&d->i.bmdma[1], addr - 12, val, size);
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 91253fa4ef..fff23803a6 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -75,7 +75,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 bmdma_cmd_writeb(bm, val);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_status_writeb(bm, val);
 break;
 default:;
 }
-- 
2.40.1




[PATCH v3 7/7] hw/ide/piix: Move registration of VMStateDescription to DeviceClass

2023-05-31 Thread Bernhard Beschow
The modern, declarative way to set up VM state handling is to assign to
DeviceClass::vmsd attribute.

There shouldn't be any change in behavior since dc->vmsd causes
vmstate_register_with_alias_id() to be called on the instance during
the instance init phase. vmstate_register() was also called during the
instance init phase which forwards to vmstate_register_with_alias_id()
internally. Checking the migration schema before and after this patch confirms:

before:
> qemu-system-x86_64 -S
> qemu > migrate -d exec:cat>before.mig

after:
> qemu-system-x86_64 -S
> qemu > migrate -d exec:cat>after.mig

> analyze-migration.py -d desc -f before.mig > before.json
> analyze-migration.py -d desc -f after.mig > after.json
> diff before.json after.json
-> empty

Signed-off-by: Bernhard Beschow 
---
 hw/ide/piix.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 47e0b474c3..151f206046 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -28,7 +28,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "migration/vmstate.h"
 #include "qapi/error.h"
 #include "hw/pci/pci.h"
 #include "hw/ide/piix.h"
@@ -159,8 +158,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 bmdma_setup_bar(d);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, &d->bmdma_bar);
 
-vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_pci, d);
-
 for (unsigned i = 0; i < 2; i++) {
 if (!pci_piix_init_bus(d, i, errp)) {
 return;
@@ -186,6 +183,7 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
 dc->reset = piix_ide_reset;
+dc->vmsd = &vmstate_ide_pci;
 k->realize = pci_piix_ide_realize;
 k->exit = pci_piix_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_INTEL;
@@ -208,6 +206,7 @@ static void piix4_ide_class_init(ObjectClass *klass, void 
*data)
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
 dc->reset = piix_ide_reset;
+dc->vmsd = &vmstate_ide_pci;
 k->realize = pci_piix_ide_realize;
 k->exit = pci_piix_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_INTEL;
-- 
2.40.1




Re: [PULL 00/21] Migration 20230530 patches

2023-05-31 Thread Juan Quintela
Richard Henderson  wrote:
> On 5/30/23 11:25, Juan Quintela wrote:
>> The following changes since commit aa9bbd865502ed517624ab6fe7d4b5d89ca95e43:
>>Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu
>> into staging (2023-05-29 14:31:52 -0700)
>> are available in the Git repository at:
>>https://gitlab.com/juan.quintela/qemu.git
>> tags/migration-20230530-pull-request
>> for you to fetch changes up to
>> c63c544005e6b1375a9c038f0e0fb8dfb8b249f4:
>>migration/rdma: Check sooner if we are in postcopy for
>> save_page() (2023-05-30 19:23:50 +0200)
>> 

Added Markus and Daniel.

>> Migration 20230530 Pull request (take 2)
>> Hi
>> Resend last PULL request, this time it compiles when CONFIG_RDMA is
>> not configured in.
>> [take 1]
>> On this PULL request:
>> - Set vmstate migration failure right (vladimir)
>> - Migration QEMUFileHook removal (juan)
>> - Migration Atomic counters (juan)
>> Please apply.
>> 
>> Juan Quintela (16):
>>migration: Don't abuse qemu_file transferred for RDMA
>>migration/RDMA: It is accounting for zero/normal pages in two places
>>migration/rdma: Remove QEMUFile parameter when not used
>>migration/rdma: Don't use imaginary transfers
>>migration: Remove unused qemu_file_credit_transfer()
>>migration/rdma: Simplify the function that saves a page
>>migration: Create migrate_rdma()
>>migration/rdma: Unfold ram_control_before_iterate()
>>migration/rdma: Unfold ram_control_after_iterate()
>>migration/rdma: Remove all uses of RAM_CONTROL_HOOK
>>migration/rdma: Unfold hook_ram_load()
>>migration/rdma: Create rdma_control_save_page()
>>qemu-file: Remove QEMUFileHooks
>>migration/rdma: Move rdma constants from qemu-file.h to rdma.h
>>migration/rdma: Remove qemu_ prefix from exported functions
>>migration/rdma: Check sooner if we are in postcopy for save_page()
>> Vladimir Sementsov-Ogievskiy (5):
>>runstate: add runstate_get()
>>migration: never fail in global_state_store()
>>runstate: drop unused runstate_store()
>>migration: switch from .vm_was_running to .vm_old_state
>>migration: restore vmstate on migration failure
>
> Appears to introduce multiple avocado failures:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/4378066518#L286
>
> Test summary:
> tests/avocado/migration.py:X86_64.test_migration_with_exec: ERROR
> tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
> tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
> make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: 
> check-avocado] Error 1
>
> https://gitlab.com/qemu-project/qemu/-/jobs/4378066523#L387
>
> Test summary:
> tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: ERROR
> tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
> make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: 
> check-avocado] Error 1
>
> Also fails QTEST_QEMU_BINARY=./qemu-system-aarch64 
> ./tests/qtest/migration-test
>
> ../src/migration/rdma.c:408:QIO_CHANNEL_RDMA: Object 0xf7bba680 is
> not an instance of type qio-channel-rdma

I am looking at the other errors, but this one is weird.  It is failing
here:

#define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelRDMA, QIO_CHANNEL_RDMA)

In the OBJECT line.

I have no clue what problem are we having here with the object system to
decide at declaration time that a variable is not of the type that we
are declaring.

I am missing something obvious here?

Later, Juan.



> qemu-system-aarch64: Not a migration stream
> qemu-system-aarch64: load of migration failed: Invalid argument


> Broken pipe
>
>
> r~




Re: [PATCH v2 00/10] hw/virtio: Build various target-agnostic objects just once

2023-05-31 Thread Michael S. Tsirkin
On Wed, May 31, 2023 at 10:39:48PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Michael,
> 
> On 24/5/23 11:37, Philippe Mathieu-Daudé wrote:
> > All patches reviewed.
> 
> Could you take this series via your virtio tree?
> 
> Thanks!
> 
> Phil.

Will do, in next pull. Thanks!

> > Less controvertial than my first approach [*] which caches
> > the access_is_big_endian value in VirtIODevice state, this
> > series just remove a unnecessary / pointless dependency on
> > "virtio-access.h", allowing to build various virtio objects
> > once for all targets.
> > 
> > [*] 
> > https://lore.kernel.org/qemu-devel/20221212230517.28872-11-phi...@linaro.org/
> > 
> > Philippe Mathieu-Daudé (10):
> >softmmu: Introduce qemu_target_page_mask() helper
> >hw/scsi: Introduce VHOST_SCSI_COMMON symbol in Kconfig
> >hw/scsi: Rearrange meson.build
> >hw/scsi: Rename target-specific source set as
> >  'specific_virtio_scsi_ss'
> >hw/virtio: Introduce VHOST_VSOCK_COMMON symbol in Kconfig
> >hw/virtio/virtio-mem: Use qemu_ram_get_fd() helper
> >hw/virtio/vhost-vsock: Include missing 'virtio/virtio-bus.h' header
> >hw/virtio/virtio-iommu: Use target-agnostic qemu_target_page_mask()
> >hw/virtio: Remove unnecessary 'virtio-access.h' header
> >hw/virtio: Build various target-agnostic objects just once




Re: [PATCH v2 20/23] q800: don't access Nubus bus directly from the mac-nubus-bridge device

2023-05-31 Thread Laurent Vivier

Le 31/05/2023 à 14:53, Mark Cave-Ayland a écrit :

Instead use the qdev_get_child_bus() function which is intended for this exact
purpose.

Signed-off-by: Mark Cave-Ayland 
---
  hw/m68k/q800.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index d02a1a7a1f..946cb09e30 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -445,7 +445,7 @@ static void q800_machine_init(MachineState *machine)
qdev_get_gpio_in_named(DEVICE(&m->via2), 
"nubus-irq",
   VIA2_NUBUS_IRQ_9));
  
-nubus = &NUBUS_BRIDGE(dev)->bus;

+nubus = NUBUS_BUS(qdev_get_child_bus(dev, "nubus-bus.0"));
  
  /* framebuffer in nubus slot #9 */
  


Reviewed-by: Laurent Vivier 




[PULL 3/5] python/machine: use connect-based interface for existing sockets

2023-05-31 Thread John Snow
Instead of using accept() with sockets (which uses open_with_socket()),
use calls to connect() to utilize existing sockets instead. A benefit of
this is more robust error handling already present within the connect()
call that isn't present in open_with_socket().

Signed-off-by: John Snow 
Message-id: 20230517163406.2593480-4-js...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index e57c254484..cc636cb6bd 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -337,18 +337,17 @@ def _pre_launch(self) -> None:
 self._remove_files.append(self._console_address)
 
 if self._qmp_set:
-monitor_address = None
 sock = None
 if self._monitor_address is None:
 self._sock_pair = socket.socketpair()
 sock = self._sock_pair[1]
 if isinstance(self._monitor_address, str):
 self._remove_files.append(self._monitor_address)
-monitor_address = self._monitor_address
+
 self._qmp_connection = QEMUMonitorProtocol(
-address=monitor_address,
+address=self._monitor_address,
 sock=sock,
-server=True,
+server=bool(self._monitor_address),
 nickname=self._name
 )
 
@@ -370,7 +369,10 @@ def _post_launch(self) -> None:
 if self._sock_pair:
 self._sock_pair[0].close()
 if self._qmp_connection:
-self._qmp.accept(self._qmp_timer)
+if self._sock_pair:
+self._qmp.connect()
+else:
+self._qmp.accept(self._qmp_timer)
 
 def _close_qemu_log_file(self) -> None:
 if self._qemu_log_file is not None:
-- 
2.40.1




[PULL 5/5] Revert "python/qmp/protocol: add open_with_socket()"

2023-05-31 Thread John Snow
This reverts commit a3cfea92e2030926e00a2519d299384ea648e36e.

(It's being rolled back in favor of a different API, which brings the
in-tree and out-of-tree versions of qemu.qmp back in sync.)

Signed-off-by: John Snow 
Message-id: 20230517163406.2593480-6-js...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/qmp/protocol.py | 24 +---
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/python/qemu/qmp/protocol.py b/python/qemu/qmp/protocol.py
index d534db4631..753182131f 100644
--- a/python/qemu/qmp/protocol.py
+++ b/python/qemu/qmp/protocol.py
@@ -297,19 +297,6 @@ async def start_server_and_accept(
 await self.accept()
 assert self.runstate == Runstate.RUNNING
 
-@upper_half
-@require(Runstate.IDLE)
-async def open_with_socket(self, sock: socket.socket) -> None:
-"""
-Start connection with given socket.
-
-:param sock: A socket.
-
-:raise StateError: When the `Runstate` is not `IDLE`.
-"""
-self._reader, self._writer = await asyncio.open_connection(sock=sock)
-self._set_state(Runstate.CONNECTING)
-
 @upper_half
 @require(Runstate.IDLE)
 async def start_server(self, address: SocketAddrT,
@@ -357,12 +344,11 @@ async def accept(self) -> None:
 protocol-level failure occurs while establishing a new
 session, the wrapped error may also be an `QMPError`.
 """
-if not self._reader:
-if self._accepted is None:
-raise QMPError("Cannot call accept() before start_server().")
-await self._session_guard(
-self._do_accept(),
-'Failed to establish connection')
+if self._accepted is None:
+raise QMPError("Cannot call accept() before start_server().")
+await self._session_guard(
+self._do_accept(),
+'Failed to establish connection')
 await self._session_guard(
 self._establish_session(),
 'Failed to establish session')
-- 
2.40.1




[PULL 4/5] python/qmp/legacy: remove open_with_socket() calls

2023-05-31 Thread John Snow
Favor using connect() when passing a socket instead of
open_with_socket(). Simultaneously, update constructor calls to use the
combined address argument for QEMUMonitorProtocol().

Signed-off-by: John Snow 
Message-id: 20230517163406.2593480-5-js...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py |  7 ---
 python/qemu/qmp/legacy.py  | 29 -
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index cc636cb6bd..c16a0b6fed 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -337,16 +337,17 @@ def _pre_launch(self) -> None:
 self._remove_files.append(self._console_address)
 
 if self._qmp_set:
-sock = None
 if self._monitor_address is None:
 self._sock_pair = socket.socketpair()
 sock = self._sock_pair[1]
 if isinstance(self._monitor_address, str):
 self._remove_files.append(self._monitor_address)
 
+sock_or_addr = self._monitor_address or sock
+assert sock_or_addr is not None
+
 self._qmp_connection = QEMUMonitorProtocol(
-address=self._monitor_address,
-sock=sock,
+sock_or_addr,
 server=bool(self._monitor_address),
 nickname=self._name
 )
diff --git a/python/qemu/qmp/legacy.py b/python/qemu/qmp/legacy.py
index b1eb3f360f..e1e9383978 100644
--- a/python/qemu/qmp/legacy.py
+++ b/python/qemu/qmp/legacy.py
@@ -68,34 +68,31 @@ class QEMUMonitorProtocol:
 Provide an API to connect to QEMU via QEMU Monitor Protocol (QMP)
 and then allow to handle commands and events.
 
-:param address:  QEMU address, can be either a unix socket path (string)
- or a tuple in the form ( address, port ) for a TCP
- connection or None
-:param sock: a socket or None
+:param address:  QEMU address, can be a unix socket path (string), a tuple
+ in the form ( address, port ) for a TCP connection, or an
+ existing `socket.socket` object.
 :param server:   Act as the socket server. (See 'accept')
+ Not applicable when passing a socket directly.
 :param nickname: Optional nickname used for logging.
 """
 
 def __init__(self,
- address: Optional[SocketAddrT] = None,
- sock: Optional[socket.socket] = None,
+ address: Union[SocketAddrT, socket.socket],
  server: bool = False,
  nickname: Optional[str] = None):
 
-assert address or sock
+if server and isinstance(address, socket.socket):
+raise ValueError(
+"server argument should be False when passing a socket")
+
 self._qmp = QMPClient(nickname)
 self._aloop = asyncio.get_event_loop()
 self._address = address
-self._sock = sock
 self._timeout: Optional[float] = None
 
 if server:
-if sock:
-assert self._sock is not None
-self._sync(self._qmp.open_with_socket(self._sock))
-else:
-assert self._address is not None
-self._sync(self._qmp.start_server(self._address))
+assert not isinstance(self._address, socket.socket)
+self._sync(self._qmp.start_server(self._address))
 
 _T = TypeVar('_T')
 
@@ -150,13 +147,11 @@ def connect(self, negotiate: bool = True) -> 
Optional[QMPMessage]:
 :return: QMP greeting dict, or None if negotiate is false
 :raise ConnectError: on connection errors
 """
-addr_or_sock = self._address or self._sock
-assert addr_or_sock is not None
 self._qmp.await_greeting = negotiate
 self._qmp.negotiate = negotiate
 
 self._sync(
-self._qmp.connect(addr_or_sock)
+self._qmp.connect(self._address)
 )
 return self._get_greeting()
 
-- 
2.40.1




[PULL 1/5] python/qmp: allow sockets to be passed to connect()

2023-05-31 Thread John Snow
Allow existing sockets to be passed to connect(). The changes are pretty
minimal, and this allows for far greater flexibility in setting up
communications with an endpoint.

Signed-off-by: John Snow 
Message-id: 20230517163406.2593480-2-js...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/qmp/protocol.py | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/python/qemu/qmp/protocol.py b/python/qemu/qmp/protocol.py
index 22e60298d2..d534db4631 100644
--- a/python/qemu/qmp/protocol.py
+++ b/python/qemu/qmp/protocol.py
@@ -370,7 +370,7 @@ async def accept(self) -> None:
 
 @upper_half
 @require(Runstate.IDLE)
-async def connect(self, address: SocketAddrT,
+async def connect(self, address: Union[SocketAddrT, socket.socket],
   ssl: Optional[SSLContext] = None) -> None:
 """
 Connect to the server and begin processing message queues.
@@ -615,7 +615,7 @@ async def _do_accept(self) -> None:
 self.logger.debug("Connection accepted.")
 
 @upper_half
-async def _do_connect(self, address: SocketAddrT,
+async def _do_connect(self, address: Union[SocketAddrT, socket.socket],
   ssl: Optional[SSLContext] = None) -> None:
 """
 Acting as the transport client, initiate a connection to a server.
@@ -634,9 +634,17 @@ async def _do_connect(self, address: SocketAddrT,
 # otherwise yield.
 await asyncio.sleep(0)
 
-self.logger.debug("Connecting to %s ...", address)
-
-if isinstance(address, tuple):
+if isinstance(address, socket.socket):
+self.logger.debug("Connecting with existing socket: "
+  "fd=%d, family=%r, type=%r",
+  address.fileno(), address.family, address.type)
+connect = asyncio.open_connection(
+limit=self._limit,
+ssl=ssl,
+sock=address,
+)
+elif isinstance(address, tuple):
+self.logger.debug("Connecting to %s ...", address)
 connect = asyncio.open_connection(
 address[0],
 address[1],
@@ -644,13 +652,14 @@ async def _do_connect(self, address: SocketAddrT,
 limit=self._limit,
 )
 else:
+self.logger.debug("Connecting to file://%s ...", address)
 connect = asyncio.open_unix_connection(
 path=address,
 ssl=ssl,
 limit=self._limit,
 )
+
 self._reader, self._writer = await connect
-
 self.logger.debug("Connected.")
 
 @upper_half
-- 
2.40.1




[PULL 2/5] python/qmp/legacy: allow using sockets for connect()

2023-05-31 Thread John Snow
Instead of asserting that we have an address, allow the use of sockets
instead of addresses during a call to connect().

Signed-off-by: John Snow 
Message-id: 20230517163406.2593480-3-js...@redhat.com
Signed-off-by: John Snow 
---
 python/qemu/qmp/legacy.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/python/qemu/qmp/legacy.py b/python/qemu/qmp/legacy.py
index 8b09ee7dbb..b1eb3f360f 100644
--- a/python/qemu/qmp/legacy.py
+++ b/python/qemu/qmp/legacy.py
@@ -150,12 +150,13 @@ def connect(self, negotiate: bool = True) -> 
Optional[QMPMessage]:
 :return: QMP greeting dict, or None if negotiate is false
 :raise ConnectError: on connection errors
 """
-assert self._address is not None
+addr_or_sock = self._address or self._sock
+assert addr_or_sock is not None
 self._qmp.await_greeting = negotiate
 self._qmp.negotiate = negotiate
 
 self._sync(
-self._qmp.connect(self._address)
+self._qmp.connect(addr_or_sock)
 )
 return self._get_greeting()
 
-- 
2.40.1




  1   2   3   4   5   >