Re: [PATCH] vdpa: fix gcc cvq_isolated uninitialized variable warning

2023-09-14 Thread Jason Wang
On Tue, Sep 12, 2023 at 5:54 AM Stefan Hajnoczi  wrote:
>
> gcc 13.2.1 emits the following warning:
>
>   net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’:
>   net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used uninitialized 
> [-Werror=maybe-uninitialized]
>1394 | s->cvq_isolated = cvq_isolated;
> | ^~
>   net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here
>1355 | int cvq_isolated;
> | ^~~~
>   cc1: all warnings being treated as errors
>
> Cc: Eugenio Pérez 
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Jason Wang 

Thanks

> ---
>  net/vhost-vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 34202ca009..7eaee841aa 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1352,7 +1352,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  VhostVDPAState *s;
>  int ret = 0;
>  assert(name);
> -int cvq_isolated;
> +int cvq_isolated = 0;
>
>  if (is_datapath) {
>  nc = qemu_new_net_client(_vhost_vdpa_info, peer, device,
> --
> 2.41.0
>




Re: [PATCH v3 14/14] RFC tcg/ppc: Disable TCG_REG_TB for Power9/Power10

2023-09-14 Thread Jordan Niethe
On Wed, Aug 16, 2023 at 5:57 AM Richard Henderson
 wrote:
>
> This may or may not improve performance.
> It appears to result in slightly larger code,
> but perhaps not enough to matter.

I have collected some power9 macro performance data for an smp compile workload:

Setup
-

- Power9 powernv host
- mttcg smp 8 guest

Method
--

- Warm up compile skiboot (https://github.com/open-power/skiboot)
- Average time taken for 5 trials compiling skiboot with -j `nproc`

Results
---


|Patch| Mean time (s) | stdev | Decrease (%) |
|-|---|---|--|
| tcg: Add tcg_out_tb_start...|161.77 |  2.39 |- |
| tcg/ppc: Enable direct branching... |145.81 |  1.71 |  9.9 |
| tcg/ppc: Use ADDPCIS... |146.44 |  1.28 |  9.5 |
| RFC tcg/ppc: Disable TCG_REG_TB...  |145.95 |  1.07 |  9.7 |


- Enabling direct branching is a performance gain, beyond that less conclusive.
- Using pcaddis for direct branching seems slightly better than bl +4
sequence for ISA v3.0.
- PC relative addressing seems slightly better than TOC relative addressing.

Any other suggestions for performance comparison?
I still have to try on a Power10.

>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/ppc/tcg-target.c.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 20aaa90af2..c1e0efb498 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -83,7 +83,7 @@
>  #define TCG_VEC_TMP2TCG_REG_V1
>
>  #define TCG_REG_TB TCG_REG_R31
> -#define USE_REG_TB (TCG_TARGET_REG_BITS == 64)
> +#define USE_REG_TB (TCG_TARGET_REG_BITS == 64 && !have_isa_3_00)
>
>  /* Shorthand for size of a pointer.  Avoid promotion to unsigned.  */
>  #define SZP  ((int)sizeof(void *))
> --
> 2.34.1
>



Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem

2023-09-14 Thread Xiaoyao Li

On 9/15/2023 10:04 AM, Wang, Lei wrote:

On 9/14/2023 11:50, Xiaoyao Li wrote:

From: Chao Peng

Add KVM gmem support to RAMBlock so both normal hva based memory
and kvm gmem fd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
gmem for the RAMBlock when it's set.

Signed-off-by: Xiaoyao Li

Kindly reminding the author's Signed-off-by is missing.


I will fix it.
Thanks!





Re: [RFC PATCH v2 00/21] QEMU gmem implemention

2023-09-14 Thread Xiaoyao Li

On 9/14/2023 9:09 PM, David Hildenbrand wrote:

On 14.09.23 05:50, Xiaoyao Li wrote:

It's the v2 RFC of enabling KVM gmem[1] as the backend for private
memory.

For confidential-computing, KVM provides gmem/guest_mem interfaces for
userspace, like QEMU, to allocate user-unaccesible private memory. This
series aims to add gmem support in QEMU's RAMBlock so that each RAM can
have both hva-based shared memory and gmem_fd based private memory. QEMU
does the shared-private conversion on KVM_MEMORY_EXIT and discards the
memory.

It chooses the design that adds "private" property to hostmeory backend.
If "private" property is set, QEMU will allocate/create KVM gmem when
initialize the RAMbloch of the memory backend.

This sereis also introduces the first user of kvm gmem,
KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
can be created with

   $qemu -object sw-protected-vm,id=sp-vm0 \
-object memory-backend-ram,id=mem0,size=1G,private=on \
-machine 
q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \

...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault, because KVM doesn't support emulating 
string IO

to private memory.


Is support being added? Or have we figured out what it would take to 
make it work?


Hi David,

I only reply the questions that werrn't covered by Sean's reply.

How does this interact with other features (memory ballooning, virtiofs, 
vfio/mdev/...)?


I need time to learn them before I can answer it.



This version still leave some opens to be discussed:
1. whether we need "private" propery to be user-settable?

    It seems unnecessary because vm-type is determined. If the VM is
    confidential-guest, then the RAM of the guest must be able to be
    mapped as private, i.e., have kvm gmem backend. So QEMU can
    determine the value of "private" property automatiacally based on vm
    type.

    This also aligns with the board internal MemoryRegion that needs to
    have kvm gmem backend, e.g., TDX requires OVMF to act as private
    memory so bios memory region needs to have kvm gmem fd associated.
    QEMU no doubt will do it internally automatically.


Would it make sense to have some regions without "pivate" semantics? 
Like NVDIMMs?


Of course it can have regions without "private" semantics.

Whether a region needs "private" backend depends on the definition of VM 
type. E.g., for TDX,
 - all the RAM needs to able to mapped as private. So it needs private 
gmem.

 - TDVF(OVMF) code must be mapped as private. So it needs private gmem.
 - MMIO region needs to be shared for TDX 1.0, and it doesn't need 
private gmem;




2. hugepage support.

    KVM gmem can be allocated from hugetlbfs. How does QEMU determine
    when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
    easiest solution is create KVM gmem with 
KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

    only when memory backend is HostMemoryBackendFile of hugetlbfs.


Good question.

Probably "if the memory backend uses huge pages, also use huge pages for 
the private gmem" makes sense.


... but it becomes a mess with preallocation ... which is what people 
should actually be using with hugetlb. Andeventual double 
memory-consumption ... but maybe that's all been taken care of already?


Probably it's best to leave hugetlb support as future work and start 
with something minimal.




As Sean replied, I had some misunderstanding of 
KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. If it's for THP, I think we can allow it 
for every gmem.


As for hugetlb, we can leave it as future work.




Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-14 Thread Zhijian Li (Fujitsu)


On 15/09/2023 04:20, “William Roche wrote:
> From: William Roche 
> 
> A memory page poisoned from the hypervisor level is no longer readable.
> Thus, it is now treated as a zero-page for the ram saving migration phase.
> 
> The migration of a VM will crash Qemu when it tries to read the
> memory address space and stumbles on the poisoned page with a similar
> stack trace:
> 
> Program terminated with signal SIGBUS, Bus error.
> #0  _mm256_loadu_si256
> #1  buffer_zero_avx2
> #2  select_accel_fn
> #3  buffer_is_zero
> #4  save_zero_page_to_file
> #5  save_zero_page
> #6  ram_save_target_page_legacy
> #7  ram_save_host_page
> #8  ram_find_and_save_block
> #9  ram_save_iterate
> #10 qemu_savevm_state_iterate
> #11 migration_iteration_run
> #12 migration_thread
> #13 qemu_thread_start
> 
> Fix it by considering poisoned pages as if they were zero-pages for
> the migration copy. This fix also works with underlying large pages,
> taking into account the RAMBlock segment "page-size".
> 
> Standard migration and compressed transfers are handled by this code.
> RDMA transfer isn't touched.
> 


I'm okay with "RDMA isn't touched".
BTW, could you share your reproducing program/hacking to poison the page, so 
that
i am able to take a look the RDMA part later when i'm free.

Not sure it's suitable to acknowledge a not touched part. Anyway
Acked-by: Li Zhijian  # RDMA


> Signed-off-by: William Roche 
> ---
>   accel/kvm/kvm-all.c  | 14 ++
>   accel/stubs/kvm-stub.c   |  5 +
>   include/sysemu/kvm.h | 10 ++
>   migration/ram-compress.c |  3 ++-
>   migration/ram.c  | 23 +--
>   migration/ram.h  |  2 ++
>   6 files changed, 54 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index ff1578bb32..7fb13c8a56 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1152,6 +1152,20 @@ static void kvm_unpoison_all(void *param)
>   }
>   }
>   
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *offset)
> +{
> +HWPoisonPage *pg;
> +ram_addr_t ram_addr = (ram_addr_t) offset;
> +
> +QLIST_FOREACH(pg, _page_list, list) {
> +if ((ram_addr >= pg->ram_addr) &&
> +(ram_addr - pg->ram_addr < block->page_size)) {
> +return true;
> +}
> +}
> +return false;
> +}
> +
>   void kvm_hwpoison_page_add(ram_addr_t ram_addr)
>   {
>   HWPoisonPage *page;
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index 235dc661bc..c0a31611df 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -133,3 +133,8 @@ uint32_t kvm_dirty_ring_size(void)
>   {
>   return 0;
>   }
> +
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr)
> +{
> +return false;
> +}
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index ee9025f8e9..858688227a 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -570,4 +570,14 @@ bool kvm_arch_cpu_check_are_resettable(void);
>   bool kvm_dirty_ring_enabled(void);
>   
>   uint32_t kvm_dirty_ring_size(void);
> +
> +/**
> + * kvm_hwpoisoned_page - indicate if the given page is poisoned
> + * @block: memory block of the given page
> + * @ram_addr: offset of the page
> + *
> + * Returns: true: page is poisoned
> + *  false: page not yet poisoned
> + */
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr);
>   #endif
> diff --git a/migration/ram-compress.c b/migration/ram-compress.c
> index 06254d8c69..1916ce709d 100644
> --- a/migration/ram-compress.c
> +++ b/migration/ram-compress.c
> @@ -34,6 +34,7 @@
>   #include "qemu/error-report.h"
>   #include "migration.h"
>   #include "options.h"
> +#include "ram.h"
>   #include "io/channel-null.h"
>   #include "exec/target_page.h"
>   #include "exec/ramblock.h"
> @@ -198,7 +199,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
> z_stream *stream,
>   
>   assert(qemu_file_buffer_empty(f));
>   
> -if (buffer_is_zero(p, page_size)) {
> +if (migration_buffer_is_zero(block, offset, page_size)) {
>   return RES_ZEROPAGE;
>   }
>   
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..fd337f7e65 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1129,6 +1129,26 @@ void ram_release_page(const char *rbname, uint64_t 
> offset)
>   ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
>   }
>   
> +/**
> + * migration_buffer_is_zero: indicate if the page at the given
> + * location is entirely filled with zero, or is a poisoned page.
> + *
> + * @block: block that contains the page
> + * @offset: offset inside the block for the page
> + * @len: size to consider
> + */
> +bool migration_buffer_is_zero(RAMBlock *block, ram_addr_t offset,
> + size_t len)
> +{
> +uint8_t *p = block->host + offset;
> +
> +if (kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) {
> +return true;
> +}
> +
> +return 

RE: [PATCH v1 02/22] Update linux-header to support iommufd cdev and hwpt alloc

2023-09-14 Thread Duan, Zhenzhong
Hi Eric,

>-Original Message-
>From: Eric Auger 
>Sent: Thursday, September 14, 2023 10:46 PM
>Subject: Re: [PATCH v1 02/22] Update linux-header to support iommufd cdev and
>hwpt alloc
>
>Hi Zhenzhong,
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> From https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
>> branch: for_next
>> commit id: eb501c2d96cfce6b42528e8321ea085ec605e790
>I see that in your branch you have now updated against v6.6-rc1. However
>you should run a full ./scripts/update-linux-headers.sh,
>ie. not only importing the changes in linux-headers/linux/iommufd.h as
>it seems to do but also import all changes brought with this linux version.

Found reason. The base is already against v6.6-rc1, [PATCH v1 01/22] added
Iommufd.h into script and this patch added it.
I agree the subject is confusing, need to be like "Update iommufd.h to 
linux-header"
I'll fix the subject in next version, thanks for point out.

BR.
Zhenzhong



Re: [PATCH v2] target/riscv: update checks on writing pmpcfg for Smepmp version 1.0

2023-09-14 Thread 張哲嘉
> On Fri, Sep 08, 2023 at 04:38:34PM +0800, Alvin Chang wrote:

> > Current checks on writing pmpcfg for Smepmp follows Smepmp version

> > 0.9.1. However, Smepmp specification has already been ratified, and

> > there are some differences between version 0.9.1 and 1.0. In this

> > commit we update the checks of writing pmpcfg to follow Smepmp version

> 1.0.

> >

> > When mseccfg.MML is set, the constraints to modify PMP rules are:

> > 1. Locked rules connot be removed or modified until a PMP reset, unless

> >mseccfg.RLB is set.

> > 2. From Smepmp specification version 1.0, chapter 2 section 4b:

> >Adding a rule with executable privileges that either is M-mode-only

> >or a locked Shared-Region is not possible and such pmpcfg writes are

> >ignored, leaving pmpcfg unchanged.

> >

> > The commit transfers the value of pmpcfg into the index of the Smepmp

> > truth table, and checks the rules by aforementioned specification

> > changes.

> >

> > Signed-off-by: Alvin Chang 

> > ---

> > Changes from v1: Convert ePMP over to Smepmp.

> >

> >  target/riscv/pmp.c | 51

> > ++

> >  1 file changed, 42 insertions(+), 9 deletions(-)

> >

> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index

> > 9d8db493e6..d1c3fc1e4f 100644

> > --- a/target/riscv/pmp.c

> > +++ b/target/riscv/pmp.c

> > @@ -98,16 +98,49 @@ static bool pmp_write_cfg(CPURISCVState *env,

> uint32_t pmp_index, uint8_t val)

> >  locked = false;

> >  }

> >

> > -/* mseccfg.MML is set */

> > -if (MSECCFG_MML_ISSET(env)) {

> > -/* not adding execute bit */

> > -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) !=

> PMP_EXEC) {

> > -locked = false;

> > -}

> > -/* shared region and not adding X bit */

> > -if ((val & PMP_LOCK) != PMP_LOCK &&

> > -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {

> > +/*

> > + * mseccfg.MML is set. Locked rules cannot be removed or

> modified

> > + * until a PMP reset. Besides, from Smepmp specification

> version 1.0

> > + * , chapter 2 section 4b says:

> > + * Adding a rule with executable privileges that either is

> > + * M-mode-only or a locked Shared-Region is not possible

> and such

> > + * pmpcfg writes are ignored, leaving pmpcfg unchanged.

> > + */

> > +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env,

> pmp_index)) {

> > +/*

> > + * Convert the PMP permissions to match the truth

> table in the

> > + * ePMP spec.

> > + */

> > +const uint8_t epmp_operation =

> > +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) <<

> 2) |

> > +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2);

> > +

> > +switch (epmp_operation) {

> > +/* pmpcfg.L = 0. Neither M-mode-only nor locked

> Shared-Region */

> > +case 0:

> > +case 1:

> > +case 2:

> > +case 3:

> > +case 4:

> > +case 5:

> > +case 6:

> > +case 7:

> > +/* pmpcfg.L = 1 and pmpcfg.X = 0 (but case 10 is not

> allowed) */

> > +case 8:

>

> case 0 ... 8:

>


OK, will apply case ranges.


> > +case 12:

> > +case 14:

> > +/* pmpcfg.LRWX =  */

> > +case 15:  /* Read-only locked Shared-Region on all

> > + modes */

> >  locked = false;

> > +break;

> > +/* Other rules which add new code regions are not

> allowed */

> > +case 9:

> > +case 10:  /* Execute-only locked Shared-Region on all

> modes */

> > +case 11:

>

> case 9 ... 11:

>

> And why not put these cases in numerical order?

>


Agree, I will put them in numerical order.


> > +case 13:

> > +break;

> > +default:

> > +g_assert_not_reached();

> >  }

> >  }

> >  } else {

> > --

> > 2.34.1

> >

> >

>

> It looks like this patch has overlap with

>

> https://lore.kernel.org/all/20230907062440.1174224-1-mchitale@ventanamicr


> o.com/


>

> Maybe you and Mayuresh can work together on a final patch.

>


It seems Mayuresh's patch is to reset PMP entries and mseccfg when CPU
resets.

This patch is to check the valid setting of pmpcfg at runtime, when CPU
supports Smepmp.

I think they are two 

Re: [PATCH v2 04/10] Introduce the CPU address space destruction function

2023-09-14 Thread lixianglai

Hi David Hildenbrand:



Hi David Hildenbrand:

On 14.09.23 15:00, lixianglai wrote:

Hi David:


Hi!




On 12.09.23 04:11, xianglai li wrote:

Introduce new function to destroy CPU address space resources
for cpu hot-(un)plug.


How do other archs handle that? Or how are they able to get away
without destroying?

They do not remove the cpu address space, taking the X86 
architecture as

an example:

1.Start the x86 VM:

./qemu-system-x86_64 \
-machine q35  \
-cpu Broadwell-IBRS \
-smp 1,maxcpus=100,sockets=100,cores=1,threads=1 \
-m 4G \
-drive file=~/anolis-8.8.qcow2  \
-serial stdio   \
-monitor telnet:localhost:4498,server,nowait   \
-nographic

2.Connect the qemu monitor

telnet 127.0.0.1 4498

info mtree

address-space: cpu-memory-0
address-space: memory
    - (prio 0, i/o): system
      -7fff (prio 0, ram): alias 
ram-below-4g

@pc.ram -7fff
      - (prio -1, i/o): pci
    000a-000b (prio 1, i/o): vga-lowmem

3.Perform cpu hot swap int qemu monitor

device_add
Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
device_del cpu1



Hm, doesn't seem to work for me on upstream QEMU for some reason: 
"Error: acpi: device unplug request for not supported device type: 
Broadwell-IBRS-x86_64-cpu"


First I use qemu tcg, and then the cpu needs to be removed after the 
operating system is booted.


Thanks,

xianglai.




What happens if you re-add that CPU? Will we reuse the previous 
address space?



Here is the memory layout where I inserted cpu1 again. It does not 
appear that the original address space was reused, and the address 
space is now duplicated


info mtree

address-space: cpu-memory-0
address-space: cpu-memory-1
address-space: cpu-memory-1
address-space: memory
  - (prio 0, i/o): system
    -7fff (prio 0, ram): alias 
ram-below-4g @pc.ram -7fff

    - (prio -1, i/o): pci
  000a-000a (prio 2, ram): alias 
vga.chain4 @vga.vram -

  000a-000b (prio 1, i/o): vga-lowmem
  000c-000d (prio 1, rom): pc.rom
  000e-000f (prio 1, rom): alias isa-bios 
@pc.bios 0002-0003

  fd00-fdff (prio 1, ram): vga.vram


In addition, I do not find the corresponding resource release action 
for cpu->cpu_ases requested in function cpu_address_space_init.


I wonder if there is a leak in the memory space requested here. Maybe 
qemu automatically reclaims memory space


or frees resources somewhere else I didn't find? I thought I'd try 
running the following valgrind to see if I could verify my suspicions.


void cpu_address_space_init(CPUState *cpu, int asidx,
    const char *prefix, MemoryRegion *mr)
{

...

    if (!cpu->cpu_ases) {
    cpu->cpu_ases = g_new0(CPUAddressSpace, cpu->num_ases);
    }

...

}




info mtree

address-space: cpu-memory-0
address-space: cpu-memory-1
address-space: memory
    - (prio 0, i/o): system
      -7fff (prio 0, ram): alias 
ram-below-4g

@pc.ram -7fff
      - (prio -1, i/o): pci
    000a-000b (prio 1, i/o): vga-lowmem


  From the above test, you can see whether the address space of cpu1 is
residual after a cpu hot swap, and whether it is reasonable?



Probably we should teach other archs to destroy that address space as 
well.


Can we do that from the core, instead of having to do that in each 
CPU unrealize function?


I think it can also be done in the public code flow. Since I refer to 
arm's scheme


(https://lore.kernel.org/all/20200613213629.21984-1-salil.me...@huawei.com/), 



and arm's patch will be issued soon, I will conduct rebase based on 
arm patch in the future.


Therefore, I would like to see if arm has any good suggestions. If 
there are no good suggestions at this stage,


I think we can shelve this problem for the first time, and I can 
consider not referencing this function for the first time,


and we can submit another patch to solve this problem.

Hi Salil Mehta:

Is the cpu_address_space_destroy function still present in the new 
patch version of arm?


Can we put this function on the public path of cpu destroy?


Thanks,

xianglai.






Re: [PATCH v2 04/10] Introduce the CPU address space destruction function

2023-09-14 Thread lixianglai



Hi David Hildenbrand:

On 14.09.23 15:00, lixianglai wrote:

Hi David:


Hi!




On 12.09.23 04:11, xianglai li wrote:

Introduce new function to destroy CPU address space resources
for cpu hot-(un)plug.


How do other archs handle that? Or how are they able to get away
without destroying?


They do not remove the cpu address space, taking the X86 architecture as
an example:

1.Start the x86 VM:

./qemu-system-x86_64 \
-machine q35  \
-cpu Broadwell-IBRS \
-smp 1,maxcpus=100,sockets=100,cores=1,threads=1 \
-m 4G \
-drive file=~/anolis-8.8.qcow2  \
-serial stdio   \
-monitor telnet:localhost:4498,server,nowait   \
-nographic

2.Connect the qemu monitor

telnet 127.0.0.1 4498

info mtree

address-space: cpu-memory-0
address-space: memory
    - (prio 0, i/o): system
      -7fff (prio 0, ram): alias 
ram-below-4g

@pc.ram -7fff
      - (prio -1, i/o): pci
    000a-000b (prio 1, i/o): vga-lowmem

3.Perform cpu hot swap int qemu monitor

device_add
Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
device_del cpu1



Hm, doesn't seem to work for me on upstream QEMU for some reason: 
"Error: acpi: device unplug request for not supported device type: 
Broadwell-IBRS-x86_64-cpu"




What happens if you re-add that CPU? Will we reuse the previous 
address space?



Here is the memory layout where I inserted cpu1 again. It does not 
appear that the original address space was reused, and the address space 
is now duplicated


info mtree

address-space: cpu-memory-0
address-space: cpu-memory-1
address-space: cpu-memory-1
address-space: memory
  - (prio 0, i/o): system
    -7fff (prio 0, ram): alias ram-below-4g 
@pc.ram -7fff

    - (prio -1, i/o): pci
  000a-000a (prio 2, ram): alias vga.chain4 
@vga.vram -

  000a-000b (prio 1, i/o): vga-lowmem
  000c-000d (prio 1, rom): pc.rom
  000e-000f (prio 1, rom): alias isa-bios 
@pc.bios 0002-0003

  fd00-fdff (prio 1, ram): vga.vram


In addition, I do not find the corresponding resource release action for 
cpu->cpu_ases requested in function cpu_address_space_init.


I wonder if there is a leak in the memory space requested here. Maybe 
qemu automatically reclaims memory space


or frees resources somewhere else I didn't find? I thought I'd try 
running the following valgrind to see if I could verify my suspicions.


void cpu_address_space_init(CPUState *cpu, int asidx,
    const char *prefix, MemoryRegion *mr)
{

...

    if (!cpu->cpu_ases) {
    cpu->cpu_ases = g_new0(CPUAddressSpace, cpu->num_ases);
    }

...

}




info mtree

address-space: cpu-memory-0
address-space: cpu-memory-1
address-space: memory
    - (prio 0, i/o): system
      -7fff (prio 0, ram): alias 
ram-below-4g

@pc.ram -7fff
      - (prio -1, i/o): pci
    000a-000b (prio 1, i/o): vga-lowmem


  From the above test, you can see whether the address space of cpu1 is
residual after a cpu hot swap, and whether it is reasonable?



Probably we should teach other archs to destroy that address space as 
well.


Can we do that from the core, instead of having to do that in each CPU 
unrealize function?


I think it can also be done in the public code flow. Since I refer to 
arm's scheme


(https://lore.kernel.org/all/20200613213629.21984-1-salil.me...@huawei.com/), 



and arm's patch will be issued soon, I will conduct rebase based on arm 
patch in the future.


Therefore, I would like to see if arm has any good suggestions. If there 
are no good suggestions at this stage,


I think we can shelve this problem for the first time, and I can 
consider not referencing this function for the first time,


and we can submit another patch to solve this problem.

Hi Salil Mehta:

Is the cpu_address_space_destroy function still present in the new patch 
version of arm?


Can we put this function on the public path of cpu destroy?


Thanks,

xianglai.




[PATCH v1 0/4] vfio: report NUMA nodes for device memory

2023-09-14 Thread ankita
From: Ankit Agrawal 

For devices which allow CPU to cache coherently access their memory,
it is sensible to expose such memory as NUMA nodes separate from
the sysmem node. Qemu currently do not provide a mechanism for creation
of NUMA nodes associated with a vfio-pci device.

Implement a mechanism to create and associate a set of unique NUMA nodes
with a vfio-pci device.

NUMA node is created by inserting a series of the unique proximity
domains (PXM) in the VM SRAT ACPI table. The ACPI tables are read once
at the time of bootup by the kernel to determine the NUMA configuration
and is inflexible post that. Hence this feature is incompatible with
device hotplug. The added node range associated with the device is
communicated through ACPI DSD and can be fetched by the VM kernel or
kernel modules. QEMU's VM SRAT and DSD builder code is modified
accordingly.

New command line params are introduced for admin to have a control on
the NUMA node assignment.

It is expected for a vfio-pci driver to expose this feature through
sysfs. Presence of the feature is checked to enable these code changes.

Applied over v8.1.0-rc4.

Ankit Agrawal (4):
  vfio: new command line params for device memory NUMA nodes
  vfio: assign default values to node params
  hw/arm/virt-acpi-build: patch guest SRAT for NUMA nodes
  acpi/gpex: patch guest DSDT for dev mem information

 hw/arm/virt-acpi-build.c|  54 +
 hw/pci-host/gpex-acpi.c |  69 +
 hw/vfio/pci.c   | 146 
 hw/vfio/pci.h   |   2 +
 include/hw/pci/pci_device.h |   3 +
 5 files changed, 274 insertions(+)

-- 
2.17.1




[PATCH v1 1/4] vfio: new command line params for device memory NUMA nodes

2023-09-14 Thread ankita
From: Ankit Agrawal 

The CPU cache coherent device memory can be added as a set of
NUMA nodes distinct from the system memory nodes. The Qemu currently
do not provide a mechanism to support node creation for a vfio-pci
device.

Introduce new command line parameters to allow host admin provide
the desired starting NUMA node id (pxm-ns) and the number of such
nodes (pxm-nc) associated with the device. In this implementation,
a numerically consecutive nodes from pxm-ns to pxm-ns + pxm-nc
is created. Also validate the requested range of nodes to check
for conflict with other nodes and to ensure that the id do not cross
QEMU limit.

Since the QEMU's SRAT and DST builder code needs the proximity
domain (PXM) id range, expose PXM start and count as device object
properties.

The device driver module communicates support for such feature through
sysfs. Check the presence of the feature to activate the code.

E.g. the following argument adds 8 PXM nodes starting from id 0x10.
-device vfio-pci-nohotplug,host=,pxm-ns=0x10,pxm-nc=8

Signed-off-by: Ankit Agrawal 
---
 hw/vfio/pci.c   | 144 
 hw/vfio/pci.h   |   2 +
 include/hw/pci/pci_device.h |   3 +
 3 files changed, 149 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a205c6b113..cc0c516161 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -42,6 +42,8 @@
 #include "qapi/error.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
+#include "qapi/visitor.h"
+#include "include/hw/boards.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -2955,6 +2957,22 @@ static void vfio_register_req_notifier(VFIOPCIDevice 
*vdev)
 }
 }
 
+static void vfio_pci_get_dev_mem_pxm_start(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+{
+uint64_t pxm_start = (uintptr_t) opaque;
+visit_type_uint64(v, name, _start, errp);
+}
+
+static void vfio_pci_get_dev_mem_pxm_count(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+{
+uint64_t pxm_count = (uintptr_t) opaque;
+visit_type_uint64(v, name, _count, errp);
+}
+
 static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
 {
 Error *err = NULL;
@@ -2974,6 +2992,125 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static int validate_dev_numa(uint32_t dev_node_start, uint32_t num_nodes)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+unsigned int i;
+
+if (num_nodes >= MAX_NODES) {
+return -EINVAL;
+}
+
+for (i = 0; i < num_nodes; i++) {
+if (ms->numa_state->nodes[dev_node_start + i].present) {
+return -EBUSY;
+}
+}
+
+return 0;
+}
+
+static int mark_dev_node_present(uint32_t dev_node_start, uint32_t num_nodes)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+unsigned int i;
+
+for (i = 0; i < num_nodes; i++) {
+ms->numa_state->nodes[dev_node_start + i].present = true;
+}
+
+return 0;
+}
+
+
+static bool vfio_pci_read_cohmem_support_sysfs(VFIODevice *vdev)
+{
+gchar *contents = NULL;
+gsize length;
+char *path;
+bool ret = false;
+uint32_t supported;
+
+path = g_strdup_printf("%s/coherent_mem", vdev->sysfsdev);
+if (g_file_get_contents(path, , , NULL) && length > 0) {
+if ((sscanf(contents, "%u", ) == 1) && supported) {
+ret = true;
+}
+}
+
+if (length) {
+g_free(contents);
+}
+g_free(path);
+
+return ret;
+}
+
+static int vfio_pci_dev_mem_probe(VFIOPCIDevice *vPciDev,
+ Error **errp)
+{
+Object *obj = NULL;
+VFIODevice *vdev = >vbasedev;
+MachineState *ms = MACHINE(qdev_get_machine());
+int ret = 0;
+uint32_t dev_node_start = vPciDev->dev_node_start;
+uint32_t dev_node_count = vPciDev->dev_nodes;
+
+if (!vdev->sysfsdev || !vfio_pci_read_cohmem_support_sysfs(vdev)) {
+ret = -ENODEV;
+goto done;
+}
+
+if (vdev->type == VFIO_DEVICE_TYPE_PCI) {
+obj = vfio_pci_get_object(vdev);
+}
+
+/* Since this device creates new NUMA node, hotplug is not supported. */
+if (!obj || DEVICE_CLASS(object_get_class(obj))->hotpluggable) {
+ret = -EINVAL;
+goto done;
+}
+
+/*
+ * This device has memory that is coherently accessible from the CPU.
+ * The memory can be represented seperate memory-only NUMA nodes.
+ */
+vPciDev->pdev.has_coherent_memory = true;
+
+/*
+ * The device can create several NUMA nodes with consecutive IDs
+ * from dev_node_start to dev_node_start + dev_node_count.
+ * Verify
+ * - whether any node ID is occupied in the desired range.
+ * - Node ID is not crossing MAX_NODE.
+ */
+ret 

[PATCH v1 4/4] acpi/gpex: patch guest DSDT for dev mem information

2023-09-14 Thread ankita
From: Ankit Agrawal 

To add the memory in the guest as NUMA nodes, it needs the PXM node index
and the total count of nodes associated with the memory. The range of
proximity domains are communicated to the VM as part of the guest ACPI
using the nvidia,gpu-mem-pxm-start and nvidia,gpu-mem-pxm-count DSD
properties. These value respectively represent the staring proximity
domain id and the count. Kernel modules can then fetch this information
and determine the numa node id using pxm_to_node().

Signed-off-by: Ankit Agrawal 
---
 hw/pci-host/gpex-acpi.c | 69 +
 1 file changed, 69 insertions(+)

diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 7c7316bc96..0548feace1 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -49,6 +49,72 @@ static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t 
irq)
 }
 }
 
+static void acpi_dsdt_add_cohmem_device(Aml *dev, int32_t devfn,
+uint64_t dev_mem_pxm_start,
+uint64_t dev_mem_pxm_count)
+{
+Aml *memdev = aml_device("CMD%X", PCI_SLOT(devfn));
+Aml *pkg = aml_package(2);
+Aml *pkg1 = aml_package(2);
+Aml *pkg2 = aml_package(2);
+Aml *dev_pkg = aml_package(2);
+Aml *UUID;
+
+aml_append(memdev, aml_name_decl("_ADR", aml_int(PCI_SLOT(devfn) << 16)));
+
+aml_append(pkg1, aml_string("dev-mem-pxm-start"));
+aml_append(pkg1, aml_int(dev_mem_pxm_start));
+
+aml_append(pkg2, aml_string("dev-mem-pxm-count"));
+aml_append(pkg2, aml_int(dev_mem_pxm_count));
+
+aml_append(pkg, pkg1);
+aml_append(pkg, pkg2);
+
+UUID = aml_touuid("DAFFD814-6EBA-4D8C-8A91-BC9BBF4AA301");
+aml_append(dev_pkg, UUID);
+aml_append(dev_pkg, pkg);
+
+aml_append(memdev, aml_name_decl("_DSD", dev_pkg));
+aml_append(dev, memdev);
+}
+
+static void find_mem_device(PCIBus *bus, PCIDevice *pdev,
+void *opaque)
+{
+Aml *dev = (Aml *)opaque;
+
+if (bus == NULL) {
+return;
+}
+
+if (pdev->has_coherent_memory) {
+Object *po = OBJECT(pdev);
+
+if (po == NULL) {
+return;
+}
+
+uint64_t pxm_start
+   = object_property_get_uint(po, "dev_mem_pxm_start", NULL);
+uint64_t pxm_count
+   = object_property_get_uint(po, "dev_mem_pxm_count", NULL);
+
+acpi_dsdt_add_cohmem_device(dev, pdev->devfn, pxm_start, pxm_count);
+}
+}
+
+static void acpi_dsdt_find_and_add_cohmem_device(PCIBus *bus, Aml *dev)
+{
+if (bus == NULL) {
+return;
+}
+
+pci_for_each_device_reverse(bus, pci_bus_num(bus),
+find_mem_device, dev);
+
+}
+
 static void acpi_dsdt_add_pci_osc(Aml *dev)
 {
 Aml *method, *UUID, *ifctx, *ifctx1, *elsectx, *buf;
@@ -207,7 +273,10 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 
 acpi_dsdt_add_pci_route_table(dev, cfg->irq);
 
+acpi_dsdt_find_and_add_cohmem_device(cfg->bus, dev);
+
 method = aml_method("_CBA", 0, AML_NOTSERIALIZED);
+
 aml_append(method, aml_return(aml_int(cfg->ecam.base)));
 aml_append(dev, method);
 
-- 
2.17.1




[PATCH v1 3/4] hw/arm/virt-acpi-build: patch guest SRAT for NUMA nodes

2023-09-14 Thread ankita
From: Ankit Agrawal 

During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids.
This allows for the creation of NUMA nodes for each unique id.

Insert a series of the unique PXM ids in the VM SRAT ACPI table. The
range of nodes can be determined from the "dev_mem_pxm_start" and
"dev_mem_pxm_count" object properties associated with the device. These
nodes as made MEM_AFFINITY_HOTPLUGGABLE. This allows the kernel to create
memory-less NUMA nodes on bootup to which a subrange (or entire range) of
device memory can be added/removed.

Signed-off-by: Ankit Agrawal 
---
 hw/arm/virt-acpi-build.c | 54 
 1 file changed, 54 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6b674231c2..6d1e3b6b8a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -46,6 +46,7 @@
 #include "hw/acpi/hmat.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
+#include "hw/vfio/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/arm/virt.h"
@@ -515,6 +516,57 @@ build_spcr(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 acpi_table_end(linker, );
 }
 
+static int devmem_device_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
+*list = g_slist_append(*list, DEVICE(obj));
+}
+
+object_child_foreach(obj, devmem_device_list, opaque);
+return 0;
+}
+
+static GSList *devmem_get_device_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), devmem_device_list, );
+return list;
+}
+
+static void build_srat_devmem(GArray *table_data)
+{
+GSList *device_list, *list = devmem_get_device_list();
+
+for (device_list = list; device_list; device_list = device_list->next) {
+DeviceState *dev = device_list->data;
+Object *obj = OBJECT(dev);
+VFIOPCIDevice *pcidev
+= ((VFIOPCIDevice *)object_dynamic_cast(OBJECT(obj),
+   TYPE_VFIO_PCI));
+
+if (pcidev->pdev.has_coherent_memory) {
+uint64_t start_node = object_property_get_uint(obj,
+  "dev_mem_pxm_start", _abort);
+uint64_t node_count = object_property_get_uint(obj,
+  "dev_mem_pxm_count", _abort);
+uint64_t node_index;
+
+/*
+ * Add the node_count PXM domains starting from start_node as
+ * hot pluggable. The VM kernel parse the PXM domains and
+ * creates NUMA nodes.
+ */
+for (node_index = 0; node_index < node_count; node_index++)
+build_srat_memory(table_data, 0, 0, start_node + node_index,
+MEM_AFFINITY_ENABLED | MEM_AFFINITY_HOTPLUGGABLE);
+}
+}
+g_slist_free(list);
+}
+
 /*
  * ACPI spec, Revision 5.1
  * 5.2.16 System Resource Affinity Table (SRAT)
@@ -569,6 +621,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
 }
 
+build_srat_devmem(table_data);
+
 acpi_table_end(linker, );
 }
 
-- 
2.17.1




[PATCH v1 2/4] vfio: assign default values to node params

2023-09-14 Thread ankita
From: Ankit Agrawal 

It may be desirable for some deployments to have QEMU automatically
pick a range and create the NUMA nodes. So the admin need not care
about passing any additional params. Another advantage is that the
feature is not dependent on newer libvirt that support the new
parameters pxm-ns and pxm-nc.

Assign default values to pxm-ns (first available node) and pxm-nc (8).
This makes the new params optional and the feature will work on older
libvirt.

Signed-off-by: Ankit Agrawal 
---
 hw/vfio/pci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cc0c516161..0bba161172 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3053,8 +3053,10 @@ static int vfio_pci_dev_mem_probe(VFIOPCIDevice *vPciDev,
 VFIODevice *vdev = >vbasedev;
 MachineState *ms = MACHINE(qdev_get_machine());
 int ret = 0;
-uint32_t dev_node_start = vPciDev->dev_node_start;
-uint32_t dev_node_count = vPciDev->dev_nodes;
+uint32_t dev_node_start = vPciDev->dev_node_start ?
+  vPciDev->dev_node_start :
+  ms->numa_state->num_nodes;
+uint32_t dev_node_count = vPciDev->dev_nodes ? vPciDev->dev_nodes : 8;
 
 if (!vdev->sysfsdev || !vfio_pci_read_cohmem_support_sysfs(vdev)) {
 ret = -ENODEV;
-- 
2.17.1




Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem

2023-09-14 Thread Wang, Lei
On 9/14/2023 11:50, Xiaoyao Li wrote:
> From: Chao Peng 
> 
> Add KVM gmem support to RAMBlock so both normal hva based memory
> and kvm gmem fd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
> gmem for the RAMBlock when it's set.
> 
> Signed-off-by: Xiaoyao Li 

Kindly reminding the author's Signed-off-by is missing.

> ---
>  accel/kvm/kvm-all.c | 17 +
>  include/exec/memory.h   |  3 +++
>  include/exec/ramblock.h |  1 +
>  include/sysemu/kvm.h|  2 ++
>  softmmu/physmem.c   | 18 +++---
>  5 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 60aacd925393..185ae16d9620 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -4225,3 +4225,20 @@ void query_stats_schemas_cb(StatsSchemaList **result, 
> Error **errp)
>  query_stats_schema_vcpu(first_cpu, _args);
>  }
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +int fd;
> +struct kvm_create_guest_memfd gmem = {
> +.size = size,
> +/* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is 
> supported */
> +.flags = flags,
> +};
> +
> +fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, );
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "%s: error creating kvm gmem\n", 
> __func__);
> +}
> +
> +return fd;
> +}
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 68284428f87c..227cb2578e95 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
>  /* RAM is an mmap-ed named file */
>  #define RAM_NAMED_FILE (1 << 9)
>  
> +/* RAM can be private that has kvm gmem backend */
> +#define RAM_KVM_GMEM(1 << 10)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> IOMMUNotifierFlag flags,
> hwaddr start, hwaddr end,
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 69c6a5390293..0d158b3909c9 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -41,6 +41,7 @@ struct RAMBlock {
>  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>  int fd;
>  uint64_t fd_offset;
> +int gmem_fd;
>  size_t page_size;
>  /* dirty bitmap used during migration */
>  unsigned long *bmap;
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 115f0cca79d1..f5b74c8dd8c5 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -580,4 +580,6 @@ bool kvm_arch_cpu_check_are_resettable(void);
>  bool kvm_dirty_ring_enabled(void);
>  
>  uint32_t kvm_dirty_ring_size(void);
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
>  #endif
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3df73542e1fe..2d98a88f41f0 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1824,6 +1824,16 @@ static void ram_block_add(RAMBlock *new_block, Error 
> **errp)
>  }
>  }
>  
> +if (kvm_enabled() && new_block->flags & RAM_KVM_GMEM &&
> +new_block->gmem_fd < 0) {
> +new_block->gmem_fd = kvm_create_guest_memfd(new_block->max_length,
> +0, errp);
> +if (new_block->gmem_fd < 0) {
> +qemu_mutex_unlock_ramlist();
> +return;
> +}
> +}
> +
>  new_ram_size = MAX(old_ram_size,
>(new_block->offset + new_block->max_length) >> 
> TARGET_PAGE_BITS);
>  if (new_ram_size > old_ram_size) {
> @@ -1885,7 +1895,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  
>  /* Just support these ram flags by now. */
>  assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
> -  RAM_PROTECTED | RAM_NAMED_FILE)) == 0);
> +  RAM_PROTECTED | RAM_NAMED_FILE | RAM_KVM_GMEM)) == 
> 0);
>  
>  if (xen_enabled()) {
>  error_setg(errp, "-mem-path not supported with Xen");
> @@ -1920,6 +1930,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  new_block->used_length = size;
>  new_block->max_length = size;
>  new_block->flags = ram_flags;
> +new_block->gmem_fd = -1;
>  new_block->host = file_ram_alloc(new_block, size, fd, readonly,
>   !file_size, offset, errp);
>  if (!new_block->host) {
> @@ -1978,7 +1989,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
> ram_addr_t max_size,
>  Error *local_err = NULL;
>  
>  assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
> -  RAM_NORESERVE)) == 0);
> +  RAM_NORESERVE| RAM_KVM_GMEM)) == 0);
>  assert(!host ^ (ram_flags & RAM_PREALLOC));
>  
>

Re: QEMU migration-test CI intermittent failure

2023-09-14 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 14, 2023 at 07:54:17PM -0300, Fabiano Rosas wrote:
>> Fabiano Rosas  writes:
>> 
>> > Peter Xu  writes:
>> >
>> >> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
>> >>> I managed to reproduce it. It's not the return path error. In hindsight
>> >>> that's obvious because that error happens in the 'recovery' test and this
>> >>> one in the 'plain' one. Sorry about the noise.
>> >>
>> >> No worry.  It's good to finally identify that.
>> >>
>> >>> 
>> >>> This one reproduced with just 4 iterations of preempt/plain. I'll
>> >>> investigate.
>> >
>> > It seems that we're getting a tcp disconnect (ECONNRESET) on when doing
>> > that shutdown() on postcopy_qemufile_src. The one from commit 6621883f93
>> > ("migration: Fix potential race on postcopy_qemufile_src").
>> >
>> > I'm trying to determine why that happens when other times it just
>> > returns 0 as expected.
>> >
>> > Could this mean that we're kicking the dest too soon while it is still
>> > receiving valid data?
>> 
>> Looking a bit more into this, what's happening is that
>> postcopy_ram_incoming_cleanup() is shutting the postcopy_qemufile_dst
>> while ram_load_postcopy() is still running.
>> 
>> The postcopy_ram_listen_thread() function waits for the
>> main_thread_load_event, but that only works when not using preempt. With
>> the preempt thread, the event is set right away and we proceed to do the
>> cleanup without waiting.
>> 
>> So the assumption of commit 6621883f93 that the incoming side knows when
>> it has finished migrating is wrong IMO. Without the EOS we're relying on
>> the chance that the shutdown() happens after the last recvmsg has
>> returned and not during it.
>> 
>> Peter, what do you think?
>
> That's a good point.
>
> One thing to verify that (sorry, I still cannot reproduce it myself, which
> is so weirdly... it seems loads won't really help reproducing this) is to
> let the main thread wait for all requested pages to arrive:
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 29aea9456d..df055c51ea 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -597,6 +597,12 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
> *mis)
>  trace_postcopy_ram_incoming_cleanup_entry();
>  
>  if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
> +/*
> + * NOTE!  it's possible that the preempt thread is still handling
> + * the last pages to arrive which were requested by faults.  Making
> + * sure nothing is left behind.
> + */
> +while (qatomic_read(>page_requested_count));
>  /* Notify the fast load thread to quit */
>  mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>  if (mis->postcopy_qemufile_dst) {
>
> If that can work solidly, we can figure out a better way than a dead loop
> here.

Yep, 2000+ iterations so far and no error.

Should we add something that makes ram_load_postcopy return once it's
finished? Then this code could just set PREEMPT_THREAD_QUIT and join the
preempt thread.



Re: [PATCH 4/4] target/ppc: Add migration support for BHRB

2023-09-14 Thread Nicholas Piggin
On Wed Sep 13, 2023 at 6:25 AM AEST, Glenn Miles wrote:
> Adds migration support for Branch History Rolling
> Buffer (BHRB) internal state.
>
> Signed-off-by: Glenn Miles 
> ---
>  target/ppc/machine.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index b195fb4dc8..89146969c8 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -314,6 +314,7 @@ static int cpu_post_load(void *opaque, int version_id)
>  
>  if (tcg_enabled()) {
>  pmu_mmcr01a_updated(env);
> +hreg_bhrb_filter_update(env);
>  }
>  
>  return 0;
> @@ -670,6 +671,27 @@ static const VMStateDescription vmstate_compat = {
>  }
>  };
>  
> +#ifdef TARGET_PPC64
> +static bool bhrb_needed(void *opaque)
> +{
> +PowerPCCPU *cpu = opaque;
> +return (cpu->env.flags & POWERPC_FLAG_BHRB) != 0;
> +}
> +
> +static const VMStateDescription vmstate_bhrb = {
> +.name = "cpu/bhrb",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = bhrb_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINTTL(env.bhrb_num_entries, PowerPCCPU),

Maybe don't need bhrb_num_entries since target machine should have the
same?

> +VMSTATE_UINTTL(env.bhrb_offset, PowerPCCPU),
> +VMSTATE_UINT64_ARRAY(env.bhrb, PowerPCCPU, BHRB_MAX_NUM_ENTRIES),

Is it possible to migrate only bhrb_num_entries items? Wants a VARRAY
AFAIKS but there is no VARRAY_UINT64?

Since all sizes are the same 32 now, would it be possible to turn it
into a VARRAY sometime later if supposing a new CPU changed to a
different size, and would the wire format for the VARRAY still be
compatible with this fixed size array, or does a VARRAY look different
I wonder?

Thanks,
Nick



Re: [PATCH 3/4] target/ppc: Add clrbhrb and mfbhrbe instructions

2023-09-14 Thread Nicholas Piggin
On Wed Sep 13, 2023 at 6:24 AM AEST, Glenn Miles wrote:
> Add support for the clrbhrb and mfbhrbe instructions.
>
> Since neither instruction is believed to be critical to
> performance, both instructions were implemented using helper
> functions.
>
> Access to both instructions is controlled by bits in the
> HFSCR (for privileged state) and MMCR0 (for problem state).
> A new function, helper_mmcr0_facility_check, was added for
> checking MMCR0[BHRBA] and raising a facility_unavailable exception
> if required.
>
> Signed-off-by: Glenn Miles 
> ---
>  target/ppc/cpu.h |  1 +
>  target/ppc/helper.h  |  4 
>  target/ppc/misc_helper.c | 43 
>  target/ppc/translate.c   | 13 
>  4 files changed, 61 insertions(+)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index bda1afb700..ee81ede4ee 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -541,6 +541,7 @@ FIELD(MSR, LE, MSR_LE, 1)
>  
>  /* HFSCR bits */
>  #define HFSCR_MSGP PPC_BIT(53) /* Privileged Message Send Facilities */
> +#define HFSCR_BHRB PPC_BIT(59) /* BHRB Instructions */
>  #define HFSCR_IC_MSGP  0xA
>  
>  #define DBCR0_ICMP (1 << 27)
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 1a3d9a7e57..bbc32ff114 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -816,3 +816,7 @@ DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32)
>  
>  DEF_HELPER_1(tbegin, void, env)
>  DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> +
> +DEF_HELPER_1(clrbhrb, void, env)
> +DEF_HELPER_FLAGS_2(mfbhrbe, TCG_CALL_NO_WG, i64, env, i32)
> +
> diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
> index 692d058665..45abe04f66 100644
> --- a/target/ppc/misc_helper.c
> +++ b/target/ppc/misc_helper.c
> @@ -139,6 +139,17 @@ void helper_fscr_facility_check(CPUPPCState *env, 
> uint32_t bit,
>  #endif
>  }
>  
> +static void helper_mmcr0_facility_check(CPUPPCState *env, uint32_t bit,
> + uint32_t sprn, uint32_t cause)
> +{
> +#ifdef TARGET_PPC64
> +if (FIELD_EX64(env->msr, MSR, PR) &&
> +!(env->spr[SPR_POWER_MMCR0] & (1ULL << bit))) {
> +raise_fu_exception(env, bit, sprn, cause, GETPC());
> +}
> +#endif
> +}
> +
>  void helper_msr_facility_check(CPUPPCState *env, uint32_t bit,
> uint32_t sprn, uint32_t cause)
>  {
> @@ -351,3 +362,35 @@ void helper_fixup_thrm(CPUPPCState *env)
>  env->spr[i] = v;
>  }
>  }
> +
> +void helper_clrbhrb(CPUPPCState *env)
> +{
> +helper_hfscr_facility_check(env, HFSCR_BHRB, "clrbhrb", FSCR_IC_BHRB);
> +
> +helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);

Repeating the comment about MMCR0_BHRBA and PPC_BIT_NR discrepancy here
for posterity.

> +
> +memset(env->bhrb, 0, sizeof(env->bhrb));
> +}
> +
> +uint64_t helper_mfbhrbe(CPUPPCState *env, uint32_t bhrbe)
> +{
> +unsigned int index;
> +
> +helper_hfscr_facility_check(env, HFSCR_BHRB, "mfbhrbe", FSCR_IC_BHRB);
> +
> +helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
> +
> +if ((bhrbe >= env->bhrb_num_entries) ||
> +   (env->spr[SPR_POWER_MMCR0] & MMCR0_PMAE)) {

Nitpick, but multi line statment starts again inside the first
parenthesis after a keyword like this.

> +return 0;
> +}
> +
> +/*
> + * Note: bhrb_offset is the byte offset for writing the
> + * next entry (over the oldest entry), which is why we
> + * must offset bhrbe by 1 to get to the 0th entry.
> + */
> +index = ((env->bhrb_offset / sizeof(uint64_t)) - (bhrbe + 1)) %
> +env->bhrb_num_entries;
> +return env->bhrb[index];
> +}
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 7824475f54..b330871793 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6549,12 +6549,25 @@ static void gen_brh(DisasContext *ctx)
>  }
>  #endif
>  
> +static void gen_clrbhrb(DisasContext *ctx)
> +{
> +gen_helper_clrbhrb(cpu_env);
> +}
> +
> +static void gen_mfbhrbe(DisasContext *ctx)
> +{
> +TCGv_i32 bhrbe = tcg_constant_i32(_SPR(ctx->opcode));
> +gen_helper_mfbhrbe(cpu_gpr[rD(ctx->opcode)], cpu_env, bhrbe);
> +}
> +
>  static opcode_t opcodes[] = {
>  #if defined(TARGET_PPC64)
>  GEN_HANDLER_E(brd, 0x1F, 0x1B, 0x05, 0xF801, PPC_NONE, PPC2_ISA310),
>  GEN_HANDLER_E(brw, 0x1F, 0x1B, 0x04, 0xF801, PPC_NONE, PPC2_ISA310),
>  GEN_HANDLER_E(brh, 0x1F, 0x1B, 0x06, 0xF801, PPC_NONE, PPC2_ISA310),
>  #endif
> +GEN_HANDLER_E(clrbhrb, 0x1F, 0x0E, 0x0D, 0x3FFF801, PPC_NONE, PPC2_ISA207S),
> +GEN_HANDLER_E(mfbhrbe, 0x1F, 0x0E, 0x09, 0x001, PPC_NONE, PPC2_ISA207S),

How much of a pain would it be to add it as decodetree? If there is an
addition a family of existing instrutions here it makes sense to add it
here, for new family would be nice to use decodetree.

I think they're only supported in 64-bit ISA so it could be ifdef
TARGET_PPC64.

Thanks,

Re: [RFC PATCH v2 00/21] QEMU gmem implemention

2023-09-14 Thread Sean Christopherson
On Thu, Sep 14, 2023, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
> > It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> > memory.
> > 
> > For confidential-computing, KVM provides gmem/guest_mem interfaces for
> > userspace, like QEMU, to allocate user-unaccesible private memory. This
> > series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> > have both hva-based shared memory and gmem_fd based private memory. QEMU
> > does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> > memory.
> > 
> > It chooses the design that adds "private" property to hostmeory backend.
> > If "private" property is set, QEMU will allocate/create KVM gmem when
> > initialize the RAMbloch of the memory backend.
> > 
> > This sereis also introduces the first user of kvm gmem,
> > KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> > can be created with
> > 
> >$qemu -object sw-protected-vm,id=sp-vm0 \
> > -object memory-backend-ram,id=mem0,size=1G,private=on \
> > -machine 
> > q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0
> >  \
> > ...
> > 
> > Unfortunately this patch series fails the boot of OVMF at very early
> > stage due to triple fault, because KVM doesn't support emulating string IO
> > to private memory.
> 
> Is support being added? Or have we figured out what it would take to make it
> work?

Hrm, this isn't something I've thought deeply about.  The issue is that anything
that reaches any form of copy_{from,to}_user() will go kablooie because KVM will
always try to read/write the shared mappings.  The best case scenario is that 
the
shared mapping is invalid and the uaccess faults.  The worst case scenario is
that KVM read/writes the wrong memory and sends the guest into the weeds.  Eww.

And we (well, at least I) definitely want to support this so that gmem can be
used for "regular" VMs, i.e. for VMs where userspace is in the TCB, but for 
which
userspace doesn't have access to guest memory by default.

It shouldn't be too hard to support.  It's easy enough to wire up the hook
(thankfully that aren't _that_ many sites), and gmem only supports struct page 
at
the moment so we go straight to kmap.  E.g. something like this

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54480655bcce..b500b0ce5ce3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3291,12 +3291,15 @@ static int next_segment(unsigned long len, int offset)
return len;
 }
 
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-void *data, int offset, int len)
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+gfn_t gfn, void *data, int offset, int len)
 {
int r;
unsigned long addr;
 
+   if (kvm_mem_is_private(kvm, gfn))
+   return kvm_gmem_read(slot, gfn, data, offset, len);
+
addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
if (kvm_is_error_hva(addr))
return -EFAULT;
@@ -3309,9 +3312,8 @@ static int __kvm_read_guest_page(struct kvm_memory_slot 
*slot, gfn_t gfn,
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int len)
 {
-   struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
-
-   return __kvm_read_guest_page(slot, gfn, data, offset, len);
+   return __kvm_read_guest_page(kvm, gfn_to_memslot(kvm, gfn), gfn, data,
+offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3320,7 +3322,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t 
gfn, void *data,
 {
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-   return __kvm_read_guest_page(slot, gfn, data, offset, len);
+   return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
> > 2. hugepage support.
> > 
> > KVM gmem can be allocated from hugetlbfs. How does QEMU determine

Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
ever supported.  I wouldn't be at all surprised if we end up going down a 
slightly
different route and don't use hugetlbfs directly.

> > when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
> > easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
> > only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for the
> private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people should
> actually be using with hugetlb. Andeventual double memory-consumption ...
> but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start 

Re: [PATCH 2/4] target/ppc: Add recording of taken branches to BHRB

2023-09-14 Thread Nicholas Piggin
On Wed Sep 13, 2023 at 6:24 AM AEST, Glenn Miles wrote:
> This commit continues adding support for the Branch History
> Rolling Buffer (BHRB) as is provided starting with the P8
> processor and continuing with its successors.  This commit
> is limited to the recording and filtering of taken branches.
>
> The following changes were made:
>
>   - Added a BHRB buffer for storing branch instruction and
> target addresses for taken branches
>   - Renamed gen_update_cfar to gen_update_branch_history and
> added a 'target' parameter to hold the branch target
> address and 'inst_type' parameter to use for filtering
>   - Added a combination of jit-time and run-time checks to
> gen_update_branch_history for determining if a branch
> should be recorded
>   - Added TCG code to gen_update_branch_history that stores
> data to the BHRB and updates the BHRB offset.
>   - Added BHRB resource initialization and reset functions
>   - Enabled functionality for P8, P9 and P10 processors.
>
> Signed-off-by: Glenn Miles 
> ---
>  target/ppc/cpu.h   |  18 +++-
>  target/ppc/cpu_init.c  |  41 -
>  target/ppc/helper_regs.c   |  32 +++
>  target/ppc/helper_regs.h   |   1 +
>  target/ppc/power8-pmu.c|   2 +
>  target/ppc/power8-pmu.h|   7 ++
>  target/ppc/translate.c | 114 +++--
>  target/ppc/translate/branch-impl.c.inc |   2 +-
>  8 files changed, 205 insertions(+), 12 deletions(-)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 20ae1466a5..bda1afb700 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -454,8 +454,9 @@ FIELD(MSR, LE, MSR_LE, 1)
>  #define MMCR2_UREG_MASK (MMCR2_FC1P0 | MMCR2_FC2P0 | MMCR2_FC3P0 | \
>   MMCR2_FC4P0 | MMCR2_FC5P0 | MMCR2_FC6P0)
>  
> -#define MMCRA_BHRBRDPPC_BIT(26)/* BHRB Recording Disable */
> -
> +#define MMCRA_BHRBRDPPC_BIT(26) /* BHRB Recording Disable */

Fold this tidying into patch 1.

> +#define MMCRA_IFM_MASK  PPC_BITMASK(32, 33) /* BHRB Instruction Filtering */
> +#define MMCRA_IFM_SHIFT PPC_BIT_NR(33)
>  
>  #define MMCR1_EVT_SIZE 8
>  /* extract64() does a right shift before extracting */
> @@ -682,6 +683,8 @@ enum {
>  POWERPC_FLAG_SMT  = 0x0040,
>  /* Using "LPAR per core" mode  (as opposed to per-thread)
> */
>  POWERPC_FLAG_SMT_1LPAR = 0x0080,
> +/* Has BHRB */
> +POWERPC_FLAG_BHRB  = 0x0100,
>  };

Interesting question of which patch to add different flags. I'm
strongly in add when you add code that uses them like this one,
but it's a matter of taste and not always practical to be an
absolute rule. I don't mind too much what others do, but maybe
this and the pcc->flags init should go in patch 1 since that's adding
flags that aren't yet used?

>  
>  /*
> @@ -1110,6 +1113,9 @@ DEXCR_ASPECT(PHIE, 6)
>  #define PPC_CPU_OPCODES_LEN  0x40
>  #define PPC_CPU_INDIRECT_OPCODES_LEN 0x20
>  
> +#define BHRB_MAX_NUM_ENTRIES_LOG2 (5)
> +#define BHRB_MAX_NUM_ENTRIES  (1 << BHRB_MAX_NUM_ENTRIES_LOG2)
> +
>  struct CPUArchState {
>  /* Most commonly used resources during translated code execution first */
>  target_ulong gpr[32];  /* general purpose registers */
> @@ -1196,6 +1202,14 @@ struct CPUArchState {
>  int dcache_line_size;
>  int icache_line_size;
>  
> +/* Branch History Rolling Buffer (BHRB) resources */
> +target_ulong bhrb_num_entries;
> +target_ulong bhrb_base;
> +target_ulong bhrb_filter;
> +target_ulong bhrb_offset;
> +target_ulong bhrb_offset_mask;
> +uint64_t bhrb[BHRB_MAX_NUM_ENTRIES];

Put these under ifdef TARGET_PPC64?

> +
>  /* These resources are used during exception processing */
>  /* CPU model definition */
>  target_ulong msr_mask;
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 568f9c3b88..19d7505a73 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6100,6 +6100,28 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>  pcc->l1_icache_size = 0x8000;
>  }
>  
> +static void bhrb_init_state(CPUPPCState *env, target_long num_entries_log2)
> +{
> +if (env->flags & POWERPC_FLAG_BHRB) {
> +if (num_entries_log2 > BHRB_MAX_NUM_ENTRIES_LOG2) {
> +num_entries_log2 = BHRB_MAX_NUM_ENTRIES_LOG2;
> +}
> +env->bhrb_num_entries = 1 << num_entries_log2;
> +env->bhrb_base = (target_long)>bhrb[0];
> +env->bhrb_offset_mask = (env->bhrb_num_entries * sizeof(uint64_t)) - 
> 1;
> +}
> +}
> +
> +static void bhrb_reset_state(CPUPPCState *env)
> +{
> +if (env->flags & POWERPC_FLAG_BHRB) {
> +env->bhrb_offset = 0;
> +env->bhrb_filter = 0;
> +memset(env->bhrb, 0, sizeof(env->bhrb));
> +}
> +}
> +
> +#define POWER8_BHRB_ENTRIES_LOG2 5
>  static void init_proc_POWER8(CPUPPCState *env)
>  {
>  /* 

Re: [PATCH 1/4] target/ppc: Add new hflags to support BHRB

2023-09-14 Thread Nicholas Piggin
On Wed Sep 13, 2023 at 6:23 AM AEST, Glenn Miles wrote:
> This commit is preparatory to the addition of Branch History
> Rolling Buffer (BHRB) functionality, which is being provided
> today starting with the P8 processor.
>
> BHRB uses several SPR register fields to control whether or not
> a branch instruction's address (and sometimes target address)
> should be recorded.  Checking each of these fields with each
> branch instruction using jitted code would lead to a significant
> decrease in performance.
>
> Therefore, it was decided that BHRB configuration bits that are
> not expected to change frequently should have their state stored in
> hflags so that the amount of checking done by jitted code can
> be reduced.
>
> This commit contains the changes for storing the state of the
> following register fields as hflags:
>
>   MMCR0[FCP] - Determines if BHRB recording is frozen in the
>  problem state
>
>   MMCR0[FCPC] - A modifier for MMCR0[FCP]
>
>   MMCRA[BHRBRD] - Disables all BHRB recording for a thread
>
> Signed-off-by: Glenn Miles 
> ---
>  target/ppc/cpu.h |  9 +
>  target/ppc/cpu_init.c|  4 ++--
>  target/ppc/helper.h  |  1 +
>  target/ppc/helper_regs.c | 12 
>  target/ppc/machine.c |  2 +-
>  target/ppc/power8-pmu-regs.c.inc |  5 +
>  target/ppc/power8-pmu.c  | 15 +++
>  target/ppc/power8-pmu.h  |  4 ++--
>  target/ppc/spr_common.h  |  1 +
>  target/ppc/translate.c   |  6 ++
>  10 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 25fac9577a..20ae1466a5 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -439,6 +439,9 @@ FIELD(MSR, LE, MSR_LE, 1)
>  #define MMCR0_FC56   PPC_BIT(59) /* PMC Freeze Counters 5-6 bit */
>  #define MMCR0_PMC1CE PPC_BIT(48) /* MMCR0 PMC1 Condition Enabled */
>  #define MMCR0_PMCjCE PPC_BIT(49) /* MMCR0 PMCj Condition Enabled */
> +#define MMCR0_BHRBA  PPC_BIT_NR(42)  /* BHRB Available */

It's confusing to use NR for this. Either call it MMCR0_BHRBA_NR or have
the facility check in patch 3 take the bit value. I'd move it to patch 3
too.

> +#define MMCR0_FCPPPC_BIT(34) /* Freeze Counters/BHRB if PR=1 */
> +#define MMCR0_FCPC   PPC_BIT(51) /* Condition for FCP bit */
>  /* MMCR0 userspace r/w mask */
>  #define MMCR0_UREG_MASK (MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE)
>  /* MMCR2 userspace r/w mask */
> @@ -451,6 +454,9 @@ FIELD(MSR, LE, MSR_LE, 1)
>  #define MMCR2_UREG_MASK (MMCR2_FC1P0 | MMCR2_FC2P0 | MMCR2_FC3P0 | \
>   MMCR2_FC4P0 | MMCR2_FC5P0 | MMCR2_FC6P0)
>  
> +#define MMCRA_BHRBRDPPC_BIT(26)/* BHRB Recording Disable */
> +
> +
>  #define MMCR1_EVT_SIZE 8
>  /* extract64() does a right shift before extracting */
>  #define MMCR1_PMC1SEL_START 32
> @@ -703,6 +709,9 @@ enum {
>  HFLAGS_PMCJCE = 17, /* MMCR0 PMCjCE bit */
>  HFLAGS_PMC_OTHER = 18, /* PMC other than PMC5-6 is enabled */
>  HFLAGS_INSN_CNT = 19, /* PMU instruction count enabled */
> +HFLAGS_FCPC = 20,   /* MMCR0 FCPC bit */
> +HFLAGS_FCP = 21,/* MMCR0 FCP bit */
> +HFLAGS_BHRBRD = 22, /* MMCRA BHRBRD bit */
>  HFLAGS_VSX = 23, /* MSR_VSX if cpu has VSX */
>  HFLAGS_VR = 25,  /* MSR_VR if cpu has VRE */

hflags are an interesting tradeoff. You can specialise some code but
at the cost of duplicating your jit footprint, which is often the
most costly thing. The ideal hflag is one where code is not shared
between flag set/clear like PR and HV. Rarely used features is another
good one, that BHRB falls into.

But, we do want flags that carry stronger or more direct semantics
wrt code generation because you want to avoid redundant hflags values
that result in the same code generation. I might have missed something
but AFAIKS BHRB_ENABLED could be a combination of this logic (from
later patch):

+/* ISA 3.1 adds the PMCRA[BRHBRD] and problem state checks */
+if ((ctx->insns_flags2 & PPC2_ISA310) && (ctx->mmcra_bhrbrd || !ctx->pr)) {
+return;
+}
+
+/* Check for BHRB "frozen" conditions */
+if (ctx->mmcr0_fcpc) {
+if (ctx->mmcr0_fcp) {
+if ((ctx->hv) && (ctx->pr)) {
+return;
+}
+} else if (!(ctx->hv) && (ctx->pr)) {
+return;
+}
+} else if ((ctx->mmcr0_fcp) && (ctx->pr)) {
+return;
+}

Otherwise the patch looks good to me.

Thanks,
Nick




Re: [RFC PATCH 0/8] i386/sev: Use C API of Rust SEV library

2023-09-14 Thread Tyler Fanelli

On 9/14/23 3:04 PM, Philippe Mathieu-Daudé wrote:

Hi Tyler,

On 14/9/23 19:58, Tyler Fanelli wrote:

These patches are submitted as an RFC mainly because I'm a relative
newcomer to QEMU with no knowledge of the community's views on
including Rust code, nor it's preference of using library APIs for
ioctls that were previously implemented in QEMU directly.

Recently, the Rust sev library [0] has introduced a C API to take
advantage of the library outside of Rust.

Should the inclusion of the library as a dependency be desired, it can
be extended further to include the firmware/platform ioctls, the
attestation report fetching, and more. This would result in much of
the AMD-SEV portion of QEMU being offloaded to the library.

This series looks to explore the possibility of using the library and
show a bit of what it would look like. I'm looking for comments
regarding if this feature is desired.

[0] https://github.com/virtee/sev

Tyler Fanelli (8):
   Add SEV Rust library as dependency with CONFIG_SEV
   i386/sev: Replace INIT and ES_INIT ioctls with sev library 
equivalents

   i386/sev: Replace LAUNCH_START ioctl with sev library equivalent
   i386/sev: Replace UPDATE_DATA ioctl with sev library equivalent
   i386/sev: Replace LAUNCH_UPDATE_VMSA ioctl with sev library 
equivalent

   i386/sev: Replace LAUNCH_MEASURE ioctl with sev library equivalent
   i386/sev: Replace LAUNCH_SECRET ioctl with sev library equivalent
   i386/sev: Replace LAUNCH_FINISH ioctl with sev library equivalent


There is still one ioctl use, GET_ATTESTATION_REPORT. No libsev
equivalent for this one yet?

There is an equivalent, however the machine that I'm using currently 
hangs when trying to fetch an attestation report (not a libsev issue, as 
it hangs when I try with latest qemu release as well). When I can either 
update its firmware or get access to another SEV machine, I can test and 
confirm it behaves as intended with the libsev API. Once this is done, I 
can add that API to the patch series.



Tyler




Re: [PATCH] tests/avocado: Fix console data loss

2023-09-14 Thread Nicholas Piggin
On Wed Sep 13, 2023 at 6:51 PM AEST, Alex Bennée wrote:
>
> Nicholas Piggin  writes:
>
> > Occasionally some avocado tests will fail waiting for console line
> > despite the machine running correctly. Console data goes missing, as can
> > be seen in the console log. This is due to _console_interaction calling
> > makefile() on the console socket each time it is invoked, which must be
> > losing old buffer contents when going out of scope.
> >
> > It is not enough to makefile() with buffered=0. That helps significantly
> > but data loss is still possible. My guess is that readline() has a line
> > buffer even when the file is in unbuffered mode, that can eat data.
> >
> > Fix this by providing a console file that persists for the life of the
> > console.
> >
> > Signed-off-by: Nicholas Piggin 
>
> Queued to testing/next, thanks.
>
> > ---
> >
> > For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> > was flakey for me due to this bug. I don't know why that in particular,
> > 3 calls to wait_for_console_pattern probably helps.
> >
> > I didn't pinpoint when the bug was introduced because the original
> > was probably not buggy because it was only run once at the end of the
> > test. At some point after it was moved to common code, something would
> > have started to call it more than once which is where potential for bug
> > is introduced.
>
> There is a sprawling mass somewhere between:
>
>   - pythons buffering of IO
>   - device models dropping chars when blocked
>   - noisy tests with competing console output
>
> that adds up to unreliable tests that rely on seeing certain patterns on
> the console. 

Yeah it's a tricky bug and a difficult stack to diagnose. I started to
look at 40p machine firmware console at first since it was happening on
there.

It's actually not too bad now, I was irritating it by putting delays in
various avocado console socket reading, which can trigger it easily (my
guess is due to delay allowing file buffer to pull in more data than is
consumed). With patch the only check-avocado failures I was getting was
some OS watchdog timeouts in their console print code caused by back
pressure.

Thanks,
Nick



Re: QEMU migration-test CI intermittent failure

2023-09-14 Thread Peter Xu
On Thu, Sep 14, 2023 at 07:54:17PM -0300, Fabiano Rosas wrote:
> Fabiano Rosas  writes:
> 
> > Peter Xu  writes:
> >
> >> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
> >>> I managed to reproduce it. It's not the return path error. In hindsight
> >>> that's obvious because that error happens in the 'recovery' test and this
> >>> one in the 'plain' one. Sorry about the noise.
> >>
> >> No worry.  It's good to finally identify that.
> >>
> >>> 
> >>> This one reproduced with just 4 iterations of preempt/plain. I'll
> >>> investigate.
> >
> > It seems that we're getting a tcp disconnect (ECONNRESET) on when doing
> > that shutdown() on postcopy_qemufile_src. The one from commit 6621883f93
> > ("migration: Fix potential race on postcopy_qemufile_src").
> >
> > I'm trying to determine why that happens when other times it just
> > returns 0 as expected.
> >
> > Could this mean that we're kicking the dest too soon while it is still
> > receiving valid data?
> 
> Looking a bit more into this, what's happening is that
> postcopy_ram_incoming_cleanup() is shutting the postcopy_qemufile_dst
> while ram_load_postcopy() is still running.
> 
> The postcopy_ram_listen_thread() function waits for the
> main_thread_load_event, but that only works when not using preempt. With
> the preempt thread, the event is set right away and we proceed to do the
> cleanup without waiting.
> 
> So the assumption of commit 6621883f93 that the incoming side knows when
> it has finished migrating is wrong IMO. Without the EOS we're relying on
> the chance that the shutdown() happens after the last recvmsg has
> returned and not during it.
> 
> Peter, what do you think?

That's a good point.

One thing to verify that (sorry, I still cannot reproduce it myself, which
is so weirdly... it seems loads won't really help reproducing this) is to
let the main thread wait for all requested pages to arrive:

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 29aea9456d..df055c51ea 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -597,6 +597,12 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
*mis)
 trace_postcopy_ram_incoming_cleanup_entry();
 
 if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
+/*
+ * NOTE!  it's possible that the preempt thread is still handling
+ * the last pages to arrive which were requested by faults.  Making
+ * sure nothing is left behind.
+ */
+while (qatomic_read(>page_requested_count));
 /* Notify the fast load thread to quit */
 mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
 if (mis->postcopy_qemufile_dst) {

If that can work solidly, we can figure out a better way than a dead loop
here.

Thanks,

-- 
Peter Xu




Re: QEMU migration-test CI intermittent failure

2023-09-14 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Peter Xu  writes:
>
>> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
>>> I managed to reproduce it. It's not the return path error. In hindsight
>>> that's obvious because that error happens in the 'recovery' test and this
>>> one in the 'plain' one. Sorry about the noise.
>>
>> No worry.  It's good to finally identify that.
>>
>>> 
>>> This one reproduced with just 4 iterations of preempt/plain. I'll
>>> investigate.
>
> It seems that we're getting a tcp disconnect (ECONNRESET) on when doing
> that shutdown() on postcopy_qemufile_src. The one from commit 6621883f93
> ("migration: Fix potential race on postcopy_qemufile_src").
>
> I'm trying to determine why that happens when other times it just
> returns 0 as expected.
>
> Could this mean that we're kicking the dest too soon while it is still
> receiving valid data?

Looking a bit more into this, what's happening is that
postcopy_ram_incoming_cleanup() is shutting the postcopy_qemufile_dst
while ram_load_postcopy() is still running.

The postcopy_ram_listen_thread() function waits for the
main_thread_load_event, but that only works when not using preempt. With
the preempt thread, the event is set right away and we proceed to do the
cleanup without waiting.

So the assumption of commit 6621883f93 that the incoming side knows when
it has finished migrating is wrong IMO. Without the EOS we're relying on
the chance that the shutdown() happens after the last recvmsg has
returned and not during it.

Peter, what do you think?



[PATCH 0/1] hw/arm/sse-timer: Add CNTFRQ reset property

2023-09-14 Thread Joe Komlodi
Hi all,

This patch just adds an object property to initialize the reset value of
CNTFRQ.
We noticed that Linux would complain that CNTFRQ would have a mismatch
compared to an expected value, and this was because TF-A was assuming
that CNTFRQ was initialized to a different value out of reset.

Since it's valid for CNTFRQ to have a non-zero reset value, we just
added an object property so people can set it.

Thanks!
Joe

Joe Komlodi (1):
  hw/timer/sse-timer: Add CNTFRQ reset property

 hw/timer/sse-timer.c | 4 +++-
 include/hw/timer/sse-timer.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
2.42.0.459.ge4e396fd5e-goog




[PATCH 1/1] hw/timer/sse-timer: Add CNTFRQ reset property

2023-09-14 Thread Joe Komlodi
This can have a non-zero reset value, and cause the kernel to complain
about a CNTFRQ mismatch if TF-A (or firmware in general) does not
initialize it (because it expects the value to be non-zero out of
reset).

To fix this, we'll just add an object property that people can use to
initialize the CNTFRQ reset value.

Signed-off-by: Joe Komlodi 
---
 hw/timer/sse-timer.c | 4 +++-
 include/hw/timer/sse-timer.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/timer/sse-timer.c b/hw/timer/sse-timer.c
index e92e83747d..a727f05bac 100644
--- a/hw/timer/sse-timer.c
+++ b/hw/timer/sse-timer.c
@@ -376,7 +376,7 @@ static void sse_timer_reset(DeviceState *dev)
 trace_sse_timer_reset();
 
 timer_del(>timer);
-s->cntfrq = 0;
+s->cntfrq = s->cntfrq_reset;
 s->cntp_ctl = 0;
 s->cntp_cval = 0;
 s->cntp_aival = 0;
@@ -430,6 +430,7 @@ static const VMStateDescription sse_timer_vmstate = {
 .minimum_version_id = 1,
 .fields = (VMStateField[]) {
 VMSTATE_TIMER(timer, SSETimer),
+VMSTATE_UINT32(cntfrq_reset, SSETimer),
 VMSTATE_UINT32(cntfrq, SSETimer),
 VMSTATE_UINT32(cntp_ctl, SSETimer),
 VMSTATE_UINT64(cntp_cval, SSETimer),
@@ -442,6 +443,7 @@ static const VMStateDescription sse_timer_vmstate = {
 
 static Property sse_timer_properties[] = {
 DEFINE_PROP_LINK("counter", SSETimer, counter, TYPE_SSE_COUNTER, 
SSECounter *),
+DEFINE_PROP_UINT32("cntfrq-reset", SSETimer, cntfrq_reset, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/timer/sse-timer.h b/include/hw/timer/sse-timer.h
index 265ad32400..ad84c24940 100644
--- a/include/hw/timer/sse-timer.h
+++ b/include/hw/timer/sse-timer.h
@@ -43,6 +43,8 @@ struct SSETimer {
 QEMUTimer timer;
 Notifier counter_notifier;
 
+uint32_t cntfrq_reset;
+
 uint32_t cntfrq;
 uint32_t cntp_ctl;
 uint64_t cntp_cval;
-- 
2.42.0.459.ge4e396fd5e-goog




Re: [PULL 3/5] hw/ufs: Support for Query Transfer Requests

2023-09-14 Thread Jeuk Kim



On 23. 9. 14. 23:40, Peter Maydell wrote:

On Thu, 7 Sept 2023 at 19:17, Stefan Hajnoczi  wrote:

From: Jeuk Kim 

This commit makes the UFS device support query
and nop out transfer requests.

The next patch would be support for UFS logical
unit and scsi command transfer request.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
ff7a5f0fd26761936a553ffb89d3df0ba62844e9.1693980783.git.jeuk20@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
  hw/ufs/ufs.h|  46 +++
  hw/ufs/ufs.c| 988 +++-
  hw/ufs/trace-events |   1 +
  3 files changed, 1033 insertions(+), 2 deletions(-)

Hi; Coverity isn't happy about the code in this function
(CID 1519050). The code isn't strictly wrong, but it's
probably possible to make it a bit more clearly correct.


+static void ufs_process_db(UfsHc *u, uint32_t val)
+{
+unsigned long doorbell;
+uint32_t slot;
+uint32_t nutrs = u->params.nutrs;
+UfsRequest *req;
+
+val &= ~u->reg.utrldbr;
+if (!val) {
+return;
+}
+
+doorbell = val;
+slot = find_first_bit(, nutrs);

Here we pass the address of a single 'unsigned long' to
find_first_bit(). That function operates on arrays, so
unless nutrs is guaranteed to be less than 32 this might
walk off the end of memory.

There is a check on params.nutrs in ufs_check_constraints(),
which checks for "> UFS_MAX_NUTRS" and that value is 32,
so this won't actually overflow, but Coverity can't
see that check and in any case what it really doesn't
like here is the passing of the address of a 'long'
variable to a function that is prototyped as taking
an array of longs.

You can probably make Coverity happy by defining
doorbell here as a 1 element array, and asserting
that nutrs is 32 or less. Alternatively, we have
ctz32() for working through bits in a uint32_t, though
that is a bit lower-level than find_first_bit/find_next_bit.


+
+while (slot < nutrs) {
+req = >req_list[slot];
+if (req->state == UFS_REQUEST_ERROR) {
+trace_ufs_err_utrl_slot_error(req->slot);
+return;
+}
+
+if (req->state != UFS_REQUEST_IDLE) {
+trace_ufs_err_utrl_slot_busy(req->slot);
+return;
+}
+
+trace_ufs_process_db(slot);
+req->state = UFS_REQUEST_READY;
+slot = find_next_bit(, nutrs, slot + 1);
+}
+
+qemu_bh_schedule(u->doorbell_bh);
+}

thanks
-- PMM



Thank you for letting me know about the coverity issue with a detailed 
description!


I have checked all the coverity issues related to ufs.
(cid 1519042, cid 1519043, cid 1519050, cid 1519051)

I will fix them with an additional patch as soon as possible.

Thank you!

Jeuk




Re: [PULL 4/5] hw/ufs: Support for UFS logical unit

2023-09-14 Thread Jeuk Kim



On 23. 9. 15. 02:31, Paolo Bonzini wrote:

On 9/7/23 20:16, Stefan Hajnoczi wrote:

From: Jeuk Kim

This commit adds support for ufs logical unit.
The LU handles processing for the SCSI command,
unit descriptor query request.

This commit enables the UFS device to process
IO requests.

Signed-off-by: Jeuk Kim
Reviewed-by: Stefan Hajnoczi
Message-id:beacc504376ab6a14b1a3830bb3c69382cf6aebc.1693980783.git.jeuk20@gmail.com 


Signed-off-by: Stefan Hajnoczi
---


Jeuk,

can you explain the differences between scsi-hd and ufs-lu, apart from 
the different bus type?  Ideally, the UFS controller would be in 
hw/scsi/ufs.c and there would be no need for ufs-lu at all.


Would it make sense to allow any SCSI device into a UFS bus without 
the need to have duplicate code?


Thanks!

Paolo




Hi Paolo,


While ufs does use the SCSI specification to communicate with the driver,

it doesn't behave exactly like a typical scsi device.


First, ufs-lu has a feature called "unit descriptor". This feature shows 
the status of the ufs-lu


and only works with UFS-specific "query request" commands, not SCSI 
commands.



UFS also has something called a well-known lu. Unlike typical SCSI 
devices, where each lu is independent,


UFS can control other lu's through the well-known lu.


Finally, UFS-LU will have features that SCSI-HD does not have, such as 
the zone block command.



In addition to this, I wanted some scsi commands to behave differently 
from scsi-hd, for example,


the Inquiry command should read "QEMU UFS" instead of "QEMU HARDDISK",

and the mode_sense_page command should have a different result.


For these reasons, I chose to generate the ufs-lu code separately.


Please let me know if you have any comments on this.


Thanks!

Jeuk




Re: [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-14 Thread Peter Xu
On Thu, Sep 14, 2023 at 08:20:53PM +, “William Roche wrote:
> From: William Roche 
> 
> A Qemu VM can survive a memory error, as qemu can relay the error to the
> VM kernel which could also deal with it -- poisoning/off-lining the impacted
> page.
> This situation creates a hole in the VM memory address space that the VM 
> kernel
> knows about (an unreadable page or set of pages).
> 
> But the migration of this VM (live migration through the network or
> pseudo-migration with the creation of a state file) will crash Qemu when
> it sequentially reads the memory address space and stumbles on the
> existing hole.
> 
> In order to correct this problem, I suggest to treat the poisoned pages as if
> they were zero-pages for the migration copy.
> This fix also works with underlying large pages, taking into account the
> RAMBlock segment "page-size".
> This fix is scripts/checkpatch.pl clean.
> 
> v2:
>   - adding compressed transfer handling of poisoned pages
>  
> Testing: I could verify that migration now works with a poisoned page
> through standard and compressed migration with 4k and large (2M) pages.
> 
> The RDMA transfer is not considered by this patch.
> 
> William Roche (1):
>   migration: skip poisoned memory pages on "ram saving" phase

If there's a new version, please consider adding a TODO above
control_save_page() that poison page is probably broken there, so we can
still remember.

Reviewed-by: Peter Xu 

Copy:

lizhij...@fujitsu.com, lidongc...@tencent.com

Thanks,

-- 
Peter Xu




Re: [PATCH v3 2/2] qemu-img: map: report compressed data blocks

2023-09-14 Thread Andrey Drobyshev
On 9/15/23 00:17, Eric Blake wrote:
> On Fri, Sep 08, 2023 at 12:02:26AM +0300, Andrey Drobyshev wrote:
>> Right now "qemu-img map" reports compressed blocks as containing data
>> but having no host offset.  This is not very informative.  Instead,
>> let's add another boolean field named "compressed" in case JSON output
>> mode is specified.  This is achieved by utilizing new allocation status
>> flag BDRV_BLOCK_COMPRESSED for bdrv_block_status().
>>
>> Also update the expected qemu-iotests outputs to contain the new field.
>>
>> Signed-off-by: Andrey Drobyshev 
>> ---
> 
>> +++ b/qapi/block-core.json
>> @@ -409,6 +409,9 @@
>>  #
>>  # @zero: whether the virtual blocks read as zeroes
>>  #
>> +# @compressed: true indicates that data is stored compressed.  Only valid
>> +# for the formats whith support compression (since 8.2)
> 
> s/whith/which/
> 
> "compressed":false seems universally valid for all other file formats,
> and the field is not marked as optional.  Do we really need the
> disclaimer?  Could we get by with the shorter 'Will be false for
> formats that do not support compression', or by omitting it
> altogether?
> 

You're right, this remark should've been removed as it only makes sense
in case of the field being optional.  Feel free to remove it altogether,
or I can send a follow-up if you prefer.

Andrey



Re: [PATCH] target/hppa: Optimize ldcw/ldcd instruction translation

2023-09-14 Thread Helge Deller

Hi Richard,

On 9/13/23 22:30, Richard Henderson wrote:

On 9/13/23 10:19, Helge Deller wrote:

On 9/13/23 18:55, Richard Henderson wrote:

On 9/13/23 07:47, Helge Deller wrote:

+    haddr = (uint32_t *)((uintptr_t)vaddr);
+    old = *haddr;


This is horribly incorrect, both for user-only and system mode.


Richard, thank you for the review!
But would you mind explaining why this is incorrect?
I thought the "vaddr = probe_access()" calculates the host address, so
shouldn't it be the right address?


The vaddr name is confusing (since it implies virtual address, which
the return from probe_access is not) as are the casts, which are
unnecessary.


Still, I think my code isn't as wrong as you said.

But I tend to agree with you on this:

Frankly, the current tcg_gen_atomic_xchg_reg is as optimized as
you'll be able to do.

tcg_gen_atomic_xchg_reg() seems to generate on x86-64:

00525160 :
  525160:   53  push   %rbx
  525161:   4c 8b 44 24 08  mov0x8(%rsp),%r8
  525166:   89 d3   mov%edx,%ebx
  525168:   89 ca   mov%ecx,%edx
  52516a:   b9 04 00 00 00  mov$0x4,%ecx
  52516f:   e8 1c a6 ff ff  call   51f790 
  525174:   48 89 c2mov%rax,%rdx
  525177:   89 d8   mov%ebx,%eax
  525179:   0f c8   bswap  %eax
  52517b:   87 02   xchg   %eax,(%rdx)
  52517d:   5b  pop%rbx
  52517e:   0f c8   bswap  %eax
  525180:   c3  ret

and atomic_mmu_lookup() is basically the same as probe_access(),
so there is probably no gain in my patch.

Please ignore my patch.

Thank you!
Helge



Re: [PATCH v3 2/2] qemu-img: map: report compressed data blocks

2023-09-14 Thread Eric Blake
On Fri, Sep 08, 2023 at 12:02:26AM +0300, Andrey Drobyshev wrote:
> Right now "qemu-img map" reports compressed blocks as containing data
> but having no host offset.  This is not very informative.  Instead,
> let's add another boolean field named "compressed" in case JSON output
> mode is specified.  This is achieved by utilizing new allocation status
> flag BDRV_BLOCK_COMPRESSED for bdrv_block_status().
> 
> Also update the expected qemu-iotests outputs to contain the new field.
> 
> Signed-off-by: Andrey Drobyshev 
> ---

> +++ b/qapi/block-core.json
> @@ -409,6 +409,9 @@
>  #
>  # @zero: whether the virtual blocks read as zeroes
>  #
> +# @compressed: true indicates that data is stored compressed.  Only valid
> +# for the formats whith support compression (since 8.2)

s/whith/which/

"compressed":false seems universally valid for all other file formats,
and the field is not marked as optional.  Do we really need the
disclaimer?  Could we get by with the shorter 'Will be false for
formats that do not support compression', or by omitting it
altogether?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: QEMU migration-test CI intermittent failure

2023-09-14 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
>> I managed to reproduce it. It's not the return path error. In hindsight
>> that's obvious because that error happens in the 'recovery' test and this
>> one in the 'plain' one. Sorry about the noise.
>
> No worry.  It's good to finally identify that.
>
>> 
>> This one reproduced with just 4 iterations of preempt/plain. I'll
>> investigate.

It seems that we're getting a tcp disconnect (ECONNRESET) on when doing
that shutdown() on postcopy_qemufile_src. The one from commit 6621883f93
("migration: Fix potential race on postcopy_qemufile_src").

I'm trying to determine why that happens when other times it just
returns 0 as expected.

Could this mean that we're kicking the dest too soon while it is still
receiving valid data?



Re: [PATCH v4 02/14] simpletrace: annotate magic constants from QEMU code

2023-09-14 Thread Stefan Hajnoczi
On Wed, Aug 23, 2023 at 10:54:17AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> It wasn't clear where the constants and structs came from, so I added
> comments to help.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v4 01/14] simpletrace: add __all__ to define public interface

2023-09-14 Thread Stefan Hajnoczi
On Wed, Aug 23, 2023 at 10:54:16AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> It was unclear what was the supported public interface. I.e. when
> refactoring the code, what functions/classes are important to retain.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 3/3] hw/nvme: add nvme management interface model

2023-09-14 Thread Corey Minyard
On Thu, Sep 14, 2023 at 11:53:43AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add the 'nmi-i2c' device that emulates an NVMe Management Interface
> controller.
> 
> Initial support is very basic (Read NMI DS, Configuration Get).
> 
> This is based on previously posted code by Padmakar Kalghatgi, Arun
> Kumar Agasar and Saurav Kumar.

This seems fine.

Acked-by: Corey Minyard 

One question, though.  You don't have any tests.  Did you test invalid
packets and such?  I think the logic is correct, but those are things
that are good to test.  Having tests in qemu would be even better.


> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/nvme/Kconfig  |   4 +
>  hw/nvme/meson.build  |   1 +
>  hw/nvme/nmi-i2c.c| 407 
> +++
>  hw/nvme/trace-events |   6 +
>  4 files changed, 418 insertions(+)
> 
> diff --git a/hw/nvme/Kconfig b/hw/nvme/Kconfig
> index cfa2ab0f9d5a..e1f6360c0f4b 100644
> --- a/hw/nvme/Kconfig
> +++ b/hw/nvme/Kconfig
> @@ -2,3 +2,7 @@ config NVME_PCI
>  bool
>  default y if PCI_DEVICES || PCIE_DEVICES
>  depends on PCI
> +
> +config NVME_NMI_I2C
> +bool
> +default y if I2C_MCTP
> diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build
> index 1a6a2ca2f307..7bc85f31c149 100644
> --- a/hw/nvme/meson.build
> +++ b/hw/nvme/meson.build
> @@ -1 +1,2 @@
>  system_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 
> 'ns.c', 'subsys.c'))
> +system_ss.add(when: 'CONFIG_NVME_NMI_I2C', if_true: files('nmi-i2c.c'))
> diff --git a/hw/nvme/nmi-i2c.c b/hw/nvme/nmi-i2c.c
> new file mode 100644
> index ..bf4648db0457
> --- /dev/null
> +++ b/hw/nvme/nmi-i2c.c
> @@ -0,0 +1,407 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * SPDX-FileCopyrightText: Copyright (c) 2023 Samsung Electronics Co., Ltd.
> + *
> + * SPDX-FileContributor: Padmakar Kalghatgi 
> + * SPDX-FileContributor: Arun Kumar Agasar 
> + * SPDX-FileContributor: Saurav Kumar 
> + * SPDX-FileContributor: Klaus Jensen 
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/crc32c.h"
> +#include "hw/registerfields.h"
> +#include "hw/i2c/i2c.h"
> +#include "hw/i2c/mctp.h"
> +#include "net/mctp.h"
> +#include "trace.h"
> +
> +/* NVM Express Management Interface 1.2c, Section 3.1 */
> +#define NMI_MAX_MESSAGE_LENGTH 4224
> +
> +#define TYPE_NMI_I2C_DEVICE "nmi-i2c"
> +OBJECT_DECLARE_SIMPLE_TYPE(NMIDevice, NMI_I2C_DEVICE)
> +
> +typedef struct NMIDevice {
> +MCTPI2CEndpoint mctp;
> +
> +uint8_t buffer[NMI_MAX_MESSAGE_LENGTH];
> +uint8_t scratch[NMI_MAX_MESSAGE_LENGTH];
> +
> +size_t  len;
> +int64_t pos;
> +} NMIDevice;
> +
> +FIELD(NMI_MCTPD, MT, 0, 7)
> +FIELD(NMI_MCTPD, IC, 7, 1)
> +
> +#define NMI_MCTPD_MT_NMI 0x4
> +#define NMI_MCTPD_IC_ENABLED 0x1
> +
> +FIELD(NMI_NMP, ROR, 7, 1)
> +FIELD(NMI_NMP, NMIMT, 3, 4)
> +
> +#define NMI_NMP_NMIMT_NVME_MI 0x1
> +#define NMI_NMP_NMIMT_NVME_ADMIN 0x2
> +
> +typedef struct NMIMessage {
> +uint8_t mctpd;
> +uint8_t nmp;
> +uint8_t rsvd2[2];
> +uint8_t payload[]; /* includes the Message Integrity Check */
> +} NMIMessage;
> +
> +typedef struct NMIRequest {
> +   uint8_t opc;
> +   uint8_t rsvd1[3];
> +   uint32_t dw0;
> +   uint32_t dw1;
> +   uint32_t mic;
> +} NMIRequest;
> +
> +FIELD(NMI_CMD_READ_NMI_DS_DW0, DTYP, 24, 8)
> +
> +typedef enum NMIReadDSType {
> +NMI_CMD_READ_NMI_DS_SUBSYSTEM   = 0x0,
> +NMI_CMD_READ_NMI_DS_PORTS   = 0x1,
> +NMI_CMD_READ_NMI_DS_CTRL_LIST   = 0x2,
> +NMI_CMD_READ_NMI_DS_CTRL_INFO   = 0x3,
> +NMI_CMD_READ_NMI_DS_OPT_CMD_SUPPORT = 0x4,
> +NMI_CMD_READ_NMI_DS_MEB_CMD_SUPPORT = 0x5,
> +} NMIReadDSType;
> +
> +#define NMI_STATUS_INVALID_PARAMETER 0x4
> +
> +static void nmi_scratch_append(NMIDevice *nmi, const void *buf, size_t count)
> +{
> +assert(nmi->pos + count <= NMI_MAX_MESSAGE_LENGTH);
> +
> +memcpy(nmi->scratch + nmi->pos, buf, count);
> +nmi->pos += count;
> +}
> +
> +static void nmi_set_parameter_error(NMIDevice *nmi, uint8_t bit, uint16_t 
> byte)
> +{
> +/* NVM Express Management Interface 1.2c, Figure 30 */
> +struct resp {
> +uint8_t  status;
> +uint8_t  bit;
> +uint16_t byte;
> +};
> +
> +struct resp buf = {
> +.status = NMI_STATUS_INVALID_PARAMETER,
> +.bit = bit & 0x3,
> +.byte = byte,
> +};
> +
> +nmi_scratch_append(nmi, , sizeof(buf));
> +}
> +
> +static void nmi_set_error(NMIDevice *nmi, uint8_t status)
> +{
> +const uint8_t buf[4] = {status,};
> +
> +nmi_scratch_append(nmi, buf, sizeof(buf));
> +}
> +
> +static void nmi_handle_mi_read_nmi_ds(NMIDevice *nmi, NMIRequest *request)
> +{
> +I2CSlave *i2c = I2C_SLAVE(nmi);
> +
> +uint32_t dw0 = le32_to_cpu(request->dw0);
> +uint8_t dtyp = FIELD_EX32(dw0, NMI_CMD_READ_NMI_DS_DW0, DTYP);
> +
> +trace_nmi_handle_mi_read_nmi_ds(dtyp);
> +
> +static const uint8_t nmi_ds_subsystem[36] = 

Re: [PATCH v6 2/3] hw/i2c: add mctp core

2023-09-14 Thread Corey Minyard
On Thu, Sep 14, 2023 at 11:53:42AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add an abstract MCTP over I2C endpoint model. This implements MCTP
> control message handling as well as handling the actual I2C transport
> (packetization).
> 
> Devices are intended to derive from this and implement the class
> methods.
> 
> Parts of this implementation is inspired by code[1] previously posted by
> Jonathan Cameron.

I've been kind of watching this, I guess I need to review.  I've been
over the logic and it all looks good, I think.  So I can do:

Acked-by: Corey Minyard 

Thanks to everyone that reviewed.

> 
> Squashed a fix[2] from Matt Johnston.
> 
>   [1]: 
> https://lore.kernel.org/qemu-devel/20220520170128.4436-1-jonathan.came...@huawei.com/
>   [2]: 
> https://lore.kernel.org/qemu-devel/20221121080445.ga29...@codeconstruct.com.au/
> 
> Tested-by: Jonathan Cameron 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Klaus Jensen 
> ---
>  MAINTAINERS   |   7 +
>  hw/arm/Kconfig|   1 +
>  hw/i2c/Kconfig|   4 +
>  hw/i2c/mctp.c | 432 
> ++
>  hw/i2c/meson.build|   1 +
>  hw/i2c/trace-events   |  13 ++
>  include/hw/i2c/mctp.h | 125 +++
>  include/net/mctp.h|  35 
>  8 files changed, 618 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00562f924f7a..3208ebb1bcde 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3404,6 +3404,13 @@ F: tests/qtest/adm1272-test.c
>  F: tests/qtest/max34451-test.c
>  F: tests/qtest/isl_pmbus_vr-test.c
>  
> +MCTP I2C Transport
> +M: Klaus Jensen 
> +S: Maintained
> +F: hw/i2c/mctp.c
> +F: include/hw/i2c/mctp.h
> +F: include/net/mctp.h
> +
>  Firmware schema specifications
>  M: Philippe Mathieu-Daudé 
>  R: Daniel P. Berrange 
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 7e6834844051..5bcb1e0e8a6f 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -541,6 +541,7 @@ config ASPEED_SOC
>  select DS1338
>  select FTGMAC100
>  select I2C
> +select I2C_MCTP
>  select DPS310
>  select PCA9552
>  select SERIAL
> diff --git a/hw/i2c/Kconfig b/hw/i2c/Kconfig
> index 14886b35dac2..2b2a50b83d1e 100644
> --- a/hw/i2c/Kconfig
> +++ b/hw/i2c/Kconfig
> @@ -6,6 +6,10 @@ config I2C_DEVICES
>  # to any board's i2c bus
>  bool
>  
> +config I2C_MCTP
> +bool
> +select I2C
> +
>  config SMBUS
>  bool
>  select I2C
> diff --git a/hw/i2c/mctp.c b/hw/i2c/mctp.c
> new file mode 100644
> index ..8d8e74567745
> --- /dev/null
> +++ b/hw/i2c/mctp.c
> @@ -0,0 +1,432 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * SPDX-FileCopyrightText: Copyright (c) 2023 Samsung Electronics Co., Ltd.
> + * SPDX-FileContributor: Klaus Jensen 
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/main-loop.h"
> +
> +#include "hw/qdev-properties.h"
> +#include "hw/i2c/i2c.h"
> +#include "hw/i2c/smbus_master.h"
> +#include "hw/i2c/mctp.h"
> +#include "net/mctp.h"
> +
> +#include "trace.h"
> +
> +/* DSP0237 1.2.0, Figure 1 */
> +typedef struct MCTPI2CPacketHeader {
> +uint8_t dest;
> +#define MCTP_I2C_COMMAND_CODE 0xf
> +uint8_t command_code;
> +uint8_t byte_count;
> +uint8_t source;
> +} MCTPI2CPacketHeader;
> +
> +typedef struct MCTPI2CPacket {
> +MCTPI2CPacketHeader i2c;
> +MCTPPacket  mctp;
> +} MCTPI2CPacket;
> +
> +#define i2c_mctp_payload_offset offsetof(MCTPI2CPacket, mctp.payload)
> +#define i2c_mctp_payload(buf) (buf + i2c_mctp_payload_offset)
> +
> +/* DSP0236 1.3.0, Figure 20 */
> +typedef struct MCTPControlMessage {
> +#define MCTP_MESSAGE_TYPE_CONTROL 0x0
> +uint8_t type;
> +#define MCTP_CONTROL_FLAGS_RQ   (1 << 7)
> +#define MCTP_CONTROL_FLAGS_D(1 << 6)
> +uint8_t flags;
> +uint8_t command_code;
> +uint8_t data[];
> +} MCTPControlMessage;
> +
> +enum MCTPControlCommandCodes {
> +MCTP_CONTROL_SET_EID= 0x01,
> +MCTP_CONTROL_GET_EID= 0x02,
> +MCTP_CONTROL_GET_VERSION= 0x04,
> +MCTP_CONTROL_GET_MESSAGE_TYPE_SUPPORT   = 0x05,
> +};
> +
> +#define MCTP_CONTROL_ERROR_UNSUPPORTED_CMD 0x5
> +
> +#define i2c_mctp_control_data_offset \
> +(i2c_mctp_payload_offset + offsetof(MCTPControlMessage, data))
> +#define i2c_mctp_control_data(buf) (buf + i2c_mctp_control_data_offset)
> +
> +/**
> + * The byte count field in the SMBUS Block Write containers the number of 
> bytes
> + * *following* the field itself.
> + *
> + * This is at least 5.
> + *
> + * 1 byte for the MCTP/I2C piggy-backed I2C source address in addition to the
> + * size of the MCTP transport/packet header.
> + */
> +#define MCTP_I2C_BYTE_COUNT_OFFSET (sizeof(MCTPPacketHeader) + 1)
> +
> +void i2c_mctp_schedule_send(MCTPI2CEndpoint *mctp)
> +{
> +I2CBus *i2c = I2C_BUS(qdev_get_parent_bus(DEVICE(mctp)));
> +
> +mctp->tx.state = I2C_MCTP_STATE_TX_START_SEND;
> +
> +

Re: [PATCH v6 1/3] hw/i2c: add smbus pec utility function

2023-09-14 Thread Corey Minyard
On Thu, Sep 14, 2023 at 11:53:41AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add i2c_smbus_pec() to calculate the SMBus Packet Error Code for a
> message.

Seems fine.

Acked-by: Corey Minyard 

> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/i2c/smbus_master.c | 26 ++
>  include/hw/i2c/smbus_master.h |  2 ++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/hw/i2c/smbus_master.c b/hw/i2c/smbus_master.c
> index 6a53c34e70b7..01a8e4700222 100644
> --- a/hw/i2c/smbus_master.c
> +++ b/hw/i2c/smbus_master.c
> @@ -15,6 +15,32 @@
>  #include "hw/i2c/i2c.h"
>  #include "hw/i2c/smbus_master.h"
>  
> +static uint8_t crc8(uint16_t data)
> +{
> +int i;
> +
> +for (i = 0; i < 8; i++) {
> +if (data & 0x8000) {
> +data ^= 0x1070U << 3;
> +}
> +
> +data <<= 1;
> +}
> +
> +return (uint8_t)(data >> 8);
> +}
> +
> +uint8_t i2c_smbus_pec(uint8_t crc, uint8_t *buf, size_t len)
> +{
> +int i;
> +
> +for (i = 0; i < len; i++) {
> +crc = crc8((crc ^ buf[i]) << 8);
> +}
> +
> +return crc;
> +}
> +
>  /* Master device commands.  */
>  int smbus_quick_command(I2CBus *bus, uint8_t addr, int read)
>  {
> diff --git a/include/hw/i2c/smbus_master.h b/include/hw/i2c/smbus_master.h
> index bb13bc423c22..d90f81767d86 100644
> --- a/include/hw/i2c/smbus_master.h
> +++ b/include/hw/i2c/smbus_master.h
> @@ -27,6 +27,8 @@
>  
>  #include "hw/i2c/i2c.h"
>  
> +uint8_t i2c_smbus_pec(uint8_t crc, uint8_t *buf, size_t len);
> +
>  /* Master device commands.  */
>  int smbus_quick_command(I2CBus *bus, uint8_t addr, int read);
>  int smbus_receive_byte(I2CBus *bus, uint8_t addr);
> 
> -- 
> 2.42.0
> 
> 



[PATCH v5 11/23] bsd-user: Introduce bsd-mem.h to the source tree

2023-09-14 Thread Karim Taha
From: Stacey Son 

Preserve the copyright notice and help with the 'Author' info for
subsequent changes to the file.

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 64 +++
 bsd-user/freebsd/os-syscall.c |  1 +
 2 files changed, 65 insertions(+)
 create mode 100644 bsd-user/bsd-mem.h

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
new file mode 100644
index 00..d865e0807d
--- /dev/null
+++ b/bsd-user/bsd-mem.h
@@ -0,0 +1,64 @@
+/*
+ *  memory management system call shims and definitions
+ *
+ *  Copyright (c) 2013-15 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+/*
+ * Copyright (c) 1982, 1986, 1993
+ *  The Regents of the University of California.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#ifndef BSD_USER_BSD_MEM_H
+#define BSD_USER_BSD_MEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu-bsd.h"
+
+extern struct bsd_shm_regions bsd_shm_regions[];
+extern abi_ulong target_brk;
+extern abi_ulong initial_target_brk;
+
+#endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 4c99760a21..42cd52a406 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -35,6 +35,7 @@
 
 /* BSD independent syscall shims */
 #include "bsd-file.h"
+#include "bsd-mem.h"
 #include "bsd-proc.h"
 
 /* *BSD dependent syscall shims */
-- 
2.42.0




[PATCH v5 08/23] bsd-user: Implement target_set_brk function in bsd-mem.c instead of os-syscall.c

2023-09-14 Thread Karim Taha
From: Stacey Son 

The definitions and variables names matches the corresponding ones in
linux-user/syscall.c, for making later implementation of do_obreak easier

Co-authored-by: Mikaël Urankar 
Signed-off-by: Mikaël Urankar 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c| 32 
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index e69de29bb2..8834ab2e58 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -0,0 +1,32 @@
+/*
+ *  memory management system conversion routines
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "qemu-bsd.h"
+
+struct bsd_shm_regions bsd_shm_regions[N_BSD_SHM_REGIONS];
+
+abi_ulong target_brk;
+abi_ulong initial_target_brk;
+
+void target_set_brk(abi_ulong new_brk)
+{
+target_brk = TARGET_PAGE_ALIGN(new_brk);
+initial_target_brk = target_brk;
+}
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index ae92a2314c..4c99760a21 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -60,10 +60,6 @@ safe_syscall3(ssize_t, writev, int, fd, const struct iovec 
*, iov, int, iovcnt);
 safe_syscall4(ssize_t, pwritev, int, fd, const struct iovec *, iov, int, 
iovcnt,
 off_t, offset);
 
-void target_set_brk(abi_ulong new_brk)
-{
-}
-
 /*
  * errno conversion.
  */
-- 
2.42.0




[PATCH v5 10/23] bsd-user: Implement shmid_ds conversion between host and target.

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index 46cda8eb5c..2ab1334b70 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -43,6 +43,30 @@ void target_to_host_ipc_perm__locked(struct ipc_perm 
*host_ip,
 __get_user(host_ip->key,  _ip->key);
 }
 
+abi_long target_to_host_shmid_ds(struct shmid_ds *host_sd,
+ abi_ulong target_addr)
+{
+struct target_shmid_ds *target_sd;
+
+if (!lock_user_struct(VERIFY_READ, target_sd, target_addr, 1)) {
+return -TARGET_EFAULT;
+}
+
+target_to_host_ipc_perm__locked(&(host_sd->shm_perm),
+&(target_sd->shm_perm));
+
+__get_user(host_sd->shm_segsz,  _sd->shm_segsz);
+__get_user(host_sd->shm_lpid,   _sd->shm_lpid);
+__get_user(host_sd->shm_cpid,   _sd->shm_cpid);
+__get_user(host_sd->shm_nattch, _sd->shm_nattch);
+__get_user(host_sd->shm_atime,  _sd->shm_atime);
+__get_user(host_sd->shm_dtime,  _sd->shm_dtime);
+__get_user(host_sd->shm_ctime,  _sd->shm_ctime);
+unlock_user_struct(target_sd, target_addr, 0);
+
+return 0;
+}
+
 void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
  struct ipc_perm *host_ip)
 {
@@ -55,3 +79,26 @@ void host_to_target_ipc_perm__locked(struct target_ipc_perm 
*target_ip,
 __put_user(host_ip->key,  _ip->key);
 }
 
+abi_long host_to_target_shmid_ds(abi_ulong target_addr,
+ struct shmid_ds *host_sd)
+{
+struct target_shmid_ds *target_sd;
+
+if (!lock_user_struct(VERIFY_WRITE, target_sd, target_addr, 0)) {
+return -TARGET_EFAULT;
+}
+
+host_to_target_ipc_perm__locked(&(target_sd->shm_perm),
+&(host_sd->shm_perm));
+
+__put_user(host_sd->shm_segsz,  _sd->shm_segsz);
+__put_user(host_sd->shm_lpid,   _sd->shm_lpid);
+__put_user(host_sd->shm_cpid,   _sd->shm_cpid);
+__put_user(host_sd->shm_nattch, _sd->shm_nattch);
+__put_user(host_sd->shm_atime,  _sd->shm_atime);
+__put_user(host_sd->shm_dtime,  _sd->shm_dtime);
+__put_user(host_sd->shm_ctime,  _sd->shm_ctime);
+unlock_user_struct(target_sd, target_addr, 1);
+
+return 0;
+}
-- 
2.42.0




[PATCH v5 20/23] bsd-user: Implement shm_unlink(2) and shmget(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 23 +++
 bsd-user/freebsd/os-syscall.c |  8 
 2 files changed, 31 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index f8dc943c23..c362cc07a3 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -282,4 +282,27 @@ static inline abi_long do_bsd_shm_open(abi_ulong arg1, 
abi_long arg2,
 return ret;
 }
 
+/* shm_unlink(2) */
+static inline abi_long do_bsd_shm_unlink(abi_ulong arg1)
+{
+int ret;
+void *p;
+
+p = lock_user_string(arg1);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shm_unlink(p)); /* XXX path(p)? */
+unlock_user(p, arg1, 0);
+
+return ret;
+}
+
+/* shmget(2) */
+static inline abi_long do_bsd_shmget(abi_long arg1, abi_ulong arg2,
+abi_long arg3)
+{
+return get_errno(shmget(arg1, arg2, arg3));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index effa6dac54..f0ccd787e5 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -655,6 +655,14 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 break;
 #endif
 
+case TARGET_FREEBSD_NR_shm_unlink: /* shm_unlink(2) */
+ret = do_bsd_shm_unlink(arg1);
+break;
+
+case TARGET_FREEBSD_NR_shmget: /* shmget(2) */
+ret = do_bsd_shmget(arg1, arg2, arg3);
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v5 12/23] bsd-user: Implement mmap(2) and munmap(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 20 
 bsd-user/freebsd/os-syscall.c |  9 +
 2 files changed, 29 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index d865e0807d..76b504f70c 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -61,4 +61,24 @@ extern struct bsd_shm_regions bsd_shm_regions[];
 extern abi_ulong target_brk;
 extern abi_ulong initial_target_brk;
 
+/* mmap(2) */
+static inline abi_long do_bsd_mmap(void *cpu_env, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
+{
+if (regpairs_aligned(cpu_env) != 0) {
+arg6 = arg7;
+arg7 = arg8;
+}
+return get_errno(target_mmap(arg1, arg2, arg3,
+ target_to_host_bitmask(arg4, mmap_flags_tbl),
+ arg5, target_arg64(arg6, arg7)));
+}
+
+/* munmap(2) */
+static inline abi_long do_bsd_munmap(abi_long arg1, abi_long arg2)
+{
+return get_errno(target_munmap(arg1, arg2));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 42cd52a406..893881c179 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -594,6 +594,15 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 /*
  * Memory management system calls.
  */
+case TARGET_FREEBSD_NR_mmap: /* mmap(2) */
+ret = do_bsd_mmap(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6, arg7,
+  arg8);
+break;
+
+case TARGET_FREEBSD_NR_munmap: /* munmap(2) */
+ret = do_bsd_munmap(arg1, arg2);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 13/23] bsd-user: Implement mprotect(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/bsd-mem.h| 7 +++
 bsd-user/freebsd/os-syscall.c | 4 
 2 files changed, 11 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 76b504f70c..0f9e4a1d4b 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -81,4 +81,11 @@ static inline abi_long do_bsd_munmap(abi_long arg1, abi_long 
arg2)
 return get_errno(target_munmap(arg1, arg2));
 }
 
+/* mprotect(2) */
+static inline abi_long do_bsd_mprotect(abi_long arg1, abi_long arg2,
+abi_long arg3)
+{
+return get_errno(target_mprotect(arg1, arg2, arg3));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 893881c179..74c0624637 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -603,6 +603,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_munmap(arg1, arg2);
 break;
 
+case TARGET_FREEBSD_NR_mprotect: /* mprotect(2) */
+ret = do_bsd_mprotect(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 17/23] bsd-user: Implement mincore(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 23 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 27 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b00ab3aed8..0c8d96d9a4 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -189,4 +189,27 @@ static inline abi_long do_bsd_minherit(abi_long addr, 
abi_long len,
 return get_errno(minherit(g2h_untagged(addr), len, inherit));
 }
 
+/* mincore(2) */
+static inline abi_long do_bsd_mincore(abi_ulong target_addr, abi_ulong len,
+abi_ulong target_vec)
+{
+abi_long ret;
+void *p;
+abi_ulong vec_len = DIV_ROUND_UP(len, TARGET_PAGE_SIZE);
+
+if (!guest_range_valid_untagged(target_addr, len)
+|| !page_check_range(target_addr, len, PAGE_VALID)) {
+return -TARGET_EFAULT;
+}
+
+p = lock_user(VERIFY_WRITE, target_vec, vec_len, 0);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mincore(g2h_untagged(target_addr), len, p));
+unlock_user(p, target_vec, vec_len);
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 600d048120..8ba5fcc6ca 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -635,6 +635,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_minherit(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_mincore: /* mincore(2) */
+ret = do_bsd_mincore(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 22/23] bsd-user: Implement shmat(2) and shmdt(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Use `WITH_MMAP_LOCK_GUARD` instead of mmap_lock() and mmap_unlock(),
to match linux-user implementation, according to the following commits:

69fa2708a216df715ba5102a0f98468b540a464e linux-user: Use WITH_MMAP_LOCK_GUARD 
in target_{shmat,shmdt}
ceda5688b650646248f269a992c06b11148c5759 linux-user: Fix shmdt

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
---
 bsd-user/bsd-mem.h| 87 +++
 bsd-user/freebsd/os-syscall.c |  8 
 bsd-user/mmap.c   |  2 +-
 bsd-user/qemu.h   |  1 +
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b82f3eaa25..c512a4e375 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -344,4 +344,91 @@ static inline abi_long do_bsd_shmctl(abi_long shmid, 
abi_long cmd,
 return ret;
 }
 
+/* shmat(2) */
+static inline abi_long do_bsd_shmat(int shmid, abi_ulong shmaddr, int shmflg)
+{
+abi_ulong raddr;
+abi_long ret;
+struct shmid_ds shm_info;
+
+/* Find out the length of the shared memory segment. */
+ret = get_errno(shmctl(shmid, IPC_STAT, _info));
+if (is_error(ret)) {
+/* Can't get the length */
+return ret;
+}
+
+if (!guest_range_valid_untagged(shmaddr, shm_info.shm_segsz)) {
+return -TARGET_EINVAL;
+}
+
+WITH_MMAP_LOCK_GUARD() {
+void *host_raddr;
+
+if (shmaddr) {
+host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
+} else {
+abi_ulong mmap_start;
+
+mmap_start = mmap_find_vma(0, shm_info.shm_segsz);
+
+if (mmap_start == -1) {
+return -TARGET_ENOMEM;
+}
+host_raddr = shmat(shmid, g2h_untagged(mmap_start),
+   shmflg | SHM_REMAP);
+}
+
+if (host_raddr == (void *)-1) {
+return get_errno(-1);
+}
+raddr = h2g(host_raddr);
+
+page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
+   PAGE_VALID | PAGE_RESET | PAGE_READ |
+   (shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
+
+for (int i = 0; i < N_BSD_SHM_REGIONS; i++) {
+if (bsd_shm_regions[i].start == 0) {
+bsd_shm_regions[i].start = raddr;
+bsd_shm_regions[i].size = shm_info.shm_segsz;
+break;
+}
+}
+}
+
+return raddr;
+}
+
+/* shmdt(2) */
+static inline abi_long do_bsd_shmdt(abi_ulong shmaddr)
+{
+abi_long ret;
+
+WITH_MMAP_LOCK_GUARD() {
+int i;
+
+for (i = 0; i < N_BSD_SHM_REGIONS; ++i) {
+if (bsd_shm_regions[i].start == shmaddr) {
+break;
+}
+}
+
+if (i == N_BSD_SHM_REGIONS) {
+return -TARGET_EINVAL;
+}
+
+ret = get_errno(shmdt(g2h_untagged(shmaddr)));
+if (ret == 0) {
+abi_ulong size = bsd_shm_regions[i].size;
+
+bsd_shm_regions[i].start = 0;
+page_set_flags(shmaddr, shmaddr + size - 1, 0);
+mmap_reserve(shmaddr, size);
+}
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 664b8de104..6b32d4df68 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -667,6 +667,14 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmctl(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_shmat: /* shmat(2) */
+ret = do_bsd_shmat(arg1, arg2, arg3);
+break;
+
+case TARGET_FREEBSD_NR_shmdt: /* shmdt(2) */
+ret = do_bsd_shmdt(arg1);
+break;
+
 /*
  * Misc
  */
diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 8e148a2ea3..3ef11b2807 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -636,7 +636,7 @@ fail:
 return -1;
 }
 
-static void mmap_reserve(abi_ulong start, abi_ulong size)
+void mmap_reserve(abi_ulong start, abi_ulong size)
 {
 abi_ulong real_start;
 abi_ulong real_end;
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index d3158bc2ed..09a8f9aed4 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -232,6 +232,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 int target_msync(abi_ulong start, abi_ulong len, int flags);
 extern abi_ulong mmap_next_start;
 abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size);
+void mmap_reserve(abi_ulong start, abi_ulong size);
 void TSA_NO_TSA mmap_fork_start(void);
 void TSA_NO_TSA mmap_fork_end(int child);
 
-- 
2.42.0




[PATCH v5 19/23] bsd-user: Implement shm_open(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Co-authored-by: Kyle Evans 

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 25 +
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 29 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b296c5c6f0..f8dc943c23 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -257,4 +257,29 @@ static inline abi_long do_obreak(abi_ulong brk_val)
 return target_brk;
 }
 
+/* shm_open(2) */
+static inline abi_long do_bsd_shm_open(abi_ulong arg1, abi_long arg2,
+abi_long arg3)
+{
+int ret;
+void *p;
+
+if (arg1 == (uintptr_t)SHM_ANON) {
+p = SHM_ANON;
+} else {
+p = lock_user_string(arg1);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+}
+ret = get_errno(shm_open(p, target_to_host_bitmask(arg2, fcntl_flags_tbl),
+ arg3));
+
+if (p != SHM_ANON) {
+unlock_user(p, arg1, 0);
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 5cd60fc272..effa6dac54 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -639,6 +639,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_mincore(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_freebsd12_shm_open: /* shm_open(2) */
+ret = do_bsd_shm_open(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 09/23] bsd-user: Implement ipc_perm conversion between host and target.

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index 8834ab2e58..46cda8eb5c 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -30,3 +30,28 @@ void target_set_brk(abi_ulong new_brk)
 target_brk = TARGET_PAGE_ALIGN(new_brk);
 initial_target_brk = target_brk;
 }
+
+void target_to_host_ipc_perm__locked(struct ipc_perm *host_ip,
+ struct target_ipc_perm *target_ip)
+{
+__get_user(host_ip->cuid, _ip->cuid);
+__get_user(host_ip->cgid, _ip->cgid);
+__get_user(host_ip->uid,  _ip->uid);
+__get_user(host_ip->gid,  _ip->gid);
+__get_user(host_ip->mode, _ip->mode);
+__get_user(host_ip->seq,  _ip->seq);
+__get_user(host_ip->key,  _ip->key);
+}
+
+void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
+ struct ipc_perm *host_ip)
+{
+__put_user(host_ip->cuid, _ip->cuid);
+__put_user(host_ip->cgid, _ip->cgid);
+__put_user(host_ip->uid,  _ip->uid);
+__put_user(host_ip->gid,  _ip->gid);
+__put_user(host_ip->mode, _ip->mode);
+__put_user(host_ip->seq,  _ip->seq);
+__put_user(host_ip->key,  _ip->key);
+}
+
-- 
2.42.0




[PATCH v5 14/23] bsd-user: Implement msync(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Co-authored-by: Kyle Evans 
Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 11 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 15 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 0f9e4a1d4b..5e885823a7 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -88,4 +88,15 @@ static inline abi_long do_bsd_mprotect(abi_long arg1, 
abi_long arg2,
 return get_errno(target_mprotect(arg1, arg2, arg3));
 }
 
+/* msync(2) */
+static inline abi_long do_bsd_msync(abi_long addr, abi_long len, abi_long 
flags)
+{
+if (!guest_range_valid_untagged(addr, len)) {
+/* It seems odd, but POSIX wants this to be ENOMEM */
+return -TARGET_ENOMEM;
+}
+
+return get_errno(msync(g2h_untagged(addr), len, flags));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 74c0624637..5aebb18805 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -607,6 +607,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_mprotect(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_msync: /* msync(2) */
+ret = do_bsd_msync(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 23/23] bsd-user: Add stubs for vadvise(), sbrk() and sstk()

2023-09-14 Thread Karim Taha
From: Warner Losh 

The above system calls are not supported by qemu.

Signed-off-by: Warner Losh 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 18 ++
 bsd-user/freebsd/os-syscall.c | 12 
 2 files changed, 30 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index c512a4e375..c3e72e3b86 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -431,4 +431,22 @@ static inline abi_long do_bsd_shmdt(abi_ulong shmaddr)
 return ret;
 }
 
+static inline abi_long do_bsd_vadvise(void)
+{
+/* See sys_ovadvise() in vm_unix.c */
+return -TARGET_EINVAL;
+}
+
+static inline abi_long do_bsd_sbrk(void)
+{
+/* see sys_sbrk() in vm_mmap.c */
+return -TARGET_EOPNOTSUPP;
+}
+
+static inline abi_long do_bsd_sstk(void)
+{
+/* see sys_sstk() in vm_mmap.c */
+return -TARGET_EOPNOTSUPP;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 6b32d4df68..ce2a6bc29e 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -675,6 +675,18 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmdt(arg1);
 break;
 
+case TARGET_FREEBSD_NR_freebsd11_vadvise:
+ret = do_bsd_vadvise();
+break;
+
+case TARGET_FREEBSD_NR_sbrk:
+ret = do_bsd_sbrk();
+break;
+
+case TARGET_FREEBSD_NR_sstk:
+ret = do_bsd_sstk();
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v5 16/23] bsd-user: Implment madvise(2) to match the linux-user implementation.

2023-09-14 Thread Karim Taha
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 53 +++
 bsd-user/freebsd/os-syscall.c |  4 +++
 bsd-user/syscall_defs.h   |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 16c22593bf..b00ab3aed8 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -129,6 +129,59 @@ static inline abi_long do_bsd_munlockall(void)
 return get_errno(munlockall());
 }
 
+/* madvise(2) */
+static inline abi_long do_bsd_madvise(abi_long arg1, abi_long arg2,
+abi_long arg3)
+{
+abi_ulong len;
+int ret = 0;
+abi_long start = arg1;
+abi_long len_in = arg2;
+abi_long advice = arg3;
+
+if (start & ~TARGET_PAGE_MASK) {
+return -TARGET_EINVAL;
+}
+if (len_in == 0) {
+return 0;
+}
+len = TARGET_PAGE_ALIGN(len_in);
+if (len == 0 || !guest_range_valid_untagged(start, len)) {
+return -TARGET_EINVAL;
+}
+
+/*
+ * Most advice values are hints, so ignoring and returning success is ok.
+ *
+ * However, some advice values such as MADV_DONTNEED, are not hints and
+ * need to be emulated.
+ *
+ * A straight passthrough for those may not be safe because qemu sometimes
+ * turns private file-backed mappings into anonymous mappings.
+ * If all guest pages have PAGE_PASSTHROUGH set, mappings have the
+ * same semantics for the host as for the guest.
+ *
+ * MADV_DONTNEED is passed through, if possible.
+ * If passthrough isn't possible, we nevertheless (wrongly!) return
+ * success, which is broken but some userspace programs fail to work
+ * otherwise. Completely implementing such emulation is quite complicated
+ * though.
+ */
+mmap_lock();
+switch (advice) {
+case MADV_DONTNEED:
+if (page_check_range(start, len, PAGE_PASSTHROUGH)) {
+ret = get_errno(madvise(g2h_untagged(start), len, advice));
+if (ret == 0) {
+page_reset_target_data(start, start + len - 1);
+}
+}
+}
+mmap_unlock();
+
+return ret;
+}
+
 /* minherit(2) */
 static inline abi_long do_bsd_minherit(abi_long addr, abi_long len,
 abi_long inherit)
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 553578708b..600d048120 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -627,6 +627,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_munlockall();
 break;
 
+case TARGET_FREEBSD_NR_madvise: /* madvise(2) */
+ret = do_bsd_madvise(arg1, arg2, arg3);
+break;
+
 case TARGET_FREEBSD_NR_minherit: /* minherit(2) */
 ret = do_bsd_minherit(arg1, arg2, arg3);
 break;
diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index f4a5ae2a12..929b155b10 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -95,6 +95,8 @@ struct bsd_shm_regions {
 /*
  *  sys/mman.h
  */
+#define TARGET_MADV_DONTNEED4   /* dont need these pages */
+
 #define TARGET_FREEBSD_MAP_RESERVED0080 0x0080  /* previously misimplemented */
 /* MAP_INHERIT */
 #define TARGET_FREEBSD_MAP_RESERVED0100 0x0100  /* previously unimplemented */
-- 
2.42.0




[PATCH v5 05/23] bsd-user: Implement shm_open2(2) system call

2023-09-14 Thread Karim Taha
Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-misc.h| 42 +++
 bsd-user/freebsd/os-syscall.c | 10 +
 2 files changed, 52 insertions(+)

diff --git a/bsd-user/freebsd/os-misc.h b/bsd-user/freebsd/os-misc.h
index 8436ccb2f7..6b424b7078 100644
--- a/bsd-user/freebsd/os-misc.h
+++ b/bsd-user/freebsd/os-misc.h
@@ -24,5 +24,47 @@
 #include 
 #include 
 
+int shm_open2(const char *path, int flags, mode_t mode, int shmflags,
+const char *);
+
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
+/* shm_open2(2) */
+static inline abi_long do_freebsd_shm_open2(abi_ulong pathptr, abi_ulong flags,
+abi_long mode, abi_ulong shmflags, abi_ulong nameptr)
+{
+int ret;
+void *uname, *upath;
+
+if (pathptr == (uintptr_t)SHM_ANON) {
+upath = SHM_ANON;
+} else {
+upath = lock_user_string(pathptr);
+if (upath == NULL) {
+return -TARGET_EFAULT;
+}
+}
+
+uname = NULL;
+if (nameptr != 0) {
+uname = lock_user_string(nameptr);
+if (uname == NULL) {
+unlock_user(upath, pathptr, 0);
+return -TARGET_EFAULT;
+}
+}
+ret = get_errno(shm_open2(upath,
+target_to_host_bitmask(flags, fcntl_flags_tbl), mode,
+target_to_host_bitmask(shmflags, shmflag_flags_tbl), uname));
+
+if (upath != SHM_ANON) {
+unlock_user(upath, pathptr, 0);
+}
+if (uname != NULL) {
+unlock_user(uname, nameptr, 0);
+}
+return ret;
+}
+#endif /* __FreeBSD_version >= 1300048 */
+
 
 #endif /* OS_MISC_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index fa60df529e..74146d8c72 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -33,11 +33,13 @@
 #include "signal-common.h"
 #include "user/syscall-trace.h"
 
+/* BSD independent syscall shims */
 #include "bsd-file.h"
 #include "bsd-proc.h"
 
 /* *BSD dependent syscall shims */
 #include "os-stat.h"
+#include "os-misc.h"
 
 /* I/O */
 safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
@@ -592,6 +594,14 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_freebsd_fcntl(arg1, arg2, arg3);
 break;
 
+/*
+ * Memory management system calls.
+ */
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
+case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
+ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
+break;
+#endif
 
 /*
  * sys{ctl, arch, call}
-- 
2.42.0




[PATCH v5 15/23] bsd-user: Implement mlock(2), munlock(2), mlockall(2), munlockall(2), minherit(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 37 +++
 bsd-user/freebsd/os-syscall.c | 20 +++
 2 files changed, 57 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 5e885823a7..16c22593bf 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -99,4 +99,41 @@ static inline abi_long do_bsd_msync(abi_long addr, abi_long 
len, abi_long flags)
 return get_errno(msync(g2h_untagged(addr), len, flags));
 }
 
+/* mlock(2) */
+static inline abi_long do_bsd_mlock(abi_long arg1, abi_long arg2)
+{
+if (!guest_range_valid_untagged(arg1, arg2)) {
+return -TARGET_EINVAL;
+}
+return get_errno(mlock(g2h_untagged(arg1), arg2));
+}
+
+/* munlock(2) */
+static inline abi_long do_bsd_munlock(abi_long arg1, abi_long arg2)
+{
+if (!guest_range_valid_untagged(arg1, arg2)) {
+return -TARGET_EINVAL;
+}
+return get_errno(munlock(g2h_untagged(arg1), arg2));
+}
+
+/* mlockall(2) */
+static inline abi_long do_bsd_mlockall(abi_long arg1)
+{
+return get_errno(mlockall(arg1));
+}
+
+/* munlockall(2) */
+static inline abi_long do_bsd_munlockall(void)
+{
+return get_errno(munlockall());
+}
+
+/* minherit(2) */
+static inline abi_long do_bsd_minherit(abi_long addr, abi_long len,
+abi_long inherit)
+{
+return get_errno(minherit(g2h_untagged(addr), len, inherit));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 5aebb18805..553578708b 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -611,6 +611,26 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_msync(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_mlock: /* mlock(2) */
+ret = do_bsd_mlock(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_munlock: /* munlock(2) */
+ret = do_bsd_munlock(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_mlockall: /* mlockall(2) */
+ret = do_bsd_mlockall(arg1);
+break;
+
+case TARGET_FREEBSD_NR_munlockall: /* munlockall(2) */
+ret = do_bsd_munlockall();
+break;
+
+case TARGET_FREEBSD_NR_minherit: /* minherit(2) */
+ret = do_bsd_minherit(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v5 04/23] bsd-user: Introduce freebsd/os-misc.h to the source tree

2023-09-14 Thread Karim Taha
From: Stacey Son 

To preserve the copyright notice and help with the 'Author' info for
subsequent changes to the file.

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/freebsd/os-misc.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 bsd-user/freebsd/os-misc.h

diff --git a/bsd-user/freebsd/os-misc.h b/bsd-user/freebsd/os-misc.h
new file mode 100644
index 00..8436ccb2f7
--- /dev/null
+++ b/bsd-user/freebsd/os-misc.h
@@ -0,0 +1,28 @@
+/*
+ *  miscellaneous FreeBSD system call shims
+ *
+ *  Copyright (c) 2013-14 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef OS_MISC_H
+#define OS_MISC_H
+
+#include 
+#include 
+#include 
+
+
+#endif /* OS_MISC_H */
-- 
2.42.0




[PATCH v5 03/23] bsd-user: Declarations for ipc_perm and shmid_ds conversion functions

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/qemu-bsd.h | 45 +
 1 file changed, 45 insertions(+)
 create mode 100644 bsd-user/qemu-bsd.h

diff --git a/bsd-user/qemu-bsd.h b/bsd-user/qemu-bsd.h
new file mode 100644
index 00..46572ece7d
--- /dev/null
+++ b/bsd-user/qemu-bsd.h
@@ -0,0 +1,45 @@
+/*
+ *  BSD conversion extern declarations
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef QEMU_BSD_H
+#define QEMU_BSD_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* bsd-mem.c */
+void target_to_host_ipc_perm__locked(struct ipc_perm *host_ip,
+struct target_ipc_perm *target_ip);
+void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
+struct ipc_perm *host_ip);
+abi_long target_to_host_shmid_ds(struct shmid_ds *host_sd,
+abi_ulong target_addr);
+abi_long host_to_target_shmid_ds(abi_ulong target_addr,
+struct shmid_ds *host_sd);
+
+#endif /* QEMU_BSD_H */
-- 
2.42.0




[PATCH v5 06/23] bsd-user: Implement shm_rename(2) system call

2023-09-14 Thread Karim Taha
From: Kyle Evans 

Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/freebsd/os-misc.h| 24 
 bsd-user/freebsd/os-syscall.c |  6 ++
 2 files changed, 30 insertions(+)

diff --git a/bsd-user/freebsd/os-misc.h b/bsd-user/freebsd/os-misc.h
index 6b424b7078..67e450fe7c 100644
--- a/bsd-user/freebsd/os-misc.h
+++ b/bsd-user/freebsd/os-misc.h
@@ -66,5 +66,29 @@ static inline abi_long do_freebsd_shm_open2(abi_ulong 
pathptr, abi_ulong flags,
 }
 #endif /* __FreeBSD_version >= 1300048 */
 
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300049
+/* shm_rename(2) */
+static inline abi_long do_freebsd_shm_rename(abi_ulong fromptr, abi_ulong 
toptr,
+abi_ulong flags)
+{
+int ret;
+void *ufrom, *uto;
+
+ufrom = lock_user_string(fromptr);
+if (ufrom == NULL) {
+return -TARGET_EFAULT;
+}
+uto = lock_user_string(toptr);
+if (uto == NULL) {
+unlock_user(ufrom, fromptr, 0);
+return -TARGET_EFAULT;
+}
+ret = get_errno(shm_rename(ufrom, uto, flags));
+unlock_user(ufrom, fromptr, 0);
+unlock_user(uto, toptr, 0);
+
+return ret;
+}
+#endif /* __FreeBSD_version >= 1300049 */
 
 #endif /* OS_MISC_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 74146d8c72..ae92a2314c 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -603,6 +603,12 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 break;
 #endif
 
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300049
+case TARGET_FREEBSD_NR_shm_rename: /* shm_rename(2) */
+ret = do_freebsd_shm_rename(arg1, arg2, arg3);
+break;
+#endif
+
 /*
  * sys{ctl, arch, call}
  */
-- 
2.42.0




[PATCH v5 21/23] bsd-user: Implement shmctl(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 39 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 43 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index c362cc07a3..b82f3eaa25 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -305,4 +305,43 @@ static inline abi_long do_bsd_shmget(abi_long arg1, 
abi_ulong arg2,
 return get_errno(shmget(arg1, arg2, arg3));
 }
 
+/* shmctl(2) */
+static inline abi_long do_bsd_shmctl(abi_long shmid, abi_long cmd,
+abi_ulong buff)
+{
+struct shmid_ds dsarg;
+abi_long ret = -TARGET_EINVAL;
+
+cmd &= 0xff;
+
+switch (cmd) {
+case IPC_STAT:
+if (target_to_host_shmid_ds(, buff)) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shmctl(shmid, cmd, ));
+if (host_to_target_shmid_ds(buff, )) {
+return -TARGET_EFAULT;
+}
+break;
+
+case IPC_SET:
+if (target_to_host_shmid_ds(, buff)) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shmctl(shmid, cmd, ));
+break;
+
+case IPC_RMID:
+ret = get_errno(shmctl(shmid, cmd, NULL));
+break;
+
+default:
+ret = -TARGET_EINVAL;
+break;
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index f0ccd787e5..664b8de104 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -663,6 +663,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmget(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_shmctl: /* shmctl(2) */
+ret = do_bsd_shmctl(arg1, arg2, arg3);
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v5 00/23] bsd-user: Implement mmap related system calls for FreeBSD.

2023-09-14 Thread Karim Taha
Upstream the implementation of the following mmap system calls, from the
qemu-bsd-user fork:
   mmap(2), munmap(2),
   mprotect(2),
   msync(2),
   mlock(2), munlock(2), mlockall(2), munlockall(2), mincore(2),
   madvise(2),
   minherit(2),
   shm_open(2),shm_open2(2), shm_rename2(2), shm_unlink(2), shmget(2), 
shmctl(2), shmat(2),
   shmdt(2)
   brk(2)

Karim Taha (3):
  bsd-user: Implement shm_open2(2) system call
  bsd-user: Add bsd-mem.c to meson.build
  bsd-user: Implment madvise(2) to match the linux-user implementation.

Kyle Evans (1):
  bsd-user: Implement shm_rename(2) system call

Stacey Son (18):
  bsd-user: Implement struct target_ipc_perm
  bsd-user: Implement struct target_shmid_ds
  bsd-user: Declarations for ipc_perm and shmid_ds conversion functions
  bsd-user: Introduce freebsd/os-misc.h to the source tree
  bsd-user: Implement target_set_brk function in bsd-mem.c instead of
os-syscall.c
  bsd-user: Implement ipc_perm conversion between host and target.
  bsd-user: Implement shmid_ds conversion between host and target.
  bsd-user: Introduce bsd-mem.h to the source tree
  bsd-user: Implement mmap(2) and munmap(2)
  bsd-user: Implement mprotect(2)
  bsd-user: Implement msync(2)
  bsd-user: Implement mlock(2), munlock(2), mlockall(2), munlockall(2),
minherit(2)
  bsd-user: Implement mincore(2)
  bsd-user: Implement do_obreak function
  bsd-user: Implement shm_open(2)
  bsd-user: Implement shm_unlink(2) and shmget(2)
  bsd-user: Implement shmctl(2)
  bsd-user: Implement shmat(2) and shmdt(2)

Warner Losh (1):
  bsd-user: Add stubs for vadvise(), sbrk() and sstk()

 bsd-user/bsd-mem.c| 104 
 bsd-user/bsd-mem.h| 452 ++
 bsd-user/freebsd/os-misc.h|  94 +++
 bsd-user/freebsd/os-syscall.c | 109 +++-
 bsd-user/meson.build  |   1 +
 bsd-user/mmap.c   |   2 +-
 bsd-user/qemu-bsd.h   |  45 
 bsd-user/qemu.h   |   1 +
 bsd-user/syscall_defs.h   |  39 +++
 9 files changed, 842 insertions(+), 5 deletions(-)
 create mode 100644 bsd-user/bsd-mem.c
 create mode 100644 bsd-user/bsd-mem.h
 create mode 100644 bsd-user/freebsd/os-misc.h
 create mode 100644 bsd-user/qemu-bsd.h

-- 
2.42.0




[PATCH v5 07/23] bsd-user: Add bsd-mem.c to meson.build

2023-09-14 Thread Karim Taha
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/bsd-mem.c   | 0
 bsd-user/meson.build | 1 +
 2 files changed, 1 insertion(+)
 create mode 100644 bsd-user/bsd-mem.c

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
new file mode 100644
index 00..e69de29bb2
diff --git a/bsd-user/meson.build b/bsd-user/meson.build
index 5243122fc5..6ee68fdfe7 100644
--- a/bsd-user/meson.build
+++ b/bsd-user/meson.build
@@ -7,6 +7,7 @@ bsd_user_ss = ss.source_set()
 common_user_inc += include_directories('include')
 
 bsd_user_ss.add(files(
+  'bsd-mem.c',
   'bsdload.c',
   'elfload.c',
   'main.c',
-- 
2.42.0




[PATCH v5 18/23] bsd-user: Implement do_obreak function

2023-09-14 Thread Karim Taha
From: Stacey Son 

Match linux-user, by manually applying the following commits, in order:

d28b3c90cfad1a7e211ae2bce36ecb9071086129   linux-user: Make sure initial brk(0) 
is page-aligned
15ad98536ad9410fb32ddf1ff09389b677643faa   linux-user: Fix qemu brk() to not 
zero bytes on current page
dfe49864afb06e7e452a4366051697bc4fcfc1a5   linux-user: Prohibit brk() to to 
shrink below initial heap address
eac78a4b0b7da4de2c0a297f4d528ca9cc6256a3   linux-user: Fix signed math overflow 
in brk() syscall
c6cc059eca18d9f6e4e26bb8b6d1135ddb35d81a   linux-user: Do not call get_errno() 
in do_brk()
e69e032d1a8ee8d754ca119009a3c2c997f8bb30   linux-user: Use MAP_FIXED_NOREPLACE 
for do_brk()
cb9d5d1fda0bc2312fc0c779b4ea1d7bf826f31f   linux-user: Do nothing if too small 
brk is specified
2aea137a425a87b930a33590177b04368fd7cc12   linux-user: Do not align brk with 
host page size

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 45 +++
 bsd-user/freebsd/os-syscall.c |  7 ++
 2 files changed, 52 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 0c8d96d9a4..b296c5c6f0 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -212,4 +212,49 @@ static inline abi_long do_bsd_mincore(abi_ulong 
target_addr, abi_ulong len,
 return ret;
 }
 
+/* do_brk() must return target values and target errnos. */
+static inline abi_long do_obreak(abi_ulong brk_val)
+{
+abi_long mapped_addr;
+abi_ulong new_brk;
+abi_ulong old_brk;
+
+/* brk pointers are always untagged */
+
+/* do not allow to shrink below initial brk value */
+if (brk_val < initial_target_brk) {
+return target_brk;
+}
+
+new_brk = TARGET_PAGE_ALIGN(brk_val);
+old_brk = TARGET_PAGE_ALIGN(target_brk);
+
+/* new and old target_brk might be on the same page */
+if (new_brk == old_brk) {
+target_brk = brk_val;
+return target_brk;
+}
+
+/* Release heap if necesary */
+if (new_brk < old_brk) {
+target_munmap(new_brk, old_brk - new_brk);
+
+target_brk = brk_val;
+return target_brk;
+}
+
+mapped_addr = target_mmap(old_brk, new_brk - old_brk,
+  PROT_READ | PROT_WRITE,
+  MAP_FIXED | MAP_EXCL | MAP_ANON | MAP_PRIVATE,
+  -1, 0);
+
+if (mapped_addr == old_brk) {
+target_brk = brk_val;
+return target_brk;
+}
+
+/* For everything else, return the previous break. */
+return target_brk;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 8ba5fcc6ca..5cd60fc272 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -651,6 +651,13 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 break;
 #endif
 
+/*
+ * Misc
+ */
+case TARGET_FREEBSD_NR_break:
+ret = do_obreak(arg1);
+break;
+
 /*
  * sys{ctl, arch, call}
  */
-- 
2.42.0




[PATCH v5 02/23] bsd-user: Implement struct target_shmid_ds

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/syscall_defs.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 4deb4fed35..f4a5ae2a12 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -72,6 +72,26 @@ struct target_ipc_perm {
 #define TARGET_IPC_SET  1   /* set options */
 #define TARGET_IPC_STAT 2   /* get options */
 
+/*
+ * sys/shm.h
+ */
+struct target_shmid_ds {
+struct  target_ipc_perm shm_perm; /* peration permission structure */
+abi_ulong   shm_segsz;  /* size of segment in bytes */
+int32_t shm_lpid;   /* process ID of last shared memory op */
+int32_t shm_cpid;   /* process ID of creator */
+int32_t shm_nattch; /* number of current attaches */
+target_time_t shm_atime;  /* time of last shmat() */
+target_time_t shm_dtime;  /* time of last shmdt() */
+target_time_t shm_ctime;  /* time of last change by shmctl() */
+};
+
+#define N_BSD_SHM_REGIONS   32
+struct bsd_shm_regions {
+abi_long start;
+abi_long size;
+};
+
 /*
  *  sys/mman.h
  */
-- 
2.42.0




[PATCH v5 01/23] bsd-user: Implement struct target_ipc_perm

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/syscall_defs.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 9c90616baa..4deb4fed35 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -55,6 +55,23 @@ struct target_iovec {
 abi_long iov_len;   /* Number of bytes */
 };
 
+/*
+ * sys/ipc.h
+ */
+struct target_ipc_perm {
+uint32_tcuid;   /* creator user id */
+uint32_tcgid;   /* creator group id */
+uint32_tuid;/* user id */
+uint32_tgid;/* group id */
+uint16_tmode;   /* r/w permission */
+uint16_tseq;/* sequence # */
+abi_longkey;/* user specified msg/sem/shm key */
+};
+
+#define TARGET_IPC_RMID 0   /* remove identifier */
+#define TARGET_IPC_SET  1   /* set options */
+#define TARGET_IPC_STAT 2   /* get options */
+
 /*
  *  sys/mman.h
  */
-- 
2.42.0




[PATCH] pc-bios/canyonlands.dts: Fix some DeviceTree warnings

2023-09-14 Thread Philippe Mathieu-Daudé
canyonlands.dts was imported in 2018, in commit 4b387f9ee1
("ppc: Add aCube Sam460ex board"). The file content is based
on Linux file arch/powerpc/boot/dts/canyonlands.dts from
commit 5edc2aae16bc. Then Linux added 2 commits on top:
- 86bc917d2ac1 ("powerpc/boot/dts: Fix dtc "pciex" warnings")
- eca213152a36 ("powerpc/4xx: Complete removal of MSI support")

Backport the same commits in order to fix some of the following
warnings which started to appear since commit 6e0dc9d2a8 ("meson:
compile bundled device trees"):

  [7831/8926] Generating pc-bios/canyonlands.dts with a custom command
  pc-bios/canyonlands.dts:47.9-50.4: Warning (unit_address_vs_reg): /memory: 
node has a reg or ranges property, but no unit name
  pc-bios/canyonlands.dts:210.13-429.5: Warning (unit_address_vs_reg): 
/plb/opb: node has a reg or ranges property, but no unit name
  pc-bios/canyonlands.dts:464.26-504.5: Warning (pci_bridge): 
/plb/pciex@d: node name is not "pci" or "pcie"
  pc-bios/canyonlands.dts:506.26-546.5: Warning (pci_bridge): 
/plb/pciex@d2000: node name is not "pci" or "pcie"
  pc-bios/canyonlands.dtb: Warning (unit_address_format): Failed prerequisite 
'pci_bridge'
  pc-bios/canyonlands.dtb: Warning (pci_device_reg): Failed prerequisite 
'pci_bridge'
  pc-bios/canyonlands.dtb: Warning (pci_device_bus_num): Failed prerequisite 
'pci_bridge'
  pc-bios/canyonlands.dts:268.14-289.7: Warning (avoid_unnecessary_addr_size): 
/plb/opb/ebc/ndfc@3,0: unnecessary #address-cells/#size-cells without "ranges" 
or child "reg" property

Signed-off-by: Philippe Mathieu-Daudé 
---
 pc-bios/canyonlands.dts | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/pc-bios/canyonlands.dts b/pc-bios/canyonlands.dts
index 0d6ac92d0f..5db1bff6b2 100644
--- a/pc-bios/canyonlands.dts
+++ b/pc-bios/canyonlands.dts
@@ -461,7 +461,7 @@
interrupt-map = < 0x0 0x0 0x0 0x0  0x0 0x8 >;
};
 
-   PCIE0: pciex@d {
+   PCIE0: pcie@d {
device_type = "pci";
#interrupt-cells = <1>;
#size-cells = <2>;
@@ -503,7 +503,7 @@
0x0 0x0 0x0 0x4  0xf 0x4 /* swizzled int D 
*/>;
};
 
-   PCIE1: pciex@d2000 {
+   PCIE1: pcie@d2000 {
device_type = "pci";
#interrupt-cells = <1>;
#size-cells = <2>;
@@ -544,23 +544,5 @@
0x0 0x0 0x0 0x3  0x12 0x4 /* swizzled int 
C */
0x0 0x0 0x0 0x4  0x13 0x4 /* swizzled int 
D */>;
};
-
-   MSI: ppc4xx-msi@C1000 {
-   compatible = "amcc,ppc4xx-msi", "ppc4xx-msi";
-   reg = < 0xC 0x1000 0x100>;
-   sdr-base = <0x36C>;
-   msi-data = <0x>;
-   msi-mask = <0x>;
-   interrupt-count = <3>;
-   interrupts = <0 1 2 3>;
-   interrupt-parent = <>;
-   #interrupt-cells = <1>;
-   #address-cells = <0>;
-   #size-cells = <0>;
-   interrupt-map = <0  0x18 1
-   1  0x19 1
-   2  0x1A 1
-   3  0x1B 1>;
-   };
};
 };
-- 
2.41.0




Re: [PATCH v3 5/5] vfio-user: Fix config space access byte order

2023-09-14 Thread Philippe Mathieu-Daudé

On 7/9/23 15:04, Mattias Nissler wrote:

PCI config space is little-endian, so on a big-endian host we need to
perform byte swaps for values as they are passed to and received from
the generic PCI config space access machinery.

Signed-off-by: Mattias Nissler 
---
  hw/remote/vfio-user-obj.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index cee5e615a9..d38b4700f3 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
char * const buf,
  while (bytes > 0) {
  len = (bytes > pci_access_width) ? pci_access_width : bytes;
  if (is_write) {
-memcpy(, ptr, len);
+val = ldn_le_p(ptr, len);
  pci_host_config_write_common(o->pci_dev, offset,
   pci_config_size(o->pci_dev),
   val, len);
@@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
char * const buf,
  } else {
  val = pci_host_config_read_common(o->pci_dev, offset,
pci_config_size(o->pci_dev), 
len);
-memcpy(ptr, , len);
+stn_le_p(ptr, len, val);
  trace_vfu_cfg_read(offset, val);
  }
  offset += len;


This makes sense,

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 5/5] vfio-user: Fix config space access byte order

2023-09-14 Thread Stefan Hajnoczi
On Thu, Sep 07, 2023 at 06:04:10AM -0700, Mattias Nissler wrote:
> PCI config space is little-endian, so on a big-endian host we need to
> perform byte swaps for values as they are passed to and received from
> the generic PCI config space access machinery.
> 
> Signed-off-by: Mattias Nissler 
> ---
>  hw/remote/vfio-user-obj.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

After some discussion about PCI Configuration Space endianness on IRC
with aw, mcayland, and f4bug I am now happy with this patch:

1. Configuration space can only be accessed in 1-, 2-, or 4-byte
   accesses.
2. If it's a 2- or 4-byte access then your patch adds the missing
   little-endian conversion.
3. If it's a 1-byte access then there is (effectively) no byteswap in
   the code path and the pci_dev->config[] array is already
   little-endian.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-14 Thread “William Roche
From: William Roche 

A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page.
This situation creates a hole in the VM memory address space that the VM kernel
knows about (an unreadable page or set of pages).

But the migration of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.

In order to correct this problem, I suggest to treat the poisoned pages as if
they were zero-pages for the migration copy.
This fix also works with underlying large pages, taking into account the
RAMBlock segment "page-size".
This fix is scripts/checkpatch.pl clean.

v2:
  - adding compressed transfer handling of poisoned pages
 
Testing: I could verify that migration now works with a poisoned page
through standard and compressed migration with 4k and large (2M) pages.

The RDMA transfer is not considered by this patch.

William Roche (1):
  migration: skip poisoned memory pages on "ram saving" phase

 accel/kvm/kvm-all.c  | 14 ++
 accel/stubs/kvm-stub.c   |  5 +
 include/sysemu/kvm.h | 10 ++
 migration/ram-compress.c |  3 ++-
 migration/ram.c  | 23 +--
 migration/ram.h  |  2 ++
 6 files changed, 54 insertions(+), 3 deletions(-)

-- 
2.39.3




[PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-14 Thread “William Roche
From: William Roche 

A memory page poisoned from the hypervisor level is no longer readable.
Thus, it is now treated as a zero-page for the ram saving migration phase.

The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page_to_file
#5  save_zero_page
#6  ram_save_target_page_legacy
#7  ram_save_host_page
#8  ram_find_and_save_block
#9  ram_save_iterate
#10 qemu_savevm_state_iterate
#11 migration_iteration_run
#12 migration_thread
#13 qemu_thread_start

Fix it by considering poisoned pages as if they were zero-pages for
the migration copy. This fix also works with underlying large pages,
taking into account the RAMBlock segment "page-size".

Standard migration and compressed transfers are handled by this code.
RDMA transfer isn't touched.

Signed-off-by: William Roche 
---
 accel/kvm/kvm-all.c  | 14 ++
 accel/stubs/kvm-stub.c   |  5 +
 include/sysemu/kvm.h | 10 ++
 migration/ram-compress.c |  3 ++-
 migration/ram.c  | 23 +--
 migration/ram.h  |  2 ++
 6 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ff1578bb32..7fb13c8a56 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1152,6 +1152,20 @@ static void kvm_unpoison_all(void *param)
 }
 }
 
+bool kvm_hwpoisoned_page(RAMBlock *block, void *offset)
+{
+HWPoisonPage *pg;
+ram_addr_t ram_addr = (ram_addr_t) offset;
+
+QLIST_FOREACH(pg, _page_list, list) {
+if ((ram_addr >= pg->ram_addr) &&
+(ram_addr - pg->ram_addr < block->page_size)) {
+return true;
+}
+}
+return false;
+}
+
 void kvm_hwpoison_page_add(ram_addr_t ram_addr)
 {
 HWPoisonPage *page;
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 235dc661bc..c0a31611df 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -133,3 +133,8 @@ uint32_t kvm_dirty_ring_size(void)
 {
 return 0;
 }
+
+bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr)
+{
+return false;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index ee9025f8e9..858688227a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -570,4 +570,14 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+/**
+ * kvm_hwpoisoned_page - indicate if the given page is poisoned
+ * @block: memory block of the given page
+ * @ram_addr: offset of the page
+ *
+ * Returns: true: page is poisoned
+ *  false: page not yet poisoned
+ */
+bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr);
 #endif
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 06254d8c69..1916ce709d 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -34,6 +34,7 @@
 #include "qemu/error-report.h"
 #include "migration.h"
 #include "options.h"
+#include "ram.h"
 #include "io/channel-null.h"
 #include "exec/target_page.h"
 #include "exec/ramblock.h"
@@ -198,7 +199,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 
 assert(qemu_file_buffer_empty(f));
 
-if (buffer_is_zero(p, page_size)) {
+if (migration_buffer_is_zero(block, offset, page_size)) {
 return RES_ZEROPAGE;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..fd337f7e65 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1129,6 +1129,26 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
 }
 
+/**
+ * migration_buffer_is_zero: indicate if the page at the given
+ * location is entirely filled with zero, or is a poisoned page.
+ *
+ * @block: block that contains the page
+ * @offset: offset inside the block for the page
+ * @len: size to consider
+ */
+bool migration_buffer_is_zero(RAMBlock *block, ram_addr_t offset,
+ size_t len)
+{
+uint8_t *p = block->host + offset;
+
+if (kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) {
+return true;
+}
+
+return buffer_is_zero(p, len);
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -1142,10 +1162,9 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
   RAMBlock *block, ram_addr_t offset)
 {
-uint8_t *p = block->host + offset;
 int len = 0;
 
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+if (migration_buffer_is_zero(block, offset, TARGET_PAGE_SIZE)) {
 len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
 qemu_put_byte(file, 0);
 len += 1;
diff --git 

[RFC PATCH 4/8] i386/sev: Replace UPDATE_DATA ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
UPDATE_DATA takes the VM's file descriptor, a guest memory region to
be encrypted, as well as the size of the aforementioned guest memory
region.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 31 ++-
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 49be072cbc..615021a1a3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -715,29 +715,6 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 return 0;
 }
 
-static int
-sev_launch_update_data(SevGuestState *sev, uint8_t *addr, uint64_t len)
-{
-int ret, fw_error;
-struct kvm_sev_launch_update_data update;
-
-if (!addr || !len) {
-return 1;
-}
-
-update.uaddr = (__u64)(unsigned long)addr;
-update.len = len;
-trace_kvm_sev_launch_update_data(addr, len);
-ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_UPDATE_DATA,
-, _error);
-if (ret) {
-error_report("%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
-__func__, ret, fw_error, fw_error_to_str(fw_error));
-}
-
-return ret;
-}
-
 static int
 sev_launch_update_vmsa(SevGuestState *sev)
 {
@@ -1009,15 +986,19 @@ out:
 int
 sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp)
 {
+KVMState *s = kvm_state;
+int fw_error;
+
 if (!sev_guest) {
 return 0;
 }
 
 /* if SEV is in update state then encrypt the data else do nothing */
 if (sev_check_state(sev_guest, SEV_STATE_LAUNCH_UPDATE)) {
-int ret = sev_launch_update_data(sev_guest, ptr, len);
+int ret = sev_launch_update_data(s->vmfd, (__u64) ptr, len, _error);
 if (ret < 0) {
-error_setg(errp, "SEV: Failed to encrypt pflash rom");
+error_setg(errp, "SEV: Failed to encrypt pflash rom fw_err=%d",
+   fw_error);
 return ret;
 }
 }
-- 
2.40.1




[RFC PATCH 8/8] i386/sev: Replace LAUNCH_FINISH ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
The LAUNCH_FINISH ioctl finishes the guest launch flow and transitions
the guest into a state ready to be run.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 38 --
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index a4510b5437..e52dcc67c3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -785,35 +785,29 @@ static Notifier sev_machine_done_notify = {
 .notify = sev_launch_get_measure,
 };
 
-static void
-sev_launch_finish(SevGuestState *sev)
-{
-int ret, error;
-
-trace_kvm_sev_launch_finish();
-ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_FINISH, 0, );
-if (ret) {
-error_report("%s: LAUNCH_FINISH ret=%d fw_error=%d '%s'",
- __func__, ret, error, fw_error_to_str(error));
-exit(1);
-}
-
-sev_set_guest_state(sev, SEV_STATE_RUNNING);
-
-/* add migration blocker */
-error_setg(_mig_blocker,
-   "SEV: Migration is not implemented");
-migrate_add_blocker(sev_mig_blocker, _fatal);
-}
-
 static void
 sev_vm_state_change(void *opaque, bool running, RunState state)
 {
 SevGuestState *sev = opaque;
+int ret, fw_error;
+KVMState *s = kvm_state;
 
 if (running) {
 if (!sev_check_state(sev, SEV_STATE_RUNNING)) {
-sev_launch_finish(sev);
+trace_kvm_sev_launch_finish();
+ret = sev_launch_finish(s->vmfd, _error);
+if (ret) {
+error_report("%s: LAUNCH_FINISH ret=%d fw_error=%d '%s'",
+ __func__, ret, fw_error,
+ fw_error_to_str(fw_error));
+exit(1);
+}
+
+sev_set_guest_state(sev, SEV_STATE_RUNNING);
+
+// add migration blocker.
+error_setg(_mig_blocker, "SEV: Migration is not implemented");
+migrate_add_blocker(sev_mig_blocker, _fatal);
 }
 }
 }
-- 
2.40.1




[RFC PATCH 7/8] i386/sev: Replace LAUNCH_SECRET ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
The LAUNCH_SECRET API can inject a secret into the VM once the
measurement has been retrieved.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 105 --
 target/i386/sev.h |   2 -
 2 files changed, 36 insertions(+), 71 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index f53ff140e3..a4510b5437 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -983,88 +983,44 @@ sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error 
**errp)
 return 0;
 }
 
-int sev_inject_launch_secret(const char *packet_hdr, const char *secret,
- uint64_t gpa, Error **errp)
-{
-struct kvm_sev_launch_secret input;
-g_autofree guchar *data = NULL, *hdr = NULL;
-int error, ret = 1;
-void *hva;
-gsize hdr_sz = 0, data_sz = 0;
-MemoryRegion *mr = NULL;
-
-if (!sev_guest) {
-error_setg(errp, "SEV not enabled for guest");
-return 1;
-}
-
-/* secret can be injected only in this state */
-if (!sev_check_state(sev_guest, SEV_STATE_LAUNCH_SECRET)) {
-error_setg(errp, "SEV: Not in correct state. (LSECRET) %x",
- sev_guest->state);
-return 1;
-}
-
-hdr = g_base64_decode(packet_hdr, _sz);
-if (!hdr || !hdr_sz) {
-error_setg(errp, "SEV: Failed to decode sequence header");
-return 1;
-}
-
-data = g_base64_decode(secret, _sz);
-if (!data || !data_sz) {
-error_setg(errp, "SEV: Failed to decode data");
-return 1;
-}
-
-hva = gpa2hva(, gpa, data_sz, errp);
-if (!hva) {
-error_prepend(errp, "SEV: Failed to calculate guest address: ");
-return 1;
-}
-
-input.hdr_uaddr = (uint64_t)(unsigned long)hdr;
-input.hdr_len = hdr_sz;
-
-input.trans_uaddr = (uint64_t)(unsigned long)data;
-input.trans_len = data_sz;
-
-input.guest_uaddr = (uint64_t)(unsigned long)hva;
-input.guest_len = data_sz;
-
-trace_kvm_sev_launch_secret(gpa, input.guest_uaddr,
-input.trans_uaddr, input.trans_len);
-
-ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_LAUNCH_SECRET,
-, );
-if (ret) {
-error_setg(errp, "SEV: failed to inject secret ret=%d fw_error=%d 
'%s'",
- ret, error, fw_error_to_str(error));
-return ret;
-}
-
-return 0;
-}
-
 #define SEV_SECRET_GUID "4c2eb361-7d9b-4cc3-8081-127c90d3d294"
 struct sev_secret_area {
 uint32_t base;
 uint32_t size;
 };
 
-void qmp_sev_inject_launch_secret(const char *packet_hdr,
-  const char *secret,
+void qmp_sev_inject_launch_secret(const char *hdr_b64,
+  const char *secret_b64,
   bool has_gpa, uint64_t gpa,
   Error **errp)
 {
+int ret, fw_error = 0;
+g_autofree guchar *hdr = NULL, *secret = NULL;
+uint8_t *data = NULL;
+KVMState *s = kvm_state;
+gsize hdr_sz = 0, secret_sz = 0;
+MemoryRegion *mr = NULL;
+void *hva;
+struct sev_secret_area *area = NULL;
+
 if (!sev_enabled()) {
 error_setg(errp, "SEV not enabled for guest");
 return;
 }
-if (!has_gpa) {
-uint8_t *data;
-struct sev_secret_area *area;
 
+hdr = g_base64_decode(hdr_b64, _sz);
+if (!hdr || !hdr_sz) {
+error_setg(errp, "SEV: Failed to decode sequence header");
+return;
+}
+
+secret = g_base64_decode(secret_b64, _sz);
+if (!secret || !secret_sz) {
+error_setg(errp, "SEV: Failed to decode secret");
+return;
+}
+
+if (!has_gpa) {
 if (!pc_system_ovmf_table_find(SEV_SECRET_GUID, , NULL)) {
 error_setg(errp, "SEV: no secret area found in OVMF,"
" gpa must be specified.");
@@ -1074,7 +1030,18 @@ void qmp_sev_inject_launch_secret(const char *packet_hdr,
 gpa = area->base;
 }
 
-sev_inject_launch_secret(packet_hdr, secret, gpa, errp);
+hva = gpa2hva(, gpa, secret_sz, errp);
+if (!hva) {
+error_prepend(errp, "SEV: Failed to calculate guest address: ");
+return;
+}
+
+ret = sev_inject_launch_secret(s->vmfd, hdr, secret, secret_sz,
+   hva, _error);
+if (ret < 0) {
+error_setg(errp, "%s: LAUNCH_SECRET ret=%d fw_error=%d '%s'", __func__,
+   ret, fw_error, fw_error_to_str(fw_error));
+}
 }
 
 static int
diff --git a/target/i386/sev.h b/target/i386/sev.h
index acb181358e..f1af28eca0 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -53,8 +53,6 @@ uint32_t sev_get_reduced_phys_bits(void);
 bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp);
 
 int sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp);
-int sev_inject_launch_secret(const char *hdr, const char *secret,
-  

[RFC PATCH 6/8] i386/sev: Replace LAUNCH_MEASURE ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
The LAUNCH_MEASURE API returns the measurement of the launched guest's
memory pages (and VMCB save areas if ES is enabled). The caller is
responsible for ensuring that the pointer (identified as the "data"
argument) is a valid pointer that can hold the guest's measurement (a
measurement in SEV is 48 bytes in size).

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 24 ++--
 target/i386/sev.h |  2 ++
 2 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index adb35291e8..f53ff140e3 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -721,7 +721,6 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
 SevGuestState *sev = sev_guest;
 int ret, fw_error;
 g_autofree guchar *data = NULL;
-struct kvm_sev_launch_measure measurement = {};
 KVMState *s = kvm_state;
 
 if (!sev_check_state(sev, SEV_STATE_LAUNCH_UPDATE)) {
@@ -738,31 +737,20 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
 }
 }
 
-/* query the measurement blob length */
-ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_MEASURE,
-, _error);
-if (!measurement.len) {
-error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
- __func__, ret, fw_error, fw_error_to_str(fw_error));
-return;
-}
+data = g_malloc(SEV_MEASUREMENT_SIZE);
 
-data = g_new0(guchar, measurement.len);
-measurement.uaddr = (unsigned long)data;
-
-/* get the measurement blob */
-ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_MEASURE,
-, _error);
+ret = sev_launch_measure(s->vmfd, data, _error);
 if (ret) {
-error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
- __func__, ret, fw_error, fw_error_to_str(fw_error));
+error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'", __func__,
+   ret, fw_error, fw_error_to_str(fw_error));
+
 return;
 }
 
 sev_set_guest_state(sev, SEV_STATE_LAUNCH_SECRET);
 
 /* encode the measurement value and emit the event */
-sev->measurement = g_base64_encode(data, measurement.len);
+sev->measurement = g_base64_encode(data, SEV_MEASUREMENT_SIZE);
 trace_kvm_sev_launch_measurement(sev->measurement);
 }
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index e7499c95b1..acb181358e 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -38,6 +38,8 @@ typedef struct SevKernelLoaderContext {
 size_t cmdline_size;
 } SevKernelLoaderContext;
 
+#define SEV_MEASUREMENT_SIZE 48
+
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
 bool sev_es_enabled(void);
-- 
2.40.1




[RFC PATCH 1/8] Add SEV Rust library as dependency with CONFIG_SEV

2023-09-14 Thread Tyler Fanelli
The Rust sev library provides a type-safe implementation of the AMD
Secure Encrypted Virtualization (SEV) APIs.

Signed-off-by: Tyler Fanelli 
---
 meson.build   | 7 +++
 meson_options.txt | 2 ++
 scripts/meson-buildoptions.sh | 3 +++
 target/i386/meson.build   | 2 +-
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 5150a74831..7114a4a2b9 100644
--- a/meson.build
+++ b/meson.build
@@ -1079,6 +1079,12 @@ if targetos == 'linux' and (have_system or have_tools)
method: 'pkg-config',
required: get_option('libudev'))
 endif
+sev = not_found
+if not get_option('sev').auto()
+  sev = dependency('sev', version: '1.2.1',
+  method: 'pkg-config',
+  required: get_option('sev'))
+endif
 
 mpathlibs = [libudev]
 mpathpersist = not_found
@@ -4283,6 +4289,7 @@ summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
 summary_info += {'blkio support': blkio}
+summary_info += {'sev support':   sev}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
 summary_info += {'Linux AIO support': libaio}
diff --git a/meson_options.txt b/meson_options.txt
index f82d88b7c6..c57d542c0b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -134,6 +134,8 @@ option('cap_ng', type : 'feature', value : 'auto',
description: 'cap_ng support')
 option('blkio', type : 'feature', value : 'auto',
description: 'libblkio block device driver')
+option('sev', type : 'feature', value : 'auto',
+description: 'SEV Rust library')
 option('bpf', type : 'feature', value : 'auto',
 description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index e1d178370c..d7deb50bda 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -83,6 +83,7 @@ meson_options_help() {
   printf "%s\n" '  avx512bwAVX512BW optimizations'
   printf "%s\n" '  avx512f AVX512F optimizations'
   printf "%s\n" '  blkio   libblkio block device driver'
+  printf "%s\n" '  sev SEV Rust library'
   printf "%s\n" '  bochs   bochs image format support'
   printf "%s\n" '  bpf eBPF support'
   printf "%s\n" '  brlapi  brlapi character device driver'
@@ -227,6 +228,8 @@ _meson_option_parse() {
 --disable-lto) printf "%s" -Db_lto=false ;;
 --enable-blkio) printf "%s" -Dblkio=enabled ;;
 --disable-blkio) printf "%s" -Dblkio=disabled ;;
+--enable-sev) printf "%s" -Dsev=enabled ;;
+--disable-sev) printf "%s" -Dsev=disabled ;;
 --block-drv-ro-whitelist=*) quote_sh "-Dblock_drv_ro_whitelist=$2" ;;
 --block-drv-rw-whitelist=*) quote_sh "-Dblock_drv_rw_whitelist=$2" ;;
 --enable-block-drv-whitelist-in-tools) printf "%s" 
-Dblock_drv_whitelist_in_tools=true ;;
diff --git a/target/i386/meson.build b/target/i386/meson.build
index 6f1036d469..18450dc134 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -6,7 +6,7 @@ i386_ss.add(files(
   'xsave_helper.c',
   'cpu-dump.c',
 ))
-i386_ss.add(when: 'CONFIG_SEV', if_true: files('host-cpu.c'))
+i386_ss.add(when: 'CONFIG_SEV', if_true: [sev, files('host-cpu.c')])
 
 # x86 cpu type
 i386_ss.add(when: 'CONFIG_KVM', if_true: files('host-cpu.c'))
-- 
2.40.1




[RFC PATCH 3/8] i386/sev: Replace LAUNCH_START ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
The sev library offers an equivalent API for SEV_LAUNCH_START. The
library contains some internal state for each VM it's currently running,
and organizes the internal state for each VM via it's file descriptor.
Therefore, the VM's file descriptor must be provided as input.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 80 ++-
 1 file changed, 30 insertions(+), 50 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index f0fd291e68..49be072cbc 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -715,51 +715,6 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 return 0;
 }
 
-static int
-sev_launch_start(SevGuestState *sev)
-{
-gsize sz;
-int ret = 1;
-int fw_error, rc;
-struct kvm_sev_launch_start start = {
-.handle = sev->handle, .policy = sev->policy
-};
-guchar *session = NULL, *dh_cert = NULL;
-
-if (sev->session_file) {
-if (sev_read_file_base64(sev->session_file, , ) < 0) {
-goto out;
-}
-start.session_uaddr = (unsigned long)session;
-start.session_len = sz;
-}
-
-if (sev->dh_cert_file) {
-if (sev_read_file_base64(sev->dh_cert_file, _cert, ) < 0) {
-goto out;
-}
-start.dh_uaddr = (unsigned long)dh_cert;
-start.dh_len = sz;
-}
-
-trace_kvm_sev_launch_start(start.policy, session, dh_cert);
-rc = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_START, , _error);
-if (rc < 0) {
-error_report("%s: LAUNCH_START ret=%d fw_error=%d '%s'",
-__func__, ret, fw_error, fw_error_to_str(fw_error));
-goto out;
-}
-
-sev_set_guest_state(sev, SEV_STATE_LAUNCH_UPDATE);
-sev->handle = start.handle;
-ret = 0;
-
-out:
-g_free(session);
-g_free(dh_cert);
-return ret;
-}
-
 static int
 sev_launch_update_data(SevGuestState *sev, uint8_t *addr, uint64_t len)
 {
@@ -913,11 +868,13 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 {
 SevGuestState *sev
 = (SevGuestState *)object_dynamic_cast(OBJECT(cgs), TYPE_SEV_GUEST);
+gsize sz;
 char *devname;
-int ret, fw_error;
+int ret = -1, fw_error;
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
+guchar *session = NULL, *dh_cert = NULL;
 KVMState *s = kvm_state;
 
 if (!sev) {
@@ -1007,23 +964,46 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 goto err;
 }
 
-ret = sev_launch_start(sev);
+if (!sev->session_file || !sev->dh_cert_file) {
+goto err;
+}
+
+if (sev_read_file_base64(sev->session_file, , ) < 0) {
+goto err;
+}
+
+if (sev_read_file_base64(sev->dh_cert_file, _cert, ) < 0) {
+goto err;
+}
+
+ret = sev_launch_start(s->vmfd, sev->policy, (void *) dh_cert,
+   (void *) session, _error);
 if (ret) {
-error_setg(errp, "%s: failed to create encryption context", __func__);
+error_setg(errp, "%s: LAUNCH_START ret=%d fw_error=%d '%s'",
+   __func__, ret, fw_error, fw_error_to_str(fw_error));
 goto err;
 }
 
+sev_set_guest_state(sev, SEV_STATE_LAUNCH_UPDATE);
+
 ram_block_notifier_add(_ram_notifier);
 qemu_add_machine_init_done_notifier(_machine_done_notify);
 qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
 
 cgs->ready = true;
 
-return 0;
+ret = 0;
+goto out;
+
 err:
 sev_guest = NULL;
 ram_block_discard_disable(false);
-return -1;
+out:
+g_free(session);
+g_free(dh_cert);
+
+return ret;
+
 }
 
 int
-- 
2.40.1




[RFC PATCH 0/8] i386/sev: Use C API of Rust SEV library

2023-09-14 Thread Tyler Fanelli
These patches are submitted as an RFC mainly because I'm a relative
newcomer to QEMU with no knowledge of the community's views on
including Rust code, nor it's preference of using library APIs for
ioctls that were previously implemented in QEMU directly.

Recently, the Rust sev library [0] has introduced a C API to take
advantage of the library outside of Rust.

Should the inclusion of the library as a dependency be desired, it can
be extended further to include the firmware/platform ioctls, the
attestation report fetching, and more. This would result in much of
the AMD-SEV portion of QEMU being offloaded to the library.

This series looks to explore the possibility of using the library and
show a bit of what it would look like. I'm looking for comments
regarding if this feature is desired.

[0] https://github.com/virtee/sev

Tyler Fanelli (8):
  Add SEV Rust library as dependency with CONFIG_SEV
  i386/sev: Replace INIT and ES_INIT ioctls with sev library equivalents
  i386/sev: Replace LAUNCH_START ioctl with sev library equivalent
  i386/sev: Replace UPDATE_DATA ioctl with sev library equivalent
  i386/sev: Replace LAUNCH_UPDATE_VMSA ioctl with sev library equivalent
  i386/sev: Replace LAUNCH_MEASURE ioctl with sev library equivalent
  i386/sev: Replace LAUNCH_SECRET ioctl with sev library equivalent
  i386/sev: Replace LAUNCH_FINISH ioctl with sev library equivalent

 meson.build   |   7 +
 meson_options.txt |   2 +
 scripts/meson-buildoptions.sh |   3 +
 target/i386/meson.build   |   2 +-
 target/i386/sev.c | 311 --
 target/i386/sev.h |   4 +-
 target/i386/trace-events  |   1 +
 7 files changed, 123 insertions(+), 207 deletions(-)

-- 
2.40.1




[RFC PATCH 5/8] i386/sev: Replace LAUNCH_UPDATE_VMSA ioctl with sev library equivalent

2023-09-14 Thread Tyler Fanelli
The LAUNCH_UPDATE_VMSA API takes the VM's file descriptor, as well as a
field for any firmware errors as input.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 615021a1a3..adb35291e8 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -715,27 +715,14 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 return 0;
 }
 
-static int
-sev_launch_update_vmsa(SevGuestState *sev)
-{
-int ret, fw_error;
-
-ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_UPDATE_VMSA, NULL, _error);
-if (ret) {
-error_report("%s: LAUNCH_UPDATE_VMSA ret=%d fw_error=%d '%s'",
-__func__, ret, fw_error, fw_error_to_str(fw_error));
-}
-
-return ret;
-}
-
 static void
 sev_launch_get_measure(Notifier *notifier, void *unused)
 {
 SevGuestState *sev = sev_guest;
-int ret, error;
+int ret, fw_error;
 g_autofree guchar *data = NULL;
 struct kvm_sev_launch_measure measurement = {};
+KVMState *s = kvm_state;
 
 if (!sev_check_state(sev, SEV_STATE_LAUNCH_UPDATE)) {
 return;
@@ -743,18 +730,20 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
 
 if (sev_es_enabled()) {
 /* measure all the VM save areas before getting launch_measure */
-ret = sev_launch_update_vmsa(sev);
+ret = sev_launch_update_vmsa(s->vmfd, _error);
 if (ret) {
+error_report("%s: LAUNCH_UPDATE_VMSA ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
 exit(1);
 }
 }
 
 /* query the measurement blob length */
 ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_MEASURE,
-, );
+, _error);
 if (!measurement.len) {
 error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
- __func__, ret, error, fw_error_to_str(errno));
+ __func__, ret, fw_error, fw_error_to_str(fw_error));
 return;
 }
 
@@ -763,10 +752,10 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
 
 /* get the measurement blob */
 ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_MEASURE,
-, );
+, _error);
 if (ret) {
 error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
- __func__, ret, error, fw_error_to_str(errno));
+ __func__, ret, fw_error, fw_error_to_str(fw_error));
 return;
 }
 
-- 
2.40.1




[RFC PATCH 2/8] i386/sev: Replace INIT and ES_INIT ioctls with sev library equivalents

2023-09-14 Thread Tyler Fanelli
The sev library offers APIs for SEV_INIT and SEV_ES_INIT, both taking
the file descriptors of the encrypting VM and /dev/sev as input.

If this API ioctl call fails, fw_error will be set accordingly.

Signed-off-by: Tyler Fanelli 
---
 target/i386/sev.c| 14 +-
 target/i386/trace-events |  1 +
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index fe2144c038..f0fd291e68 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -18,6 +18,8 @@
 
 #include 
 
+#include 
+
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "qemu/base64.h"
@@ -27,6 +29,7 @@
 #include "crypto/hash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "sysemu/kvm_int.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "trace.h"
@@ -911,10 +914,11 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 SevGuestState *sev
 = (SevGuestState *)object_dynamic_cast(OBJECT(cgs), TYPE_SEV_GUEST);
 char *devname;
-int ret, fw_error, cmd;
+int ret, fw_error;
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
+KVMState *s = kvm_state;
 
 if (!sev) {
 return 0;
@@ -990,13 +994,13 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
  __func__);
 goto err;
 }
-cmd = KVM_SEV_ES_INIT;
+trace_kvm_sev_es_init();
+ret = sev_es_init(s->vmfd, sev->sev_fd, _error);
 } else {
-cmd = KVM_SEV_INIT;
+trace_kvm_sev_init();
+ret = sev_init(s->vmfd, sev->sev_fd, _error);
 }
 
-trace_kvm_sev_init();
-ret = sev_ioctl(sev->sev_fd, cmd, NULL, _error);
 if (ret) {
 error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
__func__, ret, fw_error, fw_error_to_str(fw_error));
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 2cd8726eeb..2dca4ee117 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -2,6 +2,7 @@
 
 # sev.c
 kvm_sev_init(void) ""
+kvm_sev_es_init(void) ""
 kvm_memcrypt_register_region(void *addr, size_t len) "addr %p len 0x%zx"
 kvm_memcrypt_unregister_region(void *addr, size_t len) "addr %p len 0x%zx"
 kvm_sev_change_state(const char *old, const char *new) "%s -> %s"
-- 
2.40.1




[PATCH v4 22/23] bsd-user: Implement shmat(2) and shmdt(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Use `WITH_MMAP_LOCK_GUARD` instead of mmap_lock() and mmap_unlock(),
to match linux-user implementation, according to the following commits:

69fa2708a216df715ba5102a0f98468b540a464e linux-user: Use WITH_MMAP_LOCK_GUARD 
in target_{shmat,shmdt}
ceda5688b650646248f269a992c06b11148c5759 linux-user: Fix shmdt

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
---
 bsd-user/bsd-mem.h| 87 +++
 bsd-user/freebsd/os-syscall.c |  8 
 bsd-user/mmap.c   |  2 +-
 bsd-user/qemu.h   |  1 +
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b82f3eaa25..c512a4e375 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -344,4 +344,91 @@ static inline abi_long do_bsd_shmctl(abi_long shmid, 
abi_long cmd,
 return ret;
 }
 
+/* shmat(2) */
+static inline abi_long do_bsd_shmat(int shmid, abi_ulong shmaddr, int shmflg)
+{
+abi_ulong raddr;
+abi_long ret;
+struct shmid_ds shm_info;
+
+/* Find out the length of the shared memory segment. */
+ret = get_errno(shmctl(shmid, IPC_STAT, _info));
+if (is_error(ret)) {
+/* Can't get the length */
+return ret;
+}
+
+if (!guest_range_valid_untagged(shmaddr, shm_info.shm_segsz)) {
+return -TARGET_EINVAL;
+}
+
+WITH_MMAP_LOCK_GUARD() {
+void *host_raddr;
+
+if (shmaddr) {
+host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
+} else {
+abi_ulong mmap_start;
+
+mmap_start = mmap_find_vma(0, shm_info.shm_segsz);
+
+if (mmap_start == -1) {
+return -TARGET_ENOMEM;
+}
+host_raddr = shmat(shmid, g2h_untagged(mmap_start),
+   shmflg | SHM_REMAP);
+}
+
+if (host_raddr == (void *)-1) {
+return get_errno(-1);
+}
+raddr = h2g(host_raddr);
+
+page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
+   PAGE_VALID | PAGE_RESET | PAGE_READ |
+   (shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
+
+for (int i = 0; i < N_BSD_SHM_REGIONS; i++) {
+if (bsd_shm_regions[i].start == 0) {
+bsd_shm_regions[i].start = raddr;
+bsd_shm_regions[i].size = shm_info.shm_segsz;
+break;
+}
+}
+}
+
+return raddr;
+}
+
+/* shmdt(2) */
+static inline abi_long do_bsd_shmdt(abi_ulong shmaddr)
+{
+abi_long ret;
+
+WITH_MMAP_LOCK_GUARD() {
+int i;
+
+for (i = 0; i < N_BSD_SHM_REGIONS; ++i) {
+if (bsd_shm_regions[i].start == shmaddr) {
+break;
+}
+}
+
+if (i == N_BSD_SHM_REGIONS) {
+return -TARGET_EINVAL;
+}
+
+ret = get_errno(shmdt(g2h_untagged(shmaddr)));
+if (ret == 0) {
+abi_ulong size = bsd_shm_regions[i].size;
+
+bsd_shm_regions[i].start = 0;
+page_set_flags(shmaddr, shmaddr + size - 1, 0);
+mmap_reserve(shmaddr, size);
+}
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 35f94f51fc..fe0968773e 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -559,6 +559,14 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmctl(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_shmat: /* shmat(2) */
+ret = do_bsd_shmat(arg1, arg2, arg3);
+break;
+
+case TARGET_FREEBSD_NR_shmdt: /* shmdt(2) */
+ret = do_bsd_shmdt(arg1);
+break;
+
 /*
  * Misc
  */
diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 8e148a2ea3..3ef11b2807 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -636,7 +636,7 @@ fail:
 return -1;
 }
 
-static void mmap_reserve(abi_ulong start, abi_ulong size)
+void mmap_reserve(abi_ulong start, abi_ulong size)
 {
 abi_ulong real_start;
 abi_ulong real_end;
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 6724bb9f0a..e9499b8dac 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -234,6 +234,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 int target_msync(abi_ulong start, abi_ulong len, int flags);
 extern abi_ulong mmap_next_start;
 abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size);
+void mmap_reserve(abi_ulong start, abi_ulong size);
 void TSA_NO_TSA mmap_fork_start(void);
 void TSA_NO_TSA mmap_fork_end(int child);
 
-- 
2.42.0




[PATCH v4 14/23] bsd-user: Implement msync(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Co-authored-by: Kyle Evans 
Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 11 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 15 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 0f9e4a1d4b..5e885823a7 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -88,4 +88,15 @@ static inline abi_long do_bsd_mprotect(abi_long arg1, 
abi_long arg2,
 return get_errno(target_mprotect(arg1, arg2, arg3));
 }
 
+/* msync(2) */
+static inline abi_long do_bsd_msync(abi_long addr, abi_long len, abi_long 
flags)
+{
+if (!guest_range_valid_untagged(addr, len)) {
+/* It seems odd, but POSIX wants this to be ENOMEM */
+return -TARGET_ENOMEM;
+}
+
+return get_errno(msync(g2h_untagged(addr), len, flags));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 127805e079..859492dee7 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -499,6 +499,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_mprotect(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_msync: /* msync(2) */
+ret = do_bsd_msync(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 15/23] bsd-user: Implement mlock(2), munlock(2), mlockall(2), munlockall(2), minherit(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 37 +++
 bsd-user/freebsd/os-syscall.c | 20 +++
 2 files changed, 57 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 5e885823a7..16c22593bf 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -99,4 +99,41 @@ static inline abi_long do_bsd_msync(abi_long addr, abi_long 
len, abi_long flags)
 return get_errno(msync(g2h_untagged(addr), len, flags));
 }
 
+/* mlock(2) */
+static inline abi_long do_bsd_mlock(abi_long arg1, abi_long arg2)
+{
+if (!guest_range_valid_untagged(arg1, arg2)) {
+return -TARGET_EINVAL;
+}
+return get_errno(mlock(g2h_untagged(arg1), arg2));
+}
+
+/* munlock(2) */
+static inline abi_long do_bsd_munlock(abi_long arg1, abi_long arg2)
+{
+if (!guest_range_valid_untagged(arg1, arg2)) {
+return -TARGET_EINVAL;
+}
+return get_errno(munlock(g2h_untagged(arg1), arg2));
+}
+
+/* mlockall(2) */
+static inline abi_long do_bsd_mlockall(abi_long arg1)
+{
+return get_errno(mlockall(arg1));
+}
+
+/* munlockall(2) */
+static inline abi_long do_bsd_munlockall(void)
+{
+return get_errno(munlockall());
+}
+
+/* minherit(2) */
+static inline abi_long do_bsd_minherit(abi_long addr, abi_long len,
+abi_long inherit)
+{
+return get_errno(minherit(g2h_untagged(addr), len, inherit));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 859492dee7..6eaa705cd3 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -503,6 +503,26 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_msync(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_mlock: /* mlock(2) */
+ret = do_bsd_mlock(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_munlock: /* munlock(2) */
+ret = do_bsd_munlock(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_mlockall: /* mlockall(2) */
+ret = do_bsd_mlockall(arg1);
+break;
+
+case TARGET_FREEBSD_NR_munlockall: /* munlockall(2) */
+ret = do_bsd_munlockall();
+break;
+
+case TARGET_FREEBSD_NR_minherit: /* minherit(2) */
+ret = do_bsd_minherit(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 04/23] bsd-user: Introduce freebsd/os-misc.h to the source tree

2023-09-14 Thread Karim Taha
From: Stacey Son 

To preserve the copyright notice and help with the 'Author' info for
subsequent changes to the file.

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/freebsd/os-misc.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 bsd-user/freebsd/os-misc.h

diff --git a/bsd-user/freebsd/os-misc.h b/bsd-user/freebsd/os-misc.h
new file mode 100644
index 00..8436ccb2f7
--- /dev/null
+++ b/bsd-user/freebsd/os-misc.h
@@ -0,0 +1,28 @@
+/*
+ *  miscellaneous FreeBSD system call shims
+ *
+ *  Copyright (c) 2013-14 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef OS_MISC_H
+#define OS_MISC_H
+
+#include 
+#include 
+#include 
+
+
+#endif /* OS_MISC_H */
-- 
2.42.0




[PATCH v4 23/23] bsd-user: Add stubs for vadvise(), sbrk() and sstk()

2023-09-14 Thread Karim Taha
From: Warner Losh 

The above system calls are not supported by qemu.

Signed-off-by: Warner Losh 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 18 ++
 bsd-user/freebsd/os-syscall.c | 12 
 2 files changed, 30 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index c512a4e375..c3e72e3b86 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -431,4 +431,22 @@ static inline abi_long do_bsd_shmdt(abi_ulong shmaddr)
 return ret;
 }
 
+static inline abi_long do_bsd_vadvise(void)
+{
+/* See sys_ovadvise() in vm_unix.c */
+return -TARGET_EINVAL;
+}
+
+static inline abi_long do_bsd_sbrk(void)
+{
+/* see sys_sbrk() in vm_mmap.c */
+return -TARGET_EOPNOTSUPP;
+}
+
+static inline abi_long do_bsd_sstk(void)
+{
+/* see sys_sstk() in vm_mmap.c */
+return -TARGET_EOPNOTSUPP;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index fe0968773e..9647249e90 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -567,6 +567,18 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmdt(arg1);
 break;
 
+case TARGET_FREEBSD_NR_freebsd11_vadvise:
+ret = do_bsd_vadvise();
+break;
+
+case TARGET_FREEBSD_NR_sbrk:
+ret = do_bsd_sbrk();
+break;
+
+case TARGET_FREEBSD_NR_sstk:
+ret = do_bsd_sstk();
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v4 16/23] bsd-user: Implment madvise(2) to match the linux-user implementation.

2023-09-14 Thread Karim Taha
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 53 +++
 bsd-user/freebsd/os-syscall.c |  4 +++
 bsd-user/syscall_defs.h   |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 16c22593bf..b00ab3aed8 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -129,6 +129,59 @@ static inline abi_long do_bsd_munlockall(void)
 return get_errno(munlockall());
 }
 
+/* madvise(2) */
+static inline abi_long do_bsd_madvise(abi_long arg1, abi_long arg2,
+abi_long arg3)
+{
+abi_ulong len;
+int ret = 0;
+abi_long start = arg1;
+abi_long len_in = arg2;
+abi_long advice = arg3;
+
+if (start & ~TARGET_PAGE_MASK) {
+return -TARGET_EINVAL;
+}
+if (len_in == 0) {
+return 0;
+}
+len = TARGET_PAGE_ALIGN(len_in);
+if (len == 0 || !guest_range_valid_untagged(start, len)) {
+return -TARGET_EINVAL;
+}
+
+/*
+ * Most advice values are hints, so ignoring and returning success is ok.
+ *
+ * However, some advice values such as MADV_DONTNEED, are not hints and
+ * need to be emulated.
+ *
+ * A straight passthrough for those may not be safe because qemu sometimes
+ * turns private file-backed mappings into anonymous mappings.
+ * If all guest pages have PAGE_PASSTHROUGH set, mappings have the
+ * same semantics for the host as for the guest.
+ *
+ * MADV_DONTNEED is passed through, if possible.
+ * If passthrough isn't possible, we nevertheless (wrongly!) return
+ * success, which is broken but some userspace programs fail to work
+ * otherwise. Completely implementing such emulation is quite complicated
+ * though.
+ */
+mmap_lock();
+switch (advice) {
+case MADV_DONTNEED:
+if (page_check_range(start, len, PAGE_PASSTHROUGH)) {
+ret = get_errno(madvise(g2h_untagged(start), len, advice));
+if (ret == 0) {
+page_reset_target_data(start, start + len - 1);
+}
+}
+}
+mmap_unlock();
+
+return ret;
+}
+
 /* minherit(2) */
 static inline abi_long do_bsd_minherit(abi_long addr, abi_long len,
 abi_long inherit)
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 6eaa705cd3..f5d60cf902 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -519,6 +519,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_munlockall();
 break;
 
+case TARGET_FREEBSD_NR_madvise: /* madvise(2) */
+ret = do_bsd_madvise(arg1, arg2, arg3);
+break;
+
 case TARGET_FREEBSD_NR_minherit: /* minherit(2) */
 ret = do_bsd_minherit(arg1, arg2, arg3);
 break;
diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 074df7bdd6..76f4856009 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -95,6 +95,8 @@ struct bsd_shm_regions {
 /*
  *  sys/mman.h
  */
+#define TARGET_MADV_DONTNEED4   /* dont need these pages */
+
 #define TARGET_FREEBSD_MAP_RESERVED0080 0x0080  /* previously misimplemented */
 /* MAP_INHERIT */
 #define TARGET_FREEBSD_MAP_RESERVED0100 0x0100  /* previously unimplemented */
-- 
2.42.0




[PATCH v4 17/23] bsd-user: Implement mincore(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 23 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 27 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b00ab3aed8..0c8d96d9a4 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -189,4 +189,27 @@ static inline abi_long do_bsd_minherit(abi_long addr, 
abi_long len,
 return get_errno(minherit(g2h_untagged(addr), len, inherit));
 }
 
+/* mincore(2) */
+static inline abi_long do_bsd_mincore(abi_ulong target_addr, abi_ulong len,
+abi_ulong target_vec)
+{
+abi_long ret;
+void *p;
+abi_ulong vec_len = DIV_ROUND_UP(len, TARGET_PAGE_SIZE);
+
+if (!guest_range_valid_untagged(target_addr, len)
+|| !page_check_range(target_addr, len, PAGE_VALID)) {
+return -TARGET_EFAULT;
+}
+
+p = lock_user(VERIFY_WRITE, target_vec, vec_len, 0);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(mincore(g2h_untagged(target_addr), len, p));
+unlock_user(p, target_vec, vec_len);
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index f5d60cf902..8d1cf3b35c 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -527,6 +527,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_minherit(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_mincore: /* mincore(2) */
+ret = do_bsd_mincore(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 13/23] bsd-user: Implement mprotect(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/bsd-mem.h| 7 +++
 bsd-user/freebsd/os-syscall.c | 4 
 2 files changed, 11 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 76b504f70c..0f9e4a1d4b 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -81,4 +81,11 @@ static inline abi_long do_bsd_munmap(abi_long arg1, abi_long 
arg2)
 return get_errno(target_munmap(arg1, arg2));
 }
 
+/* mprotect(2) */
+static inline abi_long do_bsd_mprotect(abi_long arg1, abi_long arg2,
+abi_long arg3)
+{
+return get_errno(target_mprotect(arg1, arg2, arg3));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index d88f62319b..127805e079 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -495,6 +495,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_munmap(arg1, arg2);
 break;
 
+case TARGET_FREEBSD_NR_mprotect: /* mprotect(2) */
+ret = do_bsd_mprotect(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 03/23] bsd-user: Declarations for ipc_perm and shmid_ds conversion functions

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/qemu-bsd.h | 45 +
 1 file changed, 45 insertions(+)
 create mode 100644 bsd-user/qemu-bsd.h

diff --git a/bsd-user/qemu-bsd.h b/bsd-user/qemu-bsd.h
new file mode 100644
index 00..46572ece7d
--- /dev/null
+++ b/bsd-user/qemu-bsd.h
@@ -0,0 +1,45 @@
+/*
+ *  BSD conversion extern declarations
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef QEMU_BSD_H
+#define QEMU_BSD_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* bsd-mem.c */
+void target_to_host_ipc_perm__locked(struct ipc_perm *host_ip,
+struct target_ipc_perm *target_ip);
+void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
+struct ipc_perm *host_ip);
+abi_long target_to_host_shmid_ds(struct shmid_ds *host_sd,
+abi_ulong target_addr);
+abi_long host_to_target_shmid_ds(abi_ulong target_addr,
+struct shmid_ds *host_sd);
+
+#endif /* QEMU_BSD_H */
-- 
2.42.0




[PATCH v4 21/23] bsd-user: Implement shmctl(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 39 +++
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 43 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index c362cc07a3..b82f3eaa25 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -305,4 +305,43 @@ static inline abi_long do_bsd_shmget(abi_long arg1, 
abi_ulong arg2,
 return get_errno(shmget(arg1, arg2, arg3));
 }
 
+/* shmctl(2) */
+static inline abi_long do_bsd_shmctl(abi_long shmid, abi_long cmd,
+abi_ulong buff)
+{
+struct shmid_ds dsarg;
+abi_long ret = -TARGET_EINVAL;
+
+cmd &= 0xff;
+
+switch (cmd) {
+case IPC_STAT:
+if (target_to_host_shmid_ds(, buff)) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shmctl(shmid, cmd, ));
+if (host_to_target_shmid_ds(buff, )) {
+return -TARGET_EFAULT;
+}
+break;
+
+case IPC_SET:
+if (target_to_host_shmid_ds(, buff)) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shmctl(shmid, cmd, ));
+break;
+
+case IPC_RMID:
+ret = get_errno(shmctl(shmid, cmd, NULL));
+break;
+
+default:
+ret = -TARGET_EINVAL;
+break;
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 52cca2300f..35f94f51fc 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -555,6 +555,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_shmget(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_shmctl: /* shmctl(2) */
+ret = do_bsd_shmctl(arg1, arg2, arg3);
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v4 12/23] bsd-user: Implement mmap(2) and munmap(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 20 
 bsd-user/freebsd/os-syscall.c |  9 +
 2 files changed, 29 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index d865e0807d..76b504f70c 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -61,4 +61,24 @@ extern struct bsd_shm_regions bsd_shm_regions[];
 extern abi_ulong target_brk;
 extern abi_ulong initial_target_brk;
 
+/* mmap(2) */
+static inline abi_long do_bsd_mmap(void *cpu_env, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
+{
+if (regpairs_aligned(cpu_env) != 0) {
+arg6 = arg7;
+arg7 = arg8;
+}
+return get_errno(target_mmap(arg1, arg2, arg3,
+ target_to_host_bitmask(arg4, mmap_flags_tbl),
+ arg5, target_arg64(arg6, arg7)));
+}
+
+/* munmap(2) */
+static inline abi_long do_bsd_munmap(abi_long arg1, abi_long arg2)
+{
+return get_errno(target_munmap(arg1, arg2));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 7e2a395e0f..d88f62319b 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -486,6 +486,15 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 /*
  * Memory management system calls.
  */
+case TARGET_FREEBSD_NR_mmap: /* mmap(2) */
+ret = do_bsd_mmap(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6, arg7,
+  arg8);
+break;
+
+case TARGET_FREEBSD_NR_munmap: /* munmap(2) */
+ret = do_bsd_munmap(arg1, arg2);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 02/23] bsd-user: Implement struct target_shmid_ds

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/syscall_defs.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 39a9bc8ed7..074df7bdd6 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -72,6 +72,26 @@ struct target_ipc_perm {
 #define TARGET_IPC_SET  1   /* set options */
 #define TARGET_IPC_STAT 2   /* get options */
 
+/*
+ * sys/shm.h
+ */
+struct target_shmid_ds {
+struct  target_ipc_perm shm_perm; /* peration permission structure */
+abi_ulong   shm_segsz;  /* size of segment in bytes */
+int32_t shm_lpid;   /* process ID of last shared memory op */
+int32_t shm_cpid;   /* process ID of creator */
+int32_t shm_nattch; /* number of current attaches */
+target_time_t shm_atime;  /* time of last shmat() */
+target_time_t shm_dtime;  /* time of last shmdt() */
+target_time_t shm_ctime;  /* time of last change by shmctl() */
+};
+
+#define N_BSD_SHM_REGIONS   32
+struct bsd_shm_regions {
+abi_long start;
+abi_long size;
+};
+
 /*
  *  sys/mman.h
  */
-- 
2.42.0




[PATCH v4 20/23] bsd-user: Implement shm_unlink(2) and shmget(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 23 +++
 bsd-user/freebsd/os-syscall.c |  8 
 2 files changed, 31 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index f8dc943c23..c362cc07a3 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -282,4 +282,27 @@ static inline abi_long do_bsd_shm_open(abi_ulong arg1, 
abi_long arg2,
 return ret;
 }
 
+/* shm_unlink(2) */
+static inline abi_long do_bsd_shm_unlink(abi_ulong arg1)
+{
+int ret;
+void *p;
+
+p = lock_user_string(arg1);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(shm_unlink(p)); /* XXX path(p)? */
+unlock_user(p, arg1, 0);
+
+return ret;
+}
+
+/* shmget(2) */
+static inline abi_long do_bsd_shmget(abi_long arg1, abi_ulong arg2,
+abi_long arg3)
+{
+return get_errno(shmget(arg1, arg2, arg3));
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 7404b0aa72..52cca2300f 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -547,6 +547,14 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 break;
 #endif
 
+case TARGET_FREEBSD_NR_shm_unlink: /* shm_unlink(2) */
+ret = do_bsd_shm_unlink(arg1);
+break;
+
+case TARGET_FREEBSD_NR_shmget: /* shmget(2) */
+ret = do_bsd_shmget(arg1, arg2, arg3);
+break;
+
 /*
  * Misc
  */
-- 
2.42.0




[PATCH v4 00/23] bsd-user: Implement mmap related system calls for FreeBSD.

2023-09-14 Thread Karim Taha
Upstream the implementation of the following mmap system calls, from the
qemu-bsd-user fork:
   mmap(2), munmap(2),
   mprotect(2),
   msync(2),
   mlock(2), munlock(2), mlockall(2), munlockall(2), mincore(2),
   madvise(2),
   minherit(2),
   shm_open(2),shm_open2(2), shm_rename2(2), shm_unlink(2), shmget(2), 
shmctl(2), shmat(2),
   shmdt(2)
   brk(2)

Karim Taha (2):
  bsd-user: Add bsd-mem.c to meson.build
  bsd-user: Implment madvise(2) to match the linux-user implementation.

Kyle Evans (2):
  bsd-user: Implement shm_open2(2) system call
  bsd-user: Implement shm_rename(2) system call

Stacey Son (18):
  bsd-user: Implement struct target_ipc_perm
  bsd-user: Implement struct target_shmid_ds
  bsd-user: Declarations for ipc_perm and shmid_ds conversion functions
  bsd-user: Introduce freebsd/os-misc.h to the source tree
  bsd-user: Implement target_set_brk function in bsd-mem.c instead of
os-syscall.c
  bsd-user: Implement ipc_perm conversion between host and target.
  bsd-user: Implement shmid_ds conversion between host and target.
  bsd-user: Introduce bsd-mem.h to the source tree
  bsd-user: Implement mmap(2) and munmap(2)
  bsd-user: Implement mprotect(2)
  bsd-user: Implement msync(2)
  bsd-user: Implement mlock(2), munlock(2), mlockall(2), munlockall(2),
minherit(2)
  bsd-user: Implement mincore(2)
  bsd-user: Implement do_obreak function
  bsd-user: Implement shm_open(2)
  bsd-user: Implement shm_unlink(2) and shmget(2)
  bsd-user: Implement shmctl(2)
  bsd-user: Implement shmat(2) and shmdt(2)

Warner Losh (1):
  bsd-user: Add stubs for vadvise(), sbrk() and sstk()

 bsd-user/bsd-mem.c| 104 
 bsd-user/bsd-mem.h| 452 ++
 bsd-user/freebsd/os-misc.h|  94 +++
 bsd-user/freebsd/os-syscall.c | 112 -
 bsd-user/meson.build  |   1 +
 bsd-user/mmap.c   |   2 +-
 bsd-user/qemu-bsd.h   |  45 
 bsd-user/qemu.h   |   1 +
 bsd-user/syscall_defs.h   |  39 +++
 9 files changed, 845 insertions(+), 5 deletions(-)
 create mode 100644 bsd-user/bsd-mem.c
 create mode 100644 bsd-user/bsd-mem.h
 create mode 100644 bsd-user/freebsd/os-misc.h
 create mode 100644 bsd-user/qemu-bsd.h

-- 
2.42.0




[PATCH v4 19/23] bsd-user: Implement shm_open(2)

2023-09-14 Thread Karim Taha
From: Stacey Son 

Co-authored-by: Kyle Evans 

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 25 +
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 29 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index b296c5c6f0..f8dc943c23 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -257,4 +257,29 @@ static inline abi_long do_obreak(abi_ulong brk_val)
 return target_brk;
 }
 
+/* shm_open(2) */
+static inline abi_long do_bsd_shm_open(abi_ulong arg1, abi_long arg2,
+abi_long arg3)
+{
+int ret;
+void *p;
+
+if (arg1 == (uintptr_t)SHM_ANON) {
+p = SHM_ANON;
+} else {
+p = lock_user_string(arg1);
+if (p == NULL) {
+return -TARGET_EFAULT;
+}
+}
+ret = get_errno(shm_open(p, target_to_host_bitmask(arg2, fcntl_flags_tbl),
+ arg3));
+
+if (p != SHM_ANON) {
+unlock_user(p, arg1, 0);
+}
+
+return ret;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 8dd29fddde..7404b0aa72 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -531,6 +531,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_mincore(arg1, arg2, arg3);
 break;
 
+case TARGET_FREEBSD_NR_freebsd12_shm_open: /* shm_open(2) */
+ret = do_bsd_shm_open(arg1, arg2, arg3);
+break;
+
 #if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
 case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
 ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
-- 
2.42.0




[PATCH v4 10/23] bsd-user: Implement shmid_ds conversion between host and target.

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index 46cda8eb5c..2ab1334b70 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -43,6 +43,30 @@ void target_to_host_ipc_perm__locked(struct ipc_perm 
*host_ip,
 __get_user(host_ip->key,  _ip->key);
 }
 
+abi_long target_to_host_shmid_ds(struct shmid_ds *host_sd,
+ abi_ulong target_addr)
+{
+struct target_shmid_ds *target_sd;
+
+if (!lock_user_struct(VERIFY_READ, target_sd, target_addr, 1)) {
+return -TARGET_EFAULT;
+}
+
+target_to_host_ipc_perm__locked(&(host_sd->shm_perm),
+&(target_sd->shm_perm));
+
+__get_user(host_sd->shm_segsz,  _sd->shm_segsz);
+__get_user(host_sd->shm_lpid,   _sd->shm_lpid);
+__get_user(host_sd->shm_cpid,   _sd->shm_cpid);
+__get_user(host_sd->shm_nattch, _sd->shm_nattch);
+__get_user(host_sd->shm_atime,  _sd->shm_atime);
+__get_user(host_sd->shm_dtime,  _sd->shm_dtime);
+__get_user(host_sd->shm_ctime,  _sd->shm_ctime);
+unlock_user_struct(target_sd, target_addr, 0);
+
+return 0;
+}
+
 void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
  struct ipc_perm *host_ip)
 {
@@ -55,3 +79,26 @@ void host_to_target_ipc_perm__locked(struct target_ipc_perm 
*target_ip,
 __put_user(host_ip->key,  _ip->key);
 }
 
+abi_long host_to_target_shmid_ds(abi_ulong target_addr,
+ struct shmid_ds *host_sd)
+{
+struct target_shmid_ds *target_sd;
+
+if (!lock_user_struct(VERIFY_WRITE, target_sd, target_addr, 0)) {
+return -TARGET_EFAULT;
+}
+
+host_to_target_ipc_perm__locked(&(target_sd->shm_perm),
+&(host_sd->shm_perm));
+
+__put_user(host_sd->shm_segsz,  _sd->shm_segsz);
+__put_user(host_sd->shm_lpid,   _sd->shm_lpid);
+__put_user(host_sd->shm_cpid,   _sd->shm_cpid);
+__put_user(host_sd->shm_nattch, _sd->shm_nattch);
+__put_user(host_sd->shm_atime,  _sd->shm_atime);
+__put_user(host_sd->shm_dtime,  _sd->shm_dtime);
+__put_user(host_sd->shm_ctime,  _sd->shm_ctime);
+unlock_user_struct(target_sd, target_addr, 1);
+
+return 0;
+}
-- 
2.42.0




[PATCH v4 08/23] bsd-user: Implement target_set_brk function in bsd-mem.c instead of os-syscall.c

2023-09-14 Thread Karim Taha
From: Stacey Son 

The definitions and variables names matches the corresponding ones in
linux-user/syscall.c, for making later implementation of do_obreak easier

Co-authored-by: Mikaël Urankar 
Signed-off-by: Mikaël Urankar 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c| 32 
 bsd-user/freebsd/os-syscall.c |  4 
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index e69de29bb2..8834ab2e58 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -0,0 +1,32 @@
+/*
+ *  memory management system conversion routines
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "qemu-bsd.h"
+
+struct bsd_shm_regions bsd_shm_regions[N_BSD_SHM_REGIONS];
+
+abi_ulong target_brk;
+abi_ulong initial_target_brk;
+
+void target_set_brk(abi_ulong new_brk)
+{
+target_brk = TARGET_PAGE_ALIGN(new_brk);
+initial_target_brk = target_brk;
+}
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 2920370ad2..c0a22eb746 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -59,10 +59,6 @@ safe_syscall3(ssize_t, writev, int, fd, const struct iovec 
*, iov, int, iovcnt);
 safe_syscall4(ssize_t, pwritev, int, fd, const struct iovec *, iov, int, 
iovcnt,
 off_t, offset);
 
-void target_set_brk(abi_ulong new_brk)
-{
-}
-
 /*
  * errno conversion.
  */
-- 
2.42.0




[PATCH v4 01/23] bsd-user: Implement struct target_ipc_perm

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/syscall_defs.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index e4825f2662..39a9bc8ed7 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -55,6 +55,23 @@ struct target_iovec {
 abi_long iov_len;   /* Number of bytes */
 };
 
+/*
+ * sys/ipc.h
+ */
+struct target_ipc_perm {
+uint32_tcuid;   /* creator user id */
+uint32_tcgid;   /* creator group id */
+uint32_tuid;/* user id */
+uint32_tgid;/* group id */
+uint16_tmode;   /* r/w permission */
+uint16_tseq;/* sequence # */
+abi_longkey;/* user specified msg/sem/shm key */
+};
+
+#define TARGET_IPC_RMID 0   /* remove identifier */
+#define TARGET_IPC_SET  1   /* set options */
+#define TARGET_IPC_STAT 2   /* get options */
+
 /*
  *  sys/mman.h
  */
-- 
2.42.0




[PATCH v4 05/23] bsd-user: Implement shm_open2(2) system call

2023-09-14 Thread Karim Taha
From: Kyle Evans 

Signed-off-by: Kyle Evans 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-misc.h| 42 +++
 bsd-user/freebsd/os-syscall.c | 13 +++
 2 files changed, 55 insertions(+)

diff --git a/bsd-user/freebsd/os-misc.h b/bsd-user/freebsd/os-misc.h
index 8436ccb2f7..6b424b7078 100644
--- a/bsd-user/freebsd/os-misc.h
+++ b/bsd-user/freebsd/os-misc.h
@@ -24,5 +24,47 @@
 #include 
 #include 
 
+int shm_open2(const char *path, int flags, mode_t mode, int shmflags,
+const char *);
+
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
+/* shm_open2(2) */
+static inline abi_long do_freebsd_shm_open2(abi_ulong pathptr, abi_ulong flags,
+abi_long mode, abi_ulong shmflags, abi_ulong nameptr)
+{
+int ret;
+void *uname, *upath;
+
+if (pathptr == (uintptr_t)SHM_ANON) {
+upath = SHM_ANON;
+} else {
+upath = lock_user_string(pathptr);
+if (upath == NULL) {
+return -TARGET_EFAULT;
+}
+}
+
+uname = NULL;
+if (nameptr != 0) {
+uname = lock_user_string(nameptr);
+if (uname == NULL) {
+unlock_user(upath, pathptr, 0);
+return -TARGET_EFAULT;
+}
+}
+ret = get_errno(shm_open2(upath,
+target_to_host_bitmask(flags, fcntl_flags_tbl), mode,
+target_to_host_bitmask(shmflags, shmflag_flags_tbl), uname));
+
+if (upath != SHM_ANON) {
+unlock_user(upath, pathptr, 0);
+}
+if (uname != NULL) {
+unlock_user(uname, nameptr, 0);
+}
+return ret;
+}
+#endif /* __FreeBSD_version >= 1300048 */
+
 
 #endif /* OS_MISC_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 2224a280ea..b4311db578 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -33,9 +33,13 @@
 #include "signal-common.h"
 #include "user/syscall-trace.h"
 
+/* BSD independent syscall shims */
 #include "bsd-file.h"
 #include "bsd-proc.h"
 
+/* *BSD dependent syscall shims */
+#include "os-misc.h"
+
 /* I/O */
 safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
 safe_syscall4(int, openat, int, fd, const char *, path, int, flags, mode_t,
@@ -482,6 +486,15 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_undelete(arg1);
 break;
 
+/*
+ * Memory management system calls.
+ */
+#if defined(__FreeBSD_version) && __FreeBSD_version >= 1300048
+case TARGET_FREEBSD_NR_shm_open2: /* shm_open2(2) */
+ret = do_freebsd_shm_open2(arg1, arg2, arg3, arg4, arg5);
+break;
+#endif
+
 /*
  * sys{ctl, arch, call}
  */
-- 
2.42.0




[PATCH v4 18/23] bsd-user: Implement do_obreak function

2023-09-14 Thread Karim Taha
From: Stacey Son 

Match linux-user, by manually applying the following commits, in order:

d28b3c90cfad1a7e211ae2bce36ecb9071086129   linux-user: Make sure initial brk(0) 
is page-aligned
15ad98536ad9410fb32ddf1ff09389b677643faa   linux-user: Fix qemu brk() to not 
zero bytes on current page
dfe49864afb06e7e452a4366051697bc4fcfc1a5   linux-user: Prohibit brk() to to 
shrink below initial heap address
eac78a4b0b7da4de2c0a297f4d528ca9cc6256a3   linux-user: Fix signed math overflow 
in brk() syscall
c6cc059eca18d9f6e4e26bb8b6d1135ddb35d81a   linux-user: Do not call get_errno() 
in do_brk()
e69e032d1a8ee8d754ca119009a3c2c997f8bb30   linux-user: Use MAP_FIXED_NOREPLACE 
for do_brk()
cb9d5d1fda0bc2312fc0c779b4ea1d7bf826f31f   linux-user: Do nothing if too small 
brk is specified
2aea137a425a87b930a33590177b04368fd7cc12   linux-user: Do not align brk with 
host page size

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 45 +++
 bsd-user/freebsd/os-syscall.c |  7 ++
 2 files changed, 52 insertions(+)

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
index 0c8d96d9a4..b296c5c6f0 100644
--- a/bsd-user/bsd-mem.h
+++ b/bsd-user/bsd-mem.h
@@ -212,4 +212,49 @@ static inline abi_long do_bsd_mincore(abi_ulong 
target_addr, abi_ulong len,
 return ret;
 }
 
+/* do_brk() must return target values and target errnos. */
+static inline abi_long do_obreak(abi_ulong brk_val)
+{
+abi_long mapped_addr;
+abi_ulong new_brk;
+abi_ulong old_brk;
+
+/* brk pointers are always untagged */
+
+/* do not allow to shrink below initial brk value */
+if (brk_val < initial_target_brk) {
+return target_brk;
+}
+
+new_brk = TARGET_PAGE_ALIGN(brk_val);
+old_brk = TARGET_PAGE_ALIGN(target_brk);
+
+/* new and old target_brk might be on the same page */
+if (new_brk == old_brk) {
+target_brk = brk_val;
+return target_brk;
+}
+
+/* Release heap if necesary */
+if (new_brk < old_brk) {
+target_munmap(new_brk, old_brk - new_brk);
+
+target_brk = brk_val;
+return target_brk;
+}
+
+mapped_addr = target_mmap(old_brk, new_brk - old_brk,
+  PROT_READ | PROT_WRITE,
+  MAP_FIXED | MAP_EXCL | MAP_ANON | MAP_PRIVATE,
+  -1, 0);
+
+if (mapped_addr == old_brk) {
+target_brk = brk_val;
+return target_brk;
+}
+
+/* For everything else, return the previous break. */
+return target_brk;
+}
+
 #endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 8d1cf3b35c..8dd29fddde 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -543,6 +543,13 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 break;
 #endif
 
+/*
+ * Misc
+ */
+case TARGET_FREEBSD_NR_break:
+ret = do_obreak(arg1);
+break;
+
 /*
  * sys{ctl, arch, call}
  */
-- 
2.42.0




[PATCH v4 11/23] bsd-user: Introduce bsd-mem.h to the source tree

2023-09-14 Thread Karim Taha
From: Stacey Son 

Preserve the copyright notice and help with the 'Author' info for
subsequent changes to the file.

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.h| 64 +++
 bsd-user/freebsd/os-syscall.c |  1 +
 2 files changed, 65 insertions(+)
 create mode 100644 bsd-user/bsd-mem.h

diff --git a/bsd-user/bsd-mem.h b/bsd-user/bsd-mem.h
new file mode 100644
index 00..d865e0807d
--- /dev/null
+++ b/bsd-user/bsd-mem.h
@@ -0,0 +1,64 @@
+/*
+ *  memory management system call shims and definitions
+ *
+ *  Copyright (c) 2013-15 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+/*
+ * Copyright (c) 1982, 1986, 1993
+ *  The Regents of the University of California.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#ifndef BSD_USER_BSD_MEM_H
+#define BSD_USER_BSD_MEM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu-bsd.h"
+
+extern struct bsd_shm_regions bsd_shm_regions[];
+extern abi_ulong target_brk;
+extern abi_ulong initial_target_brk;
+
+#endif /* BSD_USER_BSD_MEM_H */
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index c0a22eb746..7e2a395e0f 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -35,6 +35,7 @@
 
 /* BSD independent syscall shims */
 #include "bsd-file.h"
+#include "bsd-mem.h"
 #include "bsd-proc.h"
 
 /* *BSD dependent syscall shims */
-- 
2.42.0




[PATCH v4 07/23] bsd-user: Add bsd-mem.c to meson.build

2023-09-14 Thread Karim Taha
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
Reviewed-by: Warner Losh 
---
 bsd-user/bsd-mem.c   | 0
 bsd-user/meson.build | 1 +
 2 files changed, 1 insertion(+)
 create mode 100644 bsd-user/bsd-mem.c

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
new file mode 100644
index 00..e69de29bb2
diff --git a/bsd-user/meson.build b/bsd-user/meson.build
index 5243122fc5..6ee68fdfe7 100644
--- a/bsd-user/meson.build
+++ b/bsd-user/meson.build
@@ -7,6 +7,7 @@ bsd_user_ss = ss.source_set()
 common_user_inc += include_directories('include')
 
 bsd_user_ss.add(files(
+  'bsd-mem.c',
   'bsdload.c',
   'elfload.c',
   'main.c',
-- 
2.42.0




[PATCH v4 09/23] bsd-user: Implement ipc_perm conversion between host and target.

2023-09-14 Thread Karim Taha
From: Stacey Son 

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
Reviewed-by: Richard Henderson 
---
 bsd-user/bsd-mem.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/bsd-user/bsd-mem.c b/bsd-user/bsd-mem.c
index 8834ab2e58..46cda8eb5c 100644
--- a/bsd-user/bsd-mem.c
+++ b/bsd-user/bsd-mem.c
@@ -30,3 +30,28 @@ void target_set_brk(abi_ulong new_brk)
 target_brk = TARGET_PAGE_ALIGN(new_brk);
 initial_target_brk = target_brk;
 }
+
+void target_to_host_ipc_perm__locked(struct ipc_perm *host_ip,
+ struct target_ipc_perm *target_ip)
+{
+__get_user(host_ip->cuid, _ip->cuid);
+__get_user(host_ip->cgid, _ip->cgid);
+__get_user(host_ip->uid,  _ip->uid);
+__get_user(host_ip->gid,  _ip->gid);
+__get_user(host_ip->mode, _ip->mode);
+__get_user(host_ip->seq,  _ip->seq);
+__get_user(host_ip->key,  _ip->key);
+}
+
+void host_to_target_ipc_perm__locked(struct target_ipc_perm *target_ip,
+ struct ipc_perm *host_ip)
+{
+__put_user(host_ip->cuid, _ip->cuid);
+__put_user(host_ip->cgid, _ip->cgid);
+__put_user(host_ip->uid,  _ip->uid);
+__put_user(host_ip->gid,  _ip->gid);
+__put_user(host_ip->mode, _ip->mode);
+__put_user(host_ip->seq,  _ip->seq);
+__put_user(host_ip->key,  _ip->key);
+}
+
-- 
2.42.0




  1   2   3   4   >