Re: [PATCH v5 10/23] xen/riscv: introduces acrquire, release and full barriers

2024-03-04 Thread Jan Beulich
On 26.02.2024 18:38, Oleksii Kurochko wrote:
> Signed-off-by: Oleksii Kurochko 

Acked-by: Jan Beulich 
albeit ...

> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/fence.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _ASM_RISCV_FENCE_H
> +#define _ASM_RISCV_FENCE_H
> +
> +#define RISCV_ACQUIRE_BARRIER   "\tfence r , rw\n"
> +#define RISCV_RELEASE_BARRIER   "\tfence rw, w\n"
> +#define RISCV_FULL_BARRIER  "\tfence rw, rw\n"

... I'm not really happy with the \t and \n that are put here. My take
on this is that it is the responsibility of the use site to supply such
as and when necessary.

Jan



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:10, Andrew Cooper wrote:
> --- a/xen/include/xen/nospec.h
> +++ b/xen/include/xen/nospec.h
> @@ -9,6 +9,29 @@
>  
>  #include 
>  
> +/*
> + * Protect a conditional branch from bad speculation.  Architectures *must*
> + * provide arch_evaluate_nospec() for this to be effective.
> + */
> +static always_inline bool evaluate_nospec(bool cond)
> +{
> +#ifndef arch_evaluate_nospec
> +#define arch_evaluate_nospec(cond) cond

Hmm, noticed only while replying to patch 2: If the #define is to be kept
(see my reply there) it needs to be one of

#define arch_evaluate_nospec(cond) (cond)

or

#define arch_evaluate_nospec

Or it ought to be #undef-ed after use (thus preventing use in a context
where "cond" may expand to other than "cond").

Jan

> +#endif
> +return arch_evaluate_nospec(cond);
> +}
> +
> +/*
> + * Halt speculation unconditonally.  Architectures *must* provide
> + * arch_block_speculation() for this to be effective.
> + */
> +static always_inline void block_speculation(void)
> +{
> +#ifdef arch_block_speculation
> +arch_block_speculation();
> +#endif
> +}
> +
>  /**
>   * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 
> otherwise
>   * @index: array element index




Re: [PATCH 2/2] xen/nospec: Allow evaluate_nospec() to short circuit constant expressions

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:10, Andrew Cooper wrote:
> --- a/xen/include/xen/nospec.h
> +++ b/xen/include/xen/nospec.h
> @@ -18,6 +18,15 @@ static always_inline bool evaluate_nospec(bool cond)
>  #ifndef arch_evaluate_nospec
>  #define arch_evaluate_nospec(cond) cond
>  #endif
> +
> +/*
> + * If the compiler can reduce the condition to a constant, then it won't
> + * be emitting a conditional branch, and there's nothing needing
> + * protecting.
> + */
> +if ( __builtin_constant_p(cond) )
> +return cond;
> +
>  return arch_evaluate_nospec(cond);
>  }

While for now, even after having some hours for considering, I can't point
out anything concrete that could potentially become a problem here, I
still have the gut feeling that this would better be left in the arch
logic. (There's the oddity of what the function actually expands to if the
#define in context actually takes effect, but that's merely cosmetic.)

The one thing I'm firmly unhappy with is "won't" in the comment: We can't
know what the compiler will do. I've certainly known of compilers which
didn't as you indicate here. That was nothing remotely recent, but
ancient DOS/Windows ones. Still, unlike with e.g. __{get,put}_user_bad()
the compiler doing something unexpected would go entirely silently here.

The other (minor) aspect I'm not entirely happy with is that you insert
between the fallback #define and its use. I think (if we need such a
#define in the first place) the two would better stay close together.

As to the need for the #define: To me

static always_inline bool evaluate_nospec(bool cond)
{
#ifdef arch_evaluate_nospec
return arch_evaluate_nospec(cond);
#else
return cond;
#endif
}

or even

static always_inline bool evaluate_nospec(bool cond)
{
#ifdef arch_evaluate_nospec
return arch_evaluate_nospec(cond);
#endif
return cond;
}

reads no worse, but perhaps slightly better, and is then consistent with
block_speculation(). At which point the question about "insertion point"
here would hopefully also disappear, as this addition is meaningful only
ahead of the #else.

Jan



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:10, Andrew Cooper wrote:
> It is daft to require all architectures to provide empty implementations of
> this functionality.
> 
> Provide evaluate_nospec() and block_speculation() unconditionally in
> xen/nospec.h with architectures able to opt in by providing suitable arch
> variants.
> 
> Rename x86's implementation to the arch_*() variants.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Upon further thinking and with Julien's recent reply in mind:
Acked-by: Jan Beulich 

Still I'd prefer if the arch_* were left out. They look to me to go this
half step too far (despite now having looked at patch 2 as well; I'll
reply there separately). And it is this which was why I decided that
going the other half step (moving these handful lines of code) wasn't
really worth it, when considering whether to make such a patch myself.

Jan



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 18:40, Julien Grall wrote:
> Hi Andrew,
> 
> On 04/03/2024 17:07, Andrew Cooper wrote:
>> On 04/03/2024 4:55 pm, Jan Beulich wrote:
>>> On 04.03.2024 17:46, Julien Grall wrote:
 On 04/03/2024 16:41, Jan Beulich wrote:
> On 04.03.2024 17:31, Julien Grall wrote:
>> On 04/03/2024 16:10, Andrew Cooper wrote:
>>> It is daft to require all architectures to provide empty 
>>> implementations of
>>> this functionality.
>> Oleksii recenlty sent a similar patch [1]. This was pushed back because
>> from naming, it sounds like the helpers ought to be non-empty on every
>> architecture.
>>
>> It would be best if asm-generic provides a safe version of the helpers.
>> So my preference is to not have this patch. This can of course change if
>> I see an explanation why it is empty on Arm (I believe it should contain
>> csdb) and other arch would want the same.
> Except that there's no new asm-generic/ header here (as opposed to how
> Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
> when introducing an asm-generic/ header would not have been. Of course
> if Arm wants to put something there rather sooner than later, then
> perhaps the functions better wouldn't be removed from there, just to then
> be put back pretty soon.
 I am confused. I agree the patch is slightly different, but I thought
 the fundamental problem was the block_speculation() implementation may
 not be safe everywhere. And it was best to let each architecture decide
 how they want to implement (vs Xen decide for us the default).

 Reading the original thread, I thought you had agreed with that
 statement. Did I misinterpret?
>>> Yes and no. Whatever is put in asm-generic/ ought to be correct and safe
>>> by default, imo. The same doesn't apply to fallbacks put in place in
>>> headers in xen/: If an arch doesn't provide its own implementation, it
>>> indicates that the default (fallback) is good enough. Still I can easily
>>> see that other views are possible here ...
>>
>> With speculation, there's absolutely nothing we can possibly do in any
>> common code which will be safe generally.
>>
>> But we can make it less invasive until an architecture wants to
>> implement the primitives.
> 
> I understand the goal. However, I am unsure it is a good idea to provide 
> unsafe just to reduce the arch specific header by a few lines. My 
> concern is new ports may not realize that block_speculation() needs to 
> be implemented. This could end to a preventable XSA in the future.
> 
> I guess the risk could be reduced if we had some documentation 
> explaining how to port Xen to a new architecture (I am not asking you to 
> write the doc).

But that's precisely the difference I'm trying to point out between having
a stub header in asm-generic/ vs having the fallback in xen/nospec.h: This
way an arch still has to supply asm/nospec.h, and hence they can be
expected to consider what needs putting there and what can be left to the
fallbacks (whether just "for the time being" is a separate question).
Whereas allowing to simply point at the asm-generic/ header is (imo) far
more likely to have only little thought applied ("oh, there is that
generic header, let's just use it").

Yet as said, the line between the two can certainly be viewed as blurred.

Jan



Re: [XEN PATCH 10/10] xen/keyhandler: address violations of MISRA C Rule 20.7

2024-03-04 Thread Jan Beulich
On 05.03.2024 03:03, Stefano Stabellini wrote:
> On Mon, 4 Mar 2024, Jan Beulich wrote:
>> On 02.03.2024 02:37, Stefano Stabellini wrote:
>>> On Fri, 1 Mar 2024, Jan Beulich wrote:
 On 29.02.2024 23:57, Stefano Stabellini wrote:
> On Thu, 29 Feb 2024, Nicola Vetrini wrote:
>> MISRA C Rule 20.7 states: "Expressions resulting from the expansion
>> of macro parameters shall be enclosed in parentheses". Therefore, some
>> macro definitions should gain additional parentheses to ensure that all
>> current and future users will be safe with respect to expansions that
>> can possibly alter the semantics of the passed-in macro parameter.
>>
>> No functional change.
>>
>> Signed-off-by: Nicola Vetrini 
>
> Reviewed-by: Stefano Stabellini 

 You did see the discussion on earlier patches, though? I don't think
 any of the parentheses here are needed or wanted.
>>>
>>> We need to align on this. Currently if we go by what's written in
>>> docs/misra/deviations.rst, then rhs should have parentheses.
>>
>> Quoting the actual patch again:
> 
> [...]
> 
>> What rhs are you talking about in light of this change? The only rhs I
>> can spot here is already parenthesized.
> 
> Yes you are right. I replied here as an overall comment about our
> approach to 20.7, although this patch is not a good example. My reply
> was meant in the context of https://marc.info/?l=xen-devel=170928051025701

I'm still confused: The rhs is being parenthsized there. It's the _lhs_
which isn't and ...

>>> Can we safely claim that rhs parentheses are never needed? If so, then
>>> great, let's add it to deviations.rst and skip them here and other
>>> places in this patch series (e.g. patch #8). When I say "never" I am
>>> taking for granted that the caller is not doing something completely
>>> unacceptably broken such as: 
>>>
>>>  WRITE_SYSREG64(var +, TTBR0_EL1)
>>
>> I'm afraid I can't associate this with the patch here either. Instead in
>> the context here a (respective) construct as you mention above would simply
>> fail to build.
> 
> Fair enough it will break the build. I was trying to clarify that when I
> wrote "the rhs parentheses are never needed" I meant "never" within
> reason. One can always find ways to break the system and I tried to make
> an example of something that for sure would break rhs or lhs without
> parentheses.
> 
> I meant to say, if we don't account for exceptionally broken cases, can
> we safety say we don't need parentheses for rhs?

... doesn't need to, unless - as you say - one contrives examples. Yet to
clarify here as well: I assume you mean "we don't need parentheses for lhs".

And note that even if your example used the first parameter as lhs of an
assignment, the build would still break. The + there would not magically
combine with the = to a += operator. Tokenization occurs ahead of
preprocessing, so the expanded macro would still have a + token followed by
a = one. The only way to alter tokens is by using the ## operator. Which in
turn precludes using parentheses.

Jan



Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Jan Beulich
On 05.03.2024 02:49, Stefano Stabellini wrote:
> On Mon, 4 Mar 2024, Jan Beulich wrote:
>> On 04.03.2024 16:39, Federico Serafini wrote:
>>> On 04/03/24 15:17, Jan Beulich wrote:
 On 04.03.2024 14:31, Federico Serafini wrote:
> On 01/03/24 09:06, Jan Beulich wrote:
>> On 01.03.2024 00:28, Stefano Stabellini wrote:
>>> On Wed, 14 Feb 2024, Federico Serafini wrote:
 On 14/02/24 14:15, Jan Beulich wrote:
> On 14.02.2024 12:27, Federico Serafini wrote:
>> On 14/02/24 09:28, Jan Beulich wrote:
>>> On 13.02.2024 23:33, Stefano Stabellini wrote:
 Signed-off-by: Stefano Stabellini 
 ---
  docs/misra/rules.rst | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
 index c185366966..931158b354 100644
 --- a/docs/misra/rules.rst
 +++ b/docs/misra/rules.rst
 @@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
 headers (xen/include/public/) are allowed to retain 
 longer
 identifiers for backward compatibility.
  +   * - `Rule 5.5
 `_
 + - Required
 + - Identifiers shall be distinct from macro names
 + - Clashes between function-like macros and non-callable 
 entities
 +   are allowed. The pattern #define x x is also allowed.
>>>
>>> Just for me to know what exactly is covered (hence also a question
>>> to Roberto as to [to be] implemented Eclair behavior): Even when
>>> the above would be sufficient (and imo better) people frequently
>>> write
>>>
>>> #define a(x, y) b(x, y)
>>>
>>> which, transformed to the specific case here, would then be
>>>
>>> #define a(x, y) a(x, y)
>>>
>>> I'd assume such ought to also be covered, but that's not clear
>>> from the spelling above.
>>
>> I list what happens in some different situations,
>> then we can find the right words for the documentation and/or
>> refine the configuration:
>>
>> If you
>> #define x x
>> and then use `x' as identifier,
>> the resulting violation is deviated (allowed pattern).
>>
>> If you
>> #define a(x, y) a(x, y)
>> and then use `a' as identifier for a non-callable entity,
>> the resulting violation is deviated (no clash with non-callable
>> entities).
>> If you use identifier `a' for a callable entity, the resulting 
>> violation
>> is reported: the allowed pattern covers only macros expanding to 
>> their
>> own name, in this case the macro name is considered to be
>> `a' only, not a(x, y).
>>
>> If you
>> #define a(x, y) b(x, y)
>> and then use `a' as identifier for a non-callable entity,
>> the resulting violation is deviated (no clash with non-callable
>> entities).
>
> I'm afraid I don't see what violation there is in this case, to
> deviate. As a result I'm also not sure I correctly understand the
> rest of your reply.

 #define a(x, y) b(x, y)

 int a; // Violation of Rule 5.5.

 The macro name `a' that exist before the preprocessing phase,
 still exists after the preprocessing phase as identifier for the 
 integer
 variable and this is a violation.

>> If you use `a' as identifier for a callable entity,
>> this is not a violation because after the preprocessing phase,
>> identifier `a' no longer exists.
 I correct myself:
 if you use `a' as identifier for a *function*,
 it is not a violation because after the preprocessing phase
 the identifier `a' no longer exists, for example:

 #define a(x, y) b(x, y)

 void a(int x, int y); // Ok.
>>>
>>> Federico, do you have a better wording suggestion for this rule?
>>>
>>> Jan, any further requests here? What would you like to see as next step?
>>
>> A more concise wording proposal would probably help.
>
> What do you think about:
>
> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> index 1e134ccebc..a975b9a85f 100644
> --- a/docs/misra/rules.rst
> +++ b/docs/misra/rules.rst
> @@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
>   headers (xen/include/public/) are allowed to retain longer
>   identifiers for backward compatibility.
>
> +   * - `Rule 5.5
> 

[xen-unstable test] 184905: tolerable FAIL - PUSHED

2024-03-04 Thread osstest service owner
flight 184905 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184905/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
184833
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184833
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184833
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184833
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184833
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184833
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184833
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184833
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184833
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  fc84b4a5a37b9250d87ef63983b48e1953bba6d1
baseline version:
 xen  efad36f1ba18946acecc030166b1a6bebeb88ea2

Last test of basis   184833  2024-03-01 14:11:30 Z3 days
Failing since184852  2024-03-02 16:36:22 Z2 days5 attempts
Testing same since   184905  2024-03-04 19:41:29 Z0 days1 attempts


People who 

[PATCH v4 bpf-next 1/2] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.

2024-03-04 Thread Alexei Starovoitov
From: Alexei Starovoitov 

There are various users of get_vm_area() + ioremap_page_range() APIs.
Enforce that get_vm_area() was requested as VM_IOREMAP type and range
passed to ioremap_page_range() matches created vm_area to avoid
accidentally ioremap-ing into wrong address range.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Alexei Starovoitov 
---
 mm/vmalloc.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..f42f98a127d5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -307,8 +307,21 @@ static int vmap_range_noflush(unsigned long addr, unsigned 
long end,
 int ioremap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot)
 {
+   struct vm_struct *area;
int err;
 
+   area = find_vm_area((void *)addr);
+   if (!area || !(area->flags & VM_IOREMAP)) {
+   WARN_ONCE(1, "vm_area at addr %lx is not marked as 
VM_IOREMAP\n", addr);
+   return -EINVAL;
+   }
+   if (addr != (unsigned long)area->addr ||
+   (void *)end != area->addr + get_vm_area_size(area)) {
+   WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area 
[%lx, %lx)\n",
+ addr, end, (long)area->addr,
+ (long)area->addr + get_vm_area_size(area));
+   return -ERANGE;
+   }
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
 ioremap_max_page_shift);
flush_cache_vmap(addr, end);
-- 
2.43.0




[PATCH v4 bpf-next 2/2] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().

2024-03-04 Thread Alexei Starovoitov
From: Alexei Starovoitov 

vmap/vmalloc APIs are used to map a set of pages into contiguous kernel
virtual space.

get_vm_area() with appropriate flag is used to request an area of kernel
address range. It's used for vmalloc, vmap, ioremap, xen use cases.
- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag.
- the areas created by vmap() function should be tagged with VM_MAP.
- ioremap areas are tagged with VM_IOREMAP.

BPF would like to extend the vmap API to implement a lazily-populated
sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag
and vm_area_map_pages(area, start_addr, count, pages) API to map a set
of pages within a given area.
It has the same sanity checks as vmap() does.
It also checks that get_vm_area() was created with VM_SPARSE flag
which identifies such areas in /proc/vmallocinfo
and returns zero pages on read through /proc/kcore.

The next commits will introduce bpf_arena which is a sparsely populated
shared memory region between bpf program and user space process. It will
map privately-managed pages into a sparse vm area with the following steps:

  // request virtual memory region during bpf prog verification
  area = get_vm_area(area_size, VM_SPARSE);

  // on demand
  vm_area_map_pages(area, kaddr, kend, pages);
  vm_area_unmap_pages(area, kaddr, kend);

  // after bpf program is detached and unloaded
  free_vm_area(area);

Signed-off-by: Alexei Starovoitov 
---
 include/linux/vmalloc.h |  5 
 mm/vmalloc.c| 59 +++--
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..0f72c85a377b 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -35,6 +35,7 @@ struct iov_iter;  /* in uio.h */
 #else
 #define VM_DEFER_KMEMLEAK  0
 #endif
+#define VM_SPARSE  0x1000  /* sparse vm_area. not all 
pages are present. */
 
 /* bits [20..32] reserved for arch specific ioremap internals */
 
@@ -232,6 +233,10 @@ static inline bool is_vm_area_hugepages(const void *addr)
 }
 
 #ifdef CONFIG_MMU
+int vm_area_map_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end, struct page **pages);
+void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
+unsigned long end);
 void vunmap_range(unsigned long addr, unsigned long end);
 static inline void set_vm_flush_reset_perms(void *addr)
 {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f42f98a127d5..e5b8c70950bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -648,6 +648,58 @@ static int vmap_pages_range(unsigned long addr, unsigned 
long end,
return err;
 }
 
+static int check_sparse_vm_area(struct vm_struct *area, unsigned long start,
+   unsigned long end)
+{
+   might_sleep();
+   if (WARN_ON_ONCE(area->flags & VM_FLUSH_RESET_PERMS))
+   return -EINVAL;
+   if (WARN_ON_ONCE(area->flags & VM_NO_GUARD))
+   return -EINVAL;
+   if (WARN_ON_ONCE(!(area->flags & VM_SPARSE)))
+   return -EINVAL;
+   if ((end - start) >> PAGE_SHIFT > totalram_pages())
+   return -E2BIG;
+   if (start < (unsigned long)area->addr ||
+   (void *)end > area->addr + get_vm_area_size(area))
+   return -ERANGE;
+   return 0;
+}
+
+/**
+ * vm_area_map_pages - map pages inside given sparse vm_area
+ * @area: vm_area
+ * @start: start address inside vm_area
+ * @end: end address inside vm_area
+ * @pages: pages to map (always PAGE_SIZE pages)
+ */
+int vm_area_map_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end, struct page **pages)
+{
+   int err;
+
+   err = check_sparse_vm_area(area, start, end);
+   if (err)
+   return err;
+
+   return vmap_pages_range(start, end, PAGE_KERNEL, pages, PAGE_SHIFT);
+}
+
+/**
+ * vm_area_unmap_pages - unmap pages inside given sparse vm_area
+ * @area: vm_area
+ * @start: start address inside vm_area
+ * @end: end address inside vm_area
+ */
+void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
+unsigned long end)
+{
+   if (check_sparse_vm_area(area, start, end))
+   return;
+
+   vunmap_range(start, end);
+}
+
 int is_vmalloc_or_module_addr(const void *x)
 {
/*
@@ -3822,9 +3874,9 @@ long vread_iter(struct iov_iter *iter, const char *addr, 
size_t count)
 
if (flags & VMAP_RAM)
copied = vmap_ram_vread_iter(iter, addr, n, flags);
-   else if (!(vm && (vm->flags & VM_IOREMAP)))
+   else if (!(vm && (vm->flags & (VM_IOREMAP | VM_SPARSE
copied = aligned_vread_iter(iter, addr, n);
-   else /* IOREMAP area is treated as memory hole */
+   else /* IOREMAP | SPARSE area is treated as memory hole 

[PATCH v4 bpf-next 0/2] mm: Enforce ioremap address space and introduce sparse vm_area

2024-03-04 Thread Alexei Starovoitov
From: Alexei Starovoitov 

v3 -> v4
- dropped VM_XEN patch for now. It will be in the follow up.
- fixed constant as pointed out by Mike

v2 -> v3
- added Christoph's reviewed-by to patch 1
- cap commit log lines to 75 chars
- factored out common checks in patch 3 into helper
- made vm_area_unmap_pages() return void

There are various users of kernel virtual address space:
vmalloc, vmap, ioremap, xen.

- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag
and these areas are treated differently by KASAN.

- the areas created by vmap() function should be tagged with VM_MAP
(as majority of the users do).

- ioremap areas are tagged with VM_IOREMAP and vm area start is aligned
to size of the area unlike vmalloc/vmap.

- there is also xen usage that is marked as VM_IOREMAP, but it doesn't
call ioremap_page_range() unlike all other VM_IOREMAP users.

To clean this up a bit, enforce that ioremap_page_range() checks the range
and VM_IOREMAP flag.

In addition BPF would like to reserve regions of kernel virtual address
space and populate it lazily, similar to xen use cases.
For that reason, introduce VM_SPARSE flag and vm_area_[un]map_pages()
helpers to populate this sparse area.

In the end the /proc/vmallocinfo will show
"vmalloc"
"vmap"
"ioremap"
"sparse"
categories for different kinds of address regions.

ioremap, sparse will return zero when dumped through /proc/kcore

Alexei Starovoitov (2):
  mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
  mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().

 include/linux/vmalloc.h |  5 +++
 mm/vmalloc.c| 72 +++--
 2 files changed, 75 insertions(+), 2 deletions(-)

-- 
2.43.0




Re: [XEN PATCH 10/10] xen/keyhandler: address violations of MISRA C Rule 20.7

2024-03-04 Thread Stefano Stabellini
On Mon, 4 Mar 2024, Jan Beulich wrote:
> On 02.03.2024 02:37, Stefano Stabellini wrote:
> > On Fri, 1 Mar 2024, Jan Beulich wrote:
> >> On 29.02.2024 23:57, Stefano Stabellini wrote:
> >>> On Thu, 29 Feb 2024, Nicola Vetrini wrote:
>  MISRA C Rule 20.7 states: "Expressions resulting from the expansion
>  of macro parameters shall be enclosed in parentheses". Therefore, some
>  macro definitions should gain additional parentheses to ensure that all
>  current and future users will be safe with respect to expansions that
>  can possibly alter the semantics of the passed-in macro parameter.
> 
>  No functional change.
> 
>  Signed-off-by: Nicola Vetrini 
> >>>
> >>> Reviewed-by: Stefano Stabellini 
> >>
> >> You did see the discussion on earlier patches, though? I don't think
> >> any of the parentheses here are needed or wanted.
> > 
> > We need to align on this. Currently if we go by what's written in
> > docs/misra/deviations.rst, then rhs should have parentheses.
> 
> Quoting the actual patch again:

[...]

> What rhs are you talking about in light of this change? The only rhs I
> can spot here is already parenthesized.

Yes you are right. I replied here as an overall comment about our
approach to 20.7, although this patch is not a good example. My reply
was meant in the context of https://marc.info/?l=xen-devel=170928051025701


> > Can we safely claim that rhs parentheses are never needed? If so, then
> > great, let's add it to deviations.rst and skip them here and other
> > places in this patch series (e.g. patch #8). When I say "never" I am
> > taking for granted that the caller is not doing something completely
> > unacceptably broken such as: 
> > 
> >  WRITE_SYSREG64(var +, TTBR0_EL1)
> 
> I'm afraid I can't associate this with the patch here either. Instead in
> the context here a (respective) construct as you mention above would simply
> fail to build.

Fair enough it will break the build. I was trying to clarify that when I
wrote "the rhs parentheses are never needed" I meant "never" within
reason. One can always find ways to break the system and I tried to make
an example of something that for sure would break rhs or lhs without
parentheses.

I meant to say, if we don't account for exceptionally broken cases, can
we safety say we don't need parentheses for rhs?


 
> > If we cannot generically claim that rhs parentheses are never needed,
> > then I don't think we should make any exceptions. We should add them here
> > and everywhere else. It should be easy to write a macro or review a
> > patch with a macro from someone else, and making special exception makes
> > it more difficult for everyone.



Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.6

2024-03-04 Thread Stefano Stabellini
On Mon, 4 Mar 2024, Federico Serafini wrote:
> Update ECLAIR configuration to take into account the deviations
> agreed during MISRA meetings.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 




Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Stefano Stabellini
On Mon, 4 Mar 2024, Jan Beulich wrote:
> On 04.03.2024 16:39, Federico Serafini wrote:
> > On 04/03/24 15:17, Jan Beulich wrote:
> >> On 04.03.2024 14:31, Federico Serafini wrote:
> >>> On 01/03/24 09:06, Jan Beulich wrote:
>  On 01.03.2024 00:28, Stefano Stabellini wrote:
> > On Wed, 14 Feb 2024, Federico Serafini wrote:
> >> On 14/02/24 14:15, Jan Beulich wrote:
> >>> On 14.02.2024 12:27, Federico Serafini wrote:
>  On 14/02/24 09:28, Jan Beulich wrote:
> > On 13.02.2024 23:33, Stefano Stabellini wrote:
> >> Signed-off-by: Stefano Stabellini 
> >> ---
> >>  docs/misra/rules.rst | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> >> index c185366966..931158b354 100644
> >> --- a/docs/misra/rules.rst
> >> +++ b/docs/misra/rules.rst
> >> @@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
> >> headers (xen/include/public/) are allowed to retain 
> >> longer
> >> identifiers for backward compatibility.
> >>  +   * - `Rule 5.5
> >> `_
> >> + - Required
> >> + - Identifiers shall be distinct from macro names
> >> + - Clashes between function-like macros and non-callable 
> >> entities
> >> +   are allowed. The pattern #define x x is also allowed.
> >
> > Just for me to know what exactly is covered (hence also a question
> > to Roberto as to [to be] implemented Eclair behavior): Even when
> > the above would be sufficient (and imo better) people frequently
> > write
> >
> > #define a(x, y) b(x, y)
> >
> > which, transformed to the specific case here, would then be
> >
> > #define a(x, y) a(x, y)
> >
> > I'd assume such ought to also be covered, but that's not clear
> > from the spelling above.
> 
>  I list what happens in some different situations,
>  then we can find the right words for the documentation and/or
>  refine the configuration:
> 
>  If you
>  #define x x
>  and then use `x' as identifier,
>  the resulting violation is deviated (allowed pattern).
> 
>  If you
>  #define a(x, y) a(x, y)
>  and then use `a' as identifier for a non-callable entity,
>  the resulting violation is deviated (no clash with non-callable
>  entities).
>  If you use identifier `a' for a callable entity, the resulting 
>  violation
>  is reported: the allowed pattern covers only macros expanding to 
>  their
>  own name, in this case the macro name is considered to be
>  `a' only, not a(x, y).
> 
>  If you
>  #define a(x, y) b(x, y)
>  and then use `a' as identifier for a non-callable entity,
>  the resulting violation is deviated (no clash with non-callable
>  entities).
> >>>
> >>> I'm afraid I don't see what violation there is in this case, to
> >>> deviate. As a result I'm also not sure I correctly understand the
> >>> rest of your reply.
> >>
> >> #define a(x, y) b(x, y)
> >>
> >> int a; // Violation of Rule 5.5.
> >>
> >> The macro name `a' that exist before the preprocessing phase,
> >> still exists after the preprocessing phase as identifier for the 
> >> integer
> >> variable and this is a violation.
> >>
>  If you use `a' as identifier for a callable entity,
>  this is not a violation because after the preprocessing phase,
>  identifier `a' no longer exists.
> >> I correct myself:
> >> if you use `a' as identifier for a *function*,
> >> it is not a violation because after the preprocessing phase
> >> the identifier `a' no longer exists, for example:
> >>
> >> #define a(x, y) b(x, y)
> >>
> >> void a(int x, int y); // Ok.
> >
> > Federico, do you have a better wording suggestion for this rule?
> >
> > Jan, any further requests here? What would you like to see as next step?
> 
>  A more concise wording proposal would probably help.
> >>>
> >>> What do you think about:
> >>>
> >>> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> >>> index 1e134ccebc..a975b9a85f 100644
> >>> --- a/docs/misra/rules.rst
> >>> +++ b/docs/misra/rules.rst
> >>> @@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
> >>>   headers (xen/include/public/) are allowed to retain longer
> >>>   identifiers for backward compatibility.
> >>>
> >>> +   * - `Rule 5.5
> >>> `_

Re: [XEN PATCH 02/10] xen/arm: address some violations of MISRA C Rule 20.7

2024-03-04 Thread Stefano Stabellini
On Mon, 4 Mar 2024, Nicola Vetrini wrote:
> Hi,
> 
> as the maintainers of this subsystem, would you prefer Jan's version or the
> one in the patch?
> Both are fine w.r.t MISRA Rule 20.7 because the macro arguments themselves are
> parenthesized.

I prefer Jan's version. Thanks for asking Nicola.


> > > > --- a/xen/arch/arm/include/asm/vgic-emul.h
> > > > +++ b/xen/arch/arm/include/asm/vgic-emul.h
> > > > @@ -6,11 +6,11 @@
> > > >   * a range of registers
> > > >   */
> > > > 
> > > > -#define VREG32(reg) reg ... reg + 3
> > > > -#define VREG64(reg) reg ... reg + 7
> > > > +#define VREG32(reg) (reg) ... (reg) + 3
> > > > +#define VREG64(reg) (reg) ... (reg) + 7
> > > 
> > > #define VREG32(reg) (reg) ... ((reg) + 3)
> > > #define VREG64(reg) (reg) ... ((reg) + 7)
> > > 
> > > ?
> > > 
> > 
> > The outer parentheses are not required, but I can add them if the
> > maintainers share your view.




RE: [PATCH] MAINTAINERS: drop AMD SVM and Intel VT-x sections

2024-03-04 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Monday, March 4, 2024 5:28 PM
> 
> We'd like to thank the VT-x maintainers for their past contributions,
> while at the same time we'd like to reflect reality as it has been for
> quite some time. Have VT-x maintainership (and for symmetry also AMD
> SVM's) fall back to general x86.
> 
> Signed-off-by: Jan Beulich 
> 

Acked-by: Kevin Tian 


Re: [PATCH v2 bpf-next 2/3] mm, xen: Separate xen use cases from ioremap.

2024-03-04 Thread Alexei Starovoitov
On Sun, Mar 3, 2024 at 11:55 PM Mike Rapoport  wrote:
>
> On Fri, Feb 23, 2024 at 03:57:27PM -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov 
> >
> > xen grant table and xenbus ring are not ioremap the way arch specific code 
> > is using it,
> > so let's add VM_XEN flag to separate them from VM_IOREMAP users.
> > xen will not and should not be calling ioremap_page_range() on that range.
> > /proc/vmallocinfo will print such region as "xen" instead of "ioremap" as 
> > well.
> >
> > Signed-off-by: Alexei Starovoitov 
> > ---
> >  arch/x86/xen/grant-table.c | 2 +-
> >  drivers/xen/xenbus/xenbus_client.c | 2 +-
> >  include/linux/vmalloc.h| 1 +
> >  mm/vmalloc.c   | 7 +--
> >  4 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> > index 1e681bf62561..b816db0349c4 100644
> > --- a/arch/x86/xen/grant-table.c
> > +++ b/arch/x86/xen/grant-table.c
> > @@ -104,7 +104,7 @@ static int arch_gnttab_valloc(struct gnttab_vm_area 
> > *area, unsigned nr_frames)
> >   area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), 
> > GFP_KERNEL);
> >   if (area->ptes == NULL)
> >   return -ENOMEM;
> > - area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
> > + area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_XEN);
> >   if (!area->area)
> >   goto out_free_ptes;
> >   if (apply_to_page_range(_mm, (unsigned long)area->area->addr,
> > diff --git a/drivers/xen/xenbus/xenbus_client.c 
> > b/drivers/xen/xenbus/xenbus_client.c
> > index 32835b4b9bc5..b9c81a2d578b 100644
> > --- a/drivers/xen/xenbus/xenbus_client.c
> > +++ b/drivers/xen/xenbus/xenbus_client.c
> > @@ -758,7 +758,7 @@ static int xenbus_map_ring_pv(struct xenbus_device *dev,
> >   bool leaked = false;
> >   int err = -ENOMEM;
> >
> > - area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_IOREMAP);
> > + area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_XEN);
> >   if (!area)
> >   return -ENOMEM;
> >   if (apply_to_page_range(_mm, (unsigned long)area->addr,
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index c720be70c8dd..223e51c243bc 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -28,6 +28,7 @@ struct iov_iter;/* in uio.h */
> >  #define VM_FLUSH_RESET_PERMS 0x0100  /* reset direct map and flush 
> > TLB on unmap, can't be freed in atomic context */
> >  #define VM_MAP_PUT_PAGES 0x0200  /* put pages and free array 
> > in vfree */
> >  #define VM_ALLOW_HUGE_VMAP   0x0400  /* Allow for huge pages on 
> > archs with HAVE_ARCH_HUGE_VMALLOC */
> > +#define VM_XEN   0x0800  /* xen use cases */
> >
> >  #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
> >   !defined(CONFIG_KASAN_VMALLOC)
>
> There's also VM_DEFER_KMEMLEAK a line below:

Ohh. Good catch. Will fix.

> I think it makes sense to use an enumeration for vm_flags, just like as
> Suren did for GFP
> (https://lore.kernel.org/linux-mm/20240224015800.2569851-1-sur...@google.com/)

Hmm. I'm pretty sure Christoph hates BIT macro obfuscation.
I'm not a fan of it either, though we use it in bpf in a few places.
If mm folks prefer that style they can do such conversion later.



Re: AMD-Vi issue

2024-03-04 Thread Elliott Mitchell
On Mon, Mar 04, 2024 at 11:55:07PM +, Andrew Cooper wrote:
> On 25/01/2024 8:24 pm, Elliott Mitchell wrote:
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> >
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > 0x8 I
> 
> You have a device which is issuing interrupts which have been blocked
> due to the IOMMU configuration.
> 
> But you've elided all details other than the fact it's assigned to
> dom0.  As such, there is nothing we can do to help.

I've provided the details thought most likely to allow others to
reproduce.  I've pointed to a report from someone else with somewhat
similar hardware who also encountered this (and had enough hardware to
try several combinations).

Sorry about being reluctant to send exact hardware serial numbers to a
public mailing list.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Re: [PATCH] xen/common: Do not allocate magic pages 1:1 for direct mapped domains

2024-03-04 Thread Henry Wang

Hi Julien,

On 3/5/2024 2:38 AM, Julien Grall wrote:

On 01/03/2024 03:03, Henry Wang wrote:

Hi Julien,

Hi Henry,

On 2/28/2024 8:27 PM, Julien Grall wrote:

Hi Henry,
...here basically means we want to do the finding of the unused 
region in toolstack. Since currently what we care about is only a 
couple of pages instead of the whole memory map, could it be 
possible that we do the opposite: in alloc_xs_page(), we issue a 
domctl hypercall to Xen and do the finding work in Xen and return 
with the found gfn? Then the page can be mapped by 
populate_physmap() from alloc_xs_page() and used for XenStore.


I know that DOMCTL hypercalls are not stable. But I am not overly 
happy with creating an hypercall which is just "fetch the magic 
regions". I think it need to be a more general one that would expose 
all the regions.


Also, you can't really find the magic regions when the hypercall is 
done as we don't have the guest memory layout. This will need to be 
done in advance.


Overall, I think it would be better if we provide an hypercall 
listing the regions currently occupied (similar to e820). One will 
have the type "magic pages".


Would below design make sense to you:


This looks good in principle. 


Great! Actually Michal helped a lot to come up with the proposal (Thanks 
Michal!). I am really sorry that I missed to mention this in my last 
email and didn't credit him properly.



Some comments below.
(1) Similarly as the e820, we can firstly introduce some data 
structures in struct arch_domain:


#define MEM_REGIONS_MAX 4

// For now we only care the magic regions, but in the future this can 
be easily extended with other types such as RAM, MMIO etc.

enum mem_region_type {
 MEM_REGION_MAGIC,
};

struct mem_region {
 paddr_t start;
 paddr_t size;
 enum mem_region_type type;
};

struct domain_mem_map {
 unsigned int nr_mem_regions;
 struct mem_region region[MEM_REGIONS_MAX];
};


If you plan to expose the same interface externally, then you will 
need to replace paddr_t with uint64_t and avoid using an enum in the 
structure.


Yes you are correct. Maybe we can also try using xen_pfn_t and the 
number of pages to describe the range as an alternative. I will use the 
one that makes the implementation easier.


You will also want to expose a dynamic array (even if this is fixed in 
the hypervisor).


Ok.


struct arch_domain {
...
 struct domain_mem_map mem_map;
};

(2) Then in domain creation time, for normal and static 
non-directmapped Dom0less DomU, set d->arch.mem_map.region[0].start = 
GUEST_MAGIC_BASE and the size to 4 pages. For static and directmapped 
Dom0less DomU, find a free region of 4 pages for magic pages. The 
finding of a free region can reuse the approach for extended region 
and be called before make_hypervisor_node(), and when 
make_hypervisor_node() is called. We carve the magic pages out from 
the actual extended region.


(3) Add a new hypercall XENMEM_memory_map_domid (or a domctl). Input 
will be domid and output will be the memory map of the domain 
recorded in struct arch_domain.


XENMEM_* are supposed to be stable interface. I am not aware of any 
use in the guest yet, so I think it would be best to use a DOMCTL.


Sounds good to me. Thanks for the input!

Kind regards,
Henry



(4) In alloc_xs_page() and alloc_magic_pages, replace the hardcoded 
base address with the new address found by the hypercall.


Cheers,






Re: AMD-Vi issue

2024-03-04 Thread Andrew Cooper
On 25/01/2024 8:24 pm, Elliott Mitchell wrote:
> The original observation features MD-RAID1 using a pair of Samsung
> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
>
> (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 0x8 I

You have a device which is issuing interrupts which have been blocked
due to the IOMMU configuration.

But you've elided all details other than the fact it's assigned to
dom0.  As such, there is nothing we can do to help.

This isn't the first time you've been asked to provide a bare minimum
amount of details such that we might be able to help.  I'm sorry you are
having problems, but continuing to ping a question with no actionable
information is unfair to those of us who are who are providing support
to the community for free.

~Andrew



Re: Xen 4.18rc/ARM64 on Raspberry Pi 4B: Traffic in DomU crashing Dom0 when using VLANs

2024-03-04 Thread Julien Grall

Hi Paul,

On 01/03/2024 19:37, Paul Leiber wrote:
Stopping xen-watchdog prevents the reboot. However, when triggering 
traffic on the VLAN, Dom0 and DomU become completely unresponsive. No 
error or kernel message is printed in the serial console.


Thanks for providing some logs. See some comments below. How long did 
you wait before confirming dom0 is stucked?


IIRC, Linux may print some RCU stall logs after a few minutes.



Switching to Xen console works. Pressing '0' produces the following output:

(XEN) '0' pressed -> dumping Dom0's registers
(XEN) *** Dumping Dom0 vcpu#0 state: ***
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]
(XEN) CPU:    0
(XEN) PC: 88027e50
(XEN) LR: 88027e44
(XEN) SP_EL0: 89c78f80
(XEN) SP_EL1: 88003b60
(XEN) CPSR:   03c5 MODE:64-bit EL1h (Guest Kernel, handler)


[...]


(XEN) *** Dumping Dom0 vcpu#1 state: ***
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]
(XEN) CPU:    0
(XEN) PC: 88c5dc80
(XEN) LR: 88c5dc88
(XEN) SP_EL0: 42272080
(XEN) SP_EL1: 8800b0e0
(XEN) CPSR:   8305 MODE:64-bit EL1h (Guest Kernel, handler)


[...]


(XEN) *** Dumping Dom0 vcpu#2 state: ***
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]
(XEN) CPU:    0
(XEN) PC: 88027e50
(XEN) LR: 88027e44
(XEN) SP_EL0: 42271040
(XEN) SP_EL1: 89fcbf20
(XEN) CPSR:   03c5 MODE:64-bit EL1h (Guest Kernel, handler)


[...]


(XEN) *** Dumping Dom0 vcpu#3 state: ***
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]
(XEN) CPU:    0
(XEN) PC: 88027e50
(XEN) LR: 88027e44
(XEN) SP_EL0: 422730c0
(XEN) SP_EL1: 89fd3f20
(XEN) CPSR:   03c5 MODE:64-bit EL1h (Guest Kernel, handler)


All the PCs but one (vcpu#1) are the same.


(XEN) 'q' pressed -> dumping domain info (now = 727929105981)
(XEN) General information for domain 0:
(XEN) refcnt=3 dying=0 pause_count=0
(XEN) nr_pages=262144 xenheap_pages=2 dirty_cpus={} max_pages=262144
(XEN) handle=---- vm_assist=0020
(XEN) p2m mappings for domain 0 (vmid 1):
(XEN)   1G mappings: 0 (shattered 1)
(XEN)   2M mappings: 422 (shattered 90)
(XEN)   4K mappings: 45372
(XEN) Rangesets belonging to domain 0:
(XEN) Interrupts { 32-152, 154-255 }
(XEN) I/O Memory { 0-fe200, fe203-ff841, ff849- }
(XEN) NODE affinity for domain 0: [0]
(XEN) VCPU information and callbacks for domain 0:
(XEN)   UNIT0 affinities: hard={0-3} soft={0-3}
(XEN) VCPU0: CPU3 [has=F] poll=0 upcall_pend=01 upcall_mask=01
(XEN) pause_count=0 pause_flags=1


The vCPU is blocked. But...


(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)    VCPU_LR[0]=2a02
(XEN)    VCPU_LR[1]=1a1b
(XEN)    VCPU_LR[2]=1a01
(XEN)    VCPU_LR[3]=1a10


... it loosk like multiple IRQs are inflights. LR0 (holding IRQ2) is 
active but the others are pending. This is the same for vCPU #2, #3. 
vCPU #1 still seems to "work".


AFAICT, Linux is using IRQ2 for the IPI CPU_STOP. So it sounds like dom0 
may have panicked.


Looking at the initial logs you posted. I see some messages from Xen but 
no messages at all from dom0 (including boot). Can you check if you have 
console=hvc0 on the Linux command line?


If not, please add it and retry.

Cheers,

--
Julien Grall



Re: [PATCH 5.10] xen/events: close evtchn after mapping cleanup

2024-03-04 Thread Andrew Paniakin
On 04/03/2024, Andrew Panyakin wrote:
> On 04/03/2024, Greg KH wrote:
> > On Sat, Mar 02, 2024 at 08:03:57AM -0800, Andrew Panyakin wrote:
> > > From: Maximilian Heyne 
> > >
> > > Commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream
> [...] 
> > Where is the 5.15.y version of this commit?  We have to have that before
> > we can take a 5.10.y version, right?
> 
> Remaining patches for 5.15 and 6.1 ready, will send once analyze test
> results.

Remaining patches posted:
5.15: 
https://lore.kernel.org/all/20240304212139.ga3585...@uc3ecf78c6baf56.ant.amazon.com/
6.1: 
https://lore.kernel.org/all/20240304192006.ga3517...@uc3ecf78c6baf56.ant.amazon.com/



[PATCH 5.15] xen/events: close evtchn after mapping cleanup

2024-03-04 Thread Andrew Panyakin
From: Maximilian Heyne 

Commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream

shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.

This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
  channel:

  CPU0CPU1
  shutdown_pirq {
xen_evtchn_close(e)
  __startup_pirq {
EVTCHNOP_bind_pirq
  -> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
  }
xen_irq_info_cleanup() {
  set_evtchn_to_irq(e, -1)
}
  }

  Assume here event channel e refers here to the same event channel
  number.
  After this race the evtchn_to_irq mapping for e is invalid (-1).

- __startup_pirq races with __unbind_from_irq in a similar way. Because
  __startup_pirq doesn't take irq_mapping_update_lock it can grab the
  evtchn that __unbind_from_irq is currently freeing and cleaning up. In
  this case even though the event channel is allocated, its mapping can
  be unset in evtchn_to_irq.

The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.

On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.

  [ cut here ]
  blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; 
indirect descriptors: enabled; bounce buffer: enabled
  kernel BUG at drivers/xen/events/events_base.c:499!
  invalid opcode:  [#1] SMP PTI
  CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 
5.10.201-191.748.amzn2.x86_64 #1
  Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  Workqueue: nvme-reset-wq nvme_reset_work
  RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
  Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 
64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 
00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
  RSP: :c9000d533b08 EFLAGS: 00010046
  RAX:  RBX:  RCX: 0006
  RDX: 0028 RSI:  RDI: 
  RBP: 888107419680 R08:  R09: 82d72b00
  R10:  R11:  R12: 01ed
  R13:  R14:  R15: 0002
  FS:  () GS:88bc8b50() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 02610001 CR4: 001706e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? set_affinity_irq+0xdc/0x1c0
   ? __die_body.cold+0x8/0xd
   ? die+0x2b/0x50
   ? do_trap+0x90/0x110
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? do_error_trap+0x65/0x80
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? exc_invalid_op+0x4e/0x70
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? asm_exc_invalid_op+0x12/0x20
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? bind_evtchn_to_cpu+0xc5/0xf0
   set_affinity_irq+0xdc/0x1c0
   irq_do_set_affinity+0x1d7/0x1f0
   irq_setup_affinity+0xd6/0x1a0
   irq_startup+0x8a/0xf0
   __setup_irq+0x639/0x6d0
   ? nvme_suspend+0x150/0x150
   request_threaded_irq+0x10c/0x180
   ? nvme_suspend+0x150/0x150
   pci_request_irq+0xa8/0xf0
   ? __blk_mq_free_request+0x74/0xa0
   queue_request_irq+0x6f/0x80
   nvme_create_queue+0x1af/0x200
   nvme_create_io_queues+0xbd/0xf0
   nvme_setup_io_queues+0x246/0x320
   ? nvme_irq_check+0x30/0x30
   nvme_reset_work+0x1c8/0x400
   process_one_work+0x1b0/0x350
   worker_thread+0x49/0x310
   ? process_one_work+0x350/0x350
   kthread+0x11b/0x140
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  Modules linked in:
  ---[ end trace a11715de1eee1873 ]---

Fixes: d46a78b05c0e ("xen: implement pirq type event channels")
Cc: sta...@vger.kernel.org
Co-debugged-by: Andrew Panyakin 
Signed-off-by: Maximilian Heyne 
[apanyaki: backport to v5.15-stable]
Signed-off-by: Andrew Paniakin 
---
Compare to upstream patch this 

[ovmf test] 184903: all pass - PUSHED

2024-03-04 Thread osstest service owner
flight 184903 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184903/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 918288ab5a7c3abe9c58d576ccc0ae32e2c7dea0
baseline version:
 ovmf bff9815b616669f1cf743e412bcefe22dfb4

Last test of basis   184899  2024-03-04 16:43:32 Z0 days
Testing same since   184903  2024-03-04 18:41:12 Z0 days1 attempts


People who touched revisions under test:
  Michael Kubacki 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   bff9815b61..918288ab5a  918288ab5a7c3abe9c58d576ccc0ae32e2c7dea0 -> 
xen-tested-master



Re: Serious AMD-Vi issue

2024-03-04 Thread Elliott Mitchell
On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > 0x8 I
> > 
> > Where the device points at the SATA controller.  I've ended up
> > reproducing this with some noticable differences.
> > 
> > A major goal of RAID is to have different devices fail at different
> > times.  Hence my initial run had a Samsung device plus a device from
> > another reputable flash manufacturer.
> > 
> > I initially noticed this due to messages in domain 0's dmesg about
> > errors from the SATA device.  Wasn't until rather later that I noticed
> > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> > be duplicated into domain 0's dmesg?).
> > 
> > All of the failures consistently pointed at the Samsung device.  Due to
> > the expectation it would fail first (lower quality offering with
> > lesser guarantees), I proceeded to replace it with a NVMe device.
> > 
> > With some monitoring I discovered the NVMe device was now triggering
> > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> > of IOMMU issue with Xen.
> > 
> > 
> > All I can do is offer speculation about the underlying cause.  There
> > does seem to be a pattern of higher-performance flash storage devices
> > being more severely effected.
> > 
> > I was speculating about the issue being the MD-RAID1 driver abusing
> > Linux's DMA infrastructure in some fashion.
> > 
> > Upon further consideration, I'm wondering if this is perhaps a latency
> > issue.  I imagine there is some sort of flush after the IOMMU tables are
> > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> > execute commands before reloading the IOMMU tables is complete.
> 
> Ping!
> 
> The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> 
> To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> errors matched the Samsung SATA (a number of times the SATA driver
> complained).
> 
> As stated, I'm speculating lower latency devices starting to execute
> commands before IOMMU tables have finished reloading.  When originally
> implemented fast flash devices were rare.

I guess I'm lucky I ended up with some slightly higher-latency hardware.
This is a very serious issue as data loss can occur.

AMD needs to fund their Xen engineers more, otherwise soon AMD hardware
may no longer be viable with Xen.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





[PATCH 6.1] xen/events: close evtchn after mapping cleanup

2024-03-04 Thread Andrew Panyakin
From: Maximilian Heyne 

Commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream

shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.

This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
  channel:

  CPU0CPU1
  shutdown_pirq {
xen_evtchn_close(e)
  __startup_pirq {
EVTCHNOP_bind_pirq
  -> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
  }
xen_irq_info_cleanup() {
  set_evtchn_to_irq(e, -1)
}
  }

  Assume here event channel e refers here to the same event channel
  number.
  After this race the evtchn_to_irq mapping for e is invalid (-1).

- __startup_pirq races with __unbind_from_irq in a similar way. Because
  __startup_pirq doesn't take irq_mapping_update_lock it can grab the
  evtchn that __unbind_from_irq is currently freeing and cleaning up. In
  this case even though the event channel is allocated, its mapping can
  be unset in evtchn_to_irq.

The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.

On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.

  [ cut here ]
  blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; 
indirect descriptors: enabled; bounce buffer: enabled
  kernel BUG at drivers/xen/events/events_base.c:499!
  invalid opcode:  [#1] SMP PTI
  CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 
5.10.201-191.748.amzn2.x86_64 #1
  Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  Workqueue: nvme-reset-wq nvme_reset_work
  RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
  Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 
64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 
00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
  RSP: :c9000d533b08 EFLAGS: 00010046
  RAX:  RBX:  RCX: 0006
  RDX: 0028 RSI:  RDI: 
  RBP: 888107419680 R08:  R09: 82d72b00
  R10:  R11:  R12: 01ed
  R13:  R14:  R15: 0002
  FS:  () GS:88bc8b50() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 02610001 CR4: 001706e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? set_affinity_irq+0xdc/0x1c0
   ? __die_body.cold+0x8/0xd
   ? die+0x2b/0x50
   ? do_trap+0x90/0x110
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? do_error_trap+0x65/0x80
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? exc_invalid_op+0x4e/0x70
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? asm_exc_invalid_op+0x12/0x20
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? bind_evtchn_to_cpu+0xc5/0xf0
   set_affinity_irq+0xdc/0x1c0
   irq_do_set_affinity+0x1d7/0x1f0
   irq_setup_affinity+0xd6/0x1a0
   irq_startup+0x8a/0xf0
   __setup_irq+0x639/0x6d0
   ? nvme_suspend+0x150/0x150
   request_threaded_irq+0x10c/0x180
   ? nvme_suspend+0x150/0x150
   pci_request_irq+0xa8/0xf0
   ? __blk_mq_free_request+0x74/0xa0
   queue_request_irq+0x6f/0x80
   nvme_create_queue+0x1af/0x200
   nvme_create_io_queues+0xbd/0xf0
   nvme_setup_io_queues+0x246/0x320
   ? nvme_irq_check+0x30/0x30
   nvme_reset_work+0x1c8/0x400
   process_one_work+0x1b0/0x350
   worker_thread+0x49/0x310
   ? process_one_work+0x350/0x350
   kthread+0x11b/0x140
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  Modules linked in:
  ---[ end trace a11715de1eee1873 ]---

Fixes: d46a78b05c0e ("xen: implement pirq type event channels")
Cc: sta...@vger.kernel.org
Co-debugged-by: Andrew Panyakin 
Signed-off-by: Maximilian Heyne 
---
Compare to upstream patch this one does not have close_evtchn flag
because there is no need to 

[xen-unstable test] 184890: trouble: broken/fail/pass

2024-03-04 Thread osstest service owner
flight 184890 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184890/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt broken

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  5 host-install(5)  broken pass in 184882
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
184882 pass in 184890
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail in 184882 pass in 
184890
 test-amd64-i386-libvirt-pair 28 guest-migrate/dst_host/src_host/debian.repeat 
fail pass in 184882

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
184833
 test-armhf-armhf-libvirt 16 saverestore-support-check fail in 184882 like 
184833
 test-armhf-armhf-libvirt15 migrate-support-check fail in 184882 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184833
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184833
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184833
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184833
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184833
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184833
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184833
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184833
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 

Re: [PATCH 5.10] xen/events: close evtchn after mapping cleanup

2024-03-04 Thread Andrew Panyakin
On 04/03/2024, Greg KH wrote:
> On Sat, Mar 02, 2024 at 08:03:57AM -0800, Andrew Panyakin wrote:
> > From: Maximilian Heyne 
> >
> > Commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream
[...] 
> Where is the 5.15.y version of this commit?  We have to have that before
> we can take a 5.10.y version, right?

Remaining patches for 5.15 and 6.1 ready, will send once analyze test
results.



Re: [PATCH] xen/common: Do not allocate magic pages 1:1 for direct mapped domains

2024-03-04 Thread Julien Grall




On 01/03/2024 03:03, Henry Wang wrote:

Hi Julien,


Hi Henry,


On 2/28/2024 8:27 PM, Julien Grall wrote:

Hi Henry,
...here basically means we want to do the finding of the unused 
region in toolstack. Since currently what we care about is only a 
couple of pages instead of the whole memory map, could it be possible 
that we do the opposite: in alloc_xs_page(), we issue a domctl 
hypercall to Xen and do the finding work in Xen and return with the 
found gfn? Then the page can be mapped by populate_physmap() from 
alloc_xs_page() and used for XenStore.


I know that DOMCTL hypercalls are not stable. But I am not overly 
happy with creating an hypercall which is just "fetch the magic 
regions". I think it need to be a more general one that would expose 
all the regions.


Also, you can't really find the magic regions when the hypercall is 
done as we don't have the guest memory layout. This will need to be 
done in advance.


Overall, I think it would be better if we provide an hypercall listing 
the regions currently occupied (similar to e820). One will have the 
type "magic pages".


Would below design make sense to you:


This looks good in principle. Some comments below.



(1) Similarly as the e820, we can firstly introduce some data structures 
in struct arch_domain:


#define MEM_REGIONS_MAX 4

// For now we only care the magic regions, but in the future this can be 
easily extended with other types such as RAM, MMIO etc.

enum mem_region_type {
     MEM_REGION_MAGIC,
};

struct mem_region {
     paddr_t start;
     paddr_t size;
     enum mem_region_type type;
};

struct domain_mem_map {
     unsigned int nr_mem_regions;
     struct mem_region region[MEM_REGIONS_MAX];
};


If you plan to expose the same interface externally, then you will need 
to replace paddr_t with uint64_t and avoid using an enum in the structure.


You will also want to expose a dynamic array (even if this is fixed in 
the hypervisor).




struct arch_domain {
...
     struct domain_mem_map mem_map;
};

(2) Then in domain creation time, for normal and static non-directmapped 
Dom0less DomU, set d->arch.mem_map.region[0].start = GUEST_MAGIC_BASE 
and the size to 4 pages. For static and directmapped Dom0less DomU, find 
a free region of 4 pages for magic pages. The finding of a free region 
can reuse the approach for extended region and be called before 
make_hypervisor_node(), and when make_hypervisor_node() is called. We 
carve the magic pages out from the actual extended region.


(3) Add a new hypercall XENMEM_memory_map_domid (or a domctl). Input 
will be domid and output will be the memory map of the domain recorded 
in struct arch_domain.


XENMEM_* are supposed to be stable interface. I am not aware of any use 
in the guest yet, so I think it would be best to use a DOMCTL.




(4) In alloc_xs_page() and alloc_magic_pages, replace the hardcoded base 
address with the new address found by the hypercall.


Cheers,

--
Julien Grall



[ovmf test] 184899: all pass - PUSHED

2024-03-04 Thread osstest service owner
flight 184899 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184899/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf bff9815b616669f1cf743e412bcefe22dfb4
baseline version:
 ovmf 1ae5bee967bffcd6dbbabca913ea3c65d8f09c76

Last test of basis   184892  2024-03-04 10:11:18 Z0 days
Testing same since   184899  2024-03-04 16:43:32 Z0 days1 attempts


People who touched revisions under test:
  Michael Kubacki 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   1ae5bee967..bff9815b61  bff9815b616669f1cf743e412bcefe22dfb4 -> 
xen-tested-master



Re: [XEN PATCH 02/10] xen/arm: address some violations of MISRA C Rule 20.7

2024-03-04 Thread Nicola Vetrini

Hi,

as the maintainers of this subsystem, would you prefer Jan's version or 
the one in the patch?
Both are fine w.r.t MISRA Rule 20.7 because the macro arguments 
themselves are parenthesized.



--- a/xen/arch/arm/include/asm/vgic-emul.h
+++ b/xen/arch/arm/include/asm/vgic-emul.h
@@ -6,11 +6,11 @@
  * a range of registers
  */

-#define VREG32(reg) reg ... reg + 3
-#define VREG64(reg) reg ... reg + 7
+#define VREG32(reg) (reg) ... (reg) + 3
+#define VREG64(reg) (reg) ... (reg) + 7


#define VREG32(reg) (reg) ... ((reg) + 3)
#define VREG64(reg) (reg) ... ((reg) + 7)

?



The outer parentheses are not required, but I can add them if the 
maintainers share your view.


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Andrew Cooper
On 04/03/2024 5:40 pm, Julien Grall wrote:
> Hi Andrew,
>
> On 04/03/2024 17:07, Andrew Cooper wrote:
>> On 04/03/2024 4:55 pm, Jan Beulich wrote:
>>> On 04.03.2024 17:46, Julien Grall wrote:
 On 04/03/2024 16:41, Jan Beulich wrote:
> On 04.03.2024 17:31, Julien Grall wrote:
>> On 04/03/2024 16:10, Andrew Cooper wrote:
>>> It is daft to require all architectures to provide empty
>>> implementations of
>>> this functionality.
>> Oleksii recenlty sent a similar patch [1]. This was pushed back
>> because
>> from naming, it sounds like the helpers ought to be non-empty on
>> every
>> architecture.
>>
>> It would be best if asm-generic provides a safe version of the
>> helpers.
>> So my preference is to not have this patch. This can of course
>> change if
>> I see an explanation why it is empty on Arm (I believe it should
>> contain
>> csdb) and other arch would want the same.
> Except that there's no new asm-generic/ header here (as opposed to
> how
> Oleksii had it). Imo avoiding the need for empty stubs is okay
> this way,
> when introducing an asm-generic/ header would not have been. Of
> course
> if Arm wants to put something there rather sooner than later, then
> perhaps the functions better wouldn't be removed from there, just
> to then
> be put back pretty soon.
 I am confused. I agree the patch is slightly different, but I thought
 the fundamental problem was the block_speculation() implementation may
 not be safe everywhere. And it was best to let each architecture
 decide
 how they want to implement (vs Xen decide for us the default).

 Reading the original thread, I thought you had agreed with that
 statement. Did I misinterpret?
>>> Yes and no. Whatever is put in asm-generic/ ought to be correct and
>>> safe
>>> by default, imo. The same doesn't apply to fallbacks put in place in
>>> headers in xen/: If an arch doesn't provide its own implementation, it
>>> indicates that the default (fallback) is good enough. Still I can
>>> easily
>>> see that other views are possible here ...
>>
>> With speculation, there's absolutely nothing we can possibly do in any
>> common code which will be safe generally.
>>
>> But we can make it less invasive until an architecture wants to
>> implement the primitives.
>
> I understand the goal. However, I am unsure it is a good idea to
> provide unsafe just to reduce the arch specific header by a few lines.

It doesn't matter if it's unsafe in the arch header or unsafe in the
common code.  It's still unsafe.

There is no change in risk here.  There's simply less blind copy/pasting
of the unsafe form into new architectures in order to get Xen to compile.

~Andrew



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Julien Grall

Hi Andrew,

On 04/03/2024 17:07, Andrew Cooper wrote:

On 04/03/2024 4:55 pm, Jan Beulich wrote:

On 04.03.2024 17:46, Julien Grall wrote:

On 04/03/2024 16:41, Jan Beulich wrote:

On 04.03.2024 17:31, Julien Grall wrote:

On 04/03/2024 16:10, Andrew Cooper wrote:

It is daft to require all architectures to provide empty implementations of
this functionality.

Oleksii recenlty sent a similar patch [1]. This was pushed back because
from naming, it sounds like the helpers ought to be non-empty on every
architecture.

It would be best if asm-generic provides a safe version of the helpers.
So my preference is to not have this patch. This can of course change if
I see an explanation why it is empty on Arm (I believe it should contain
csdb) and other arch would want the same.

Except that there's no new asm-generic/ header here (as opposed to how
Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
when introducing an asm-generic/ header would not have been. Of course
if Arm wants to put something there rather sooner than later, then
perhaps the functions better wouldn't be removed from there, just to then
be put back pretty soon.

I am confused. I agree the patch is slightly different, but I thought
the fundamental problem was the block_speculation() implementation may
not be safe everywhere. And it was best to let each architecture decide
how they want to implement (vs Xen decide for us the default).

Reading the original thread, I thought you had agreed with that
statement. Did I misinterpret?

Yes and no. Whatever is put in asm-generic/ ought to be correct and safe
by default, imo. The same doesn't apply to fallbacks put in place in
headers in xen/: If an arch doesn't provide its own implementation, it
indicates that the default (fallback) is good enough. Still I can easily
see that other views are possible here ...


With speculation, there's absolutely nothing we can possibly do in any
common code which will be safe generally.

But we can make it less invasive until an architecture wants to
implement the primitives.


I understand the goal. However, I am unsure it is a good idea to provide 
unsafe just to reduce the arch specific header by a few lines. My 
concern is new ports may not realize that block_speculation() needs to 
be implemented. This could end to a preventable XSA in the future.


I guess the risk could be reduced if we had some documentation 
explaining how to port Xen to a new architecture (I am not asking you to 
write the doc).


Anyway, I understand this is a matter of perspective. I will not oppose 
this change if you can get others to agree with the risk.


Cheers,

--
Julien Grall



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Andrew Cooper
On 04/03/2024 4:55 pm, Jan Beulich wrote:
> On 04.03.2024 17:46, Julien Grall wrote:
>> On 04/03/2024 16:41, Jan Beulich wrote:
>>> On 04.03.2024 17:31, Julien Grall wrote:
 On 04/03/2024 16:10, Andrew Cooper wrote:
> It is daft to require all architectures to provide empty implementations 
> of
> this functionality.
 Oleksii recenlty sent a similar patch [1]. This was pushed back because
 from naming, it sounds like the helpers ought to be non-empty on every
 architecture.

 It would be best if asm-generic provides a safe version of the helpers.
 So my preference is to not have this patch. This can of course change if
 I see an explanation why it is empty on Arm (I believe it should contain
 csdb) and other arch would want the same.
>>> Except that there's no new asm-generic/ header here (as opposed to how
>>> Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
>>> when introducing an asm-generic/ header would not have been. Of course
>>> if Arm wants to put something there rather sooner than later, then
>>> perhaps the functions better wouldn't be removed from there, just to then
>>> be put back pretty soon.
>> I am confused. I agree the patch is slightly different, but I thought 
>> the fundamental problem was the block_speculation() implementation may 
>> not be safe everywhere. And it was best to let each architecture decide 
>> how they want to implement (vs Xen decide for us the default).
>>
>> Reading the original thread, I thought you had agreed with that 
>> statement. Did I misinterpret?
> Yes and no. Whatever is put in asm-generic/ ought to be correct and safe
> by default, imo. The same doesn't apply to fallbacks put in place in
> headers in xen/: If an arch doesn't provide its own implementation, it
> indicates that the default (fallback) is good enough. Still I can easily
> see that other views are possible here ...

With speculation, there's absolutely nothing we can possibly do in any
common code which will be safe generally.

But we can make it less invasive until an architecture wants to
implement the primitives.

~Andrew



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:46, Julien Grall wrote:
> On 04/03/2024 16:41, Jan Beulich wrote:
>> On 04.03.2024 17:31, Julien Grall wrote:
>>> On 04/03/2024 16:10, Andrew Cooper wrote:
 It is daft to require all architectures to provide empty implementations of
 this functionality.
>>>
>>> Oleksii recenlty sent a similar patch [1]. This was pushed back because
>>> from naming, it sounds like the helpers ought to be non-empty on every
>>> architecture.
>>>
>>> It would be best if asm-generic provides a safe version of the helpers.
>>> So my preference is to not have this patch. This can of course change if
>>> I see an explanation why it is empty on Arm (I believe it should contain
>>> csdb) and other arch would want the same.
>>
>> Except that there's no new asm-generic/ header here (as opposed to how
>> Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
>> when introducing an asm-generic/ header would not have been. Of course
>> if Arm wants to put something there rather sooner than later, then
>> perhaps the functions better wouldn't be removed from there, just to then
>> be put back pretty soon.
> 
> I am confused. I agree the patch is slightly different, but I thought 
> the fundamental problem was the block_speculation() implementation may 
> not be safe everywhere. And it was best to let each architecture decide 
> how they want to implement (vs Xen decide for us the default).
> 
> Reading the original thread, I thought you had agreed with that 
> statement. Did I misinterpret?
Yes and no. Whatever is put in asm-generic/ ought to be correct and safe
by default, imo. The same doesn't apply to fallbacks put in place in
headers in xen/: If an arch doesn't provide its own implementation, it
indicates that the default (fallback) is good enough. Still I can easily
see that other views are possible here ...

Jan



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Andrew Cooper
On 04/03/2024 4:41 pm, Jan Beulich wrote:
> On 04.03.2024 17:31, Julien Grall wrote:
>> On 04/03/2024 16:10, Andrew Cooper wrote:
>>> It is daft to require all architectures to provide empty implementations of
>>> this functionality.
>> Oleksii recenlty sent a similar patch [1]. This was pushed back because 
>> from naming, it sounds like the helpers ought to be non-empty on every 
>> architecture.
>>
>> It would be best if asm-generic provides a safe version of the helpers. 
>> So my preference is to not have this patch. This can of course change if 
>> I see an explanation why it is empty on Arm (I believe it should contain 
>> csdb) and other arch would want the same.
> Except that there's no new asm-generic/ header here (as opposed to how
> Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
> when introducing an asm-generic/ header would not have been. Of course
> if Arm wants to put something there rather sooner than later, then
> perhaps the functions better wouldn't be removed from there, just to then
> be put back pretty soon.

If the ARM maintainers want Spectre-v1 safety then they can do the work
themselves, after this patch has gone in.

~Andrew



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Julien Grall

Hi Jan,

On 04/03/2024 16:41, Jan Beulich wrote:

On 04.03.2024 17:31, Julien Grall wrote:

On 04/03/2024 16:10, Andrew Cooper wrote:

It is daft to require all architectures to provide empty implementations of
this functionality.


Oleksii recenlty sent a similar patch [1]. This was pushed back because
from naming, it sounds like the helpers ought to be non-empty on every
architecture.

It would be best if asm-generic provides a safe version of the helpers.
So my preference is to not have this patch. This can of course change if
I see an explanation why it is empty on Arm (I believe it should contain
csdb) and other arch would want the same.


Except that there's no new asm-generic/ header here (as opposed to how
Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
when introducing an asm-generic/ header would not have been. Of course
if Arm wants to put something there rather sooner than later, then
perhaps the functions better wouldn't be removed from there, just to then
be put back pretty soon.


I am confused. I agree the patch is slightly different, but I thought 
the fundamental problem was the block_speculation() implementation may 
not be safe everywhere. And it was best to let each architecture decide 
how they want to implement (vs Xen decide for us the default).


Reading the original thread, I thought you had agreed with that 
statement. Did I misinterpret?


Cheers,

--
Julien Grall



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Andrew Cooper
On 04/03/2024 4:45 pm, Jan Beulich wrote:
> On 04.03.2024 17:10, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/nospec.h
>> +++ b/xen/arch/x86/include/asm/nospec.h
>> @@ -23,20 +23,20 @@ static always_inline bool barrier_nospec_false(void)
>>  return false;
>>  }
>>  
>> -/* Allow to protect evaluation of conditionals with respect to speculation 
>> */
>> -static always_inline bool evaluate_nospec(bool condition)
>> +static always_inline bool arch_evaluate_nospec(bool condition)
>>  {
>>  if ( condition )
>>  return barrier_nospec_true();
>>  else
>>  return barrier_nospec_false();
>>  }
>> +#define arch_evaluate_nospec arch_evaluate_nospec
>>  
>> -/* Allow to block speculative execution in generic code */
>> -static always_inline void block_speculation(void)
>> +static always_inline void arch_block_speculation(void)
>>  {
>>  barrier_nospec_true();
>>  }
>> +#define arch_block_speculation arch_block_speculation
> I'm having some difficulty seeing the need for the renaming (adding
> or the arch_ prefixes): Why can't the original names be kept, and
> ...
>
>> --- a/xen/include/xen/nospec.h
>> +++ b/xen/include/xen/nospec.h
>> @@ -9,6 +9,29 @@
>>  
>>  #include 
>>  
>> +/*
>> + * Protect a conditional branch from bad speculation.  Architectures *must*
>> + * provide arch_evaluate_nospec() for this to be effective.
>> + */
>> +static always_inline bool evaluate_nospec(bool cond)
>> +{
>> +#ifndef arch_evaluate_nospec
>> +#define arch_evaluate_nospec(cond) cond
>> +#endif
>> +return arch_evaluate_nospec(cond);
>> +}
>> +
>> +/*
>> + * Halt speculation unconditonally.  Architectures *must* provide
>> + * arch_block_speculation() for this to be effective.
>> + */
>> +static always_inline void block_speculation(void)
>> +{
>> +#ifdef arch_block_speculation
>> +arch_block_speculation();
>> +#endif
>> +}
> ... stubs be introduced here if the original names aren't #define-d?
> IOW what does the extra layer gain us?

See the next patch.

~Andrew



Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:10, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/nospec.h
> +++ b/xen/arch/x86/include/asm/nospec.h
> @@ -23,20 +23,20 @@ static always_inline bool barrier_nospec_false(void)
>  return false;
>  }
>  
> -/* Allow to protect evaluation of conditionals with respect to speculation */
> -static always_inline bool evaluate_nospec(bool condition)
> +static always_inline bool arch_evaluate_nospec(bool condition)
>  {
>  if ( condition )
>  return barrier_nospec_true();
>  else
>  return barrier_nospec_false();
>  }
> +#define arch_evaluate_nospec arch_evaluate_nospec
>  
> -/* Allow to block speculative execution in generic code */
> -static always_inline void block_speculation(void)
> +static always_inline void arch_block_speculation(void)
>  {
>  barrier_nospec_true();
>  }
> +#define arch_block_speculation arch_block_speculation

I'm having some difficulty seeing the need for the renaming (adding
or the arch_ prefixes): Why can't the original names be kept, and
...

> --- a/xen/include/xen/nospec.h
> +++ b/xen/include/xen/nospec.h
> @@ -9,6 +9,29 @@
>  
>  #include 
>  
> +/*
> + * Protect a conditional branch from bad speculation.  Architectures *must*
> + * provide arch_evaluate_nospec() for this to be effective.
> + */
> +static always_inline bool evaluate_nospec(bool cond)
> +{
> +#ifndef arch_evaluate_nospec
> +#define arch_evaluate_nospec(cond) cond
> +#endif
> +return arch_evaluate_nospec(cond);
> +}
> +
> +/*
> + * Halt speculation unconditonally.  Architectures *must* provide
> + * arch_block_speculation() for this to be effective.
> + */
> +static always_inline void block_speculation(void)
> +{
> +#ifdef arch_block_speculation
> +arch_block_speculation();
> +#endif
> +}

... stubs be introduced here if the original names aren't #define-d?
IOW what does the extra layer gain us?

Jan



[linux-linus test] 184886: tolerable FAIL - PUSHED

2024-03-04 Thread osstest service owner
flight 184886 linux-linus real [real]
flight 184898 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184886/
http://logs.test-lab.xenproject.org/osstest/logs/184898/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl  10 host-ping-check-xen fail pass in 184898-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl 15 migrate-support-check fail in 184898 never pass
 test-armhf-armhf-xl 16 saverestore-support-check fail in 184898 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184880
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184880
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184880
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184880
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184880
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184880
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184880
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184880
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux90d35da658da8cff0d4ecbb5113f5fac9d00eb72
baseline version:
 linux58c806d867bf265c6fd16fc3bc62e2d3c156b5c9

Last test of basis   184880  2024-03-03 18:14:43 Z0 days
Testing same since   184886  2024-03-04 04:27:14 Z0 days1 attempts


People who touched revisions under test:
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386 

Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Jan Beulich
On 04.03.2024 17:31, Julien Grall wrote:
> On 04/03/2024 16:10, Andrew Cooper wrote:
>> It is daft to require all architectures to provide empty implementations of
>> this functionality.
> 
> Oleksii recenlty sent a similar patch [1]. This was pushed back because 
> from naming, it sounds like the helpers ought to be non-empty on every 
> architecture.
> 
> It would be best if asm-generic provides a safe version of the helpers. 
> So my preference is to not have this patch. This can of course change if 
> I see an explanation why it is empty on Arm (I believe it should contain 
> csdb) and other arch would want the same.

Except that there's no new asm-generic/ header here (as opposed to how
Oleksii had it). Imo avoiding the need for empty stubs is okay this way,
when introducing an asm-generic/ header would not have been. Of course
if Arm wants to put something there rather sooner than later, then
perhaps the functions better wouldn't be removed from there, just to then
be put back pretty soon.

Jan



Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Jan Beulich
On 04.03.2024 16:39, Federico Serafini wrote:
> On 04/03/24 15:17, Jan Beulich wrote:
>> On 04.03.2024 14:31, Federico Serafini wrote:
>>> On 01/03/24 09:06, Jan Beulich wrote:
 On 01.03.2024 00:28, Stefano Stabellini wrote:
> On Wed, 14 Feb 2024, Federico Serafini wrote:
>> On 14/02/24 14:15, Jan Beulich wrote:
>>> On 14.02.2024 12:27, Federico Serafini wrote:
 On 14/02/24 09:28, Jan Beulich wrote:
> On 13.02.2024 23:33, Stefano Stabellini wrote:
>> Signed-off-by: Stefano Stabellini 
>> ---
>>  docs/misra/rules.rst | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
>> index c185366966..931158b354 100644
>> --- a/docs/misra/rules.rst
>> +++ b/docs/misra/rules.rst
>> @@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
>> headers (xen/include/public/) are allowed to retain 
>> longer
>> identifiers for backward compatibility.
>>  +   * - `Rule 5.5
>> `_
>> + - Required
>> + - Identifiers shall be distinct from macro names
>> + - Clashes between function-like macros and non-callable 
>> entities
>> +   are allowed. The pattern #define x x is also allowed.
>
> Just for me to know what exactly is covered (hence also a question
> to Roberto as to [to be] implemented Eclair behavior): Even when
> the above would be sufficient (and imo better) people frequently
> write
>
> #define a(x, y) b(x, y)
>
> which, transformed to the specific case here, would then be
>
> #define a(x, y) a(x, y)
>
> I'd assume such ought to also be covered, but that's not clear
> from the spelling above.

 I list what happens in some different situations,
 then we can find the right words for the documentation and/or
 refine the configuration:

 If you
 #define x x
 and then use `x' as identifier,
 the resulting violation is deviated (allowed pattern).

 If you
 #define a(x, y) a(x, y)
 and then use `a' as identifier for a non-callable entity,
 the resulting violation is deviated (no clash with non-callable
 entities).
 If you use identifier `a' for a callable entity, the resulting 
 violation
 is reported: the allowed pattern covers only macros expanding to their
 own name, in this case the macro name is considered to be
 `a' only, not a(x, y).

 If you
 #define a(x, y) b(x, y)
 and then use `a' as identifier for a non-callable entity,
 the resulting violation is deviated (no clash with non-callable
 entities).
>>>
>>> I'm afraid I don't see what violation there is in this case, to
>>> deviate. As a result I'm also not sure I correctly understand the
>>> rest of your reply.
>>
>> #define a(x, y) b(x, y)
>>
>> int a; // Violation of Rule 5.5.
>>
>> The macro name `a' that exist before the preprocessing phase,
>> still exists after the preprocessing phase as identifier for the integer
>> variable and this is a violation.
>>
 If you use `a' as identifier for a callable entity,
 this is not a violation because after the preprocessing phase,
 identifier `a' no longer exists.
>> I correct myself:
>> if you use `a' as identifier for a *function*,
>> it is not a violation because after the preprocessing phase
>> the identifier `a' no longer exists, for example:
>>
>> #define a(x, y) b(x, y)
>>
>> void a(int x, int y); // Ok.
>
> Federico, do you have a better wording suggestion for this rule?
>
> Jan, any further requests here? What would you like to see as next step?

 A more concise wording proposal would probably help.
>>>
>>> What do you think about:
>>>
>>> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
>>> index 1e134ccebc..a975b9a85f 100644
>>> --- a/docs/misra/rules.rst
>>> +++ b/docs/misra/rules.rst
>>> @@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
>>>   headers (xen/include/public/) are allowed to retain longer
>>>   identifiers for backward compatibility.
>>>
>>> +   * - `Rule 5.5
>>> `_
>>> + - Required
>>> + - Identifiers shall be distinct from macro names
>>> + - Macros expanding to their own name are allowed (e.g., #define x x).
>>> +   Clashes between names of function-like macros and identifiers of
>>> +   non-callable entities are allowed.
>>

Re: [PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Julien Grall

Hi Andrew,

On 04/03/2024 16:10, Andrew Cooper wrote:

It is daft to require all architectures to provide empty implementations of
this functionality.


Oleksii recenlty sent a similar patch [1]. This was pushed back because 
from naming, it sounds like the helpers ought to be non-empty on every 
architecture.


It would be best if asm-generic provides a safe version of the helpers. 
So my preference is to not have this patch. This can of course change if 
I see an explanation why it is empty on Arm (I believe it should contain 
csdb) and other arch would want the same.


Cheers,

[1] 
5889d7a5fa81722472f95cc1448af0be8f359a7d.1707146506.git.oleksii.kuroc...@gmail.com


--
Julien Grall



[PATCH 2/2] xen/nospec: Allow evaluate_nospec() to short circuit constant expressions

2024-03-04 Thread Andrew Cooper
When the compiler can reduce the condition to a constant, it can elide the
conditional and one of the basic blocks.  However, arch_evaluate_nospec() will
still insert speculation protection, despite there being nothing to protect.

Allow the speculation protection to be skipped entirely when the compiler is
removing the condition entirely.

e.g. for x86, given:

  int foo(void)
  {
  if ( evaluate_nospec(1) )
  return 2;
  else
  return 42;
  }

then before, we get:

  :
  lfence
  mov$0x2,%eax
  retq

and afterwards, we get:

  :
  mov$0x2,%eax
  retq

which is correct.  With no conditional branch to protect, the lfence isn't
providing any relevant safety.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
---
 xen/include/xen/nospec.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index a4155af08770..56cf67a44176 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -18,6 +18,15 @@ static always_inline bool evaluate_nospec(bool cond)
 #ifndef arch_evaluate_nospec
 #define arch_evaluate_nospec(cond) cond
 #endif
+
+/*
+ * If the compiler can reduce the condition to a constant, then it won't
+ * be emitting a conditional branch, and there's nothing needing
+ * protecting.
+ */
+if ( __builtin_constant_p(cond) )
+return cond;
+
 return arch_evaluate_nospec(cond);
 }
 
-- 
2.30.2




[PATCH 0/2] xen/nospec: Improvements

2024-03-04 Thread Andrew Cooper
Andrew Cooper (2):
  xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation
  xen/nospec: Allow evaluate_nospec() to short circuit constant expressions

 xen/arch/arm/include/asm/nospec.h   |  9 
 xen/arch/ppc/include/asm/nospec.h   |  9 
 xen/arch/riscv/include/asm/nospec.h |  9 
 xen/arch/x86/include/asm/nospec.h   |  8 
 xen/include/xen/nospec.h| 32 +
 5 files changed, 36 insertions(+), 31 deletions(-)

-- 
2.30.2




[PATCH 1/2] xen/*/nospec: Provide common versions of evaluate_nospec/block_speculation

2024-03-04 Thread Andrew Cooper
It is daft to require all architectures to provide empty implementations of
this functionality.

Provide evaluate_nospec() and block_speculation() unconditionally in
xen/nospec.h with architectures able to opt in by providing suitable arch
variants.

Rename x86's implementation to the arch_*() variants.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
---
 xen/arch/arm/include/asm/nospec.h   |  9 -
 xen/arch/ppc/include/asm/nospec.h   |  9 -
 xen/arch/riscv/include/asm/nospec.h |  9 -
 xen/arch/x86/include/asm/nospec.h   |  8 
 xen/include/xen/nospec.h| 23 +++
 5 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/xen/arch/arm/include/asm/nospec.h 
b/xen/arch/arm/include/asm/nospec.h
index efac51fc03be..05df096faab0 100644
--- a/xen/arch/arm/include/asm/nospec.h
+++ b/xen/arch/arm/include/asm/nospec.h
@@ -12,15 +12,6 @@
 # error "unknown ARM variant"
 #endif
 
-static inline bool evaluate_nospec(bool condition)
-{
-return condition;
-}
-
-static inline void block_speculation(void)
-{
-}
-
 #endif /* _ASM_ARM_NOSPEC_H */
 
 /*
diff --git a/xen/arch/ppc/include/asm/nospec.h 
b/xen/arch/ppc/include/asm/nospec.h
index b97322e48d32..9b57a7e4b24d 100644
--- a/xen/arch/ppc/include/asm/nospec.h
+++ b/xen/arch/ppc/include/asm/nospec.h
@@ -3,13 +3,4 @@
 #ifndef __ASM_PPC_NOSPEC_H__
 #define __ASM_PPC_NOSPEC_H__
 
-static inline bool evaluate_nospec(bool condition)
-{
-return condition;
-}
-
-static inline void block_speculation(void)
-{
-}
-
 #endif /* __ASM_PPC_NOSPEC_H__ */
diff --git a/xen/arch/riscv/include/asm/nospec.h 
b/xen/arch/riscv/include/asm/nospec.h
index e30f0a781b68..b227fc61ed8b 100644
--- a/xen/arch/riscv/include/asm/nospec.h
+++ b/xen/arch/riscv/include/asm/nospec.h
@@ -4,15 +4,6 @@
 #ifndef _ASM_RISCV_NOSPEC_H
 #define _ASM_RISCV_NOSPEC_H
 
-static inline bool evaluate_nospec(bool condition)
-{
-return condition;
-}
-
-static inline void block_speculation(void)
-{
-}
-
 #endif /* _ASM_RISCV_NOSPEC_H */
 
 /*
diff --git a/xen/arch/x86/include/asm/nospec.h 
b/xen/arch/x86/include/asm/nospec.h
index 07606834c4c9..defc97707f03 100644
--- a/xen/arch/x86/include/asm/nospec.h
+++ b/xen/arch/x86/include/asm/nospec.h
@@ -23,20 +23,20 @@ static always_inline bool barrier_nospec_false(void)
 return false;
 }
 
-/* Allow to protect evaluation of conditionals with respect to speculation */
-static always_inline bool evaluate_nospec(bool condition)
+static always_inline bool arch_evaluate_nospec(bool condition)
 {
 if ( condition )
 return barrier_nospec_true();
 else
 return barrier_nospec_false();
 }
+#define arch_evaluate_nospec arch_evaluate_nospec
 
-/* Allow to block speculative execution in generic code */
-static always_inline void block_speculation(void)
+static always_inline void arch_block_speculation(void)
 {
 barrier_nospec_true();
 }
+#define arch_block_speculation arch_block_speculation
 
 /**
  * array_index_mask_nospec() - generate a mask that is ~0UL when the
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 4c250ebbd663..a4155af08770 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -9,6 +9,29 @@
 
 #include 
 
+/*
+ * Protect a conditional branch from bad speculation.  Architectures *must*
+ * provide arch_evaluate_nospec() for this to be effective.
+ */
+static always_inline bool evaluate_nospec(bool cond)
+{
+#ifndef arch_evaluate_nospec
+#define arch_evaluate_nospec(cond) cond
+#endif
+return arch_evaluate_nospec(cond);
+}
+
+/*
+ * Halt speculation unconditonally.  Architectures *must* provide
+ * arch_block_speculation() for this to be effective.
+ */
+static always_inline void block_speculation(void)
+{
+#ifdef arch_block_speculation
+arch_block_speculation();
+#endif
+}
+
 /**
  * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 
otherwise
  * @index: array element index
-- 
2.30.2




Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Federico Serafini

On 04/03/24 15:17, Jan Beulich wrote:

On 04.03.2024 14:31, Federico Serafini wrote:

On 01/03/24 09:06, Jan Beulich wrote:

On 01.03.2024 00:28, Stefano Stabellini wrote:

On Wed, 14 Feb 2024, Federico Serafini wrote:

On 14/02/24 14:15, Jan Beulich wrote:

On 14.02.2024 12:27, Federico Serafini wrote:

On 14/02/24 09:28, Jan Beulich wrote:

On 13.02.2024 23:33, Stefano Stabellini wrote:

Signed-off-by: Stefano Stabellini 
---
 docs/misra/rules.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index c185366966..931158b354 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.
 +   * - `Rule 5.5
`_
+ - Required
+ - Identifiers shall be distinct from macro names
+ - Clashes between function-like macros and non-callable entities
+   are allowed. The pattern #define x x is also allowed.


Just for me to know what exactly is covered (hence also a question
to Roberto as to [to be] implemented Eclair behavior): Even when
the above would be sufficient (and imo better) people frequently
write

#define a(x, y) b(x, y)

which, transformed to the specific case here, would then be

#define a(x, y) a(x, y)

I'd assume such ought to also be covered, but that's not clear
from the spelling above.


I list what happens in some different situations,
then we can find the right words for the documentation and/or
refine the configuration:

If you
#define x x
and then use `x' as identifier,
the resulting violation is deviated (allowed pattern).

If you
#define a(x, y) a(x, y)
and then use `a' as identifier for a non-callable entity,
the resulting violation is deviated (no clash with non-callable
entities).
If you use identifier `a' for a callable entity, the resulting violation
is reported: the allowed pattern covers only macros expanding to their
own name, in this case the macro name is considered to be
`a' only, not a(x, y).

If you
#define a(x, y) b(x, y)
and then use `a' as identifier for a non-callable entity,
the resulting violation is deviated (no clash with non-callable
entities).


I'm afraid I don't see what violation there is in this case, to
deviate. As a result I'm also not sure I correctly understand the
rest of your reply.


#define a(x, y) b(x, y)

int a; // Violation of Rule 5.5.

The macro name `a' that exist before the preprocessing phase,
still exists after the preprocessing phase as identifier for the integer
variable and this is a violation.


If you use `a' as identifier for a callable entity,
this is not a violation because after the preprocessing phase,
identifier `a' no longer exists.

I correct myself:
if you use `a' as identifier for a *function*,
it is not a violation because after the preprocessing phase
the identifier `a' no longer exists, for example:

#define a(x, y) b(x, y)

void a(int x, int y); // Ok.


Federico, do you have a better wording suggestion for this rule?

Jan, any further requests here? What would you like to see as next step?


A more concise wording proposal would probably help.


What do you think about:

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 1e134ccebc..a975b9a85f 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
  headers (xen/include/public/) are allowed to retain longer
  identifiers for backward compatibility.

+   * - `Rule 5.5
`_
+ - Required
+ - Identifiers shall be distinct from macro names
+ - Macros expanding to their own name are allowed (e.g., #define x x).
+   Clashes between names of function-like macros and identifiers of
+   non-callable entities are allowed.


Imo that still leaves open e.g. the

#define a(x, y) a(x, y)

case: Permitted ("own name") or not permitted ("a" pretty clearly is expected
to be a callable entity here, besides being a function-like macro)?


I would not consider your example as a macro that expands to its own
name, the macro name is considered to be `a' only.

Rather, you example can be used to trigger the "callable-noncallable"
part of the deviation, for example:

#define a(x, y) a(x, y)

void a(int x, int y); /* Not permitted (callable entity 'a'). */

void foo(int a);  /* Permitted (non-callable entity `a'). */

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Jan Beulich
On 04.03.2024 14:31, Federico Serafini wrote:
> On 01/03/24 09:06, Jan Beulich wrote:
>> On 01.03.2024 00:28, Stefano Stabellini wrote:
>>> On Wed, 14 Feb 2024, Federico Serafini wrote:
 On 14/02/24 14:15, Jan Beulich wrote:
> On 14.02.2024 12:27, Federico Serafini wrote:
>> On 14/02/24 09:28, Jan Beulich wrote:
>>> On 13.02.2024 23:33, Stefano Stabellini wrote:
 Signed-off-by: Stefano Stabellini 
 ---
 docs/misra/rules.rst | 6 ++
 1 file changed, 6 insertions(+)

 diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
 index c185366966..931158b354 100644
 --- a/docs/misra/rules.rst
 +++ b/docs/misra/rules.rst
 @@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.
 +   * - `Rule 5.5
 `_
 + - Required
 + - Identifiers shall be distinct from macro names
 + - Clashes between function-like macros and non-callable entities
 +   are allowed. The pattern #define x x is also allowed.
>>>
>>> Just for me to know what exactly is covered (hence also a question
>>> to Roberto as to [to be] implemented Eclair behavior): Even when
>>> the above would be sufficient (and imo better) people frequently
>>> write
>>>
>>> #define a(x, y) b(x, y)
>>>
>>> which, transformed to the specific case here, would then be
>>>
>>> #define a(x, y) a(x, y)
>>>
>>> I'd assume such ought to also be covered, but that's not clear
>>> from the spelling above.
>>
>> I list what happens in some different situations,
>> then we can find the right words for the documentation and/or
>> refine the configuration:
>>
>> If you
>> #define x x
>> and then use `x' as identifier,
>> the resulting violation is deviated (allowed pattern).
>>
>> If you
>> #define a(x, y) a(x, y)
>> and then use `a' as identifier for a non-callable entity,
>> the resulting violation is deviated (no clash with non-callable
>> entities).
>> If you use identifier `a' for a callable entity, the resulting violation
>> is reported: the allowed pattern covers only macros expanding to their
>> own name, in this case the macro name is considered to be
>> `a' only, not a(x, y).
>>
>> If you
>> #define a(x, y) b(x, y)
>> and then use `a' as identifier for a non-callable entity,
>> the resulting violation is deviated (no clash with non-callable
>> entities).
>
> I'm afraid I don't see what violation there is in this case, to
> deviate. As a result I'm also not sure I correctly understand the
> rest of your reply.

 #define a(x, y) b(x, y)

 int a; // Violation of Rule 5.5.

 The macro name `a' that exist before the preprocessing phase,
 still exists after the preprocessing phase as identifier for the integer
 variable and this is a violation.

>> If you use `a' as identifier for a callable entity,
>> this is not a violation because after the preprocessing phase,
>> identifier `a' no longer exists.
 I correct myself:
 if you use `a' as identifier for a *function*,
 it is not a violation because after the preprocessing phase
 the identifier `a' no longer exists, for example:

 #define a(x, y) b(x, y)

 void a(int x, int y); // Ok.
>>>
>>> Federico, do you have a better wording suggestion for this rule?
>>>
>>> Jan, any further requests here? What would you like to see as next step?
>>
>> A more concise wording proposal would probably help.
> 
> What do you think about:
> 
> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> index 1e134ccebc..a975b9a85f 100644
> --- a/docs/misra/rules.rst
> +++ b/docs/misra/rules.rst
> @@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
>  headers (xen/include/public/) are allowed to retain longer
>  identifiers for backward compatibility.
> 
> +   * - `Rule 5.5 
> `_
> + - Required
> + - Identifiers shall be distinct from macro names
> + - Macros expanding to their own name are allowed (e.g., #define x x).
> +   Clashes between names of function-like macros and identifiers of
> +   non-callable entities are allowed.

Imo that still leaves open e.g. the

#define a(x, y) a(x, y)

case: Permitted ("own name") or not permitted ("a" pretty clearly is expected
to be a callable entity here, besides being a function-like macro)?

Jan



Re: [PATCH v2 2/3] docs/misra/rules.rst: add rule 5.5

2024-03-04 Thread Federico Serafini

On 01/03/24 09:06, Jan Beulich wrote:

On 01.03.2024 00:28, Stefano Stabellini wrote:

On Wed, 14 Feb 2024, Federico Serafini wrote:

On 14/02/24 14:15, Jan Beulich wrote:

On 14.02.2024 12:27, Federico Serafini wrote:

On 14/02/24 09:28, Jan Beulich wrote:

On 13.02.2024 23:33, Stefano Stabellini wrote:

Signed-off-by: Stefano Stabellini 
---
docs/misra/rules.rst | 6 ++
1 file changed, 6 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index c185366966..931158b354 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -181,6 +181,12 @@ maintainers if you want to suggest a change.
   headers (xen/include/public/) are allowed to retain longer
   identifiers for backward compatibility.
+   * - `Rule 5.5
`_
+ - Required
+ - Identifiers shall be distinct from macro names
+ - Clashes between function-like macros and non-callable entities
+   are allowed. The pattern #define x x is also allowed.


Just for me to know what exactly is covered (hence also a question
to Roberto as to [to be] implemented Eclair behavior): Even when
the above would be sufficient (and imo better) people frequently
write

#define a(x, y) b(x, y)

which, transformed to the specific case here, would then be

#define a(x, y) a(x, y)

I'd assume such ought to also be covered, but that's not clear
from the spelling above.


I list what happens in some different situations,
then we can find the right words for the documentation and/or
refine the configuration:

If you
#define x x
and then use `x' as identifier,
the resulting violation is deviated (allowed pattern).

If you
#define a(x, y) a(x, y)
and then use `a' as identifier for a non-callable entity,
the resulting violation is deviated (no clash with non-callable
entities).
If you use identifier `a' for a callable entity, the resulting violation
is reported: the allowed pattern covers only macros expanding to their
own name, in this case the macro name is considered to be
`a' only, not a(x, y).

If you
#define a(x, y) b(x, y)
and then use `a' as identifier for a non-callable entity,
the resulting violation is deviated (no clash with non-callable
entities).


I'm afraid I don't see what violation there is in this case, to
deviate. As a result I'm also not sure I correctly understand the
rest of your reply.


#define a(x, y) b(x, y)

int a; // Violation of Rule 5.5.

The macro name `a' that exist before the preprocessing phase,
still exists after the preprocessing phase as identifier for the integer
variable and this is a violation.


If you use `a' as identifier for a callable entity,
this is not a violation because after the preprocessing phase,
identifier `a' no longer exists.

I correct myself:
if you use `a' as identifier for a *function*,
it is not a violation because after the preprocessing phase
the identifier `a' no longer exists, for example:

#define a(x, y) b(x, y)

void a(int x, int y); // Ok.


Federico, do you have a better wording suggestion for this rule?

Jan, any further requests here? What would you like to see as next step?


A more concise wording proposal would probably help.


What do you think about:

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 1e134ccebc..a975b9a85f 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -181,6 +181,13 @@ maintainers if you want to suggest a change.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.

+   * - `Rule 5.5 
`_

+ - Required
+ - Identifiers shall be distinct from macro names
+ - Macros expanding to their own name are allowed (e.g., #define x x).
+   Clashes between names of function-like macros and identifiers of
+   non-callable entities are allowed.
+
* - `Rule 5.6 
`_

  - Required
  - A typedef name shall be a unique identifier

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



Re: [PATCH] x86/cpu-policy: Fix visibility of HTT/CMP_LEGACY in max policies

2024-03-04 Thread Andrew Cooper
On 04/03/2024 8:42 am, Jan Beulich wrote:
> On 01.03.2024 12:28, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -464,6 +464,16 @@ static void __init 
>> guest_common_max_feature_adjustments(uint32_t *fs)
>>   raw_cpu_policy.feat.clwb )
>>  __set_bit(X86_FEATURE_CLWB, fs);
>>  }
>> +
>> +/*
>> + * Topology information inside the guest is entirely at the toolstack's
>> + * disgression, and bears no relationship to the host we're running on.
>> + *
>> + * HTT identifies p->basic.lppp as valid
>> + * CMP_LEGACY identifies p->extd.nc as valid
>> + */
>> +__set_bit(X86_FEATURE_HTT, fs);
>> +__set_bit(X86_FEATURE_CMP_LEGACY, fs);
>>  }
>>  
>>  static void __init guest_common_default_feature_adjustments(uint32_t *fs)
>> @@ -514,6 +524,18 @@ static void __init 
>> guest_common_default_feature_adjustments(uint32_t *fs)
>>  __clear_bit(X86_FEATURE_CLWB, fs);
>>  }
>>  
>> +/*
>> + * Topology information is at the toolstack's discretion so these are
>> + * unconditionally set in max, but pick a default which matches the 
>> host.
>> + */
>> +__clear_bit(X86_FEATURE_HTT, fs);
>> +if ( cpu_has_htt )
>> +__set_bit(X86_FEATURE_HTT, fs);
>> +
>> +__clear_bit(X86_FEATURE_CMP_LEGACY, fs);
>> +if ( cpu_has_cmp_legacy )
>> +__set_bit(X86_FEATURE_CMP_LEGACY, fs);
> When running on a host with the respective bit clear, won't this break
> (at least older) Linux'es logic? Iirc the unconditional setting of at
> least HTT was tied to the unconditional multiplying by 2 of the vCPU ID
> to derive the (fake) APIC ID.

This patch doesn't change the default at all.

All it does is expose (properly) in the max policy what Xen would
tolerate the toolstack doing blindly.

If there are problems in certain configurations, then they continue to
be problems.

Although I'll note that the unconditional multiplying by 2 was never
about hyper-threading - Alejandro did some archaeology and found out
that it was an LAPIC vs IOAPIC error in vmxloader.  The connection to HT
has just been bad guesswork since.

~Andrew



[xen-unstable-smoke test] 184891: tolerable all pass - PUSHED

2024-03-04 Thread osstest service owner
flight 184891 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184891/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  fc84b4a5a37b9250d87ef63983b48e1953bba6d1
baseline version:
 xen  856664f774bd5b66301c5f9022126b61b8cb492d

Last test of basis   184839  2024-03-02 02:00:28 Z2 days
Testing same since   184891  2024-03-04 10:02:19 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Nicola Vetrini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   856664f774..fc84b4a5a3  fc84b4a5a37b9250d87ef63983b48e1953bba6d1 -> smoke



Re: [PATCH 5.10] xen/events: close evtchn after mapping cleanup

2024-03-04 Thread Greg KH
On Sat, Mar 02, 2024 at 08:03:57AM -0800, Andrew Panyakin wrote:
> From: Maximilian Heyne 
> 
> Commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream
> 
> shutdown_pirq and startup_pirq are not taking the
> irq_mapping_update_lock because they can't due to lock inversion. Both
> are called with the irq_desc->lock being taking. The lock order,
> however, is first irq_mapping_update_lock and then irq_desc->lock.
> 
> This opens multiple races:
> - shutdown_pirq can be interrupted by a function that allocates an event
>   channel:
> 
>   CPU0CPU1
>   shutdown_pirq {
> xen_evtchn_close(e)
>   __startup_pirq {
> EVTCHNOP_bind_pirq
>   -> returns just freed evtchn e
> set_evtchn_to_irq(e, irq)
>   }
> xen_irq_info_cleanup() {
>   set_evtchn_to_irq(e, -1)
> }
>   }
> 
>   Assume here event channel e refers here to the same event channel
>   number.
>   After this race the evtchn_to_irq mapping for e is invalid (-1).
> 
> - __startup_pirq races with __unbind_from_irq in a similar way. Because
>   __startup_pirq doesn't take irq_mapping_update_lock it can grab the
>   evtchn that __unbind_from_irq is currently freeing and cleaning up. In
>   this case even though the event channel is allocated, its mapping can
>   be unset in evtchn_to_irq.
> 
> The fix is to first cleanup the mappings and then close the event
> channel. In this way, when an event channel gets allocated it's
> potential previous evtchn_to_irq mappings are guaranteed to be unset already.
> This is also the reverse order of the allocation where first the event
> channel is allocated and then the mappings are setup.
> 
> On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
> [un]bind interfaces"), we hit a BUG like the following during probing of NVMe
> devices. The issue is that during nvme_setup_io_queues, pci_free_irq
> is called for every device which results in a call to shutdown_pirq.
> With many nvme devices it's therefore likely to hit this race during
> boot because there will be multiple calls to shutdown_pirq and
> startup_pirq are running potentially in parallel.
> 
>   [ cut here ]
>   blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; 
> indirect descriptors: enabled; bounce buffer: enabled
>   kernel BUG at drivers/xen/events/events_base.c:499!
>   invalid opcode:  [#1] SMP PTI
>   CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 
> 5.10.201-191.748.amzn2.x86_64 #1
>   Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
>   Workqueue: nvme-reset-wq nvme_reset_work
>   RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
>   Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 
> 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 
> 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
>   RSP: :c9000d533b08 EFLAGS: 00010046
>   RAX:  RBX:  RCX: 0006
>   RDX: 0028 RSI:  RDI: 
>   RBP: 888107419680 R08:  R09: 82d72b00
>   R10:  R11:  R12: 01ed
>   R13:  R14:  R15: 0002
>   FS:  () GS:88bc8b50() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2:  CR3: 02610001 CR4: 001706e0
>   DR0:  DR1:  DR2: 
>   DR3:  DR6: fffe0ff0 DR7: 0400
>   Call Trace:
>? show_trace_log_lvl+0x1c1/0x2d9
>? show_trace_log_lvl+0x1c1/0x2d9
>? set_affinity_irq+0xdc/0x1c0
>? __die_body.cold+0x8/0xd
>? die+0x2b/0x50
>? do_trap+0x90/0x110
>? bind_evtchn_to_cpu+0xdf/0xf0
>? do_error_trap+0x65/0x80
>? bind_evtchn_to_cpu+0xdf/0xf0
>? exc_invalid_op+0x4e/0x70
>? bind_evtchn_to_cpu+0xdf/0xf0
>? asm_exc_invalid_op+0x12/0x20
>? bind_evtchn_to_cpu+0xdf/0xf0
>? bind_evtchn_to_cpu+0xc5/0xf0
>set_affinity_irq+0xdc/0x1c0
>irq_do_set_affinity+0x1d7/0x1f0
>irq_setup_affinity+0xd6/0x1a0
>irq_startup+0x8a/0xf0
>__setup_irq+0x639/0x6d0
>? nvme_suspend+0x150/0x150
>request_threaded_irq+0x10c/0x180
>? nvme_suspend+0x150/0x150
>pci_request_irq+0xa8/0xf0
>? __blk_mq_free_request+0x74/0xa0
>queue_request_irq+0x6f/0x80
>nvme_create_queue+0x1af/0x200
>nvme_create_io_queues+0xbd/0xf0
>nvme_setup_io_queues+0x246/0x320
>? nvme_irq_check+0x30/0x30
>nvme_reset_work+0x1c8/0x400
>process_one_work+0x1b0/0x350
>worker_thread+0x49/0x310
>? process_one_work+0x350/0x350
>kthread+0x11b/0x140
>? __kthread_bind_mask+0x60/0x60
>ret_from_fork+0x22/0x30
>   Modules linked in:
>   ---[ 

Re: [PATCH] hvmloader/PCI: skip huge BARs in certain calculations

2024-03-04 Thread Jan Beulich
On 04.03.2024 11:02, Roger Pau Monné wrote:
> On Mon, Mar 04, 2024 at 08:32:22AM +0100, Jan Beulich wrote:
>> BARs of size 2Gb and up can't possibly fit below 4Gb: Both the bottom of
>> the lower 2Gb range and the top of the higher 2Gb range have special
>> purpose. Don't even have them influence whether to (perhaps) relocate
>> low RAM.
> 
> Here you mention 2Gb BARs, yet the code below sets the maximum BAR
> size supported below 4Gb to 1Gb.

Hmm, I'm puzzled: There are no other BAR sizes between 1Gb and 2Gb.
Anything up to 1Gb continues to work as is, while everything 2Gb and
up has behavior changed.

>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -33,6 +33,13 @@ uint32_t pci_mem_start = HVM_BELOW_4G_MM
>>  const uint32_t pci_mem_end = RESERVED_MEMBASE;
>>  uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>>  
>> +/*
>> + * BARs larger than this value are put in 64-bit space unconditionally.  
>> That
>> + * is, such BARs also don't play into the determination of how big the 
>> lowmem
>> + * MMIO hole needs to be.
>> + */
>> +#define HUGE_BAR_THRESH GB(1)
> 
> I would be fine with defining this to an even lower number, like
> 256Mb, as to avoid as much as possible memory relocation in order to
> make the MMIO hole bigger.

As suggested in a post-commit-message remark, the main question then is
how to justify this.

>> @@ -367,7 +376,7 @@ void pci_setup(void)
>>  pci_mem_start = hvm_info->low_mem_pgend << PAGE_SHIFT;
>>  }
>>  
>> -if ( mmio_total > (pci_mem_end - pci_mem_start) )
>> +if ( mmio_total > (pci_mem_end - pci_mem_start) || bar64_relocate )
>>  {
>>  printf("Low MMIO hole not large enough for all devices,"
>> " relocating some BARs to 64-bit\n");
> 
> Is the above message now accurate?  Given the current code the low
> MMIO could be expanded up to 2Gb, yet BAR relocation will happen
> unconditionally once a 1Gb BAR is found.

Well, "all" may not be quite accurate anymore, yet would making it e.g.
"all applicable" really much more meaningful?

>> @@ -446,8 +455,9 @@ void pci_setup(void)
>>   *   the code here assumes it to be.)
>>   * Should either of those two conditions change, this code will 
>> break.
>>   */
>> -using_64bar = bars[i].is_64bar && bar64_relocate
>> -&& (mmio_total > (mem_resource.max - mem_resource.base));
>> +using_64bar = bars[i].is_64bar && bar64_relocate &&
>> +(mmio_total > (mem_resource.max - mem_resource.base) ||
>> + bar_sz > HUGE_BAR_THRESH);
>>  bar_data = pci_readl(devfn, bar_reg);
>>  
>>  if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> @@ -467,7 +477,8 @@ void pci_setup(void)
>>  resource = _resource;
>>  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>>  }
>> -mmio_total -= bar_sz;
>> +if ( bar_sz <= HUGE_BAR_THRESH )
>> +mmio_total -= bar_sz;
> 
> I'm missing the part where hvmloader notifies QEMU of the possibly
> expanded base and size memory PCI MMIO regions, so that those are
> reflected in the PCI root complex registers?

I don't understand this comment: I'm not changing the interaction
with qemu at all. Whatever the new calculation it'll be communicated
to qemu just as before.

> Overall I think we could simplify the code by having a hardcoded 1Gb
> PCI MMIO hole below 4Gb, fill it with all the 32bit BARs and
> (re)locate all 64bit BARs above 4Gb (not that I'm requesting you to do
> it here).

I'm afraid that would not work very well with OSes which aren't 64-bit
BAR / PA aware (first and foremost non-PAE 32-bit ones). Iirc that's
the reason why it wasn't done like you suggest back at the time.

Jan



[ovmf test] 184892: all pass - PUSHED

2024-03-04 Thread osstest service owner
flight 184892 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184892/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1ae5bee967bffcd6dbbabca913ea3c65d8f09c76
baseline version:
 ovmf 3775122ede395d934198ffdb0c173875a5e94c00

Last test of basis   184889  2024-03-04 07:41:17 Z0 days
Testing same since   184892  2024-03-04 10:11:18 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gua Guo 
  Himanshu Sharma 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   3775122ede..1ae5bee967  1ae5bee967bffcd6dbbabca913ea3c65d8f09c76 -> 
xen-tested-master



Re: [PATCH] x86/cpu-policy: Hide x2APIC from PV guests

2024-03-04 Thread Andrew Cooper
On 04/03/2024 8:33 am, Jan Beulich wrote:
> On 29.02.2024 23:14, Andrew Cooper wrote:
>> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
>> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
>> CPUID bit saying that they can.
> This argumentation could then be used equally for the APIC bit. Why is it
> correct to hide x2APIC, but not APIC?

In an ideal world we'd hide APIC too.

But Linux pvops doesn't tolerate it, and I'm not sure classic dom0 did
either.

>> Right now, the host x2APIC flag filters into PV guests, meaning that PV 
>> guests
>> generally see x2APIC except on Zen1-and-older AMD systems.
>>
>> Linux works around this by explicitly hiding the bit itself, and filtering
>> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
>> guests, and entirely ignores the APIC when built as a PV guest.
>>
>> Change the annotation from !A to !S.  This has a consequence of stripping it
>> out of both PV featuremasks.  However, as existing guests may have seen the
>> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
>> enough to migrate will have ignored it one way or another.
>>
>> Hiding x2APIC does also change the contents of leaf 0xb, but as the
>> information is nonsense to begin with, this is likely an improvement on the
>> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
>> with the host's topology structure, and the APIC_IDs are useless without an
>> MADT to start with.  Dom0 is the only PV VM to get an MADT but it's the host
>> one, meaning the two sets of APIC_IDs are from different address spaces.
> Aiui the field will now be seen as zero on all CPUs. Is all CPUs having the
> same APIC ID there really better than them all having different ones?

Again - we're taking something that was conditionally like this, and
making it unconditionally like this.

When x2APIC is hidden, so is the precondition for the data in this leaf
being valid.

> Also while I agree that logically CPUID leaf 0xb EDX is tied to x2APIC being
> available as a feature, nothing is said in this regard in the documentation
> of that leaf. This in particular leaves open whether on a system with x2APIC
> disabled in/by firmware the value would actually be zero. Yet that case could
> be considered somewhat similar to the PV case here.

I'm not aware of there being a capability to disable x2APIC in firmware.

The only choices you got were which mode to default to, and
(occasionally) whether to set the x2APIC opt-out in DMAR.

~Andrew



[ANNOUNCE] Call for agenda items for Community Call

2024-03-04 Thread Kelly Choi
Hi all,

Please add your proposed agenda items below.

https://cryptpad.fr/pad/#/2/pad/edit/VlIkjPmILf0abDG3j1Wn0091/

If any action items are missing or have been resolved, please add/remove
them from the sheet.

*CALL LINK: https://meet.jit.si/XenProjectCommunityCall
*

*DATE: Thursday 7th March 2024*

*TIME: 1600 UTC (4 pm UK time)*
*Note the following administrative conventions for the call:*












** To allow time to switch between meetings, we plan on starting theagenda
at 16:05 UTC sharp.  Aim to join by 16:03 UTC if possible to allocatetime
to sort out technical difficulties.* If you want to be CC'ed please add or
remove yourself from thesign-up-sheet
at https://cryptpad.fr/pad/#/2/pad/edit/D9vGzihPxxAOe6RFPz0sRCf+/
== Dial-in
Information ==## Meeting time16:00 - 17:00 British timeFurther
International meeting
times:https://www.timeanddate.com/worldclock/meetingdetails.html?year=2024=3=7=16=0=0=1234=37=224=179

  *

Thanks,
Kelly
Community Manager
Xen Project

## Dial in details
https://meet.jit.si/static/dialInInfo.html?room=XenProjectCommunityCall


Re: [PATCH] hvmloader/PCI: skip huge BARs in certain calculations

2024-03-04 Thread Alejandro Vallejo
Hi,

On 04/03/2024 07:32, Jan Beulich wrote:
> BARs of size 2Gb and up can't possibly fit below 4Gb: Both the bottom of
> the lower 2Gb range and the top of the higher 2Gb range have special
> purpose. Don't even have them influence whether to (perhaps) relocate
> low RAM.
>
> Reported-by: Neowutran 
> Signed-off-by: Jan Beulich 
> ---
> If we wanted to fit e.g. multiple 1Gb BARs, it would likely be prudent
> to similarly avoid low RAM relocation in the first place. Yet accounting
> for things differently depending on how many large BARs there are would
> require more intrusive code changes.>
> That said, I'm open to further lowering of the threshold. That'll
> require different justification then, though.
>
GPUs without resizable BARs tend to expose 256MiB on the BAR regardless
of the amount of internal VRAM, AFAIK. So IMO 256 MiB is not a totally
unmotivated number. I don't know which other PCIe devices might expose
such big BARs for comparison.

Cheers,
Alejandro



Re: Link to the 2023 xen summit schedule is broken

2024-03-04 Thread Kelly Choi
Hi Marek,

Try this link, it should show the sessions for 2023.
https://xen2023.sched.com/

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Sun, Mar 3, 2024 at 3:01 AM Marek Marczykowski-Górecki <
marma...@invisiblethingslab.com> wrote:

> Hi,
>
> The link to the last year schedule at
> https://events.linuxfoundation.org/archive/2023/xen-project-summit/ is
> broken, it opens a page for upcoming "ONE summit" event that doesn't
> look to be related to Xen Summit.
>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
>


XEN SUMMIT CFP DEADLINE EXTENDED BY 1 WEEK!

2024-03-04 Thread Kelly Choi
Hi everyone,

Good news, our CFP deadline for Xen Summit has been extended by one week to
Sunday 10th March 2024.

Please submit your talks before then!

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Tue, Feb 27, 2024 at 10:22 AM Kelly Choi  wrote:

> Hi everyone,
>
> *Just a reminder that our CFP for Xen Summit 2024 is at the end of this
> week! *
>
> Please submit your talks before then:
> https://events.linuxfoundation.org/xen-project-summit/program/cfp/
>
> We look forward to seeing you.
>
> Many thanks,
> Kelly Choi
>
> Community Manager
> Xen Project
>


Re: [PATCH] MAINTAINERS: drop AMD SVM and Intel VT-x sections

2024-03-04 Thread Roger Pau Monné
On Mon, Mar 04, 2024 at 10:27:44AM +0100, Jan Beulich wrote:
> We'd like to thank the VT-x maintainers for their past contributions,
> while at the same time we'd like to reflect reality as it has been for
> quite some time. Have VT-x maintainership (and for symmetry also AMD
> SVM's) fall back to general x86.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks.



Re: [PATCH v2] tools/xentop: Add VBD3 support to xentop

2024-03-04 Thread Fouad Hilly
On Tue, Feb 27, 2024 at 6:32 PM Anthony PERARD  wrote:
>
> On Tue, Feb 27, 2024 at 01:26:28PM +, Fouad Hilly wrote:
> > diff --git a/tools/libs/stat/xenstat_linux.c 
> > b/tools/libs/stat/xenstat_linux.c
> > index cbba54aa83ee..6d82e204aad4 100644
> > --- a/tools/libs/stat/xenstat_linux.c
> > +++ b/tools/libs/stat/xenstat_linux.c
> > @@ -390,6 +390,38 @@ void xenstat_uninit_networks(xenstat_handle * handle)
> >   fclose(priv->procnetdev);
> >  }
> >
> > +static int read_attributes_vbd3(char *vbd3_path, xenstat_vbd *vbd)
>
> "const char *vbd3_path"?
>
>
> >  static int read_attributes_vbd(const char *vbd_directory, const char 
> > *what, char *ret, int cap)
> >  {
> >   static char file_name[80];
> > @@ -438,7 +470,7 @@ int xenstat_collect_vbds(xenstat_node * node)
> >   int ret;
> >   char buf[256];
> >
> > - ret = sscanf(dp->d_name, "%3s-%u-%u", buf, , );
> > + ret = sscanf(dp->d_name, "%255[^-]-%u-%u", buf, , 
> > );
>
> 255 is overly ambitious, but it match the size of buf, so I guess it's
> kind of ok, even if unnecessary.
>
> >   if (ret != 3)
> >   continue;
> >   if (!(strstr(buf, "vbd")) && !(strstr(buf, "tap")))
> > @@ -448,6 +480,8 @@ int xenstat_collect_vbds(xenstat_node * node)
> >   vbd.back_type = 1;
> >   else if (strcmp(buf,"tap") == 0)
> >   vbd.back_type = 2;
> > + else if (strcmp(buf,"vbd3") == 0)
> > + vbd.back_type = 3;
>
> Yay for magic number... Do you think you could introduce an enum or
> define to replace this "3" by a meaningful? Maybe something like
> XENSTAT_VBD_TYPE_VBD3, (name derived from the existing function
> xenstat_vbd_type()).
>
> I'd like at least to replace the "3". But if you feel like having
> another patch to replace the "2" and "1", that would be a plus.
>
> >   else
> >   vbd.back_type = 0;
> >
> > @@ -479,6 +513,35 @@ int xenstat_collect_vbds(xenstat_node * node)
> >   vbd.error = 1;
> >   }
> >   }
> > + else if (vbd.back_type == 3)
> > + {
> > + char *td3_pid;
> > + char *path;
> > +
> > + vbd.back_type = 3;
>
> `back_type` should already be 3 ;-).
>
> > + vbd.error = 0;
> > +
> > + if (asprintf(, 
> > "/local/domain/0/backend/vbd3/%u/%u/kthread-pid", domid, vbd.dev) < 0)
> > + continue;
> > +
> > + td3_pid = xs_read(node->handle->xshandle, XBT_NULL, 
> > path, NULL);
> > +
> > + free(path);
> > +
> > + if (td3_pid == NULL)
> > + continue;
> > +
> > + if (asprintf(, "/dev/shm/td3-%s/vbd-%u-%u", 
> > td3_pid, domid, vbd.dev) < 0) {
> > + free(td3_pid);
> > + continue;
> > + }
> > +
> > + if (read_attributes_vbd3(path, ) < 0)
> > + vbd.error = 1;
>
> Why sometime we do "continue" and sometime we do "vbd.error=1"?
continue is used when path to domain statistics is checked, which is
"dynamic"; However, if the path existed but failed to read statistics,
that is when we set the error.

>
> > +
> > + free(td3_pid);
> > + free(path);
> > + }
> >   else
> >   {
> >   vbd.error = 1;
> > diff --git a/tools/libs/stat/xenstat_priv.h b/tools/libs/stat/xenstat_priv.h
> > index 4eb44a8ebb84..c3a9635240e9 100644
> > --- a/tools/libs/stat/xenstat_priv.h
> > +++ b/tools/libs/stat/xenstat_priv.h
> > @@ -98,6 +98,22 @@ struct xenstat_vbd {
> >   unsigned long long wr_sects;
> >  };
> >
> > +struct vbd3_stats {
> > + uint32_t version;
> > + uint32_t __pad;
> > + uint64_t oo_reqs;
> > + uint64_t read_reqs_submitted;
> > + uint64_t read_reqs_completed;
> > + uint64_t read_sectors;
> > + uint64_t read_total_ticks;
> > + uint64_t write_reqs_submitted;
> > + uint64_t write_reqs_completed;
> > + uint64_t write_sectors;
> > + uint64_t write_total_ticks;
> > + uint64_t io_errors;
> > + uint64_t flags;
> > +};
>
> Is that a binary interface for an external project? I'm asking because
> `__pad` would seems useless otherwise.
> If it is part of an interface, please add a comment about it, add a link
> to the project/source where this interface is described, and where is
> the canonical location? Also, is there an header we could maybe just
> use if it's in the system or an header we could import into the
> repository?
>
> Thanks,
>
> --
> Anthony PERARD



Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.6

2024-03-04 Thread Federico Serafini

On 04/03/24 11:28, Federico Serafini wrote:

Update ECLAIR configuration to take into account the deviations
agreed during MISRA meetings.

Signed-off-by: Federico Serafini 
---
Changes in v2:
- rephrased to make sure the deviation is not interpreted as a suggestion.
---
  automation/eclair_analysis/ECLAIR/deviations.ecl | 4 
  docs/misra/deviations.rst| 6 ++
  2 files changed, 10 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index 02eae39786..9ac3ee4dfd 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -378,6 +378,10 @@ explicit comment indicating the fallthrough intention is 
present."
  -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* 
[fF]all ?through.? \\*/.*$,0..1"}
  -doc_end
  
+-doc_begin="A switch statement with a single switch clause and no default label may be used in place of an equivalent if statement if it is considered to improve readability."

+-config=MC3R1.R16.6,switch_clauses+={deliberate, "default(0)"}
+-doc_end
+
  #
  # Series 18.
  #
diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
index 123c78e20a..ce855ddae6 100644
--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -322,6 +322,12 @@ Deviations related to MISRA C:2012 Rules:
   - /\* Fallthrough \*/
   - /\* Fallthrough. \*/
  
+   * - R16.6

+ - A switch statement with a single switch clause and no default label may
+   be used in place of an equivalent if statement if it is considered to
+   improve readability.
+ - Tagged as `deliberate` for ECLAIR.
+
 * - R20.7
   - Code violating Rule 20.7 is safe when macro parameters are used:
 (1) as function arguments;


I forgot the "--reroll-count",
I'm sorry.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



[XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.6

2024-03-04 Thread Federico Serafini
Update ECLAIR configuration to take into account the deviations
agreed during MISRA meetings.

Signed-off-by: Federico Serafini 
---
Changes in v2:
- rephrased to make sure the deviation is not interpreted as a suggestion.
---
 automation/eclair_analysis/ECLAIR/deviations.ecl | 4 
 docs/misra/deviations.rst| 6 ++
 2 files changed, 10 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index 02eae39786..9ac3ee4dfd 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -378,6 +378,10 @@ explicit comment indicating the fallthrough intention is 
present."
 -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* 
[fF]all ?through.? \\*/.*$,0..1"}
 -doc_end
 
+-doc_begin="A switch statement with a single switch clause and no default 
label may be used in place of an equivalent if statement if it is considered to 
improve readability."
+-config=MC3R1.R16.6,switch_clauses+={deliberate, "default(0)"}
+-doc_end
+
 #
 # Series 18.
 #
diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
index 123c78e20a..ce855ddae6 100644
--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -322,6 +322,12 @@ Deviations related to MISRA C:2012 Rules:
  - /\* Fallthrough \*/
  - /\* Fallthrough. \*/
 
+   * - R16.6
+ - A switch statement with a single switch clause and no default label may
+   be used in place of an equivalent if statement if it is considered to
+   improve readability.
+ - Tagged as `deliberate` for ECLAIR.
+
* - R20.7
  - Code violating Rule 20.7 is safe when macro parameters are used:
(1) as function arguments;
-- 
2.34.1




Re: [PATCH] hvmloader/PCI: skip huge BARs in certain calculations

2024-03-04 Thread Roger Pau Monné
On Mon, Mar 04, 2024 at 08:32:22AM +0100, Jan Beulich wrote:
> BARs of size 2Gb and up can't possibly fit below 4Gb: Both the bottom of
> the lower 2Gb range and the top of the higher 2Gb range have special
> purpose. Don't even have them influence whether to (perhaps) relocate
> low RAM.

Here you mention 2Gb BARs, yet the code below sets the maximum BAR
size supported below 4Gb to 1Gb.

> Reported-by: Neowutran 
> Signed-off-by: Jan Beulich 
> ---
> If we wanted to fit e.g. multiple 1Gb BARs, it would likely be prudent
> to similarly avoid low RAM relocation in the first place. Yet accounting
> for things differently depending on how many large BARs there are would
> require more intrusive code changes.
> 
> That said, I'm open to further lowering of the threshold. That'll
> require different justification then, though.
> 
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -33,6 +33,13 @@ uint32_t pci_mem_start = HVM_BELOW_4G_MM
>  const uint32_t pci_mem_end = RESERVED_MEMBASE;
>  uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  
> +/*
> + * BARs larger than this value are put in 64-bit space unconditionally.  That
> + * is, such BARs also don't play into the determination of how big the lowmem
> + * MMIO hole needs to be.
> + */
> +#define HUGE_BAR_THRESH GB(1)

I would be fine with defining this to an even lower number, like
256Mb, as to avoid as much as possible memory relocation in order to
make the MMIO hole bigger.

> +
>  enum virtual_vga virtual_vga = VGA_none;
>  unsigned long igd_opregion_pgbase = 0;
>  
> @@ -286,9 +293,11 @@ void pci_setup(void)
>  bars[i].bar_reg = bar_reg;
>  bars[i].bar_sz  = bar_sz;
>  
> -if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -  PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> - (bar_reg == PCI_ROM_ADDRESS) )
> +if ( is_64bar && bar_sz > HUGE_BAR_THRESH )
> +bar64_relocate = 1;
> +else if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> +  (bar_reg == PCI_ROM_ADDRESS) )
>  mmio_total += bar_sz;
>  
>  nr_bars++;
> @@ -367,7 +376,7 @@ void pci_setup(void)
>  pci_mem_start = hvm_info->low_mem_pgend << PAGE_SHIFT;
>  }
>  
> -if ( mmio_total > (pci_mem_end - pci_mem_start) )
> +if ( mmio_total > (pci_mem_end - pci_mem_start) || bar64_relocate )
>  {
>  printf("Low MMIO hole not large enough for all devices,"
> " relocating some BARs to 64-bit\n");

Is the above message now accurate?  Given the current code the low
MMIO could be expanded up to 2Gb, yet BAR relocation will happen
unconditionally once a 1Gb BAR is found.

> @@ -446,8 +455,9 @@ void pci_setup(void)
>   *   the code here assumes it to be.)
>   * Should either of those two conditions change, this code will 
> break.
>   */
> -using_64bar = bars[i].is_64bar && bar64_relocate
> -&& (mmio_total > (mem_resource.max - mem_resource.base));
> +using_64bar = bars[i].is_64bar && bar64_relocate &&
> +(mmio_total > (mem_resource.max - mem_resource.base) ||
> + bar_sz > HUGE_BAR_THRESH);
>  bar_data = pci_readl(devfn, bar_reg);
>  
>  if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
> @@ -467,7 +477,8 @@ void pci_setup(void)
>  resource = _resource;
>  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>  }
> -mmio_total -= bar_sz;
> +if ( bar_sz <= HUGE_BAR_THRESH )
> +mmio_total -= bar_sz;

I'm missing the part where hvmloader notifies QEMU of the possibly
expanded base and size memory PCI MMIO regions, so that those are
reflected in the PCI root complex registers?

Overall I think we could simplify the code by having a hardcoded 1Gb
PCI MMIO hole below 4Gb, fill it with all the 32bit BARs and
(re)locate all 64bit BARs above 4Gb (not that I'm requesting you to do
it here).

Thanks, Roger.



[ovmf test] 184889: all pass - PUSHED

2024-03-04 Thread osstest service owner
flight 184889 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184889/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 3775122ede395d934198ffdb0c173875a5e94c00
baseline version:
 ovmf 275d0a39c42ad73a6e4929822f56f5d8c16ede96

Last test of basis   184836  2024-03-01 19:13:07 Z2 days
Testing same since   184889  2024-03-04 07:41:17 Z0 days1 attempts


People who touched revisions under test:
  Jason Lou 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   275d0a39c4..3775122ede  3775122ede395d934198ffdb0c173875a5e94c00 -> 
xen-tested-master



Re: [PATCH] MAINTAINERS: drop AMD SVM and Intel VT-x sections

2024-03-04 Thread Andrew Cooper
On 04/03/2024 9:27 am, Jan Beulich wrote:
> We'd like to thank the VT-x maintainers for their past contributions,
> while at the same time we'd like to reflect reality as it has been for
> quite some time. Have VT-x maintainership (and for symmetry also AMD
> SVM's) fall back to general x86.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 



Re: [PATCH 5/6] nestedsvm: Remove bogus debug message from nestedsvm_check_intercepts

2024-03-04 Thread George Dunlap
On Tue, Feb 27, 2024 at 12:47 AM Andrew Cooper
 wrote:
>
> On 06/02/2024 1:20 am, George Dunlap wrote:
> > Changeset ef3e8db8068 ("x86/hvm: Corrections and improvements to
> > unhandled vmexit logging") introduced a printk to the default path of
> > the switch statement in nestedsvm_check_intercepts(), complaining of
> > an unknown exit reason.
> >
> > Unfortunately, the "core" switch statement which is meant to handle
> > all vmexit reasons is in nsvm_vmcb_guest_intercepts_exitcode(); the
> > switch statement in nestedsvm_check_intercepts() is only meant to
> > superimpose on top of that some special-casing for how to interaction
> > between L1 and L0 vmexits.
> >
> > Remove the printk, and add a comment to prevent future confusion.
> >
> > Signed-off-by: George Dunlap 
>
> Erm...   The addition of this printk was very deliberate, to point out
> where security fixes are needed.
>
> It's not bogus in the slightest.  It is an error for exit reasons to not
> be inspected for safety in this path.

I'm a bit at a loss how to respond to this.  As I wrote above, exit
reasons are inspected for safety in this path -- in
nsvm_vmcb_guest_intercepts_exitcode().  If you want the patch
reverted, you'll have to explain why that's not sufficient.

 -George



Re: [XEN PATCH 08/10] xen/errno: address violations of MISRA C Rule 20.7

2024-03-04 Thread Jan Beulich
On 29.02.2024 16:28, Nicola Vetrini wrote:
> MISRA C Rule 20.7 states: "Expressions resulting from the expansion
> of macro parameters shall be enclosed in parentheses". Therefore, some
> macro definitions should gain additional parentheses to ensure that all
> current and future users will be safe with respect to expansions that
> can possibly alter the semantics of the passed-in macro parameter.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Acked-by: Jan Beulich 





[PATCH] MAINTAINERS: drop AMD SVM and Intel VT-x sections

2024-03-04 Thread Jan Beulich
We'd like to thank the VT-x maintainers for their past contributions,
while at the same time we'd like to reflect reality as it has been for
quite some time. Have VT-x maintainership (and for symmetry also AMD
SVM's) fall back to general x86.

Signed-off-by: Jan Beulich 

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -220,13 +220,6 @@ M: Andrew Cooper 
-M: Andrew Cooper 
-S: Supported
-F: xen/arch/x86/hvm/svm/
-F: xen/arch/x86/cpu/vpmu_amd.c
-
 ARGO
 M: Christopher Clark 
 S: Maintained
@@ -357,15 +350,6 @@ M: Kevin Tian 
 S: Supported
 F: xen/drivers/passthrough/vtd/
 
-INTEL(R) VT FOR X86 (VT-X)
-M: Jun Nakajima 
-M: Kevin Tian 
-S: Supported
-F: xen/arch/x86/cpu/vpmu_intel.c
-F: xen/arch/x86/hvm/vmx/
-F: xen/arch/x86/include/asm/hvm/vmx/
-F: xen/arch/x86/mm/p2m-ept.c
-
 IOMMU VENDOR INDEPENDENT CODE
 M: Jan Beulich 
 M: Paul Durrant 



[PATCH 3/3] x86: switch to eager fpu save / restore

2024-03-04 Thread Fouad Hilly
From: Wei Liu 

Previously FPU is lazily switched. Due to the fact that a malicious
guest can speculatively read the not yet switched out register states,
we need to eagerly switch FPU states when a domain is scheduled to
run.

In the new world, Xen will eagerly switch FPU context in the
scheduler. Xen itself won't set CR0.TS other than for the purpose of
servicing a PV guest.

The following things are done:

1. Xen will only set and clear CR0.TS on behalf of a PV guest. Any #NM
   received in Xen should only be delivered to the running PV guest.

2. Xen no longer causes vmexit for #NM for HVM guests when nested HVM
   is not in use.

3. When nested HVM is in use, Xen needs to trap #NM if specified by
   the L1 hypervisor, but all #NM handling is left to L1 hypervisor
   to deal with.

4. Xen saves and restores FPU states wherever it needs to. The
   following places are modified:
   1. Scheduling in and out a guest;
   2. Calling EFI runtime service;
   3. ACPI reset;
   4. x86 insn emulator fpu insn emulation.

5. Treat FPU as always initialised. Adjust following components:
   1. HVM vcpu context save / load code;
   2. arch_{get,set}_info_guest;
   3. VLAPIC code.

6. Delete lazy FPU handling code.

Strip XCR0 and IA32_XSS manipulation from __context_switch. We need to
be able to zero out previously used state components. Push everything
into fpu_xrstor as that's the most suitable place.

Tested on AMD with PKU disabled and Intel, no performance degradation.

Signed-off-by: Wei Liu 
Signed-off-by: Roger Pau Monné 
Signed-off-by: Fouad Hilly 
---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: "Roger Pau Monné" 
CC: Wei Liu 
CC: George Dunlap 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Paul Durrant 
CC: Jun Nakajima 
CC: Kevin Tian 
---
 xen/arch/x86/cpu/common.c|   3 -
 xen/arch/x86/domain.c|  10 +-
 xen/arch/x86/domctl.c|   4 +-
 xen/arch/x86/hvm/emulate.c   |  38 +
 xen/arch/x86/hvm/hvm.c   |   8 +-
 xen/arch/x86/hvm/svm/nestedsvm.c |  62 +--
 xen/arch/x86/hvm/svm/svm.c   |  81 +
 xen/arch/x86/hvm/svm/vmcb.c  |   4 +-
 xen/arch/x86/hvm/vlapic.c|   4 -
 xen/arch/x86/hvm/vmx/vmcs.c  |   8 +-
 xen/arch/x86/hvm/vmx/vmx.c   |  70 +---
 xen/arch/x86/hvm/vmx/vvmx.c  |  15 +-
 xen/arch/x86/i387.c  | 164 ---
 xen/arch/x86/include/asm/domain.h|   3 -
 xen/arch/x86/include/asm/hvm/svm/nestedsvm.h |   3 -
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h  |   2 -
 xen/arch/x86/include/asm/i387.h  |   3 +-
 xen/arch/x86/include/asm/xstate.h|  17 +-
 xen/arch/x86/pv/misc-hypercalls.c|   3 +-
 xen/arch/x86/traps.c |  18 +-
 xen/arch/x86/xstate.c|  26 ---
 xen/common/domain.c  |   2 -
 xen/common/efi/runtime.c |   9 +-
 xen/include/xen/sched.h  |   4 -
 24 files changed, 72 insertions(+), 489 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 28d7f34c4dbe..9ea748f959af 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -967,9 +967,6 @@ void cpu_init(void)
/* Install correct page table. */
write_ptbase(current);
 
-   /* Ensure FPU gets initialised for each domain. */
-   stts();
-
/* Reset debug registers: */
write_debugreg(0, 0);
write_debugreg(1, 0);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bda853e3c92b..742d69dd93bb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1996,15 +1996,7 @@ static void __context_switch(void)
 if ( !is_idle_domain(nd) )
 {
 memcpy(stack_regs, >arch.user_regs, CTXT_SWITCH_STACK_BYTES);
-if ( cpu_has_xsave )
-{
-if ( !set_xcr0(n->arch.xcr0 ?: XSTATE_FP_SSE) )
-BUG();
-
-if ( cpu_has_xsaves && is_hvm_vcpu(n) )
-set_msr_xss(n->arch.msrs->xss.raw);
-}
-vcpu_restore_fpu_nonlazy(n, false);
+vcpu_restore_fpu(n);
 nd->arch.ctxt_switch->to(n);
 }
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9a72d57333e9..d838427feb52 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -967,7 +967,6 @@ long arch_do_domctl(
 
 v->arch.xcr0 = _xcr0;
 v->arch.xcr0_accum = _xcr0_accum;
-v->arch.nonlazy_xstate_used = _xcr0_accum & XSTATE_NONLAZY;
 compress_xsave_states(v, _xsave_area,
   evc->size - PV_XSAVE_HDR_SIZE);
 
@@ -1347,8 +1346,7 @@ void arch_get_info_guest(struct vcpu *v, 
vcpu_guest_context_u c)
 c(flags = v->arch.pv.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
 else

[PATCH 1/3] x86: i387.c cleanup

2024-03-04 Thread Fouad Hilly
From: Wei Liu 

No functional change

Signed-off-by: Wei Liu 
Signed-off-by: Fouad Hilly 
---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: "Roger Pau Monné" 
CC: Wei Liu 
---
 xen/arch/x86/i387.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index d824f2bb5294..7a4297cc921e 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -26,7 +26,7 @@ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
 
 ASSERT(v->arch.xsave_area);
 /*
- * XCR0 normally represents what guest OS set. In case of Xen itself, 
+ * XCR0 normally represents what guest OS set. In case of Xen itself,
  * we set the accumulated feature mask before doing save/restore.
  */
 ok = set_xcr0(v->arch.xcr0_accum | XSTATE_FP_SSE);
@@ -36,7 +36,7 @@ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
 ASSERT(ok);
 }
 
-/* Restor x87 FPU, MMX, SSE and SSE2 state */
+/* Restore x87 FPU, MMX, SSE and SSE2 state */
 static inline void fpu_fxrstor(struct vcpu *v)
 {
 const typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt;
@@ -307,7 +307,7 @@ void save_fpu_enable(void)
 int vcpu_init_fpu(struct vcpu *v)
 {
 int rc;
-
+
 v->arch.fully_eager_fpu = opt_eager_fpu;
 
 if ( (rc = xstate_alloc_save_area(v)) != 0 )
-- 
2.42.0




[PATCH 0/3] X86/eager-fpu: Switch to eager fpu save/restore

2024-03-04 Thread Fouad Hilly
X86 Xen will only eagerly switch FPU context in the scheduler.
Xen itslef won't set CR0.TS other than for the purpose of servicing
a PV guset.

Signed-off-by: Wei Liu 
Signed-off-by: Roger Pau Monné 
Signed-off-by: Fouad Hilly 
---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: "Roger Pau Monné" 
CC: Wei Liu 
CC: George Dunlap 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Paul Durrant 
CC: Jun Nakajima 
CC: Kevin Tian 

Wei Liu (3):
  x86: i387.c cleanup
  x86: introduce xstate_zero
  x86: switch to eager fpu save / restore

 xen/arch/x86/cpu/common.c|   3 -
 xen/arch/x86/domain.c|  10 +-
 xen/arch/x86/domctl.c|   4 +-
 xen/arch/x86/hvm/emulate.c   |  38 +
 xen/arch/x86/hvm/hvm.c   |   8 +-
 xen/arch/x86/hvm/svm/nestedsvm.c |  62 +--
 xen/arch/x86/hvm/svm/svm.c   |  81 +
 xen/arch/x86/hvm/svm/vmcb.c  |   4 +-
 xen/arch/x86/hvm/vlapic.c|   4 -
 xen/arch/x86/hvm/vmx/vmcs.c  |   8 +-
 xen/arch/x86/hvm/vmx/vmx.c   |  70 +---
 xen/arch/x86/hvm/vmx/vvmx.c  |  15 +-
 xen/arch/x86/i387.c  | 170 +--
 xen/arch/x86/include/asm/domain.h|   3 -
 xen/arch/x86/include/asm/hvm/svm/nestedsvm.h |   3 -
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h  |   2 -
 xen/arch/x86/include/asm/i387.h  |   3 +-
 xen/arch/x86/include/asm/xstate.h|  18 +-
 xen/arch/x86/pv/misc-hypercalls.c|   3 +-
 xen/arch/x86/traps.c |  18 +-
 xen/arch/x86/xstate.c|  65 ---
 xen/common/domain.c  |   2 -
 xen/common/efi/runtime.c |   9 +-
 xen/include/xen/sched.h  |   4 -
 24 files changed, 108 insertions(+), 499 deletions(-)

-- 
2.42.0




[PATCH 2/3] x86: introduce xstate_zero

2024-03-04 Thread Fouad Hilly
From: Wei Liu 

Factor out xrstor__ and introduce xstate_zero, which zeros all the
state components specified in the mask.

Signed-off-by: Wei Liu 
Signed-off-by: Fouad Hilly 
---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: "Roger Pau Monné" 
CC: Wei Liu 
---
 xen/arch/x86/include/asm/xstate.h |  1 +
 xen/arch/x86/xstate.c | 39 +--
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index c08c267884f0..bd767d9cd714 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -94,6 +94,7 @@ uint64_t get_msr_xss(void);
 uint64_t read_bndcfgu(void);
 void xsave(struct vcpu *v, uint64_t mask);
 void xrstor(struct vcpu *v, uint64_t mask);
+void xstate_zero(uint64_t mask);
 void xstate_set_init(uint64_t mask);
 bool xsave_enabled(const struct vcpu *v);
 int __must_check validate_xstate(const struct domain *d,
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index cf94761d0542..92a65bd8d52c 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -368,11 +368,12 @@ void xsave(struct vcpu *v, uint64_t mask)
 ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] = fip_width;
 }
 
-void xrstor(struct vcpu *v, uint64_t mask)
+/* True -> no error, false -> failed */
+static bool xrstor__(struct xsave_struct *ptr, uint64_t xcr0_accum,
+ uint64_t mask)
 {
 uint32_t hmask = mask >> 32;
 uint32_t lmask = mask;
-struct xsave_struct *ptr = v->arch.xsave_area;
 unsigned int faults, prev_faults;
 
 /*
@@ -412,7 +413,7 @@ void xrstor(struct vcpu *v, uint64_t mask)
  [ptr] "D" (ptr) )
 
 #define XRSTOR(pfx) \
-if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) \
+if ( xcr0_accum & XSTATE_XSAVES_ONLY ) \
 { \
 if ( unlikely(!(ptr->xsave_hdr.xcomp_bv & \
 XSTATE_COMPACTION_ENABLED)) ) \
@@ -461,7 +462,7 @@ void xrstor(struct vcpu *v, uint64_t mask)
   ((mask & X86_XCR0_YMM) &&
!(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) )
 ptr->fpu_sse.mxcsr &= mxcsr_mask;
-if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY )
+if ( xcr0_accum & XSTATE_XSAVES_ONLY )
 {
 ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss);
 ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv;
@@ -478,14 +479,35 @@ void xrstor(struct vcpu *v, uint64_t mask)
 case 2: /* Stage 2: Reset all state. */
 ptr->fpu_sse.mxcsr = MXCSR_DEFAULT;
 ptr->xsave_hdr.xstate_bv = 0;
-ptr->xsave_hdr.xcomp_bv = v->arch.xcr0_accum & XSTATE_XSAVES_ONLY
+ptr->xsave_hdr.xcomp_bv = xcr0_accum & XSTATE_XSAVES_ONLY
   ? XSTATE_COMPACTION_ENABLED : 0;
 continue;
 }
 
-domain_crash(current->domain);
-return;
+return false;
 }
+
+return true;
+}
+
+void xrstor(struct vcpu *v, uint64_t mask)
+{
+if ( !xrstor__(v->arch.xsave_area, v->arch.xcr0_accum, mask) )
+domain_crash(current->domain);
+}
+
+void xstate_zero(uint64_t mask)
+{
+bool ok;
+struct xsave_struct tmp;
+
+tmp.fpu_sse.mxcsr = MXCSR_DEFAULT;
+tmp.xsave_hdr.xstate_bv = 0;
+tmp.xsave_hdr.xcomp_bv = 0;
+memset(tmp.xsave_hdr.reserved, 0, sizeof(tmp.xsave_hdr.reserved));
+
+ok = xrstor__(, mask, mask);
+ASSERT(ok);
 }
 
 bool xsave_enabled(const struct vcpu *v)
@@ -731,6 +753,9 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xcr0_max) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* Zero state components before writing new XCR0 */
+xstate_zero(get_xcr0());
+
 /* By this point, new_bv really should be accepted by hardware. */
 if ( unlikely(!set_xcr0(new_bv)) )
 {
-- 
2.42.0




[PATCH v2 1/1] x86/fred: Fix init_task thread stack pointer initialization

2024-03-04 Thread Xin Li (Intel)
As TOP_OF_KERNEL_STACK_PADDING is defined as 0 on x86_64, no one noticed
it's missing in the calculation of the .sp field in INIT_THREAD until it
is defined to 16 with CONFIG_X86_FRED=y.

Subtract TOP_OF_KERNEL_STACK_PADDING from the .sp field of INIT_THREAD.

Fixes: 65c9cc9e2c14 ("x86/fred: Reserve space for the FRED stack frame")
Fixes: 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-lkp/202402262159.183c2a37-...@intel.com
Signed-off-by: Xin Li (Intel) 
---

Change Since v1:
* Apply offset TOP_OF_KERNEL_STACK_PADDING to all uses of __end_init_task
  (Brian Gerst).
---
 arch/x86/include/asm/processor.h | 6 --
 arch/x86/kernel/head_64.S| 3 ++-
 arch/x86/xen/xen-head.S  | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 26620d7642a9..17fe81998ce4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -664,8 +664,10 @@ static __always_inline void prefetchw(const void *x)
 #else
 extern unsigned long __end_init_task[];
 
-#define INIT_THREAD {  \
-   .sp = (unsigned long)&__end_init_task - sizeof(struct pt_regs), \
+#define INIT_THREAD {  \
+   .sp = (unsigned long)&__end_init_task - \
+ TOP_OF_KERNEL_STACK_PADDING - \
+ sizeof(struct pt_regs),   \
 }
 
 extern unsigned long KSTK_ESP(struct task_struct *task);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..c38e43589046 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -66,7 +67,7 @@ SYM_CODE_START_NOALIGN(startup_64)
mov %rsi, %r15
 
/* Set up the stack for verify_cpu() */
-   leaq(__end_init_task - PTREGS_SIZE)(%rip), %rsp
+   leaq(__end_init_task - TOP_OF_KERNEL_STACK_PADDING - 
PTREGS_SIZE)(%rip), %rsp
 
leaq_text(%rip), %rdi
 
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index a0ea285878db..04101b984f24 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -49,7 +49,7 @@ SYM_CODE_START(startup_xen)
ANNOTATE_NOENDBR
cld
 
-   leaq(__end_init_task - PTREGS_SIZE)(%rip), %rsp
+   leaq(__end_init_task - TOP_OF_KERNEL_STACK_PADDING - 
PTREGS_SIZE)(%rip), %rsp
 
/* Set up %gs.
 *

base-commit: e13841907b8fda0ae0ce1ec03684665f578416a8
-- 
2.44.0




Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment

2024-03-04 Thread Jan Beulich
On 04.03.2024 09:50, Roger Pau Monné wrote:
> On Mon, Mar 04, 2024 at 08:54:34AM +0100, Jan Beulich wrote:
>> On 01.03.2024 13:42, Roger Pau Monne wrote:
>>> The current usage of a cmpxchg loop to increase the value of page count is 
>>> not
>>> optimal on amd64, as there's already an instruction to do an atomic add to a
>>> 64bit integer.
>>>
>>> Switch the code in get_page_light() to use an atomic increment, as that 
>>> avoids
>>> a loop construct.  This slightly changes the order of the checks, as current
>>> code will crash before modifying the page count_info if the conditions are 
>>> not
>>> correct, while with the proposed change the crash will happen immediately
>>> after having carried the counter increase.  Since we are crashing anyway, I
>>> don't believe the re-ordering to have any meaningful impact.
>>
>> While I consider this argument fine for ...
>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct 
>>> domain *domain)
>>>   */
>>>  static void get_page_light(struct page_info *page)
>>>  {
>>> -unsigned long x, nx, y = page->count_info;
>>> +unsigned long old_pgc = arch_fetch_and_add(>count_info, 1);
>>>  
>>> -do {
>>> -x  = y;
>>> -nx = x + 1;
>>> -BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
>>
>> ... this check, I'm afraid ...
>>
>>> -BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
>>
>> ... this is a problem unless we discount the possibility of an overflow
>> happening in practice: If an overflow was detected only after the fact,
>> there would be a window in time where privilege escalation was still
>> possible from another CPU. IOW at the very least the description will
>> need extending further. Personally I wouldn't chance it and leave this
>> as a loop.
> 
> So you are worried because this could potentially turn a DoS into an
> information leak during the brief period of time where the page
> counter has overflowed into the PGC state.
> 
> My understating is the BUG_ON() was a mere protection against bad code
> that could mess with the counter, but that the counter overflowing is
> not a real issue during normal operation.

With the present counter width it should be a merely theoretical concern.
I didn't do the older calculation again though taking LA57 into account,
so I'm not sure we're not moving onto thinner and thinner ice as hardware
(and our support for it) advances. As to "mere protection" - see how the
less wide counter was an active issue on 32-bit Xen, back at the time.

Jan



Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment

2024-03-04 Thread Roger Pau Monné
On Mon, Mar 04, 2024 at 08:54:34AM +0100, Jan Beulich wrote:
> On 01.03.2024 13:42, Roger Pau Monne wrote:
> > The current usage of a cmpxchg loop to increase the value of page count is 
> > not
> > optimal on amd64, as there's already an instruction to do an atomic add to a
> > 64bit integer.
> > 
> > Switch the code in get_page_light() to use an atomic increment, as that 
> > avoids
> > a loop construct.  This slightly changes the order of the checks, as current
> > code will crash before modifying the page count_info if the conditions are 
> > not
> > correct, while with the proposed change the crash will happen immediately
> > after having carried the counter increase.  Since we are crashing anyway, I
> > don't believe the re-ordering to have any meaningful impact.
> 
> While I consider this argument fine for ...
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct 
> > domain *domain)
> >   */
> >  static void get_page_light(struct page_info *page)
> >  {
> > -unsigned long x, nx, y = page->count_info;
> > +unsigned long old_pgc = arch_fetch_and_add(>count_info, 1);
> >  
> > -do {
> > -x  = y;
> > -nx = x + 1;
> > -BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
> 
> ... this check, I'm afraid ...
> 
> > -BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
> 
> ... this is a problem unless we discount the possibility of an overflow
> happening in practice: If an overflow was detected only after the fact,
> there would be a window in time where privilege escalation was still
> possible from another CPU. IOW at the very least the description will
> need extending further. Personally I wouldn't chance it and leave this
> as a loop.

So you are worried because this could potentially turn a DoS into an
information leak during the brief period of time where the page
counter has overflowed into the PGC state.

My understating is the BUG_ON() was a mere protection against bad code
that could mess with the counter, but that the counter overflowing is
not a real issue during normal operation.

Thanks, Roger.



Re: [PATCH] x86/cpu-policy: Fix visibility of HTT/CMP_LEGACY in max policies

2024-03-04 Thread Jan Beulich
On 01.03.2024 12:28, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -464,6 +464,16 @@ static void __init 
> guest_common_max_feature_adjustments(uint32_t *fs)
>   raw_cpu_policy.feat.clwb )
>  __set_bit(X86_FEATURE_CLWB, fs);
>  }
> +
> +/*
> + * Topology information inside the guest is entirely at the toolstack's
> + * disgression, and bears no relationship to the host we're running on.
> + *
> + * HTT identifies p->basic.lppp as valid
> + * CMP_LEGACY identifies p->extd.nc as valid
> + */
> +__set_bit(X86_FEATURE_HTT, fs);
> +__set_bit(X86_FEATURE_CMP_LEGACY, fs);
>  }
>  
>  static void __init guest_common_default_feature_adjustments(uint32_t *fs)
> @@ -514,6 +524,18 @@ static void __init 
> guest_common_default_feature_adjustments(uint32_t *fs)
>  __clear_bit(X86_FEATURE_CLWB, fs);
>  }
>  
> +/*
> + * Topology information is at the toolstack's discretion so these are
> + * unconditionally set in max, but pick a default which matches the host.
> + */
> +__clear_bit(X86_FEATURE_HTT, fs);
> +if ( cpu_has_htt )
> +__set_bit(X86_FEATURE_HTT, fs);
> +
> +__clear_bit(X86_FEATURE_CMP_LEGACY, fs);
> +if ( cpu_has_cmp_legacy )
> +__set_bit(X86_FEATURE_CMP_LEGACY, fs);

When running on a host with the respective bit clear, won't this break
(at least older) Linux'es logic? Iirc the unconditional setting of at
least HTT was tied to the unconditional multiplying by 2 of the vCPU ID
to derive the (fake) APIC ID.

Jan



Re: [PATCH] x86/cpu-policy: Hide x2APIC from PV guests

2024-03-04 Thread Jan Beulich
On 29.02.2024 23:14, Andrew Cooper wrote:
> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
> CPUID bit saying that they can.

This argumentation could then be used equally for the APIC bit. Why is it
correct to hide x2APIC, but not APIC?

> Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
> generally see x2APIC except on Zen1-and-older AMD systems.
> 
> Linux works around this by explicitly hiding the bit itself, and filtering
> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
> guests, and entirely ignores the APIC when built as a PV guest.
> 
> Change the annotation from !A to !S.  This has a consequence of stripping it
> out of both PV featuremasks.  However, as existing guests may have seen the
> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
> enough to migrate will have ignored it one way or another.
> 
> Hiding x2APIC does also change the contents of leaf 0xb, but as the
> information is nonsense to begin with, this is likely an improvement on the
> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
> with the host's topology structure, and the APIC_IDs are useless without an
> MADT to start with.  Dom0 is the only PV VM to get an MADT but it's the host
> one, meaning the two sets of APIC_IDs are from different address spaces.

Aiui the field will now be seen as zero on all CPUs. Is all CPUs having the
same APIC ID there really better than them all having different ones?

Also while I agree that logically CPUID leaf 0xb EDX is tied to x2APIC being
available as a feature, nothing is said in this regard in the documentation
of that leaf. This in particular leaves open whether on a system with x2APIC
disabled in/by firmware the value would actually be zero. Yet that case could
be considered somewhat similar to the PV case here.

Jan



Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.6

2024-03-04 Thread Federico Serafini

On 04/03/24 09:04, Jan Beulich wrote:

On 01.03.2024 16:04, Federico Serafini wrote:

--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -322,6 +322,12 @@ Deviations related to MISRA C:2012 Rules:
   - /\* Fallthrough \*/
   - /\* Fallthrough. \*/
  
+   * - R16.6

+ - A switch statement with a single switch clause and no default label is
+   deliberate and improves readability with respect to an equivalent if
+   statement.
+ - Tagged as `deliberate` for ECLAIR.


Imo this is another example of wording a deviation in too wide a manner.
It shouldn't be "is", but "may". Whether what is said here applies is
entirely down to every specific instance; otherwise I'm inclined to read
this as a suggestion to replace all if() by switch(), for that always
improving readability. FTAOD things would be different if this was
explanatory text to a SAF comment - there the specific context is always
given (by where the comment actually appears).


Ok, I'll rephrase a v2.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



Re: Xen 4.19 release status tracking list [ March ]

2024-03-04 Thread Jan Beulich
On 01.03.2024 18:21, Oleksii wrote:
>   * limit passing around of cpu_user_regs [
> https://lore.kernel.org/xen-devel/ebc330a9-eafa-4858-b5cf-5694c4da9...@suse.com/
> ]:
> new patch series version was sent.

This was committed already a while ago.

>   * [PATCH v2 0/5] xen/livepatch: fixes for the pre-apply / post-revert
> hooks [
> https://lore.kernel.org/xen-devel/20240227112528.4540-1-roger@citrix.com/T/#m620c827bd097522d9d87b7f6511bc1599f6a8c84
> ]

This went in last week, I think?

> Please reply with items you would like to see in 4.19 so that people
> know what is happening and prioritize accordingly.
> You're welcome to provide a description and use cases of the feature
> you're working on.

"x86/spec-ctrl: IBPB improvements"

Jan




Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.6

2024-03-04 Thread Jan Beulich
On 01.03.2024 16:04, Federico Serafini wrote:
> --- a/docs/misra/deviations.rst
> +++ b/docs/misra/deviations.rst
> @@ -322,6 +322,12 @@ Deviations related to MISRA C:2012 Rules:
>   - /\* Fallthrough \*/
>   - /\* Fallthrough. \*/
>  
> +   * - R16.6
> + - A switch statement with a single switch clause and no default label is
> +   deliberate and improves readability with respect to an equivalent if
> +   statement.
> + - Tagged as `deliberate` for ECLAIR.

Imo this is another example of wording a deviation in too wide a manner.
It shouldn't be "is", but "may". Whether what is said here applies is
entirely down to every specific instance; otherwise I'm inclined to read
this as a suggestion to replace all if() by switch(), for that always
improving readability. FTAOD things would be different if this was
explanatory text to a SAF comment - there the specific context is always
given (by where the comment actually appears).

Jan



Re: [XEN PATCH 10/10] xen/keyhandler: address violations of MISRA C Rule 20.7

2024-03-04 Thread Jan Beulich
On 02.03.2024 02:37, Stefano Stabellini wrote:
> On Fri, 1 Mar 2024, Jan Beulich wrote:
>> On 29.02.2024 23:57, Stefano Stabellini wrote:
>>> On Thu, 29 Feb 2024, Nicola Vetrini wrote:
 MISRA C Rule 20.7 states: "Expressions resulting from the expansion
 of macro parameters shall be enclosed in parentheses". Therefore, some
 macro definitions should gain additional parentheses to ensure that all
 current and future users will be safe with respect to expansions that
 can possibly alter the semantics of the passed-in macro parameter.

 No functional change.

 Signed-off-by: Nicola Vetrini 
>>>
>>> Reviewed-by: Stefano Stabellini 
>>
>> You did see the discussion on earlier patches, though? I don't think
>> any of the parentheses here are needed or wanted.
> 
> We need to align on this. Currently if we go by what's written in
> docs/misra/deviations.rst, then rhs should have parentheses.

Quoting the actual patch again:

--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -42,10 +42,10 @@ static struct keyhandler {
 } key_table[128] __read_mostly =
 {
 #define KEYHANDLER(k, f, desc, diag)\
-[k] = { { .fn = (f) }, desc, 0, diag }
+[k] = { { .fn = (f) }, (desc), 0, (diag) }
 
 #define IRQ_KEYHANDLER(k, f, desc, diag)\
-[k] = { { .irq_fn = (f) }, desc, 1, diag }
+[k] = { { .irq_fn = (f) }, (desc), 1, (diag) }

What rhs are you talking about in light of this change? The only rhs I
can spot here is already parenthesized.

> Can we safely claim that rhs parentheses are never needed? If so, then
> great, let's add it to deviations.rst and skip them here and other
> places in this patch series (e.g. patch #8). When I say "never" I am
> taking for granted that the caller is not doing something completely
> unacceptably broken such as: 
> 
>  WRITE_SYSREG64(var +, TTBR0_EL1)

I'm afraid I can't associate this with the patch here either. Instead in
the context here a (respective) construct as you mention above would simply
fail to build.

Jan

> If we cannot generically claim that rhs parentheses are never needed,
> then I don't think we should make any exceptions. We should add them here
> and everywhere else. It should be easy to write a macro or review a
> patch with a macro from someone else, and making special exception makes
> it more difficult for everyone.