Re: [PATCH v1 3/3] perf arm64: Support virtual CPU ID for kvm-stat

2022-11-08 Thread Leo Yan
On Mon, Nov 07, 2022 at 03:39:07PM +, Marc Zyngier wrote:

[...]

> > > Please educate me: how useful is it to filter on a vcpu number across
> > > all VMs? What sense does it even make?
> > 
> > Now "perf kvm" tool is not sophisticated since it doesn't capture VMID
> > and virtual CPU ID together.
> 
> VMID is not a relevant indicator anyway, as it can change at any
> point.

Thanks for correction.  IIUC, VMID is not fixed for virtual machine, it
can be re-allocated after overflow.

> But that's only to show that everybody has a different view on
> what they need to collect. At which point, we need to provide an
> infrastructure for data extraction, and not the data itself.

Totally agree.

[...]

> > Option 3: As you suggested, I can bind KVM tracepoints with a eBPF
> > program and the eBPF program records perf events.
> > 
> > When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't
> > have vcpu context in the arguments, this means I need to add new trace
> > events for accessing "vcpu" context.
> 
> I'm not opposed to adding new trace{point,hook}s if you demonstrate
> that they are generic enough or will never need to evolve.
> 
> > 
> > Option 1 and 3 both need to add trace events; option 1 is more
> > straightforward solution and this is why it was choosed in current patch
> > set.
> > 
> > I recognized that I made a mistake, actually we can modify the trace
> > event's definition for kvm_entry / kvm_exit, note we only modify the
> > trace event's arguments, this will change the trace function's
> > definition but it will not break ABI (the format is exactly same for
> > the user space).  Below changes demonstrate what's my proposing:
> > 
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 94d33e296e10..16f6b61abfec 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > 
> > /**
> >  * Enter the guest
> >  */
> > -   trace_kvm_entry(*vcpu_pc(vcpu));
> > +   trace_kvm_entry(vcpu);
> > guest_timing_enter_irqoff();
> >  
> > ret = kvm_arm_vcpu_enter_exit(vcpu);
> > diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> > index 33e4e7dd2719..9df4fd30093c 100644
> > --- a/arch/arm64/kvm/trace_arm.h
> > +++ b/arch/arm64/kvm/trace_arm.h
> > @@ -12,15 +12,15 @@
> >   * Tracepoints for entry/exit to guest
> >   */
> >  TRACE_EVENT(kvm_entry,
> > -   TP_PROTO(unsigned long vcpu_pc),
> > -   TP_ARGS(vcpu_pc),
> > +   TP_PROTO(struct kvm_vcpu *vcpu),
> > +   TP_ARGS(vcpu),
> >  
> > TP_STRUCT__entry(
> > __field(unsigned long,  vcpu_pc )
> > ),
> >  
> > TP_fast_assign(
> > -   __entry->vcpu_pc= vcpu_pc;
> > +   __entry->vcpu_pc= *vcpu_pc(vcpu);
> > ),
> >  
> > TP_printk("PC: 0x%016lx", __entry->vcpu_pc)
> > 
> > Please let me know your opinion, if you don't object, I can move
> > forward with this approach.
> 
> I have no issue with this if this doesn't change anything else.

Thanks for confirmation.

> And if you can make use of this with a BPF program and get to the same
> result as your initial patch, then please submit it for inclusion in
> the kernel as an example. We can then point people to it next time
> this crop up (probably before Xmas).

Will do.

Just head up, if everything is going well, I will place the eBPF
program in the folder tools/perf/examples/bpf/, this can be easy for
integration and release with perf tool.

Thanks,
Leo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 3/3] perf arm64: Support virtual CPU ID for kvm-stat

2022-11-07 Thread Leo Yan
On Sat, Nov 05, 2022 at 01:28:40PM +, Marc Zyngier wrote:

[...]

> > Before:
> > 
> >   # perf kvm stat report --vcpu 27
> > 
> >   Analyze events for all VMs, VCPU 27:
> > 
> >VM-EXITSamples  Samples% Time%Min TimeMax 
> > Time Avg time
> > 
> >   Total Samples:0, Total events handled time:0.00us.
> >
> > After:
> > 
> >   # perf kvm stat report --vcpu 27
> > 
> >   Analyze events for all VMs, VCPU 27:
> > 
> >VM-EXITSamples  Samples% Time%Min TimeMax 
> > Time Avg time
> > 
> >  SYS6480898.54%91.24%  0.00us
> > 303.76us  3.46us ( +-  13.54% )
> >WFx 10 1.22% 7.79%  0.00us 
> > 69.48us 23.91us ( +-  25.91% )
> >IRQ  2 0.24% 0.97%  0.00us 
> > 22.64us 14.82us ( +-  52.77% )
> > 
> >   Total Samples:820, Total events handled time:3068.28us.
> 
> Please educate me: how useful is it to filter on a vcpu number across
> all VMs? What sense does it even make?

Now "perf kvm" tool is not sophisticated since it doesn't capture VMID
and virtual CPU ID together.

I think a case is we can spin a program on a specific virtual CPU with
taskset in VM, in this way we can check if any bottleneck is caused by
VM entry/exit, but I have to say that it's inaccurate if we only filter
on VCPU ID, we should consider tracing VMID and VCPU ID together in
later's enhancement.

> Conversely, what would be the purpose of filtering on a 5th thread of
> any process irrespective of what the process does? To me, this is the
> same level of non-sense.

I agree.

> AFAICT, this is just piling more arbitrary data extraction for no
> particular reason other than "just because we can", and there is
> absolutely no guarantee that this is fit for anyone else's purpose.
> 
> I'd rather you have a generic tracepoint taking the vcpu as a context
> and a BPF program that spits out the information people actually need,
> keeping things out of the kernel. Or even a tracehook (like the
> scheduler does), and let people load a module to dump whatever
> information they please.

Actually I considered three options:

Option 1: Simply add new version's trace events for recording more info.
This is not flexible and we even have risk to add more version's trace
event if later we might find that more data should traced.

This approach is straightforward and the implementation would be
simple.  This is main reason why finally I choosed to add new trace
events.

Option 2: use Kprobe to dynamically insert tracepoints; but this means
the user must have the corresponding vmlinux file, otherwise, perf
tool might inject tracepoint at an incorrect address.  This is the
main reason I didn't use Kprobe to add dynamic tracepoints.

Option 3: As you suggested, I can bind KVM tracepoints with a eBPF
program and the eBPF program records perf events.

When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't
have vcpu context in the arguments, this means I need to add new trace
events for accessing "vcpu" context.

Option 1 and 3 both need to add trace events; option 1 is more
straightforward solution and this is why it was choosed in current patch
set.

I recognized that I made a mistake, actually we can modify the trace
event's definition for kvm_entry / kvm_exit, note we only modify the
trace event's arguments, this will change the trace function's
definition but it will not break ABI (the format is exactly same for
the user space).  Below changes demonstrate what's my proposing:

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..16f6b61abfec 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
/**
 * Enter the guest
 */
-   trace_kvm_entry(*vcpu_pc(vcpu));
+   trace_kvm_entry(vcpu);
guest_timing_enter_irqoff();
 
ret = kvm_arm_vcpu_enter_exit(vcpu);
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..9df4fd30093c 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -12,15 +12,15 @@
  * Tracepoints for entry/exit to guest
  */
 TRACE_EVENT(kvm_entry,
-   TP_PROTO(unsigned long vcpu_pc),
-   TP_ARGS(vcpu_pc),
+   TP_PROTO(struct kvm_vcpu *vcpu),
+   TP_ARGS(vcpu),
 
TP_STRUCT__entry(
__field(unsigned long,  vcpu_pc )
),
 
TP_fast_assign(
-   __entry->vcpu_pc= vcpu_pc;
+   __entry->vcpu_pc= *vcpu_pc(vcpu);
),
 
TP_printk("PC: 0x%016lx", __entry->vcpu_pc)

Please let me know your opinion, if you don't object, I can move
forward with this approach.

> But randomly adding new 

[PATCH v1 3/3] perf arm64: Support virtual CPU ID for kvm-stat

2022-11-05 Thread Leo Yan
Since the two trace events kvm_entry_v2/kvm_exit_v2 are added, we can
use the field "vcpu_id" in the events to get to know the virtual CPU ID.
To keep backward compatibility, we still need to rely on the trace
events kvm_entry/kvm_exit for old kernels.

This patch adds Arm64's functions setup_kvm_events_tp() and
arm64__setup_kvm_tp(), by detecting the nodes under sysfs folder, it can
dynamically register trace events kvm_entry_v2/kvm_exit_v2 when the
kernel has provided them, otherwise, it rolls back to use events
kvm_entry/kvm_exit for backward compatibility.

Let cpu_isa_init() to invoke arm64__setup_kvm_tp(), this can allow the
command "perf kvm stat report" also to dynamically setup trace events.

Before:

  # perf kvm stat report --vcpu 27

  Analyze events for all VMs, VCPU 27:

   VM-EXITSamples  Samples% Time%Min TimeMax Time   
  Avg time

  Total Samples:0, Total events handled time:0.00us.

After:

  # perf kvm stat report --vcpu 27

  Analyze events for all VMs, VCPU 27:

   VM-EXITSamples  Samples% Time%Min TimeMax Time   
  Avg time

 SYS6480898.54%91.24%  0.00us303.76us   
   3.46us ( +-  13.54% )
   WFx 10 1.22% 7.79%  0.00us 69.48us   
  23.91us ( +-  25.91% )
   IRQ  2 0.24% 0.97%  0.00us 22.64us   
  14.82us ( +-  52.77% )

  Total Samples:820, Total events handled time:3068.28us.

Signed-off-by: Leo Yan 
---
 tools/perf/arch/arm64/util/kvm-stat.c | 54 ---
 1 file changed, 49 insertions(+), 5 deletions(-)

diff --git a/tools/perf/arch/arm64/util/kvm-stat.c 
b/tools/perf/arch/arm64/util/kvm-stat.c
index 73d18e0ed6f6..1ba54ce3d7d8 100644
--- a/tools/perf/arch/arm64/util/kvm-stat.c
+++ b/tools/perf/arch/arm64/util/kvm-stat.c
@@ -3,6 +3,7 @@
 #include 
 #include "../../../util/evsel.h"
 #include "../../../util/kvm-stat.h"
+#include "../../../util/tracepoint.h"
 #include "arm64_exception_types.h"
 #include "debug.h"
 
@@ -10,18 +11,28 @@ define_exit_reasons_table(arm64_exit_reasons, 
kvm_arm_exception_type);
 define_exit_reasons_table(arm64_trap_exit_reasons, kvm_arm_exception_class);
 
 const char *kvm_trap_exit_reason = "esr_ec";
-const char *vcpu_id_str = "id";
+const char *vcpu_id_str = "vcpu_id";
 const int decode_str_len = 20;
 const char *kvm_exit_reason = "ret";
-const char *kvm_entry_trace = "kvm:kvm_entry";
-const char *kvm_exit_trace = "kvm:kvm_exit";
+const char *kvm_entry_trace;
+const char *kvm_exit_trace;
 
-const char *kvm_events_tp[] = {
+#define NR_TPS 2
+
+static const char *kvm_events_tp_v1[NR_TPS + 1] = {
"kvm:kvm_entry",
"kvm:kvm_exit",
NULL,
 };
 
+static const char *kvm_events_tp_v2[NR_TPS + 1] = {
+   "kvm:kvm_entry_v2",
+   "kvm:kvm_exit_v2",
+   NULL,
+};
+
+const char *kvm_events_tp[NR_TPS + 1];
+
 static void event_get_key(struct evsel *evsel,
  struct perf_sample *sample,
  struct event_key *key)
@@ -78,8 +89,41 @@ const char * const kvm_skip_events[] = {
NULL,
 };
 
-int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused)
+static int arm64__setup_kvm_tp(struct perf_kvm_stat *kvm)
 {
+   const char **kvm_events, **events_ptr;
+   int i, nr_tp = 0;
+
+   if (is_valid_tracepoint("kvm:kvm_entry_v2")) {
+   kvm_events = kvm_events_tp_v2;
+   kvm_entry_trace = "kvm:kvm_entry_v2";
+   kvm_exit_trace = "kvm:kvm_exit_v2";
+   } else {
+   kvm_events = kvm_events_tp_v1;
+   kvm_entry_trace = "kvm:kvm_entry";
+   kvm_exit_trace = "kvm:kvm_exit";
+   }
+
+   for (events_ptr = kvm_events; *events_ptr; events_ptr++) {
+   if (!is_valid_tracepoint(*events_ptr))
+   return -1;
+   nr_tp++;
+   }
+
+   for (i = 0; i < nr_tp; i++)
+   kvm_events_tp[i] = kvm_events[i];
+   kvm_events_tp[i] = NULL;
+
kvm->exit_reasons_isa = "arm64";
return 0;
 }
+
+int setup_kvm_events_tp(struct perf_kvm_stat *kvm)
+{
+   return arm64__setup_kvm_tp(kvm);
+}
+
+int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused)
+{
+   return arm64__setup_kvm_tp(kvm);
+}
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 2/3] KVM: arm64: Add trace events with field 'vcpu_id'

2022-11-05 Thread Leo Yan
The existed trace events kvm_entry and kvm_exit don't contain the info
for virtual CPU id, thus the perf tool has no chance to do statistics
based on virtual CPU wise; and the trace events are ABI and we cannot
change it to avoid ABI breakage.

For above reasons, this patch adds two trace events kvm_entry_v2 and
kvm_exit_v2 with a new field 'vcpu_id'.  To support both the old and
new events, we use the tracepoint callback to check if any event is
enabled or not, if it's enabled then the callback will invoke the
corresponding trace event.

Signed-off-by: Leo Yan 
---
 arch/arm64/kvm/trace.c |  6 +
 arch/arm64/kvm/trace_arm.h | 45 ++
 2 files changed, 51 insertions(+)

diff --git a/arch/arm64/kvm/trace.c b/arch/arm64/kvm/trace.c
index d25a3db994e2..d9b2587c77c3 100644
--- a/arch/arm64/kvm/trace.c
+++ b/arch/arm64/kvm/trace.c
@@ -10,6 +10,9 @@ static void kvm_entry_tp(void *data, struct kvm_vcpu *vcpu)
 {
if (trace_kvm_entry_enabled())
trace_kvm_entry(*vcpu_pc(vcpu));
+
+   if (trace_kvm_entry_v2_enabled())
+   trace_kvm_entry_v2(vcpu);
 }
 
 static void kvm_exit_tp(void *data, int ret, struct kvm_vcpu *vcpu)
@@ -17,6 +20,9 @@ static void kvm_exit_tp(void *data, int ret, struct kvm_vcpu 
*vcpu)
if (trace_kvm_exit_enabled())
trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu),
   *vcpu_pc(vcpu));
+
+   if (trace_kvm_exit_v2_enabled())
+   trace_kvm_exit_v2(ret, vcpu);
 }
 
 static int __init kvm_tp_init(void)
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index ef02ae93b28b..932c9d0c36f3 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
@@ -30,6 +31,23 @@ TRACE_EVENT(kvm_entry,
TP_printk("PC: 0x%016lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_entry_v2,
+   TP_PROTO(struct kvm_vcpu *vcpu),
+   TP_ARGS(vcpu),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   vcpu_id )
+   __field(unsigned long,  vcpu_pc )
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id= vcpu->vcpu_id;
+   __entry->vcpu_pc= *vcpu_pc(vcpu);
+   ),
+
+   TP_printk("vcpu: %u PC: 0x%016lx", __entry->vcpu_id, __entry->vcpu_pc)
+);
+
 DECLARE_TRACE(kvm_exit_tp,
TP_PROTO(int ret, struct kvm_vcpu *vcpu),
TP_ARGS(ret, vcpu));
@@ -57,6 +75,33 @@ TRACE_EVENT(kvm_exit,
  __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_exit_v2,
+   TP_PROTO(int ret, struct kvm_vcpu *vcpu),
+   TP_ARGS(ret, vcpu),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   vcpu_id )
+   __field(int,ret )
+   __field(unsigned int,   esr_ec  )
+   __field(unsigned long,  vcpu_pc )
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id= vcpu->vcpu_id;
+   __entry->ret= ARM_EXCEPTION_CODE(ret);
+   __entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ?
+ 
kvm_vcpu_trap_get_class(vcpu): 0;
+   __entry->vcpu_pc= *vcpu_pc(vcpu);
+   ),
+
+   TP_printk("%s: vcpu: %u HSR_EC: 0x%04x (%s), PC: 0x%016lx",
+ __print_symbolic(__entry->ret, kvm_arm_exception_type),
+ __entry->vcpu_id,
+ __entry->esr_ec,
+ __print_symbolic(__entry->esr_ec, kvm_arm_exception_class),
+ __entry->vcpu_pc)
+);
+
 TRACE_EVENT(kvm_guest_fault,
TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
 unsigned long hxfar,
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 1/3] KVM: arm64: Dynamically register callback for tracepoints

2022-11-05 Thread Leo Yan
This commit doesn't change any functionality but only refactoring.

It registers callbacks for tracepoints dynamically, with this way, the
existed trace events (in this case kvm_entry and kvm_exit) are kept.
This is a preparation to add new trace events in later patch.

Signed-off-by: Leo Yan 
---
 arch/arm64/kvm/Makefile|  2 +-
 arch/arm64/kvm/arm.c   |  4 ++--
 arch/arm64/kvm/trace.c | 29 +
 arch/arm64/kvm/trace_arm.h |  8 
 4 files changed, 40 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/kvm/trace.c

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d4645a..4e641d2df7ad 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 inject_fault.o va_layout.o handle_exit.o \
 guest.o debug.o reset.o sys_regs.o stacktrace.o \
 vgic-sys-reg-v3.o fpsimd.o pkvm.o \
-arch_timer.o trng.o vmid.o \
+arch_timer.o trng.o vmid.o trace.o \
 vgic/vgic.o vgic/vgic-init.o \
 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..03ed5f6c92bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
/**
 * Enter the guest
 */
-   trace_kvm_entry(*vcpu_pc(vcpu));
+   trace_kvm_entry_tp(vcpu);
guest_timing_enter_irqoff();
 
ret = kvm_arm_vcpu_enter_exit(vcpu);
@@ -974,7 +974,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
local_irq_enable();
 
-   trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), 
*vcpu_pc(vcpu));
+   trace_kvm_exit_tp(ret, vcpu);
 
/* Exit types that need handling before we can be preempted */
handle_exit_early(vcpu, ret);
diff --git a/arch/arm64/kvm/trace.c b/arch/arm64/kvm/trace.c
new file mode 100644
index ..d25a3db994e2
--- /dev/null
+++ b/arch/arm64/kvm/trace.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+
+#include 
+
+#include "trace_arm.h"
+
+static void kvm_entry_tp(void *data, struct kvm_vcpu *vcpu)
+{
+   if (trace_kvm_entry_enabled())
+   trace_kvm_entry(*vcpu_pc(vcpu));
+}
+
+static void kvm_exit_tp(void *data, int ret, struct kvm_vcpu *vcpu)
+{
+   if (trace_kvm_exit_enabled())
+   trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu),
+  *vcpu_pc(vcpu));
+}
+
+static int __init kvm_tp_init(void)
+{
+   register_trace_kvm_entry_tp(kvm_entry_tp, NULL);
+   register_trace_kvm_exit_tp(kvm_exit_tp, NULL);
+   return 0;
+}
+
+core_initcall(kvm_tp_init)
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..ef02ae93b28b 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -11,6 +11,10 @@
 /*
  * Tracepoints for entry/exit to guest
  */
+DECLARE_TRACE(kvm_entry_tp,
+   TP_PROTO(struct kvm_vcpu *vcpu),
+   TP_ARGS(vcpu));
+
 TRACE_EVENT(kvm_entry,
TP_PROTO(unsigned long vcpu_pc),
TP_ARGS(vcpu_pc),
@@ -26,6 +30,10 @@ TRACE_EVENT(kvm_entry,
TP_printk("PC: 0x%016lx", __entry->vcpu_pc)
 );
 
+DECLARE_TRACE(kvm_exit_tp,
+   TP_PROTO(int ret, struct kvm_vcpu *vcpu),
+   TP_ARGS(ret, vcpu));
+
 TRACE_EVENT(kvm_exit,
TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc),
TP_ARGS(ret, esr_ec, vcpu_pc),
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 0/3] KVM: arm64: Support tracing virtual CPU ID

2022-11-05 Thread Leo Yan
Before there have some efforts and discussion for supprot tracing
virtual CPU ID in Arm64 KVM, see [1][2].

The issue was blocked with a main concern that we cannot change the
existed trace events to avoid ABI breakage.  So the question is how
we add new trace events with tracing virtual CPU ID and also need to
keep backward compatibility.

This patch set is to restart the work, it's inspired by Qais Yousef's
work for adding scheduler tracepoints [3].

The first patch changes to register tracepoint callbacks, this can allow
us to support multiple trace events with a single call site, it's a
preparation to add new trace events.

The second patch is to add two new trace events kvm_entry_v2 and
kvm_exit_v2, and these two trace events contain the field "vcpu_id" for
virtual CPU ID.

For more complete view, the third patch is the change in perf tool.
It dynamically detects trace nodes under sysfs and decide to use the
version 2's trace events or rollback to use original events.

This patch set has been tested with mainline kernel on Arm64 Ampere
Altra platform.

Note: I used checkpatch.pl to validate patches format and observed it
reports error for second patch for adding trace events; since the trace
event definition uses its own coding style, I just keep as it is.

[1] 
https://lore.kernel.org/lkml/1560330526-15468-2-git-send-email-yuzeng...@huawei.com/
[2] 
https://lore.kernel.org/lkml/20200917003645.689665-1-sergey.senozhat...@gmail.com/
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/sched/core.c?id=a056a5bed7fa67706574b00cf1122c38596b2be1


Leo Yan (3):
  KVM: arm64: Dynamically register callback for tracepoints
  KVM: arm64: Add trace events with field 'vcpu_id'
  perf arm64: Support virtual CPU ID for kvm-stat

 arch/arm64/kvm/Makefile   |  2 +-
 arch/arm64/kvm/arm.c  |  4 +-
 arch/arm64/kvm/trace.c| 35 +
 arch/arm64/kvm/trace_arm.h| 53 ++
 tools/perf/arch/arm64/util/kvm-stat.c | 54 ---
 5 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 arch/arm64/kvm/trace.c

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 0/3] vfio-pci: Support INTx mode re-enabling

2019-04-24 Thread Leo Yan
Hi Will,

On Mon, Apr 08, 2019 at 09:27:16AM +0800, Leo Yan wrote:
> When enable vfio-pci mode for NIC driver on Juno board, the IRQ is
> failed to forward properly from host to guest, finally root caused this
> issue is related with kvmtool cannot re-enable INTx mode properly.
> 
> So the basic working flow to reproduce this issue is as below:
> 
> Host Guest
> -  
>   INTx mode
>  MSI enable failed in NIC driver
>  MSI disable in NIC driver
>  Switch back to INTx mode --> kvmtool doesn't support
> 
> So this patch is to support INTx mode re-enabling; patch 0001 is one
> minor fixing up for eventfd releasing; patch 0002 introduces a new
> function vfio_pci_init_intx() which is used to finish INTx one-time
> initialisation; patch 0003 is the core patch for support INTx mode
> re-enabling, when kvmtool detects MSI is disabled it rollbacks to INTx
> mode.
> 
> This patch set has been tested on Juno-r2 board.
> 
> == Changes for V4 ==
> * Removed the unnecessary comments in patch 0003 (Jean-Philippe).
> * Added Jean-Philippe's review tags.

Could you pick up these 3 patches and merge into kvmtool?  All of them
has been received reviewing tag from Jean-Philippe.  Thanks.

> == Changes for V3 ==
> * Add new function vfio_pci_init_intx() for one-time initialisation.
> * Simplized INTx re-enabling (don't change irq_line anymore at the
>   runtime).
> 
> 
> Leo Yan (3):
>   vfio-pci: Release INTx's unmask eventfd properly
>   vfio-pci: Add new function for INTx one-time initialisation
>   vfio-pci: Re-enable INTx mode when disable MSI/MSIX
> 
>  include/kvm/vfio.h |  1 +
>  vfio/pci.c | 95 ++
>  2 files changed, 64 insertions(+), 32 deletions(-)
> 
> -- 
> 2.19.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] vfio-pci: Fix MSI IRQ forwarding for without per-vector masking

2019-04-07 Thread Leo Yan
Hi Jean-Philippe,

On Fri, Mar 22, 2019 at 01:29:24PM +0800, Leo Yan wrote:
> On Fri, Mar 22, 2019 at 01:23:08PM +0800, Leo Yan wrote:
> > If MSI doesn't support per-vector masking capability and
> > PCI_MSI_FLAGS_MASKBIT isn't set in message control field, the function
> > vfio_pci_msi_vector_write() will directly bail out for this case and
> > every vector's 'virt_state' keeps setting bit VFIO_PCI_MSI_STATE_MASKED.
> > 
> > This results in the state maintained in 'virt_state' cannot really
> > reflect the MSI hardware state; finally it will mislead the function
> > vfio_pci_update_msi_entry() to skip IRQ forwarding with below flow:
> > 
> > vfio_pci_update_msi_entry() {
> > 
> >   [...]
> > 
> >   if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
> >   return 0;  ==> skip IRQ forwarding
> > 
> >   [...]
> > }
> > 
> > To fix this issue, when detect PCI_MSI_FLAGS_MASKBIT is not set in the
> > message control field, this patch simply clears bit
> > VFIO_PCI_MSI_STATE_MASKED for all vectors 'virt_state'; at the end
> > vfio_pci_update_msi_entry() can forward MSI IRQ successfully.
> 
> Just remind, this patch is for kvmtool but not for kernel.  Sorry I
> forget to add it in subject.

Gentle ping.  Not sure if this patch is reasonable, could you help to
give a review?

[...]

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 0/3] vfio-pci: Support INTx mode re-enabling

2019-04-07 Thread Leo Yan
When enable vfio-pci mode for NIC driver on Juno board, the IRQ is
failed to forward properly from host to guest, finally root caused this
issue is related with kvmtool cannot re-enable INTx mode properly.

So the basic working flow to reproduce this issue is as below:

Host Guest
-  
  INTx mode
 MSI enable failed in NIC driver
 MSI disable in NIC driver
 Switch back to INTx mode --> kvmtool doesn't support

So this patch is to support INTx mode re-enabling; patch 0001 is one
minor fixing up for eventfd releasing; patch 0002 introduces a new
function vfio_pci_init_intx() which is used to finish INTx one-time
initialisation; patch 0003 is the core patch for support INTx mode
re-enabling, when kvmtool detects MSI is disabled it rollbacks to INTx
mode.

This patch set has been tested on Juno-r2 board.

== Changes for V4 ==
* Removed the unnecessary comments in patch 0003 (Jean-Philippe).
* Added Jean-Philippe's review tags.

== Changes for V3 ==
* Add new function vfio_pci_init_intx() for one-time initialisation.
* Simplized INTx re-enabling (don't change irq_line anymore at the
  runtime).


Leo Yan (3):
  vfio-pci: Release INTx's unmask eventfd properly
  vfio-pci: Add new function for INTx one-time initialisation
  vfio-pci: Re-enable INTx mode when disable MSI/MSIX

 include/kvm/vfio.h |  1 +
 vfio/pci.c | 95 ++
 2 files changed, 64 insertions(+), 32 deletions(-)

-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/3] vfio-pci: Add new function for INTx one-time initialisation

2019-04-07 Thread Leo Yan
To support INTx enabling for multiple times, we need firstly to extract
one-time initialisation and move the related code into a new function
vfio_pci_init_intx(); if later disable and re-enable the INTx, we can
skip these one-time operations.

This patch move below three main operations for INTx one-time
initialisation from function vfio_pci_enable_intx() into function
vfio_pci_init_intx():

- Reserve 2 FDs for INTx;
- Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO;
- Setup pdev->intx_gsi.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Leo Yan 
Reviewed-by: Jean-Philippe Brucker 
---
 vfio/pci.c | 67 --
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 5224fee..3c39844 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1018,30 +1018,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
struct vfio_irq_eventfd trigger;
struct vfio_irq_eventfd unmask;
struct vfio_pci_device *pdev = >pci;
-   int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
-
-   struct vfio_irq_info irq_info = {
-   .argsz = sizeof(irq_info),
-   .index = VFIO_PCI_INTX_IRQ_INDEX,
-   };
-
-   vfio_pci_reserve_irq_fds(2);
-
-   ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
-   if (ret || irq_info.count == 0) {
-   vfio_dev_err(vdev, "no INTx reported by VFIO");
-   return -ENODEV;
-   }
-
-   if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
-   vfio_dev_err(vdev, "interrupt not eventfd capable");
-   return -EINVAL;
-   }
-
-   if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
-   vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
-   return -EINVAL;
-   }
+   int gsi = pdev->intx_gsi;
 
/*
 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
@@ -1097,8 +1074,6 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
 
pdev->intx_fd = trigger_fd;
pdev->unmask_fd = unmask_fd;
-   /* Guest is going to ovewrite our irq_line... */
-   pdev->intx_gsi = gsi;
 
return 0;
 
@@ -1117,6 +1092,39 @@ err_close:
return ret;
 }
 
+static int vfio_pci_init_intx(struct kvm *kvm, struct vfio_device *vdev)
+{
+   int ret;
+   struct vfio_pci_device *pdev = >pci;
+   struct vfio_irq_info irq_info = {
+   .argsz = sizeof(irq_info),
+   .index = VFIO_PCI_INTX_IRQ_INDEX,
+   };
+
+   vfio_pci_reserve_irq_fds(2);
+
+   ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+   if (ret || irq_info.count == 0) {
+   vfio_dev_err(vdev, "no INTx reported by VFIO");
+   return -ENODEV;
+   }
+
+   if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
+   vfio_dev_err(vdev, "interrupt not eventfd capable");
+   return -EINVAL;
+   }
+
+   if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
+   vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
+   return -EINVAL;
+   }
+
+   /* Guest is going to ovewrite our irq_line... */
+   pdev->intx_gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
+
+   return 0;
+}
+
 static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device 
*vdev)
 {
int ret = 0;
@@ -1142,8 +1150,13 @@ static int vfio_pci_configure_dev_irqs(struct kvm *kvm, 
struct vfio_device *vdev
return ret;
}
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
+   ret = vfio_pci_init_intx(kvm, vdev);
+   if (ret)
+   return ret;
+
ret = vfio_pci_enable_intx(kvm, vdev);
+   }
 
return ret;
 }
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 1/3] vfio-pci: Release INTx's unmask eventfd properly

2019-04-07 Thread Leo Yan
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion
of the line from guest to host; but this eventfd isn't released properly
when disable INTx.

This patch firstly adds field 'unmask_fd' in struct vfio_pci_device for
storing unmask eventfd and close it when disable INTx.

Signed-off-by: Leo Yan 
Reviewed-by: Jean-Philippe Brucker 
---
 include/kvm/vfio.h | 1 +
 vfio/pci.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
index 60e6c54..28223cf 100644
--- a/include/kvm/vfio.h
+++ b/include/kvm/vfio.h
@@ -74,6 +74,7 @@ struct vfio_pci_device {
 
unsigned long   irq_modes;
int intx_fd;
+   int unmask_fd;
unsigned intintx_gsi;
struct vfio_pci_msi_common  msi;
struct vfio_pci_msi_common  msix;
diff --git a/vfio/pci.c b/vfio/pci.c
index 03de3c1..5224fee 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1008,6 +1008,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
irq__del_irqfd(kvm, gsi, pdev->intx_fd);
 
close(pdev->intx_fd);
+   close(pdev->unmask_fd);
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1095,6 +1096,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
}
 
pdev->intx_fd = trigger_fd;
+   pdev->unmask_fd = unmask_fd;
/* Guest is going to ovewrite our irq_line... */
pdev->intx_gsi = gsi;
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-04-07 Thread Leo Yan
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode.  In this case, the INTx
mode has been disabled and has no chance to re-enable it, thus both INTx
mode and MSI/MSIX mode cannot work in vfio.

Below shows the detailed flow for introducing this issue:

  vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()

  vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()

  vfio_pci_disable_msis()   => Guest PCI driver disables MSI

To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
re-enable it so that the device can fallback to use it.

Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
may be called for multiple times, this patch uses 'intx_fd == -1' to
denote the INTx is disabled, the pair functions can directly bail out
when detect INTx has been disabled and enabled respectively.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Leo Yan 
Reviewed-by: Jean-Philippe Brucker 
---
 vfio/pci.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 3c39844..f17498e 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
 
 static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device *vdev);
+static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
 
 static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
bool msix)
@@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
if (!msi_is_enabled(msis->virt_state))
return 0;
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
/*
 * PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
 * time. Since INTx has to be enabled from the start (we don't
-* have a reliable way to know when the user starts using it),
+* have a reliable way to know when the guest starts using it),
 * disable it now.
 */
vfio_pci_disable_intx(kvm, vdev);
-   /* Permanently disable INTx */
-   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
-   }
 
eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
 
@@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
msi_set_enabled(msis->phys_state, false);
msi_set_empty(msis->phys_state, true);
 
-   return 0;
+   /*
+* When MSI or MSIX is disabled, this might be called when
+* PCI driver detects the MSI interrupt failure and wants to
+* rollback to INTx mode.  Thus enable INTx if the device
+* supports INTx mode in this case.
+*/
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
+   ret = vfio_pci_enable_intx(kvm, vdev);
+
+   return ret >= 0 ? 0 : ret;
 }
 
 static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device *vdev,
@@ -1002,6 +1009,9 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
.index  = VFIO_PCI_INTX_IRQ_INDEX,
};
 
+   if (pdev->intx_fd == -1)
+   return;
+
pr_debug("user requested MSI, disabling INTx %d", gsi);
 
ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _set);
@@ -1009,6 +1019,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
 
close(pdev->intx_fd);
close(pdev->unmask_fd);
+   pdev->intx_fd = -1;
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1020,6 +1031,9 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
struct vfio_pci_device *pdev = >pci;
int gsi = pdev->intx_gsi;
 
+   if (pdev->intx_fd != -1)
+   return 0;
+
/*
 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
 * signals an interrupt from host to guest, and unmask_fd signals the
@@ -1122,6 +1136,8 @@ static int vfio_pci_init_intx(struct kvm *kvm, struct 
vfio_device *vdev)
/* Guest is going to ovewrite our irq_line... */
pdev->intx_gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
 
+   pdev->intx_fd = -1;
+
return 0;
 }
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-04-04 Thread Leo Yan
Hi Jean-Philippe,

On Thu, Apr 04, 2019 at 12:06:51PM +0100, Jean-Philippe Brucker wrote:
> On 26/03/2019 07:41, Leo Yan wrote:
> > Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
> > default to disable INTx mode when enable MSI/MSIX mode; but this logic is
> > easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
> > expected and tries to rollback to use INTx mode.  In this case, the INTx
> > mode has been disabled and has no chance to re-enable it, thus both INTx
> > mode and MSI/MSIX mode cannot work in vfio.
> > 
> > Below shows the detailed flow for introducing this issue:
> > 
> >   vfio_pci_configure_dev_irqs()
> > `-> vfio_pci_enable_intx()
> > 
> >   vfio_pci_enable_msis()
> > `-> vfio_pci_disable_intx()
> > 
> >   vfio_pci_disable_msis()   => Guest PCI driver disables MSI
> > 
> > To fix this issue, when disable MSI/MSIX we need to check if INTx mode
> > is available for this device or not; if the device can support INTx then
> > re-enable it so that the device can fallback to use it.
> > 
> > Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
> > may be called for multiple times, this patch uses 'intx_fd == -1' to
> > denote the INTx is disabled, the pair functions can directly bail out
> > when detect INTx has been disabled and enabled respectively.
> > 
> > Suggested-by: Jean-Philippe Brucker 
> > Signed-off-by: Leo Yan 
> > ---
> >  vfio/pci.c | 41 ++---
> >  1 file changed, 30 insertions(+), 11 deletions(-)
> > 
> > diff --git a/vfio/pci.c b/vfio/pci.c
> > index 3c39844..3b2b1e7 100644
> > --- a/vfio/pci.c
> > +++ b/vfio/pci.c
> > @@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
> > msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
> >  
> >  static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device 
> > *vdev);
> > +static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
> >  
> >  static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
> > bool msix)
> > @@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
> > vfio_device *vdev,
> > if (!msi_is_enabled(msis->virt_state))
> > return 0;
> >  
> > -   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> > -   /*
> > -* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> > -* time. Since INTx has to be enabled from the start (we don't
> > -* have a reliable way to know when the user starts using it),
> > -* disable it now.
> > -*/
> > +   /*
> > +* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> > +* time. Since INTx has to be enabled from the start (after enabling
> > +* 'pdev->intx_fd' will be assigned to an eventfd and doesn't equal
> > +* to the init value -1), disable it now.
> > +*/
> 
> I don't think the comment change is useful, we don't need that much
> detail. The text that you replaced was trying to explain why we enable
> INTx from the start, and would still apply (although it should have been
> s/user/guest/)
> 
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > vfio_pci_disable_intx(kvm, vdev);
> > -   /* Permanently disable INTx */
> > -   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
> > -   }
> >  
> > eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
> >  
> > @@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, 
> > struct vfio_device *vdev,
> > msi_set_enabled(msis->phys_state, false);
> > msi_set_empty(msis->phys_state, true);
> >  
> > -   return 0;
> > +   /*
> > +* When MSI or MSIX is disabled, this might be called when
> > +* PCI driver detects the MSI interrupt failure and wants to
> > +* rollback to INTx mode.  Thus enable INTx if the device
> > +* supports INTx mode in this case.
> > +*/
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > +   ret = vfio_pci_enable_intx(kvm, vdev);
> > +
> > +   return ret >= 0 ? 0 : ret;
> >  }
> >  
> >  static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device 
> > *vdev,
> > @@ -1002,6 +1009,10 @@ static void vfio_pci_disable_intx(struct kvm *kvm, 
> > struct vfio_device *vdev)
> > .index  = VFIO_PCI_INTX_IRQ_INDEX,
> > };
> >  
> > +   /* INTx mode has been disabled */
> 
> Here as well, the comments on intx_fd seem unnecessary. But these are
> only nits, the code is fine and I tested for regressions on my hardware, so:
> 
> Reviewed-by: Jean-Philippe Brucker 

Thanks for reviewing.  I will spin a new patch set to drop unnecessary
comment to let changes as simple as possible.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3 0/3] vfio-pci: Support INTx mode re-enabling

2019-04-03 Thread Leo Yan
Hi Will,

On Wed, Apr 03, 2019 at 01:43:31PM +0100, Will Deacon wrote:
> On Wed, Apr 03, 2019 at 11:58:17AM +0800, Leo Yan wrote:
> > On Tue, Apr 02, 2019 at 05:50:25PM +0100, Will Deacon wrote:
> > > Just checking, but is this series intended to supersede this one:
> > > 
> > > https://www.spinics.net/lists/kvm/msg183750.html
> > > 
> > > ?
> > 
> > Yes, this patch series is v3 and the link you pointed out is the old
> > patch series v1.  Sorry for confusion.
> 
> No, that's fine and what I was assuming. Just wanted to confirm due to the
> change in $subject, so I don't miss anything.

Yes.

Gently ping Jean-Philippe if I can get green light for this patch set?
Otherwise, please let me know if I need to follow up anything; thanks
for reviewing.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3 0/3] vfio-pci: Support INTx mode re-enabling

2019-04-02 Thread Leo Yan
Hi Will,

On Tue, Apr 02, 2019 at 05:50:25PM +0100, Will Deacon wrote:
> Hi Leo,
> 
> On Tue, Mar 26, 2019 at 03:41:28PM +0800, Leo Yan wrote:
> > When enable vfio-pci mode for NIC driver on Juno board, the IRQ is
> > failed to forward properly from host to guest, finally root caused this
> > issue is related with kvmtool cannot re-enable INTx mode properly.
> > 
> > So the basic working flow to reproduce this issue is as below:
> > 
> > Host Guest
> > -  
> >   INTx mode
> >  MSI enable failed in NIC driver
> >  MSI disable in NIC driver
> >  Switch back to INTx mode --> kvmtool doesn't support
> > 
> > So this patch is to support INTx mode re-enabling; patch 0001 is one
> > minor fixing up for eventfd releasing; patch 0002 introduces a new
> > function vfio_pci_init_intx() which is used to finish INTx one-time
> > initialisation; patch 0003 is the core patch for support INTx mode
> > re-enabling, when kvmtool detects MSI is disabled it rollbacks to INTx
> > mode.
> > 
> > This patch set has been tested on Juno-r2 board.
> > 
> > == Changes for V3 ==
> > * Add new function vfio_pci_init_intx() for one-time initialisation.
> > * Simplized INTx re-enabling (don't change irq_line anymore at the
> >   runtime).
> 
> Just checking, but is this series intended to supersede this one:
> 
> https://www.spinics.net/lists/kvm/msg183750.html
> 
> ?

Yes, this patch series is v3 and the link you pointed out is the old
patch series v1.  Sorry for confusion.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v3 1/3] vfio-pci: Release INTx's unmask eventfd properly

2019-03-26 Thread Leo Yan
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion
of the line from guest to host; but this eventfd isn't released properly
when disable INTx.

This patch firstly adds field 'unmask_fd' in struct vfio_pci_device for
storing unmask eventfd and close it when disable INTx.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Leo Yan 
---
 include/kvm/vfio.h | 1 +
 vfio/pci.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
index 60e6c54..28223cf 100644
--- a/include/kvm/vfio.h
+++ b/include/kvm/vfio.h
@@ -74,6 +74,7 @@ struct vfio_pci_device {
 
unsigned long   irq_modes;
int intx_fd;
+   int unmask_fd;
unsigned intintx_gsi;
struct vfio_pci_msi_common  msi;
struct vfio_pci_msi_common  msix;
diff --git a/vfio/pci.c b/vfio/pci.c
index 03de3c1..5224fee 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1008,6 +1008,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
irq__del_irqfd(kvm, gsi, pdev->intx_fd);
 
close(pdev->intx_fd);
+   close(pdev->unmask_fd);
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1095,6 +1096,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
}
 
pdev->intx_fd = trigger_fd;
+   pdev->unmask_fd = unmask_fd;
/* Guest is going to ovewrite our irq_line... */
pdev->intx_gsi = gsi;
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v3 0/3] vfio-pci: Support INTx mode re-enabling

2019-03-26 Thread Leo Yan
When enable vfio-pci mode for NIC driver on Juno board, the IRQ is
failed to forward properly from host to guest, finally root caused this
issue is related with kvmtool cannot re-enable INTx mode properly.

So the basic working flow to reproduce this issue is as below:

Host Guest
-  
  INTx mode
 MSI enable failed in NIC driver
 MSI disable in NIC driver
 Switch back to INTx mode --> kvmtool doesn't support

So this patch is to support INTx mode re-enabling; patch 0001 is one
minor fixing up for eventfd releasing; patch 0002 introduces a new
function vfio_pci_init_intx() which is used to finish INTx one-time
initialisation; patch 0003 is the core patch for support INTx mode
re-enabling, when kvmtool detects MSI is disabled it rollbacks to INTx
mode.

This patch set has been tested on Juno-r2 board.

== Changes for V3 ==
* Add new function vfio_pci_init_intx() for one-time initialisation.
* Simplized INTx re-enabling (don't change irq_line anymore at the
  runtime).


Leo Yan (3):
  vfio-pci: Release INTx's unmask eventfd properly
  vfio-pci: Add new function for INTx one-time initialisation
  vfio-pci: Re-enable INTx mode when disable MSI/MSIX

 include/kvm/vfio.h |   1 +
 vfio/pci.c | 108 +
 2 files changed, 72 insertions(+), 37 deletions(-)

-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v3 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-03-26 Thread Leo Yan
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode.  In this case, the INTx
mode has been disabled and has no chance to re-enable it, thus both INTx
mode and MSI/MSIX mode cannot work in vfio.

Below shows the detailed flow for introducing this issue:

  vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()

  vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()

  vfio_pci_disable_msis()   => Guest PCI driver disables MSI

To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
re-enable it so that the device can fallback to use it.

Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
may be called for multiple times, this patch uses 'intx_fd == -1' to
denote the INTx is disabled, the pair functions can directly bail out
when detect INTx has been disabled and enabled respectively.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Leo Yan 
---
 vfio/pci.c | 41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 3c39844..3b2b1e7 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
 
 static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device *vdev);
+static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
 
 static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
bool msix)
@@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
if (!msi_is_enabled(msis->virt_state))
return 0;
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
-   /*
-* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
-* time. Since INTx has to be enabled from the start (we don't
-* have a reliable way to know when the user starts using it),
-* disable it now.
-*/
+   /*
+* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
+* time. Since INTx has to be enabled from the start (after enabling
+* 'pdev->intx_fd' will be assigned to an eventfd and doesn't equal
+* to the init value -1), disable it now.
+*/
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
vfio_pci_disable_intx(kvm, vdev);
-   /* Permanently disable INTx */
-   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
-   }
 
eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
 
@@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
msi_set_enabled(msis->phys_state, false);
msi_set_empty(msis->phys_state, true);
 
-   return 0;
+   /*
+* When MSI or MSIX is disabled, this might be called when
+* PCI driver detects the MSI interrupt failure and wants to
+* rollback to INTx mode.  Thus enable INTx if the device
+* supports INTx mode in this case.
+*/
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
+   ret = vfio_pci_enable_intx(kvm, vdev);
+
+   return ret >= 0 ? 0 : ret;
 }
 
 static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device *vdev,
@@ -1002,6 +1009,10 @@ static void vfio_pci_disable_intx(struct kvm *kvm, 
struct vfio_device *vdev)
.index  = VFIO_PCI_INTX_IRQ_INDEX,
};
 
+   /* INTx mode has been disabled */
+   if (pdev->intx_fd == -1)
+   return;
+
pr_debug("user requested MSI, disabling INTx %d", gsi);
 
ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _set);
@@ -1009,6 +1020,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
 
close(pdev->intx_fd);
close(pdev->unmask_fd);
+   pdev->intx_fd = -1;
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1020,6 +1032,10 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
struct vfio_pci_device *pdev = >pci;
int gsi = pdev->intx_gsi;
 
+   /* INTx mode has been enabled */
+   if (pdev->intx_fd != -1)
+   return 0;
+
/*
 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
 * signals an interrupt from host to guest, and unmask_fd signals the
@@ -1122,6 +1138,9 @@ static int vfio_pci_init_intx(struct kvm *kvm, struct 
vfio_device *vdev)
/* Guest is going to ovewrite our irq_line... */
pdev

[PATCH kvmtool v3 2/3] vfio-pci: Add new function for INTx one-time initialisation

2019-03-26 Thread Leo Yan
To support INTx enabling for multiple times, we need firstly to extract
one-time initialisation and move the related code into a new function
vfio_pci_init_intx(); if later disable and re-enable the INTx, we can
skip these one-time operations.

This patch move below three main operations for INTx one-time
initialisation from function vfio_pci_enable_intx() into function
vfio_pci_init_intx():

- Reserve 2 FDs for INTx;
- Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO;
- Setup pdev->intx_gsi.

Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Leo Yan 
---
 vfio/pci.c | 67 --
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 5224fee..3c39844 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1018,30 +1018,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
struct vfio_irq_eventfd trigger;
struct vfio_irq_eventfd unmask;
struct vfio_pci_device *pdev = >pci;
-   int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
-
-   struct vfio_irq_info irq_info = {
-   .argsz = sizeof(irq_info),
-   .index = VFIO_PCI_INTX_IRQ_INDEX,
-   };
-
-   vfio_pci_reserve_irq_fds(2);
-
-   ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
-   if (ret || irq_info.count == 0) {
-   vfio_dev_err(vdev, "no INTx reported by VFIO");
-   return -ENODEV;
-   }
-
-   if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
-   vfio_dev_err(vdev, "interrupt not eventfd capable");
-   return -EINVAL;
-   }
-
-   if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
-   vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
-   return -EINVAL;
-   }
+   int gsi = pdev->intx_gsi;
 
/*
 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
@@ -1097,8 +1074,6 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
 
pdev->intx_fd = trigger_fd;
pdev->unmask_fd = unmask_fd;
-   /* Guest is going to ovewrite our irq_line... */
-   pdev->intx_gsi = gsi;
 
return 0;
 
@@ -1117,6 +1092,39 @@ err_close:
return ret;
 }
 
+static int vfio_pci_init_intx(struct kvm *kvm, struct vfio_device *vdev)
+{
+   int ret;
+   struct vfio_pci_device *pdev = >pci;
+   struct vfio_irq_info irq_info = {
+   .argsz = sizeof(irq_info),
+   .index = VFIO_PCI_INTX_IRQ_INDEX,
+   };
+
+   vfio_pci_reserve_irq_fds(2);
+
+   ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+   if (ret || irq_info.count == 0) {
+   vfio_dev_err(vdev, "no INTx reported by VFIO");
+   return -ENODEV;
+   }
+
+   if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
+   vfio_dev_err(vdev, "interrupt not eventfd capable");
+   return -EINVAL;
+   }
+
+   if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
+   vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
+   return -EINVAL;
+   }
+
+   /* Guest is going to ovewrite our irq_line... */
+   pdev->intx_gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
+
+   return 0;
+}
+
 static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device 
*vdev)
 {
int ret = 0;
@@ -1142,8 +1150,13 @@ static int vfio_pci_configure_dev_irqs(struct kvm *kvm, 
struct vfio_device *vdev
return ret;
}
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
+   ret = vfio_pci_init_intx(kvm, vdev);
+   if (ret)
+   return ret;
+
ret = vfio_pci_enable_intx(kvm, vdev);
+   }
 
return ret;
 }
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-03-25 Thread Leo Yan
Hi Jean-Philippe,

On Mon, Mar 25, 2019 at 06:28:40PM +, Jean-Philippe Brucker wrote:

[...]

> > @@ -162,7 +160,34 @@ static int vfio_pci_disable_msis(struct kvm *kvm, 
> > struct vfio_device *vdev,
> > msi_set_enabled(msis->phys_state, false);
> > msi_set_empty(msis->phys_state, true);
> >  
> > -   return 0;
> > +   /*
> > +* When MSI or MSIX is disabled, this might be called when
> > +* PCI driver detects the MSI interrupt failure and wants to
> > +* rollback to INTx mode.  Thus enable INTx if the device
> > +* supports INTx mode in this case.
> > +*/
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> > +   /*
> > +* Struct pci_device_header is not only used for header,
> > +* it also is used for PCI configuration; and in the function
> > +* vfio_pci_cfg_write() it firstly writes configuration space
> > +* and then read back the configuration space data into the
> > +* header structure; thus 'irq_pin' and 'irq_line' in the
> > +* header will be overwritten.
> > +*
> > +* If want to enable INTx mode properly, firstly needs to
> > +* restore 'irq_pin' and 'irq_line' values; we can simply set 1
> > +* to 'irq_pin', and 'pdev->intx_gsi' keeps gsi value when
> > +* enable INTx mode previously so we can simply use it to
> > +* recover irq line number by adding offset KVM_IRQ_OFFSET.
> > +*/
> > +   pdev->hdr.irq_pin = 1;
> > +   pdev->hdr.irq_line = pdev->intx_gsi + KVM_IRQ_OFFSET;
> 
> That doesn't look right. We shouldn't change irq_line at runtime, it's
> reserved to the guest (and irq_pin is static). Maybe move the bits that
> should only be done once (intx_gsi setup, and also the
> VFIO_DEVICE_GET_IRQ_INFO sanity check) outside of the enable() function,
> like we do for msis?

Agree this and another comment in patch 0002.  Will spin new patch
set with following these good suggestions.

Thanks a lot for reviewing!

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] vfio-pci: Fix MSI IRQ forwarding for without per-vector masking

2019-03-21 Thread Leo Yan
On Fri, Mar 22, 2019 at 01:23:08PM +0800, Leo Yan wrote:
> If MSI doesn't support per-vector masking capability and
> PCI_MSI_FLAGS_MASKBIT isn't set in message control field, the function
> vfio_pci_msi_vector_write() will directly bail out for this case and
> every vector's 'virt_state' keeps setting bit VFIO_PCI_MSI_STATE_MASKED.
> 
> This results in the state maintained in 'virt_state' cannot really
> reflect the MSI hardware state; finally it will mislead the function
> vfio_pci_update_msi_entry() to skip IRQ forwarding with below flow:
> 
> vfio_pci_update_msi_entry() {
> 
>   [...]
> 
>   if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
>   return 0;  ==> skip IRQ forwarding
> 
>   [...]
> }
> 
> To fix this issue, when detect PCI_MSI_FLAGS_MASKBIT is not set in the
> message control field, this patch simply clears bit
> VFIO_PCI_MSI_STATE_MASKED for all vectors 'virt_state'; at the end
> vfio_pci_update_msi_entry() can forward MSI IRQ successfully.

Just remind, this patch is for kvmtool but not for kernel.  Sorry I
forget to add it in subject.

Thanks,
Leo Yan

> Signed-off-by: Leo Yan 
> ---
>  vfio/pci.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index ba971eb..4fd24ac 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -363,8 +363,18 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, 
> struct vfio_device *vdev,
>   struct vfio_pci_device *pdev = >pci;
>   struct msi_cap_64 *msi_cap_64 = PCI_CAP(>hdr, pdev->msi.pos);
>  
> - if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT))
> + if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT)) {
> + /*
> +  * If MSI doesn't support per-vector masking capability,
> +  * simply unmask for all vectors.
> +  */
> + for (i = 0; i < pdev->msi.nr_entries; i++) {
> + entry = >msi.entries[i];
> + msi_set_masked(entry->virt_state, false);
> + }
> +
>   return 0;
> + }
>  
>   if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT)
>   mask_pos = PCI_MSI_MASK_64;
> -- 
> 2.19.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] vfio-pci: Fix MSI IRQ forwarding for without per-vector masking

2019-03-21 Thread Leo Yan
If MSI doesn't support per-vector masking capability and
PCI_MSI_FLAGS_MASKBIT isn't set in message control field, the function
vfio_pci_msi_vector_write() will directly bail out for this case and
every vector's 'virt_state' keeps setting bit VFIO_PCI_MSI_STATE_MASKED.

This results in the state maintained in 'virt_state' cannot really
reflect the MSI hardware state; finally it will mislead the function
vfio_pci_update_msi_entry() to skip IRQ forwarding with below flow:

vfio_pci_update_msi_entry() {

  [...]

  if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
  return 0;  ==> skip IRQ forwarding

  [...]
}

To fix this issue, when detect PCI_MSI_FLAGS_MASKBIT is not set in the
message control field, this patch simply clears bit
VFIO_PCI_MSI_STATE_MASKED for all vectors 'virt_state'; at the end
vfio_pci_update_msi_entry() can forward MSI IRQ successfully.

Signed-off-by: Leo Yan 
---
 vfio/pci.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index ba971eb..4fd24ac 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -363,8 +363,18 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, 
struct vfio_device *vdev,
struct vfio_pci_device *pdev = >pci;
struct msi_cap_64 *msi_cap_64 = PCI_CAP(>hdr, pdev->msi.pos);
 
-   if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT))
+   if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT)) {
+   /*
+* If MSI doesn't support per-vector masking capability,
+* simply unmask for all vectors.
+*/
+   for (i = 0; i < pdev->msi.nr_entries; i++) {
+   entry = >msi.entries[i];
+   msi_set_masked(entry->virt_state, false);
+   }
+
return 0;
+   }
 
if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT)
mask_pos = PCI_MSI_MASK_64;
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-20 Thread Leo Yan
On Tue, Mar 19, 2019 at 09:33:58AM +0800, Leo Yan wrote:

[...]

> So far, it still cannot work well if I only set "sky2.disable_msi=1"
> for host kernel command line, with this config it runs with below flow
> and which cannot forward interrupt properly from host to guest:
> 
> hostguest
> 
>   INTx mode   msi enable
>   msi disable
>   Switch back to INTx mode

Just for heading up, for enabling vfio-pci with NIC device on Juno
board, I sent out two patch set for related changes.  With these
changes, the INTx mode can work well and it also can handle for up
failure case.

  kvmtool changes:
  https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035186.html

  Juno DT binding for PCIe SMMU:
  
https://archive.armlinux.org.uk/lurker/message/20190320.083105.f002c91c.en.html

@Robin, if you find issue for Juno DT binding in my patch and want to
send your patch for related fixing, please feel free let me know.  I
am happy to test it and drop my own.  Thanks!

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v2 1/3] vfio-pci: Release INTx's unmask eventfd properly

2019-03-20 Thread Leo Yan
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion
of the line from guest to host; but this eventfd isn't released properly
when disable INTx.

This patch firstly adds field 'unmask_fd' in struct vfio_pci_device for
storing unmask eventfd and close it when disable INTx.

Signed-off-by: Leo Yan 
---
 include/kvm/vfio.h | 1 +
 vfio/pci.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
index 60e6c54..28223cf 100644
--- a/include/kvm/vfio.h
+++ b/include/kvm/vfio.h
@@ -74,6 +74,7 @@ struct vfio_pci_device {
 
unsigned long   irq_modes;
int intx_fd;
+   int unmask_fd;
unsigned intintx_gsi;
struct vfio_pci_msi_common  msi;
struct vfio_pci_msi_common  msix;
diff --git a/vfio/pci.c b/vfio/pci.c
index 03de3c1..5224fee 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1008,6 +1008,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
irq__del_irqfd(kvm, gsi, pdev->intx_fd);
 
close(pdev->intx_fd);
+   close(pdev->unmask_fd);
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1095,6 +1096,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
}
 
pdev->intx_fd = trigger_fd;
+   pdev->unmask_fd = unmask_fd;
/* Guest is going to ovewrite our irq_line... */
pdev->intx_gsi = gsi;
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v2 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-03-20 Thread Leo Yan
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode.  The INTx mode has been
disabled and it has no chance to be enabled again, thus both INTx mode
and MSI/MSIX mode will not be enabled in vfio for this case.

Below shows the detailed flow for introducing this issue:

  vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()

  vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()

  vfio_pci_disable_msis()   => Guest PCI driver disables MSI

To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
we need to re-enable it so the device can fallback to use it.

In this patch, should note two minor changes:

- vfio_pci_disable_intx() may be called multiple times (each time the
  guest enables one MSI vector).  This patch changes to use
  'intx_fd == -1' to denote the INTx disabled, vfio_pci_disable_intx()
  and vfio_pci_enable_intx will directly bail out when detect INTx has
  been disabled and enabled respectively.

- Since pci_device_header will be corrupted after PCI configuration
  and all irq related info will be lost.  Before re-enabling INTx
  mode, this patch restores 'irq_pin' and 'irq_line' fields in struct
  pci_device_header.

Signed-off-by: Leo Yan 
---
 vfio/pci.c | 59 --
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index d025581..ba971eb 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
 
 static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device *vdev);
+static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
 
 static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
bool msix)
@@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
if (!msi_is_enabled(msis->virt_state))
return 0;
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
-   /*
-* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
-* time. Since INTx has to be enabled from the start (we don't
-* have a reliable way to know when the user starts using it),
-* disable it now.
-*/
+   /*
+* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
+* time. Since INTx has to be enabled from the start (after enabling
+* 'pdev->intx_fd' will be assigned to an eventfd and doesn't equal
+* to the init value -1), disable it now.
+*/
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
vfio_pci_disable_intx(kvm, vdev);
-   /* Permanently disable INTx */
-   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
-   }
 
eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
 
@@ -162,7 +160,34 @@ static int vfio_pci_disable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
msi_set_enabled(msis->phys_state, false);
msi_set_empty(msis->phys_state, true);
 
-   return 0;
+   /*
+* When MSI or MSIX is disabled, this might be called when
+* PCI driver detects the MSI interrupt failure and wants to
+* rollback to INTx mode.  Thus enable INTx if the device
+* supports INTx mode in this case.
+*/
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
+   /*
+* Struct pci_device_header is not only used for header,
+* it also is used for PCI configuration; and in the function
+* vfio_pci_cfg_write() it firstly writes configuration space
+* and then read back the configuration space data into the
+* header structure; thus 'irq_pin' and 'irq_line' in the
+* header will be overwritten.
+*
+* If want to enable INTx mode properly, firstly needs to
+* restore 'irq_pin' and 'irq_line' values; we can simply set 1
+* to 'irq_pin', and 'pdev->intx_gsi' keeps gsi value when
+* enable INTx mode previously so we can simply use it to
+* recover irq line number by adding offset KVM_IRQ_OFFSET.
+*/
+   pdev->hdr.irq_pin = 1;
+   pdev->hdr.irq_line = pdev->intx_gsi + KVM_IRQ_OFFSET;
+
+   ret = vfio_pci_enable_intx(kvm, vdev);
+   }
+
+   return ret >= 0 ? 0 : ret;
 }
 
 static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_dev

[PATCH kvmtool v2 0/3] vfio-pci: Support INTx mode re-enabling

2019-03-20 Thread Leo Yan
When enable vfio-pci mode for NIC driver on Juno board, the IRQ is
failed to forward properly from host to guest, finally root caused this
issue is related with kvmtool cannot re-enable INTx mode properly.

So the basic working flow to reproduce this issue is as below:

Host Guest
-  
  INTx mode
 MSI enable failed in NIC driver
 MSI disable in NIC driver
 Switch back to INTx mode --> kvmtool doesn't support

So this patch is to support INTx mode re-enabling; 0001/0002 patches
are only minor fixing up for eventfd releasing and remove useless FDs
reservation for INTx.  0003 patch is the core patch for supporting
INTx mode re-enabling, when kvmtool detects MSI is disabled it
rollbacks to INTx mode.

This patch set has been tested on Juno-r2 board.

Leo Yan (3):
  vfio-pci: Release INTx's unmask eventfd properly
  vfio-pci: Remove useless FDs reservation in vfio_pci_enable_intx()
  vfio-pci: Re-enable INTx mode when disable MSI/MSIX

 include/kvm/vfio.h |  1 +
 vfio/pci.c | 61 +-
 2 files changed, 50 insertions(+), 12 deletions(-)

-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v2 2/3] vfio-pci: Remove useless FDs reservation in vfio_pci_enable_intx()

2019-03-20 Thread Leo Yan
Since INTx only uses 2 FDs, it's not particularly useful to reserve FDs
in function vfio_pci_enable_intx(); so this patch is to remove FDs
reservation in this function.

Signed-off-by: Leo Yan 
---
 vfio/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 5224fee..d025581 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1025,8 +1025,6 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
.index = VFIO_PCI_INTX_IRQ_INDEX,
};
 
-   vfio_pci_reserve_irq_fds(2);
-
ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
if (ret || irq_info.count == 0) {
vfio_dev_err(vdev, "no INTx reported by VFIO");
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v1 2/2] vfio-pci: Fallback to INTx mode when disable MSI/MSIX

2019-03-19 Thread Leo Yan
Hi Jean-Philippe,

On Fri, Mar 15, 2019 at 02:20:07PM +, Jean-Philippe Brucker wrote:
> On 15/03/2019 08:33, Leo Yan wrote:
> > Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
> > default to disable INTx mode when enable MSI/MSIX mode; but this logic is
> > easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
> > expected and tries to rollback to use INTx mode.  The INTx mode has been
> > disabled and it has no chance to be enabled again, thus both INTx mode
> > and MSI/MSIX mode will not be enabled in vfio for this case.
> > 
> > Below shows the detailed flow for introducing this issue:
> > 
> >   vfio_pci_configure_dev_irqs()
> > `-> vfio_pci_enable_intx()
> > 
> >   vfio_pci_enable_msis()
> > `-> vfio_pci_disable_intx()
> > 
> >   vfio_pci_disable_msis()   => Guest PCI driver disables MSI
> > 
> > To fix this issue, when disable MSI/MSIX we need to check if INTx mode
> > is available for this device or not; if the device can support INTx then
> > we need to re-enable it so the device can fallback to use it.
> > 
> > Signed-off-by: Leo Yan 
> > ---
> >  vfio/pci.c | 17 -
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/vfio/pci.c b/vfio/pci.c
> > index c0683f6..44727bb 100644
> > --- a/vfio/pci.c
> > +++ b/vfio/pci.c
> > @@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
> > msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
> >  
> >  static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device 
> > *vdev);
> > +static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
> >  
> >  static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
> > bool msix)
> > @@ -50,7 +51,7 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
> > vfio_device *vdev,
> > if (!msi_is_enabled(msis->virt_state))
> > return 0;
> >  
> > -   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > /*
> >  * PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> >  * time. Since INTx has to be enabled from the start (we don't
> > @@ -58,9 +59,6 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
> > vfio_device *vdev,
> >  * disable it now.
> >  */
> > vfio_pci_disable_intx(kvm, vdev);
> > -   /* Permanently disable INTx */
> > -   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
> 
> As a result vfio_pci_disable_intx() may be called multiple times (each
> time the guest enables one MSI vector). Could you make
> vfio_pci_disable_intx() safe against that (maybe use intx_fd == -1 to
> denote the INTx state)?

Will do this.

> > -   }
> >  
> > eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
> >  
> > @@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, 
> > struct vfio_device *vdev,
> > msi_set_enabled(msis->phys_state, false);
> > msi_set_empty(msis->phys_state, true);
> >  
> > -   return 0;
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > +   /*
> > +* When MSI or MSIX is disabled, this might be called when
> > +* PCI driver detects the MSI interrupt failure and wants to
> > +* rollback to INTx mode.  Thus enable INTx if the device
> > +* supports INTx mode in this case.
> > +*/
> > +   ret = vfio_pci_enable_intx(kvm, vdev);
> 
> Let's remove vfio_pci_reserve_irq_fds(2) from vfio_pci_enable_intx(), it
> should only called once per run, and isn't particularly useful here
> since INTx only uses 2 fds. It's used to bump the fd rlimit when a
> device needs ~2048 file descriptors for MSI-X.

Understand; will do it.

Thanks a lot for very detailed suggestions.

> > +
> > +   return ret >= 0 ? 0 : ret;
> >  }
> >  
> >  static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device 
> > *vdev,
> > 
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-18 Thread Leo Yan
Hi Robin,

On Mon, Mar 18, 2019 at 12:25:33PM +, Robin Murphy wrote:

[...]

> > diff --git a/arm/include/arm-common/kvm-arch.h 
> > b/arm/include/arm-common/kvm-arch.h
> > index b9d486d..43f78b1 100644
> > --- a/arm/include/arm-common/kvm-arch.h
> > +++ b/arm/include/arm-common/kvm-arch.h
> > @@ -7,10 +7,10 @@
> > 
> >   #include "arm-common/gic.h"
> > 
> > -#define ARM_IOPORT_AREA_AC(0x, UL)
> > -#define ARM_MMIO_AREA  _AC(0x0001, UL)
> > -#define ARM_AXI_AREA   _AC(0x4000, UL)
> > -#define ARM_MEMORY_AREA_AC(0x8000, UL)
> > +#define ARM_IOPORT_AREA_AC(0x8000, UL)
> > +#define ARM_MMIO_AREA  _AC(0x8001, UL)
> > +#define ARM_AXI_AREA   _AC(0x8800, UL)
> > +#define ARM_MEMORY_AREA_AC(0x9000, UL)
> > 
> > Anyway, very appreciate for the suggestions; it's sufficent for me to
> > dig more for memory related information (e.g. PCIe configurations,
> > IOMMU, etc) and will keep posted if I make any progress.
> 
> None of those should need to change (all the MMIO emulation stuff is
> irrelevant to PCIe DMA anyway) - provided you don't give the guest more than
> 2GB of RAM, passthrough with legacy INTx ought to work out-of-the-box. For
> MSIs to get through, you'll further need to change the host kernel to place
> its software MSI region[2] within any of the host bridge windows as well.

>From PCI configurations dumping, I can see after launch the guest with
kvmtool, the host receives the first interrupt (checked with the
function vfio_intx_handler() has been invoked once) and then PCI sent
command with PCI_COMMAND_INTX_DISABLE to disable interrupt line.  So
this flow is very likely the interrupt is not forwarded properly and
guest doesn't receive interrupt.

It's lucky that I found below flow can let interrupt forwarding from
host to guest after I always set "sky2.disable_msi=1" for both kernel
command lines:

hostguest

  INTx mode   INTx mode

So far, it still cannot work well if I only set "sky2.disable_msi=1"
for host kernel command line, with this config it runs with below flow
and which cannot forward interrupt properly from host to guest:

hostguest

  INTx mode   msi enable
  msi disable
  Switch back to INTx mode

I am so happy now I can use pure INTx mode on Juno board for NIC
enabling and pinged successfully from guest OS to my router :)

Will look into the issue in the second secnario; and if I have more
time I will look into msi mode as well (I confirmed msi mode can work
with host OS but failed in guest OS).

Very appreciate you & Eric helping!

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-15 Thread Leo Yan
Hi Robin,

On Fri, Mar 15, 2019 at 12:54:10PM +, Robin Murphy wrote:
> Hi Leo,
> 
> Sorry for the delay - I'm on holiday this week, but since I've made the
> mistake of glancing at my inbox I should probably save you from wasting any
> more time...

Sorry for disturbing you in holiday and appreciate your help.  It's no
rush to reply.

> On 2019-03-15 11:03 am, Auger Eric wrote:
> > Hi Leo,
> > 
> > + Jean-Philippe
> > 
> > On 3/15/19 10:37 AM, Leo Yan wrote:
> > > Hi Eric, Robin,
> > > 
> > > On Wed, Mar 13, 2019 at 11:24:25AM +0100, Auger Eric wrote:
> > > 
> > > [...]
> > > 
> > > > > If the NIC supports MSIs they logically are used. This can be easily
> > > > > checked on host by issuing "cat /proc/interrupts | grep vfio". Can you
> > > > > check whether the guest received any interrupt? I remember that Robin
> > > > > said in the past that on Juno, the MSI doorbell was in the PCI host
> > > > > bridge window and possibly transactions towards the doorbell could not
> > > > > reach it since considered as peer to peer.
> > > > 
> > > > I found back Robin's explanation. It was not related to MSI IOVA being
> > > > within the PCI host bridge window but RAM GPA colliding with host PCI
> > > > config space?
> > > > 
> > > > "MSI doorbells integral to PCIe root complexes (and thus untranslatable)
> > > > typically have a programmable address, so could be anywhere. In the more
> > > > general category of "special hardware addresses", QEMU's default ARM
> > > > guest memory map puts RAM starting at 0x4000; on the ARM Juno
> > > > platform, that happens to be where PCI config space starts; as Juno's
> > > > PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
> > > > the PCI bus to a guest (all of it, given the lack of ACS), the root
> > > > complex just sees the guest's attempts to DMA to "memory" as the device
> > > > attempting to access config space and aborts them."
> > > 
> > > Below is some following investigation at my side:
> > > 
> > > Firstly, must admit that I don't understand well for up paragraph; so
> > > based on the description I am wandering if can use INTx mode and if
> > > it's lucky to avoid this hardware pitfall.
> > 
> > The problem above is that during the assignment process, the virtualizer
> > maps the whole guest RAM though the IOMMU (+ the MSI doorbell on ARM) to
> > allow the device, programmed in GPA to access the whole guest RAM.
> > Unfortunately if the device emits a DMA request with 0x4000 IOVA
> > address, this IOVA is interpreted by the Juno RC as a transaction
> > towards the PCIe config space. So this DMA request will not go beyond
> > the RC, will never reach the IOMMU and will never reach the guest RAM.
> > So globally the device is not able to reach part of the guest RAM.
> > That's how I interpret the above statement. Then I don't know the
> > details of the collision, I don't have access to this HW. I don't know
> > either if this problem still exists on the r2 HW.

Thanks a lot for rephrasing, Eric :)

> The short answer is that if you want PCI passthrough to work on Juno, the
> guest memory map has to look like a Juno.
> 
> The PCIe root complex uses an internal lookup table to generate appropriate
> AXI attributes for outgoing PCIe transactions; unfortunately this has no
> notion of 'default' attributes, so addresses *must* match one of the
> programmed windows in order to be valid. From memory, EDK2 sets up a 2GB
> window covering the lower DRAM bank, an 8GB window covering the upper DRAM
> bank, and a 1MB (or thereabouts) window covering the GICv2m region with
> Device attributes.

I checked kernel memory blocks info, it gives out below result:

root@debian:~# cat /sys/kernel/debug/memblock/memory
   0: 0x8000..0xfeff
   1: 0x00088000..0x0009

So I think the lower 2GB DRAM window is: [0x8000_..0xfeff_]
and the high DRAM window is [0x8_8000_..0x9__].

BTW, now I am using uboot rather than UEFI, so not sure if uboot has
programmed memory windows for PCIe.  Could you help give a point for
which registers should be set in UEFI thus I also can check related
configurations in uboot?

> Any PCIe transactions to addresses not within one of
> those windows will be aborted by the RC without ever going out to the AXI
> side where the SMMU lies (and I think anything matching the config space or
> I/O space windows or a r

Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-15 Thread Leo Yan
Hi Eric, Robin,

On Wed, Mar 13, 2019 at 11:24:25AM +0100, Auger Eric wrote:

[...]

> > If the NIC supports MSIs they logically are used. This can be easily
> > checked on host by issuing "cat /proc/interrupts | grep vfio". Can you
> > check whether the guest received any interrupt? I remember that Robin
> > said in the past that on Juno, the MSI doorbell was in the PCI host
> > bridge window and possibly transactions towards the doorbell could not
> > reach it since considered as peer to peer.
> 
> I found back Robin's explanation. It was not related to MSI IOVA being
> within the PCI host bridge window but RAM GPA colliding with host PCI
> config space?
> 
> "MSI doorbells integral to PCIe root complexes (and thus untranslatable)
> typically have a programmable address, so could be anywhere. In the more
> general category of "special hardware addresses", QEMU's default ARM
> guest memory map puts RAM starting at 0x4000; on the ARM Juno
> platform, that happens to be where PCI config space starts; as Juno's
> PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
> the PCI bus to a guest (all of it, given the lack of ACS), the root
> complex just sees the guest's attempts to DMA to "memory" as the device
> attempting to access config space and aborts them."

Below is some following investigation at my side:

Firstly, must admit that I don't understand well for up paragraph; so
based on the description I am wandering if can use INTx mode and if
it's lucky to avoid this hardware pitfall.

But when I want to rollback to use INTx mode I found there have issue
for kvmtool to support INTx mode, so this is why I wrote the patch [1]
to fix the issue.  Alternatively, we also can set the NIC driver
module parameter 'sky2.disable_msi=1' thus can totally disable msi and
only use INTx mode.

Anyway, finally I can get INTx mode enabled and I can see the
interrupt will be registered successfully on both host and guest:

Host side:

   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5
 41:  0  0  0  0  0  0 
GICv2  54 Level arm-pmu
 42:  0  0  0  0  0  0 
GICv2  58 Level arm-pmu
 43:  0  0  0  0  0  0 
GICv2  62 Level arm-pmu
 45:772  0  0  0  0  0 
GICv2 171 Level vfio-intx(:08:00.0)

Guest side:

# cat /proc/interrupts
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5
 12:  0  0  0  0  0  0 
GIC-0  96 Level eth1

So you could see the host can receive the interrupts, but these
interrupts are mainly triggered before binding vfio-pci driver.  But
seems now after launch kvm I can see there have very small mount
interrupts are triggered in host and the guest kernel also can receive
the virtual interrupts, e.g. if use 'dhclient eth1' command in guest
OS, this command stalls for long time (> 1 minute) after return back,
I can see both the host OS and guest OS can receive 5~6 interrupts.
Based on this, I guess the flow for interrupts forwarding has been
enabled.  But seems the data packet will not really output and I use
wireshark to capture packets, but cannot find any packet output from
the NIC.

I did another testing is to shrink the memory space/io/bus region to
less than 0x4000, so this can avoid to put guest memory IPA into
0x4000.  But this doesn't work.

@Robin, could you help explain for the hardware issue and review my
methods are feasible on Juno board?  Thanks a lot for suggestions.

I will dig more for the memory mapping and post at here.

Thanks,
Leo Yan

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035055.html
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v1 1/2] vfio-pci: Release INTx's guest to host eventfd properly

2019-03-15 Thread Leo Yan
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion
of the line from guest to host; but this eventfd isn't released properly
when disable INTx.

When disable INTx this patch firstly unbinds interrupt signal by calling
ioctl VFIO_DEVICE_SET_IRQS and then it uses the new added field
'unmask_fd' in struct vfio_pci_device to close event fd.

Signed-off-by: Leo Yan 
---
 include/kvm/vfio.h |  1 +
 vfio/pci.c | 15 ---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
index 60e6c54..28223cf 100644
--- a/include/kvm/vfio.h
+++ b/include/kvm/vfio.h
@@ -74,6 +74,7 @@ struct vfio_pci_device {
 
unsigned long   irq_modes;
int intx_fd;
+   int unmask_fd;
unsigned intintx_gsi;
struct vfio_pci_msi_common  msi;
struct vfio_pci_msi_common  msix;
diff --git a/vfio/pci.c b/vfio/pci.c
index 03de3c1..c0683f6 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -996,18 +996,26 @@ static void vfio_pci_disable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
 {
struct vfio_pci_device *pdev = >pci;
int gsi = pdev->intx_gsi;
-   struct vfio_irq_set irq_set = {
-   .argsz  = sizeof(irq_set),
+   struct vfio_irq_set trigger_irq = {
+   .argsz  = sizeof(trigger_irq),
.flags  = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
.index  = VFIO_PCI_INTX_IRQ_INDEX,
};
 
+   struct vfio_irq_set unmask_irq = {
+   .argsz  = sizeof(unmask_irq),
+   .flags  = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+   .index  = VFIO_PCI_INTX_IRQ_INDEX,
+   };
+
pr_debug("user requested MSI, disabling INTx %d", gsi);
 
-   ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _set);
+   ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _irq);
+   ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _irq);
irq__del_irqfd(kvm, gsi, pdev->intx_fd);
 
close(pdev->intx_fd);
+   close(pdev->unmask_fd);
 }
 
 static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
@@ -1095,6 +1103,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
vfio_device *vdev)
}
 
pdev->intx_fd = trigger_fd;
+   pdev->unmask_fd = unmask_fd;
/* Guest is going to ovewrite our irq_line... */
pdev->intx_gsi = gsi;
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v1 2/2] vfio-pci: Fallback to INTx mode when disable MSI/MSIX

2019-03-15 Thread Leo Yan
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode.  The INTx mode has been
disabled and it has no chance to be enabled again, thus both INTx mode
and MSI/MSIX mode will not be enabled in vfio for this case.

Below shows the detailed flow for introducing this issue:

  vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()

  vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()

  vfio_pci_disable_msis()   => Guest PCI driver disables MSI

To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
we need to re-enable it so the device can fallback to use it.

Signed-off-by: Leo Yan 
---
 vfio/pci.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index c0683f6..44727bb 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
 
 static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device *vdev);
+static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
 
 static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
bool msix)
@@ -50,7 +51,7 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
if (!msi_is_enabled(msis->virt_state))
return 0;
 
-   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
/*
 * PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
 * time. Since INTx has to be enabled from the start (we don't
@@ -58,9 +59,6 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
 * disable it now.
 */
vfio_pci_disable_intx(kvm, vdev);
-   /* Permanently disable INTx */
-   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
-   }
 
eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
 
@@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, struct 
vfio_device *vdev,
msi_set_enabled(msis->phys_state, false);
msi_set_empty(msis->phys_state, true);
 
-   return 0;
+   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
+   /*
+* When MSI or MSIX is disabled, this might be called when
+* PCI driver detects the MSI interrupt failure and wants to
+* rollback to INTx mode.  Thus enable INTx if the device
+* supports INTx mode in this case.
+*/
+   ret = vfio_pci_enable_intx(kvm, vdev);
+
+   return ret >= 0 ? 0 : ret;
 }
 
 static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device *vdev,
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-13 Thread Leo Yan
On Wed, Mar 13, 2019 at 11:24:25AM +0100, Auger Eric wrote:

[...]

> >> I am stucking at the network card cannot receive interrupts in guest
> >> OS.  So took time to look into the code and added some printed info to
> >> help me to understand the detailed flow, below are two main questions
> >> I am confused with them and need some guidance:
> >>
> >> - The first question is about the msi usage in network card driver;
> >>   when review the sky2 network card driver [1], it has function
> >>   sky2_test_msi() which is used to decide if can use msi or not.
> >>
> >>   The interesting thing is this function will firstly request irq for
> >>   the interrupt and based on the interrupt handler to read back
> >>   register and then can make decision if msi is avalible or not.
> >>
> >>   This can work well for host OS, but if we want to passthrough this
> >>   device to guest OS, since the KVM doesn't prepare the interrupt for
> >>   sky2 drivers (no injection or forwarding) thus at this point the
> >>   interrupt handle will not be invorked.  At the end the driver will
> >>   not set flag 'hw->flags |= SKY2_HW_USE_MSI' and this results to not
> >>   use msi in guest OS and rollback to INTx mode.
> >>
> >>   My first impression is if we passthrough the devices to guest OS in
> >>   KVM, the PCI-e device can directly use msi;  I tweaked a bit for the
> >>   code to check status value after timeout, so both host OS and guest
> >>   OS can set the flag for msi.
> >>
> >>   I want to confirm, if this is the recommended mode for
> >>   passthrough PCI-e device to use msi both in host OS and geust OS?
> >>   Or it's will be fine for host OS using msi and guest OS using
> >>   INTx mode?
> > 
> > If the NIC supports MSIs they logically are used. This can be easily
> > checked on host by issuing "cat /proc/interrupts | grep vfio". Can you
> > check whether the guest received any interrupt? I remember that Robin
> > said in the past that on Juno, the MSI doorbell was in the PCI host
> > bridge window and possibly transactions towards the doorbell could not
> > reach it since considered as peer to peer.
> 
> I found back Robin's explanation. It was not related to MSI IOVA being
> within the PCI host bridge window but RAM GPA colliding with host PCI
> config space?
> 
> "MSI doorbells integral to PCIe root complexes (and thus untranslatable)
> typically have a programmable address, so could be anywhere. In the more
> general category of "special hardware addresses", QEMU's default ARM
> guest memory map puts RAM starting at 0x4000; on the ARM Juno
> platform, that happens to be where PCI config space starts; as Juno's
> PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
> the PCI bus to a guest (all of it, given the lack of ACS), the root
> complex just sees the guest's attempts to DMA to "memory" as the device
> attempting to access config space and aborts them."

Thanks a lot for the info, Eric.

Seems to me, this issue can be bypassed by using other memory address
rather than 0x4000 for kernel IPA thus can avoid colliding with
host PCI config space.

Robin, just curious have you tried to change guest memory start address
for bypassing this issue?  Or tried kvmtool for on Juno-r2 board (e.g.
kvmtool uses 0x400 for AXI bus and 0x8000 for RAM, we can do
some quickly shrinking thus can don't touch 0x4000 region?)

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-13 Thread Leo Yan
Hi Eric,

On Wed, Mar 13, 2019 at 11:01:33AM +0100, Auger Eric wrote:

[...]

> >   I want to confirm, if this is the recommended mode for
> >   passthrough PCI-e device to use msi both in host OS and geust OS?
> >   Or it's will be fine for host OS using msi and guest OS using
> >   INTx mode?
> 
> If the NIC supports MSIs they logically are used. This can be easily
> checked on host by issuing "cat /proc/interrupts | grep vfio". Can you
> check whether the guest received any interrupt? I remember that Robin
> said in the past that on Juno, the MSI doorbell was in the PCI host
> bridge window and possibly transactions towards the doorbell could not
> reach it since considered as peer to peer. Using GICv2M should not bring
> any performance issue. I tested that in the past with Seattle board.

I can see below info on host with launching KVM:

root@debian:~# cat /proc/interrupts | grep vfio
 46:  0  0  0  0  0  0   
MSI 4194304 Edge  vfio-msi[0](:08:00.0)

And below is interrupts in guest:

# cat /proc/interrupts
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5
  3:506400281403298330 
GIC-0  27 Level arch_timer
  5:768  0  0  0  0  0 
GIC-0 101 Edge  virtio0
  6:246  0  0  0  0  0 
GIC-0 102 Edge  virtio1
  7:  2  0  0  0  0  0 
GIC-0 103 Edge  virtio2
  8:210  0  0  0  0  0 
GIC-0  97 Level ttyS0
 13:  0  0  0  0  0  0   
MSI   0 Edge  eth1

> > - The second question is for GICv2m.  If I understand correctly, when
> >   passthrough PCI-e device to guest OS, in the guest OS we should
> >   create below data path for PCI-e devices:
> > ++
> >  -> | Memory |
> > +---++--++---+  /   ++
> > | Net card  | -> | PCI-e controller | -> | IOMMU | -
> > +---++--++---+  \   ++
> >  -> | MSI|
> > | frame  |
> > ++
> > 
> >   Since now the master is network card/PCI-e controller but not CPU,
> >   thus there have no 2 stages for memory accessing (VA->IPA->PA).  In
> >   this case, if we configure IOMMU (SMMU) for guest OS for address
> >   translation before switch from host to guest, right?  Or SMMU also
> >   have two stages memory mapping?
> 
> in your use case you don't have any virtual IOMMU. So the guest programs
> the assigned device with guest physical device and the virtualizer uses
> the physical IOMMU to translate this GPA into host physical address
> backing the guest RAM and the MSI frame. A single stage of the physical
> IOMMU is used (stage1).

Thanks a lot for the explaination.

> >   Another thing confuses me is I can see the MSI frame is mapped to
> >   GIC's physical address in host OS, thus the PCI-e device can send
> >   message correctly to msi frame.  But for guest OS, the MSI frame is
> >   mapped to one IPA memory region, and this region is use to emulate
> >   GICv2 msi frame rather than the hardware msi frame; thus will any
> >   access from PCI-e to this region will trap to hypervisor in CPU
> >   side so KVM hyperviso can help emulate (and inject) the interrupt
> >   for guest OS?
> 
> when the device sends an MSI it uses a host allocated IOVA for the
> physical MSI doorbell. This gets translated by the physical IOMMU,
> reaches the physical doorbell. The physical GICv2m triggers the
> associated physical SPI -> kvm irqfd -> virtual IRQ
> With GICv2M we have direct GSI mapping on guest.

Just want to confirm, in your elaborated flow the virtual IRQ will be
injected by qemu (or kvmtool) for every time but it's not needed to
interfere with IRQ's deactivation, right?

> >   Essentially, I want to check what's the expected behaviour for GICv2
> >   msi frame working mode when we want to passthrough one PCI-e device
> >   to guest OS and the PCI-e device has one static msi frame for it.
> 
> Your config was tested in the past with Seattle (not with sky2 NIC
> though). Adding Robin for the peer to peer potential concern.

Very appreciate for your help.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-13 Thread Leo Yan
On Wed, Mar 13, 2019 at 04:00:48PM +0800, Leo Yan wrote:

[...]

> - The second question is for GICv2m.  If I understand correctly, when
>   passthrough PCI-e device to guest OS, in the guest OS we should
>   create below data path for PCI-e devices:
> ++
>  -> | Memory |
> +---++--++---+  /   ++
> | Net card  | -> | PCI-e controller | -> | IOMMU | -
> +---++--++---+  \   ++
>  -> | MSI|
> | frame  |
> ++
> 
>   Since now the master is network card/PCI-e controller but not CPU,
>   thus there have no 2 stages for memory accessing (VA->IPA->PA).  In
>   this case, if we configure IOMMU (SMMU) for guest OS for address
>   translation before switch from host to guest, right?  Or SMMU also
>   have two stages memory mapping?
> 
>   Another thing confuses me is I can see the MSI frame is mapped to
>   GIC's physical address in host OS, thus the PCI-e device can send
>   message correctly to msi frame.  But for guest OS, the MSI frame is
>   mapped to one IPA memory region, and this region is use to emulate
>   GICv2 msi frame rather than the hardware msi frame; thus will any
>   access from PCI-e to this region will trap to hypervisor in CPU
>   side so KVM hyperviso can help emulate (and inject) the interrupt
>   for guest OS?
> 
>   Essentially, I want to check what's the expected behaviour for GICv2
>   msi frame working mode when we want to passthrough one PCI-e device
>   to guest OS and the PCI-e device has one static msi frame for it.

>From the blog [1], it has below explanation for my question for mapping
IOVA and hardware msi address.  But I searched the flag
VFIO_DMA_FLAG_MSI_RESERVED_IOVA which isn't found in mainline kernel;
I might miss something for this, want to check if related patches have
been merged in the mainline kernel?

'We reuse the VFIO DMA MAP ioctl to pass this reserved IOVA region. A
new flag (VFIO_DMA_FLAG_MSI_RESERVED_IOVA ) is introduced to
differentiate such reserved IOVA from RAM IOVA. Then the base/size of
the window is passed to the IOMMU driver though a new function
introduced in the IOMMU API. 

The IOVA allocation within the supplied reserved IOVA window is
performed on-demand, when the MSI controller composes/writes the MSI
message in the PCIe device. Also the IOMMU mapping between the newly
allocated IOVA and the backdoor address page is done at that time. The
MSI controller uses a new function introduced in the IOMMU API to
allocate the IOVA and create an IOMMU mapping.
 
So there are adaptations needed at VFIO, IOMMU and MSI controller
level. The extension of the IOMMU API still is under discussion. Also
changes at MSI controller level need to be consolidated.'

P.s. I also tried two tools qemu/kvmtool, both cannot pass interrupt
for network card in guest OS.

Thanks,
Leo Yan

[1] https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-13 Thread Leo Yan
Hi Eric & all,

On Mon, Mar 11, 2019 at 10:35:01PM +0800, Leo Yan wrote:

[...]

> So now I made some progress and can see the networking card is
> pass-through to guest OS, though the networking card reports errors
> now.  Below is detailed steps and info:
> 
> - Bind devices in the same IOMMU group to vfio driver:
> 
>   echo :03:00.0 > /sys/bus/pci/devices/\:03\:00.0/driver/unbind
>   echo 1095 3132 > /sys/bus/pci/drivers/vfio-pci/new_id
> 
>   echo :08:00.0 > /sys/bus/pci/devices/\:08\:00.0/driver/unbind
>   echo 11ab 4380 > /sys/bus/pci/drivers/vfio-pci/new_id
> 
> - Enable 'allow_unsafe_interrupts=1' for module vfio_iommu_type1;
> 
> - Use qemu to launch guest OS:
> 
>   qemu-system-aarch64 \
> -cpu host -M virt,accel=kvm -m 4096 -nographic \
> -kernel /root/virt/Image -append root=/dev/vda2 \
> -net none -device vfio-pci,host=08:00.0 \
> -drive if=virtio,file=/root/virt/qemu/debian.img \
> -append 'loglevel=8 root=/dev/vda2 rw console=ttyAMA0 earlyprintk 
> ip=dhcp'
> 
> - Host log:
> 
> [  188.329861] vfio-pci :08:00.0: enabling device ( -> 0003)
> 
> - Below is guest log, from log though the driver has been registered but
>   it reports PCI hardware failure and the timeout for the interrupt.
> 
>   So is this caused by very 'slow' forward interrupt handling?  Juno
>   board uses GICv2 (I think it has GICv2m extension).
> 
> [...]
> 
> [1.024483] sky2 :00:01.0 eth0: enabling interface
> [1.026822] sky2 :00:01.0: error interrupt status=0x8000
> [1.029155] sky2 :00:01.0: PCI hardware error (0x1010)
> [4.000699] sky2 :00:01.0 eth0: Link is up at 1000 Mbps, full duplex, 
> flow control both
> [4.026116] Sending DHCP requests .
> [4.026201] sky2 :00:01.0: error interrupt status=0x8000
> [4.030043] sky2 :00:01.0: PCI hardware error (0x1010)
> [6.546111] ..
> [   14.118106] [ cut here ]
> [   14.120672] NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out
> [   14.123555] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 
> dev_watchdog+0x2b4/0x2c0
> [   14.127082] Modules linked in:
> [   14.128631] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.0.0-rc8-00061-ga98f9a047756-dirty #
> [   14.132800] Hardware name: linux,dummy-virt (DT)
> [   14.135082] pstate: 6005 (nZCv daif -PAN -UAO)
> [   14.137459] pc : dev_watchdog+0x2b4/0x2c0
> [   14.139457] lr : dev_watchdog+0x2b4/0x2c0
> [   14.141351] sp : 10003d70
> [   14.142924] x29: 10003d70 x28: 112f60c0
> [   14.145433] x27: 0140 x26: 8000fa6eb3b8
> [   14.147936] x25:  x24: 8000fa7a7c80
> [   14.150428] x23: 8000fa6eb39c x22: 8000fa6eafb8
> [   14.152934] x21: 8000fa6eb000 x20: 112f7000
> [   14.155437] x19:  x18: 
> [   14.157929] x17:  x16: 
> [   14.160432] x15: 112fd6c8 x14: 90003a97
> [   14.162927] x13: 10003aa5 x12: 11315878
> [   14.165428] x11: 11315000 x10: 05f5e0ff
> [   14.167935] x9 : ffd0 x8 : 64656d6974203020
> [   14.170430] x7 : 6575657571207469 x6 : 00e3
> [   14.172935] x5 :  x4 : 
> [   14.175443] x3 :  x2 : 113158a8
> [   14.177938] x1 : f2db9128b1f08600 x0 : 
> [   14.180443] Call trace:
> [   14.181625]  dev_watchdog+0x2b4/0x2c0
> [   14.183377]  call_timer_fn+0x20/0x78
> [   14.185078]  expire_timers+0xa4/0xb0
> [   14.186777]  run_timer_softirq+0xa0/0x190
> [   14.188687]  __do_softirq+0x108/0x234
> [   14.190428]  irq_exit+0xcc/0xd8
> [   14.191941]  __handle_domain_irq+0x60/0xb8
> [   14.193877]  gic_handle_irq+0x58/0xb0
> [   14.195630]  el1_irq+0xb0/0x128
> [   14.197132]  arch_cpu_idle+0x10/0x18
> [   14.198835]  do_idle+0x1cc/0x288
> [   14.200389]  cpu_startup_entry+0x24/0x28
> [   14.202251]  rest_init+0xd4/0xe0
> [   14.203804]  arch_call_rest_init+0xc/0x14
> [   14.205702]  start_kernel+0x3d8/0x404
> [   14.207449] ---[ end trace 65449acd5c054609 ]---
> [   14.209630] sky2 :00:01.0 eth0: tx timeout
> [   14.211655] sky2 :00:01.0 eth0: transmit ring 0 .. 3 report=0 done=0
> [   17.906956] sky2 :00:01.0 eth0: Link is up at 1000 Mbps, full duplex, 
> flow control both

I am stucking at the network card cannot receive interrupts in guest
OS.  So took time to look into the code and added some printed info to
help me to understand the detailed flow, below are two main questions
I am confused with them and need some guidance:

- The first question is about the msi usage 

Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-11 Thread Leo Yan
Hi Auger,

On Mon, Mar 11, 2019 at 09:23:20AM +0100, Auger Eric wrote:

[...]

> > P.s. I also checked the sysfs node and found it doesn't contain node
> > 'iommu_group':
> > 
> > # ls /sys/bus/pci/devices/\:08\:00.0/iommu_group
> > ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No
> > such file or directory
> 
> please can you give the output of the following command:
> find /sys/kernel/iommu_groups/

I get below result on Juno board:

root@debian:~# find /sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/2007.etr
/sys/kernel/iommu_groups/1/type
/sys/kernel/iommu_groups/1/reserved_regions
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/7ffb.ohci
/sys/kernel/iommu_groups/0/devices/7ffc.ehci
/sys/kernel/iommu_groups/0/type
/sys/kernel/iommu_groups/0/reserved_regions

So the 'iommu_groups' is not created for pci-e devices, right?

Will debug into dt binding and related code and keep posted at here.

> when booting your host without noiommu=true
> 
> At first sight I would say you have trouble with your iommu groups.

Thanks a lot for guidance.
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-11 Thread Leo Yan
On Mon, Mar 11, 2019 at 02:42:48PM +0800, Leo Yan wrote:
> Hi all,
> 
> I am trying to enable PCI-e device pass-through mode with KVM, since
> Juno-r2 board has PCI-e bus so I firstly try to use vfio to
> passthrough the network card on PCI-e bus.

Sorry for spamming, just want to add info for Linux kernel version.

I am using Linux mainline kernel 5.0-rc7 with the latest commit:

commit a215ce8f0e00c2d707080236f1aafec337371043 (origin/master, origin/HEAD)
Merge: 2d28e01dca1a cffaaf0c8162
Author: Linus Torvalds 
Date:   Fri Mar 1 09:13:04 2019 -0800

Merge tag 'iommu-fix-v5.0-rc8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "One important fix for a memory corruption issue in the Intel VT-d
  driver that triggers on hardware with deep PCI hierarchies"

* tag 'iommu-fix-v5.0-rc8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/dmar: Fix buffer overflow during PCI bus notification

[...]

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-11 Thread Leo Yan
Hi all,

I am trying to enable PCI-e device pass-through mode with KVM, since
Juno-r2 board has PCI-e bus so I firstly try to use vfio to
passthrough the network card on PCI-e bus.

According to Juno-r2 board TRM [1], there has a CoreLink MMU-401 (SMMU)
between PCI-e devices and CCI bus; IIUC, PCI-e device and the SMMU can
be used for vfio for address isolation and from hardware pespective it
is sufficient for support pass-through mode.

I followed Eric's blog [2] for 'VFIO-PCI driver binding', so I
executed blow commands on Juno-r2 board:

  echo vfio-pci > /sys/bus/pci/devices/\:08\:00.0/driver_override
  echo :08:00.0 > /sys/bus/pci/drivers/sky2/unbind
  echo :08:00.0 > /sys/bus/pci/drivers_probe

But at the last command for vifo probing, it reports failure as below:

[   21.553889] sky2 :08:00.0 enp8s0: disabling interface
[   21.616720] vfio-pci: probe of :08:00.0 failed with error -22

I looked into for the code, though 'dev->bus->iommu_ops' points to the
data structure 'arm_smmu_ops', but 'dev->iommu_group' is NULL thus the
probe function returns failure with below flow:

  vfio_pci_probe()
`-> vfio_iommu_group_get()
  `-> iommu_group_get()
`-> return NULL;

Alternatively, if enable the kconfig CONFIG_VFIO_NOIOMMU & set global
variable 'noiommu' = true, the probe function still returns error; since
the function iommu_present(dev->bus) return back 'arm_smmu_ops' so you
could see the code will run into below logic:

vfio_iommu_group_get()
{
group = iommu_group_get(dev);

#ifdef CONFIG_VFIO_NOIOMMU

/*
 * With noiommu enabled, an IOMMU group will be created for a device
 * that doesn't already have one and doesn't have an iommu_ops on their
 * bus.  We set iommudata simply to be able to identify these groups
 * as special use and for reclamation later.
 */
if (group || !noiommu || iommu_present(dev->bus))
return group;==> return 'group' and 'group' is NULL

[...]
}

So either using SMMU or with kernel config CONFIG_VFIO_NOIOMMU, both cannot
bind vifo driver for network card device on Juno-r2 board.

P.s. I also checked the sysfs node and found it doesn't contain node
'iommu_group':

# ls /sys/bus/pci/devices/\:08\:00.0/iommu_group
ls: cannot access '/sys/bus/pci/devices/:08:00.0/iommu_group': No
such file or directory

Could you give some suggestions for this so that I can proceed?  Very
appreciate for any comment.

Thanks,
Leo Yan

[1] 
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0515f/DDI0515F_juno_arm_development_platform_soc_trm.pdf
[2] https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 2/4] KVM: arm/arm64: vgic: Improve comment on kvm_vgic_inject_irq

2019-02-26 Thread Leo Yan
On Fri, Feb 22, 2019 at 03:40:50PM +, Marc Zyngier wrote:

[...]

> > > The interrupt affinity is either defined by the distributor
> > > configuration (SPIs) or the ITS configuration (LPIs).
> > 
> > Given to the up example, I am struggling to understand how you can set
> > the interrupt affinity for virtio device.
> > 
> > Do you set the physical interrupt affinity to CPU0/1 in host OS and
> > forward it to guest OS?  Or set interrupt affinity in guest OS (I
> > tried on Juno board to set irq affinity in guest OS from
> > '/proc/irq/xxx/smp_affinity' but failed)?   Or this is accomplished by
> > user space tool (lkvm or qemu)?
> 
> virtio interrupts are purely virtual, and the host plays no part in
> their routing (nor does userspace). As for their affinity, that
> depends on the virtio driver. Some virtio devices allow their affinity
> to be changed, some don't. Here, this is a virtio-net device, which is
> perfectly happy to see its queue interrupts moved to a different vcpu.
> 
> I tend to run irqbalance in my guests so that it actually exercises
> the affinity setting in the background.
> 
> > Sorry if I am asking a stupid question :)
> 
> It's not stupid. You're simply confusing multiple independent layers.

Thanks a lot for explaination, Marc.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 0/4] ARM64/KVM: Minor cleanup and refactoring

2019-02-22 Thread Leo Yan
When I read KVM/ARM64 related code I cannot find any issue or something
could be improved for performance; so this patch series is only for
minor cleaning up and refactoring code.

Hope this is helpful and can be picked up into mainline kernel,
otherwise it's also fine for me if the maintainer or main developers
could note these points and fix them in appropriate time.


Leo Yan (4):
  KVM: arm64: Use macro to replace hard number
  KVM: arm/arm64: vgic: Improve comment on kvm_vgic_inject_irq
  KVM: arm/arm64: Define TCR_EL2_T0SZ_MASK as TCR_T0SZ_MASK
  KVM: arm/arm64: Fix comment on create_hyp_mappings()

 arch/arm64/include/asm/kvm_arm.h | 2 +-
 arch/arm64/kernel/head.S | 2 +-
 virt/kvm/arm/mmu.c   | 2 +-
 virt/kvm/arm/vgic/vgic.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 2/4] KVM: arm/arm64: vgic: Improve comment on kvm_vgic_inject_irq

2019-02-22 Thread Leo Yan
Hi Marc,

On Fri, Feb 22, 2019 at 08:37:56AM +, Marc Zyngier wrote:
> On Fri, 22 Feb 2019 16:23:24 +0800
> Leo Yan  wrote:
> 
> > The function kvm_vgic_inject_irq() is not only used by PPIs but also can
> > be used to inject interrupt for SPIs; this patch improves comment for
> > argument @cpuid to reflect support SPIs as well.
> > 
> > Signed-off-by: Leo Yan 
> > ---
> >  virt/kvm/arm/vgic/vgic.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 7cfdfbc910e0..79fe64c15051 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -394,7 +394,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> > vgic_irq *irq,
> >  /**
> >   * kvm_vgic_inject_irq - Inject an IRQ from a device to the vgic
> >   * @kvm: The VM structure pointer
> > - * @cpuid:   The CPU for PPIs
> > + * @cpuid:   The CPU for PPIs and SPIs
> >   * @intid:   The INTID to inject a new state to.
> >   * @level:   Edge-triggered:  true:  to trigger the interrupt
> >   *   false: to ignore the call
> 
> What does the CPU mean for SPIs? By definition, the routing of an SPI
> is defined by the distributor configuration.

In the code, KVM injects PPIs by specifying CPU id, so that every PPI
is bound to specific target CPU.  But for SPIs, it always pass '0' for
cpuid, from my understanding this means VM will set interrupt affinity
to VCPU0 by default; in theory we also can set different cpuid for
SPIs so that the SPIs also can be handled by other secondary VCPUs;
this is why I think @cpuid also can be used by SPIs.

> And what about LPIs? SGIs?

TBH, I don't know LPIs and didn't use it before.

For SGIs, I read the code, it uses different path to handle SGI in KVM
so kvm_vgic_inject_irq() is not used for SGIs.

> I'm afraid you've misunderstood what cpuid is for.

Thanks for guidance.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 1/4] KVM: arm64: Use macro to replace hard number

2019-02-22 Thread Leo Yan
Use macro for ID_AA64MMFR1_EL1.VH bits shift instead of 8 directly.

Signed-off-by: Leo Yan 
---
 arch/arm64/kernel/head.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..3ac377e9fd28 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -490,7 +490,7 @@ ENTRY(el2_setup)
 * kernel is intended to run at EL2.
 */
mrs x2, id_aa64mmfr1_el1
-   ubfxx2, x2, #8, #4
+   ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
 #else
mov x2, xzr
 #endif
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 4/4] KVM: arm/arm64: Fix comment on create_hyp_mappings()

2019-02-22 Thread Leo Yan
Since commit f7bec68d2fae ("arm/arm64: KVM: Prune unused #defines"), the
macro HYP_PAGE_OFFSET has been removed, but it's kept in the comment for
the function create_hyp_mappings().

This patch uses PAGE_OFFSET to replace HYP_PAGE_OFFSET and this is
consistent with what is doing in the function.

Signed-off-by: Leo Yan 
---
 virt/kvm/arm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5eca48bdb1a6..9634f47c2c8b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -734,7 +734,7 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
  * @prot:  The protection to be applied to this range
  *
  * The same virtual address as the kernel virtual address is also used
- * in Hyp-mode mapping (modulo HYP_PAGE_OFFSET) to the same underlying
+ * in Hyp-mode mapping (modulo PAGE_OFFSET) to the same underlying
  * physical pages.
  */
 int create_hyp_mappings(void *from, void *to, pgprot_t prot)
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 2/4] KVM: arm/arm64: vgic: Improve comment on kvm_vgic_inject_irq

2019-02-22 Thread Leo Yan
The function kvm_vgic_inject_irq() is not only used by PPIs but also can
be used to inject interrupt for SPIs; this patch improves comment for
argument @cpuid to reflect support SPIs as well.

Signed-off-by: Leo Yan 
---
 virt/kvm/arm/vgic/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 7cfdfbc910e0..79fe64c15051 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -394,7 +394,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq 
*irq,
 /**
  * kvm_vgic_inject_irq - Inject an IRQ from a device to the vgic
  * @kvm: The VM structure pointer
- * @cpuid:   The CPU for PPIs
+ * @cpuid:   The CPU for PPIs and SPIs
  * @intid:   The INTID to inject a new state to.
  * @level:   Edge-triggered:  true:  to trigger the interrupt
  *   false: to ignore the call
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 3/4] KVM: arm/arm64: Define TCR_EL2_T0SZ_MASK as TCR_T0SZ_MASK

2019-02-22 Thread Leo Yan
Define macro TCR_EL2_T0SZ_MASK as TCR_T0SZ_MASK, so can remove the hard
number 0x3f.

Signed-off-by: Leo Yan 
---
 arch/arm64/include/asm/kvm_arm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6f602af5263c..d945a787f36e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -99,7 +99,7 @@
 #define TCR_EL2_SH0_MASK   TCR_SH0_MASK
 #define TCR_EL2_ORGN0_MASK TCR_ORGN0_MASK
 #define TCR_EL2_IRGN0_MASK TCR_IRGN0_MASK
-#define TCR_EL2_T0SZ_MASK  0x3f
+#define TCR_EL2_T0SZ_MASK  TCR_T0SZ_MASK
 #define TCR_EL2_MASK   (TCR_EL2_TG0_MASK | TCR_EL2_SH0_MASK | \
 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | 
TCR_EL2_T0SZ_MASK)
 
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 2/4] KVM: arm/arm64: vgic: Improve comment on kvm_vgic_inject_irq

2019-02-22 Thread Leo Yan
On Fri, Feb 22, 2019 at 09:39:23AM +, Marc Zyngier wrote:

[...]

> > > > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > > > index 7cfdfbc910e0..79fe64c15051 100644
> > > > --- a/virt/kvm/arm/vgic/vgic.c
> > > > +++ b/virt/kvm/arm/vgic/vgic.c
> > > > @@ -394,7 +394,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct 
> > > > vgic_irq *irq,
> > > >  /**
> > > >   * kvm_vgic_inject_irq - Inject an IRQ from a device to the vgic
> > > >   * @kvm: The VM structure pointer
> > > > - * @cpuid:   The CPU for PPIs
> > > > + * @cpuid:   The CPU for PPIs and SPIs
> > > >   * @intid:   The INTID to inject a new state to.
> > > >   * @level:   Edge-triggered:  true:  to trigger the interrupt
> > > >   *   false: to ignore the call  
> > > 
> > > What does the CPU mean for SPIs? By definition, the routing of an SPI
> > > is defined by the distributor configuration.  
> > 
> > In the code, KVM injects PPIs by specifying CPU id, so that every PPI
> > is bound to specific target CPU.  But for SPIs, it always pass '0' for
> > cpuid, from my understanding this means VM will set interrupt affinity
> > to VCPU0 by default; in theory we also can set different cpuid for
> > SPIs so that the SPIs also can be handled by other secondary VCPUs;
> > this is why I think @cpuid also can be used by SPIs.
> 
> SPIs are not hardcoded to vcpu0. This would be a gross violation of the
> architecture. To convince yourself of this, just run a guest:
> 
> root@unassigned-hostname:~# cat /proc/interrupts 
>CPU0   CPU1   
>   2:   7315   7353 GIC-0  27 Level arch_timer
>   4:158  0 GIC-0  33 Level uart-pl011
>  42:  0  0 GIC-0  23 Level arm-pmu
>  43:  0  0 pl061   3 Edge  ACPI:Event
>  44:  0  0   MSI 32768 Edge  virtio1-config
>  45:  10476  0   MSI 32769 Edge  virtio1-req.0
>  46:  0  0   MSI 16384 Edge  virtio0-config
>  47:  3 10   MSI 16385 Edge  virtio0-input.0
> [...]
> 
> On this last line, you can see an SPI being routed to both of these
> vcpus.
> 
> I urge you to read the code further, and understand that for any other
> interrupt class, the cpuid parameter is *ignored*. Yes, we pass zero in
> that case. We could also pass an approximation of PI with the same
> effect.

Very appreciate for the elaborated example; will read the code
furthermore.

> The interrupt affinity is either defined by the distributor
> configuration (SPIs) or the ITS configuration (LPIs).

Given to the up example, I am struggling to understand how you can set
the interrupt affinity for virtio device.

Do you set the physical interrupt affinity to CPU0/1 in host OS and
forward it to guest OS?  Or set interrupt affinity in guest OS (I
tried on Juno board to set irq affinity in guest OS from
'/proc/irq/xxx/smp_affinity' but failed)?   Or this is accomplished by
user space tool (lkvm or qemu)?

Sorry if I am asking a stupid question :)

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm