Re: [RFC PATCH v0 1/2] spapr: Add H_REG_SNS hcall
On Mon, Aug 09, 2021 at 01:49:54PM +1000, David Gibson wrote: > On Thu, Aug 05, 2021 at 01:02:27PM +0530, Bharata B Rao wrote: > > Add support for H_REG_SNS hcall so that asynchronous page > > fault mechanism can be supported on PowerKVM guests. > > > > This hcall essentially issues KVM_PPC_SET_SNS to let the > > host map and pin the memory containing the Subvention > > Notification Structure. It also claims SPAPR_IRQ_SNS to > > be used as subvention notification interrupt. > > > > Note: Updates to linux-headers/linux/kvm.h are temporary > > pending headers update. > > > > Signed-off-by: Bharata B Rao > > --- > > hw/ppc/spapr.c | 3 ++ > > hw/ppc/spapr_hcall.c| 56 + > > include/hw/ppc/spapr.h | 3 ++ > > include/hw/ppc/spapr_irq.h | 1 + > > linux-headers/asm-powerpc/kvm.h | 6 > > linux-headers/linux/kvm.h | 1 + > > target/ppc/kvm.c| 14 + > > target/ppc/kvm_ppc.h| 10 ++ > > 8 files changed, 94 insertions(+) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 81699d4f8b..5f1f75826d 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -2830,6 +2830,9 @@ static void spapr_machine_init(MachineState *machine) > > > > /* Enable H_PAGE_INIT */ > > kvmppc_enable_h_page_init(); > > + > > +/* Enable H_REG_SNS */ > > +kvmppc_enable_h_reg_sns(); > > } > > > > /* map RAM */ > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > index 0e9a5b2e40..957edecb13 100644 > > --- a/hw/ppc/spapr_hcall.c > > +++ b/hw/ppc/spapr_hcall.c > > @@ -1405,6 +1405,59 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, > > SpaprMachineState *spapr, > > return H_SUCCESS; > > } > > > > +static target_ulong deregister_sns(PowerPCCPU *cpu, SpaprMachineState > > *spapr) > > +{ > > +spapr->sns_addr = -1; > > +spapr->sns_len = 0; > > +spapr_irq_free(spapr, SPAPR_IRQ_SNS, 1); > > + > > +return H_SUCCESS; > > +} > > + > > +static target_ulong h_reg_sns(PowerPCCPU *cpu, SpaprMachineState *spapr, > > + target_ulong opcode, target_ulong *args) > > +{ > > +target_ulong addr = args[0]; > > +target_ulong len = args[1]; > > + > > +if (addr == -1) { > > +return deregister_sns(cpu, spapr); > > +} > > + > > +/* > > + * If SNS area is already registered, can't register again before > > + * deregistering it first. > > + */ > > +if (spapr->sns_addr == -1) { > > As Fabiano says, it looks like the logic is reversed here. Correct. > > > +return H_PARAMETER; > > Also, H_PARAMETER doesn't seem like the right error for this case. > H_BUSY, maybe? Yes we may return H_BUSY. > > > +} > > + > > +if (!QEMU_IS_ALIGNED(addr, 4096)) { > > What's the significance of 4096 here? Should this be one of the page > size defines instead? PAPR specifies this alignment. "If the Address parameter is not 4K aligned in the valid logical address space of the caller, then return H_Parameter." > > > +return H_PARAMETER; > > +} > > + > > +if (len < 256) { > > A defined constant (SPAPR_MIN_SNS_SIZE?) would be worthwhile here, I think. Yes. > > > +return H_P2; > > +} > > + > > +/* TODO: SNS area is not allowed to cross a page boundary */ > > + > > +/* KVM_PPC_SET_SNS ioctl */ > > +if (kvmppc_set_sns_reg(addr, len)) { > > What will happen if you attempt this on a TCG system? We should have a variant of kvmppc_set_sns_reg() for TCG that returns error, so that this hcall can fail. > > > +return H_PARAMETER; > > +} > > + > > +/* Record SNS addr and len */ > > +spapr->sns_addr = addr; > > +spapr->sns_len = len; > > + > > +/* Register irq source for sending ESN notification */ > > +spapr_irq_claim(spapr, SPAPR_IRQ_SNS, false, &error_fatal); > > I don't think &error_fatal can be right here. AFAICT this must be one > of two cases: >1) This should never fail, no matter what the guest does. If it > does fail, that indicates a qemu bug. In that case &error_abort is > more appropriate >2) This could fail for certain sequences of guest actions. If > that&
Re: [RFC PATCH v0 2/2] ppc,spapr: Handle KVM_EXIT_ESN
On Thu, Aug 05, 2021 at 09:48:04AM +0200, Laurent Vivier wrote: > On 05/08/2021 09:32, Bharata B Rao wrote: > > Handle KVM_EXIT_ESN exit by issuing subvention notification > > interrupt to the guest. Guests supporting async-pf feature > > will need this interrupt to wake up tasks that are waiting > > for the expropriated pages to be available. > > > > Note: Updates to linux-headers/linux/kvm.h are temporary > > pending headers update. > > > > Signed-off-by: Bharata B Rao > > --- > > linux-headers/linux/kvm.h | 1 + > > target/ppc/kvm.c | 16 > > 2 files changed, 17 insertions(+) > ... > > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c > > index 330985c8a0..6bf3f06b88 100644 > > --- a/target/ppc/kvm.c > > +++ b/target/ppc/kvm.c > ... > > @@ -1657,6 +1658,16 @@ static int kvm_handle_debug(PowerPCCPU *cpu, struct > > kvm_run *run) > > return DEBUG_RETURN_GUEST; > > } > > > > +#if defined(TARGET_PPC64) > > +static void kvmppc_handle_esn(PowerPCCPU *cpu) > > +{ > > +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); > > + > > +fprintf(stderr, "%s: ESN exit\n", __func__); > > Do you keep this fprintf() on purpose? Not really, just that it survived the pre-post cleanup :-( Regards, Bharata.
[RFC PATCH v0 0/2] Support for H_REG_SNS hcall
Add support for H_REG_SNS hcall which will be used by the guest to make use of Expropriation/Subvention Notification option aka asynchronous page fault support. The kernel enablement patches are posted here: https://lore.kernel.org/linuxppc-dev/20210805072439.501481-1-bhar...@linux.ibm.com/T/#t Bharata B Rao (2): spapr: Add H_REG_SNS hcall ppc,spapr: Handle KVM_EXIT_ESN hw/ppc/spapr.c | 3 ++ hw/ppc/spapr_hcall.c| 56 + include/hw/ppc/spapr.h | 3 ++ include/hw/ppc/spapr_irq.h | 1 + linux-headers/asm-powerpc/kvm.h | 6 linux-headers/linux/kvm.h | 2 ++ target/ppc/kvm.c| 30 ++ target/ppc/kvm_ppc.h| 10 ++ 8 files changed, 111 insertions(+) -- 2.31.1
[RFC PATCH v0 1/2] spapr: Add H_REG_SNS hcall
Add support for H_REG_SNS hcall so that asynchronous page fault mechanism can be supported on PowerKVM guests. This hcall essentially issues KVM_PPC_SET_SNS to let the host map and pin the memory containing the Subvention Notification Structure. It also claims SPAPR_IRQ_SNS to be used as subvention notification interrupt. Note: Updates to linux-headers/linux/kvm.h are temporary pending headers update. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 3 ++ hw/ppc/spapr_hcall.c| 56 + include/hw/ppc/spapr.h | 3 ++ include/hw/ppc/spapr_irq.h | 1 + linux-headers/asm-powerpc/kvm.h | 6 linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c| 14 + target/ppc/kvm_ppc.h| 10 ++ 8 files changed, 94 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 81699d4f8b..5f1f75826d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2830,6 +2830,9 @@ static void spapr_machine_init(MachineState *machine) /* Enable H_PAGE_INIT */ kvmppc_enable_h_page_init(); + +/* Enable H_REG_SNS */ +kvmppc_enable_h_reg_sns(); } /* map RAM */ diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 0e9a5b2e40..957edecb13 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -1405,6 +1405,59 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr, return H_SUCCESS; } +static target_ulong deregister_sns(PowerPCCPU *cpu, SpaprMachineState *spapr) +{ +spapr->sns_addr = -1; +spapr->sns_len = 0; +spapr_irq_free(spapr, SPAPR_IRQ_SNS, 1); + +return H_SUCCESS; +} + +static target_ulong h_reg_sns(PowerPCCPU *cpu, SpaprMachineState *spapr, + target_ulong opcode, target_ulong *args) +{ +target_ulong addr = args[0]; +target_ulong len = args[1]; + +if (addr == -1) { +return deregister_sns(cpu, spapr); +} + +/* + * If SNS area is already registered, can't register again before + * deregistering it first. + */ +if (spapr->sns_addr == -1) { +return H_PARAMETER; +} + +if (!QEMU_IS_ALIGNED(addr, 4096)) { +return H_PARAMETER; +} + +if (len < 256) { +return H_P2; +} + +/* TODO: SNS area is not allowed to cross a page boundary */ + +/* KVM_PPC_SET_SNS ioctl */ +if (kvmppc_set_sns_reg(addr, len)) { +return H_PARAMETER; +} + +/* Record SNS addr and len */ +spapr->sns_addr = addr; +spapr->sns_len = len; + +/* Register irq source for sending ESN notification */ +spapr_irq_claim(spapr, SPAPR_IRQ_SNS, false, &error_fatal); +args[1] = SPAPR_IRQ_SNS; /* irq no in R5 */ + +return H_SUCCESS; +} + static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1]; static spapr_hcall_fn svm_hypercall_table[(SVM_HCALL_MAX - SVM_HCALL_BASE) / 4 + 1]; @@ -1545,6 +1598,9 @@ static void hypercall_register_types(void) spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support); spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt); + +/* SNS memory area registration */ +spapr_register_hypercall(H_REG_SNS, h_reg_sns); } type_init(hypercall_register_types) diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 637652ad16..934f9e066e 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -252,6 +252,8 @@ struct SpaprMachineState { uint32_t numa_assoc_array[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE]; Error *fwnmi_migration_blocker; +uint64_t sns_addr; +uint64_t sns_len; }; #define H_SUCCESS 0 @@ -549,6 +551,7 @@ struct SpaprMachineState { #define H_SCM_UNBIND_MEM0x3F0 #define H_SCM_UNBIND_ALL0x3FC #define H_SCM_HEALTH0x400 +#define H_REG_SNS 0x41C #define H_RPT_INVALIDATE0x448 #define MAX_HCALL_OPCODEH_RPT_INVALIDATE diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index c22a72c9e2..26c680f065 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -21,6 +21,7 @@ #define SPAPR_XIRQ_BASE XICS_IRQ_BASE /* 0x1000 */ #define SPAPR_IRQ_EPOW (SPAPR_XIRQ_BASE + 0x) #define SPAPR_IRQ_HOTPLUG(SPAPR_XIRQ_BASE + 0x0001) +#define SPAPR_IRQ_SNS(SPAPR_XIRQ_BASE + 0x0002) #define SPAPR_IRQ_VIO(SPAPR_XIRQ_BASE + 0x0100) /* 256 VIO devices */ #define SPAPR_IRQ_PCI_LSI(SPAPR_XIRQ_BASE + 0x0200) /* 32+ PHBs devices */ diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 9f18fa090f..d72739126a 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -470,6 +470,12 @@ struct kvm_ppc_cpu_char { #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_B
[RFC PATCH v0 2/2] ppc,spapr: Handle KVM_EXIT_ESN
Handle KVM_EXIT_ESN exit by issuing subvention notification interrupt to the guest. Guests supporting async-pf feature will need this interrupt to wake up tasks that are waiting for the expropriated pages to be available. Note: Updates to linux-headers/linux/kvm.h are temporary pending headers update. Signed-off-by: Bharata B Rao --- linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 16 2 files changed, 17 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index a76945fcbc..105c8b069a 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -269,6 +269,7 @@ struct kvm_xen_exit { #define KVM_EXIT_AP_RESET_HOLD32 #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 +#define KVM_EXIT_ESN 35 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 330985c8a0..6bf3f06b88 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -38,6 +38,7 @@ #include "hw/ppc/spapr_cpu_core.h" #include "hw/hw.h" #include "hw/ppc/ppc.h" +#include "hw/irq.h" #include "migration/qemu-file-types.h" #include "sysemu/watchdog.h" #include "trace.h" @@ -1657,6 +1658,16 @@ static int kvm_handle_debug(PowerPCCPU *cpu, struct kvm_run *run) return DEBUG_RETURN_GUEST; } +#if defined(TARGET_PPC64) +static void kvmppc_handle_esn(PowerPCCPU *cpu) +{ +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); + +fprintf(stderr, "%s: ESN exit\n", __func__); +qemu_irq_pulse(spapr_qirq(spapr, SPAPR_IRQ_SNS)); +} +#endif + int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) { PowerPCCPU *cpu = POWERPC_CPU(cs); @@ -1687,6 +1698,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) run->papr_hcall.args); ret = 0; break; + +case KVM_EXIT_ESN: +kvmppc_handle_esn(cpu); +ret = 0; +break; #endif case KVM_EXIT_EPR: trace_kvm_handle_epr(); -- 2.31.1
[PATCH v1 2/2] target/ppc: Support for H_RPT_INVALIDATE hcall
If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then - indicate the availability of H_RPT_INVALIDATE hcall to the guest via ibm,hypertas-functions property. - Enable the hcall Both the above are done only if the new sPAPR machine capability cap-rpt-invalidate is set. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 6 ++ hw/ppc/spapr_caps.c| 41 + include/hw/ppc/spapr.h | 8 ++-- target/ppc/kvm.c | 12 target/ppc/kvm_ppc.h | 12 5 files changed, 77 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 4dd90b75cc..2b96b03673 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -880,6 +880,10 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) add_str(hypertas, "hcall-copy"); add_str(hypertas, "hcall-debug"); add_str(hypertas, "hcall-vphn"); +if (spapr_get_cap(spapr, SPAPR_CAP_RPT_INVALIDATE) == SPAPR_CAP_ON) { +add_str(hypertas, "hcall-rpt-invalidate"); +} + add_str(qemu_hypertas, "hcall-memop1"); if (!kvm_enabled() || kvmppc_spapr_use_multitce()) { @@ -2018,6 +2022,7 @@ static const VMStateDescription vmstate_spapr = { &vmstate_spapr_cap_ccf_assist, &vmstate_spapr_cap_fwnmi, &vmstate_spapr_fwnmi, +&vmstate_spapr_cap_rpt_invalidate, NULL } }; @@ -4573,6 +4578,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON; smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON; smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON; +smc->default_caps.caps[SPAPR_CAP_RPT_INVALIDATE] = SPAPR_CAP_OFF; spapr_caps_add_properties(smc); smc->irq = &spapr_irq_dual; smc->dr_phb_enabled = true; diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c index d0c419b392..ed7c077a0d 100644 --- a/hw/ppc/spapr_caps.c +++ b/hw/ppc/spapr_caps.c @@ -582,6 +582,37 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, uint8_t val, } } +static void cap_rpt_invalidate_apply(SpaprMachineState *spapr, + uint8_t val, Error **errp) +{ +ERRP_GUARD(); + +if (!val) { +/* capability disabled by default */ +return; +} + +if (tcg_enabled()) { +error_setg(errp, "No H_RPT_INVALIDATE support in TCG"); +error_append_hint(errp, + "Try appending -machine cap-rpt-invalidate=off\n"); +} else if (kvm_enabled()) { +if (!kvmppc_has_cap_mmu_radix()) { +error_setg(errp, "H_RPT_INVALIDATE only supported on Radix"); +return; +} + +if (!kvmppc_has_cap_rpt_invalidate()) { +error_setg(errp, + "KVM implementation does not support H_RPT_INVALIDATE"); +error_append_hint(errp, + "Try appending -machine cap-rpt-invalidate=off\n"); +} else { +kvmppc_enable_h_rpt_invalidate(); +} +} +} + SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = { [SPAPR_CAP_HTM] = { .name = "htm", @@ -690,6 +721,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = { .type = "bool", .apply = cap_fwnmi_apply, }, +[SPAPR_CAP_RPT_INVALIDATE] = { +.name = "rpt-invalidate", +.description = "Allow H_RPT_INVALIDATE", +.index = SPAPR_CAP_RPT_INVALIDATE, +.get = spapr_cap_get_bool, +.set = spapr_cap_set_bool, +.type = "bool", +.apply = cap_rpt_invalidate_apply, +}, }; static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr, @@ -830,6 +870,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV); SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER); SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST); SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI); +SPAPR_CAP_MIG_STATE(rpt_invalidate, SPAPR_CAP_RPT_INVALIDATE); void spapr_caps_init(SpaprMachineState *spapr) { diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index f05219f75e..b18d407c1a 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -74,8 +74,10 @@ typedef enum { #define SPAPR_CAP_CCF_ASSIST0x09 /* Implements PAPR FWNMI option */ #define SPAPR_CAP_FWNMI 0x0A +/* Support H_RPT_INVALIDATE */ +#define SPAPR_CAP_RPT_INVALIDATE0x0B /* Num Caps */ -#define SPAPR_CAP_NUM (SPAPR_CAP_FWNMI + 1) +#define SPAPR_CAP_NUM (SPAPR_CAP_RPT_INVALIDATE + 1) /* * Capability Values @@ -542,8 +544,9 @@ struct SpaprMachineState { #define H_SCM_UNBIND_MEM0x3F0 #define H_SCM_UNBIND_ALL
[PATCH v1 1/2] linux-headers: Update
Update to mainline commit: 79160a603bdb ("Merge tag 'usb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb" Signed-off-by: Bharata B Rao --- include/standard-headers/asm-x86/kvm_para.h | 13 +++ include/standard-headers/drm/drm_fourcc.h | 7 ++ include/standard-headers/linux/ethtool.h | 4 +- .../linux/input-event-codes.h | 1 + include/standard-headers/linux/virtio_ids.h | 2 +- include/standard-headers/linux/virtio_vsock.h | 9 ++ linux-headers/asm-arm64/kvm.h | 11 ++ linux-headers/asm-generic/mman-common.h | 3 + linux-headers/asm-generic/unistd.h| 4 +- linux-headers/asm-mips/mman.h | 3 + linux-headers/asm-mips/unistd_n32.h | 1 + linux-headers/asm-mips/unistd_n64.h | 1 + linux-headers/asm-mips/unistd_o32.h | 1 + linux-headers/asm-powerpc/unistd_32.h | 1 + linux-headers/asm-powerpc/unistd_64.h | 1 + linux-headers/asm-s390/unistd_32.h| 1 + linux-headers/asm-s390/unistd_64.h| 1 + linux-headers/asm-x86/kvm.h | 13 +++ linux-headers/asm-x86/unistd_32.h | 7 +- linux-headers/asm-x86/unistd_64.h | 7 +- linux-headers/asm-x86/unistd_x32.h| 7 +- linux-headers/linux/kvm.h | 105 ++ linux-headers/linux/userfaultfd.h | 11 +- 23 files changed, 197 insertions(+), 17 deletions(-) diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h index 215d01b4ec..204cfb8640 100644 --- a/include/standard-headers/asm-x86/kvm_para.h +++ b/include/standard-headers/asm-x86/kvm_para.h @@ -33,6 +33,8 @@ #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 #define KVM_FEATURE_MSI_EXT_DEST_ID15 +#define KVM_FEATURE_HC_MAP_GPA_RANGE 16 +#define KVM_FEATURE_MIGRATION_CONTROL 17 #define KVM_HINTS_REALTIME 0 @@ -54,6 +56,7 @@ #define MSR_KVM_POLL_CONTROL 0x4b564d05 #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 +#define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 struct kvm_steal_time { uint64_t steal; @@ -90,6 +93,16 @@ struct kvm_clock_pairing { /* MSR_KVM_ASYNC_PF_INT */ #define KVM_ASYNC_PF_VEC_MASK GENMASK(7, 0) +/* MSR_KVM_MIGRATION_CONTROL */ +#define KVM_MIGRATION_READY(1 << 0) + +/* KVM_HC_MAP_GPA_RANGE */ +#define KVM_MAP_GPA_RANGE_PAGE_SZ_4K 0 +#define KVM_MAP_GPA_RANGE_PAGE_SZ_2M (1 << 0) +#define KVM_MAP_GPA_RANGE_PAGE_SZ_1G (1 << 1) +#define KVM_MAP_GPA_RANGE_ENC_STAT(n) (n << 4) +#define KVM_MAP_GPA_RANGE_ENCRYPTEDKVM_MAP_GPA_RANGE_ENC_STAT(1) +#define KVM_MAP_GPA_RANGE_DECRYPTEDKVM_MAP_GPA_RANGE_ENC_STAT(0) /* Operations for KVM_HC_MMU_OP */ #define KVM_MMU_OP_WRITE_PTE1 diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index a61ae520c2..352b51fd0a 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -167,6 +167,13 @@ extern "C" { #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* [31:0] R:G:B:A 10:10:10:2 little endian */ #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* [31:0] B:G:R:A 10:10:10:2 little endian */ +/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB16161616fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR16161616fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */ + +#define DRM_FORMAT_ARGB16161616fourcc_code('A', 'R', '4', '8') /* [63:0] A:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_ABGR16161616fourcc_code('A', 'B', '4', '8') /* [63:0] A:B:G:R 16:16:16:16 little endian */ + /* * Floating point 64bpp RGB * IEEE 754-2008 binary16 half-precision float diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h index 218d944a17..053d3fafdf 100644 --- a/include/standard-headers/linux/ethtool.h +++ b/include/standard-headers/linux/ethtool.h @@ -233,7 +233,7 @@ enum tunable_id { ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ /* * Add your fresh new tunable attribute above and remember to update -* tunable_strings[] in net/core/ethtool.c +* tunable_strings[] in net/ethtool/common.c */ __ETHTOOL_TUNABLE_COUNT, }; @@ -297,7 +297,7 @@ enum phy_tunable_id { ETHTOOL_PHY_EDPD, /* * Add your fresh new phy tunable attribu
[PATCH v1 0/2] Enable support for H_RPT_INVALIDATE hcall
Hi, This series enables the support for H_RPT_INVALIDATE hcall which was added to upstream kernel recently. It adds a new sPAPR machine capability cap-rpt-invalidate and if it is set by the user, checks for the availability of H_RPT_INVALIDATE hcall in the hypervisor (via KVM_CAP_RPT_INVALIDATE KVM capability) and enables the same. Headers update is needed for KVM_CAP_PPC_RPT_INVALIDATE. v0: https://lore.kernel.org/qemu-devel/20210106085910.2200795-1-bhar...@linux.ibm.com/ Bharata B Rao (2): linux-headers: Update target/ppc: Support for H_RPT_INVALIDATE hcall hw/ppc/spapr.c| 6 + hw/ppc/spapr_caps.c | 41 +++ include/hw/ppc/spapr.h| 8 +- include/standard-headers/asm-x86/kvm_para.h | 13 +++ include/standard-headers/drm/drm_fourcc.h | 7 ++ include/standard-headers/linux/ethtool.h | 4 +- .../linux/input-event-codes.h | 1 + include/standard-headers/linux/virtio_ids.h | 2 +- include/standard-headers/linux/virtio_vsock.h | 9 ++ linux-headers/asm-arm64/kvm.h | 11 ++ linux-headers/asm-generic/mman-common.h | 3 + linux-headers/asm-generic/unistd.h| 4 +- linux-headers/asm-mips/mman.h | 3 + linux-headers/asm-mips/unistd_n32.h | 1 + linux-headers/asm-mips/unistd_n64.h | 1 + linux-headers/asm-mips/unistd_o32.h | 1 + linux-headers/asm-powerpc/unistd_32.h | 1 + linux-headers/asm-powerpc/unistd_64.h | 1 + linux-headers/asm-s390/unistd_32.h| 1 + linux-headers/asm-s390/unistd_64.h| 1 + linux-headers/asm-x86/kvm.h | 13 +++ linux-headers/asm-x86/unistd_32.h | 7 +- linux-headers/asm-x86/unistd_64.h | 7 +- linux-headers/asm-x86/unistd_x32.h| 7 +- linux-headers/linux/kvm.h | 105 ++ linux-headers/linux/userfaultfd.h | 11 +- target/ppc/kvm.c | 12 ++ target/ppc/kvm_ppc.h | 12 ++ 28 files changed, 274 insertions(+), 19 deletions(-) -- 2.31.1
Re: [RFC PATCH v0 1/1] target/ppc: Support for H_RPT_INVALIDATE hcall
On Fri, Jan 15, 2021 at 06:30:05PM +0100, Greg Kurz wrote: > On Fri, 15 Jan 2021 14:01:28 +0530 > Bharata B Rao wrote: > > > On Wed, Jan 13, 2021 at 05:22:56PM +0100, Greg Kurz wrote: > > > Hi Bharata, > > > > > > On Wed, 6 Jan 2021 14:29:10 +0530 > > > Bharata B Rao wrote: > > > > > > > If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then > > > > > > > > - indicate the availability of H_RPT_INVALIDATE hcall to the guest via > > > > ibm,hypertas-functions property. > > > > - Enable the hcall > > > > > > > > Both the above are done only if the new sPAPR machine capability > > > > cap-rpt-invalidate is set. > > > > > > > > Note: The KVM implementation of the hcall has been posted for upstream > > > > review here: > > > > https://lore.kernel.org/linuxppc-dev/20210105090557.2150104-1-bhar...@linux.ibm.com/T/#t > > > > > > > > Update to linux-headers/linux/kvm.h here is temporary, will be > > > > done via header updates once the kernel change is accepted upstream. > > > > > > > > Signed-off-by: Bharata B Rao > > > > --- > > > > > > Patch looks mostly fine. A few remarks below. > > > > > > > hw/ppc/spapr.c| 7 ++ > > > > hw/ppc/spapr_caps.c | 49 +++ > > > > include/hw/ppc/spapr.h| 8 +-- > > > > linux-headers/linux/kvm.h | 1 + > > > > target/ppc/kvm.c | 12 ++ > > > > target/ppc/kvm_ppc.h | 11 + > > > > 6 files changed, 86 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > index 489cefcb81..0228083800 100644 > > > > --- a/hw/ppc/spapr.c > > > > +++ b/hw/ppc/spapr.c > > > > @@ -890,6 +890,11 @@ static void spapr_dt_rtas(SpaprMachineState > > > > *spapr, void *fdt) > > > > add_str(hypertas, "hcall-copy"); > > > > add_str(hypertas, "hcall-debug"); > > > > add_str(hypertas, "hcall-vphn"); > > > > +if (kvm_enabled() && > > > > > > You shouldn't check KVM here. The capability is enough to decide if we > > > should expose "hcall-rpt-invalidate" or not. FWIW we won't even reach > > > this code when running with anything but KVM. > > > > Correct, the capability itself can be only for KVM case. > > > > > > > > > +(spapr_get_cap(spapr, SPAPR_CAP_RPT_INVALIDATE) == > > > > SPAPR_CAP_ON)) { > > > > +add_str(hypertas, "hcall-rpt-invalidate"); > > > > +} > > > > + > > > > add_str(qemu_hypertas, "hcall-memop1"); > > > > > > > > if (!kvm_enabled() || kvmppc_spapr_use_multitce()) { > > > > @@ -2021,6 +2026,7 @@ static const VMStateDescription vmstate_spapr = { > > > > &vmstate_spapr_cap_ccf_assist, > > > > &vmstate_spapr_cap_fwnmi, > > > > &vmstate_spapr_fwnmi, > > > > +&vmstate_spapr_cap_rpt_invalidate, > > > > NULL > > > > } > > > > }; > > > > @@ -4478,6 +4484,7 @@ static void spapr_machine_class_init(ObjectClass > > > > *oc, void *data) > > > > smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON; > > > > smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON; > > > > smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON; > > > > +smc->default_caps.caps[SPAPR_CAP_RPT_INVALIDATE] = SPAPR_CAP_OFF; > > > > > > Any reason for not enabling this for the default machine type and > > > disabling it for existing machine types only ? > > > > If this capability is enabled, then > > > > 1. First level guest (L1) can off-load the TLB invalidations to the > > new hcall if the platform has disabled LPCR[GTSE]. > > > > 2. Nested guest (L2) will switch to this new hcall rather than using > > the old H_TLB_INVALIDATE hcall. > > > > Case 2 is optional and case 1 makes sense only if LPCR[GTSE]=off. > > I don't think this is relevant, as the importance of each case can change, > e.g. nested is gaining momentum. > > > Hence I thought keeping it off by default and expecting the > > user to turn it on only if required would be correct. > > > > If the feature is an improvement, even for what is considered a corner > case now, and it doesn't do harm to setups that won't use it, then it > should be enabled IMHO. > > > Please note that turning this capability ON will result in the > > new hcall being exposed to the guest. I hope this is the right > > usage of spapr-caps? > > > > That's perfectly fine and this is why we should set it to ON > for the default machine type only. The property can be turned ON only when the hypervisor supports the hcall. So if it set to ON for default machine type, then it may fail if the host doesn't have this hcall. Hence I thought it should be OFF by default and turning ON should be left to the user. Regards, Bharata.
Re: [RFC PATCH v0 1/1] target/ppc: Support for H_RPT_INVALIDATE hcall
On Wed, Jan 13, 2021 at 05:22:56PM +0100, Greg Kurz wrote: > Hi Bharata, > > On Wed, 6 Jan 2021 14:29:10 +0530 > Bharata B Rao wrote: > > > If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then > > > > - indicate the availability of H_RPT_INVALIDATE hcall to the guest via > > ibm,hypertas-functions property. > > - Enable the hcall > > > > Both the above are done only if the new sPAPR machine capability > > cap-rpt-invalidate is set. > > > > Note: The KVM implementation of the hcall has been posted for upstream > > review here: > > https://lore.kernel.org/linuxppc-dev/20210105090557.2150104-1-bhar...@linux.ibm.com/T/#t > > > > Update to linux-headers/linux/kvm.h here is temporary, will be > > done via header updates once the kernel change is accepted upstream. > > > > Signed-off-by: Bharata B Rao > > --- > > Patch looks mostly fine. A few remarks below. > > > hw/ppc/spapr.c| 7 ++ > > hw/ppc/spapr_caps.c | 49 +++ > > include/hw/ppc/spapr.h| 8 +-- > > linux-headers/linux/kvm.h | 1 + > > target/ppc/kvm.c | 12 ++ > > target/ppc/kvm_ppc.h | 11 + > > 6 files changed, 86 insertions(+), 2 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 489cefcb81..0228083800 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -890,6 +890,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, > > void *fdt) > > add_str(hypertas, "hcall-copy"); > > add_str(hypertas, "hcall-debug"); > > add_str(hypertas, "hcall-vphn"); > > +if (kvm_enabled() && > > You shouldn't check KVM here. The capability is enough to decide if we > should expose "hcall-rpt-invalidate" or not. FWIW we won't even reach > this code when running with anything but KVM. Correct, the capability itself can be only for KVM case. > > > +(spapr_get_cap(spapr, SPAPR_CAP_RPT_INVALIDATE) == SPAPR_CAP_ON)) { > > +add_str(hypertas, "hcall-rpt-invalidate"); > > +} > > + > > add_str(qemu_hypertas, "hcall-memop1"); > > > > if (!kvm_enabled() || kvmppc_spapr_use_multitce()) { > > @@ -2021,6 +2026,7 @@ static const VMStateDescription vmstate_spapr = { > > &vmstate_spapr_cap_ccf_assist, > > &vmstate_spapr_cap_fwnmi, > > &vmstate_spapr_fwnmi, > > +&vmstate_spapr_cap_rpt_invalidate, > > NULL > > } > > }; > > @@ -4478,6 +4484,7 @@ static void spapr_machine_class_init(ObjectClass *oc, > > void *data) > > smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON; > > smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON; > > smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON; > > +smc->default_caps.caps[SPAPR_CAP_RPT_INVALIDATE] = SPAPR_CAP_OFF; > > Any reason for not enabling this for the default machine type and > disabling it for existing machine types only ? If this capability is enabled, then 1. First level guest (L1) can off-load the TLB invalidations to the new hcall if the platform has disabled LPCR[GTSE]. 2. Nested guest (L2) will switch to this new hcall rather than using the old H_TLB_INVALIDATE hcall. Case 2 is optional and case 1 makes sense only if LPCR[GTSE]=off. Hence I thought keeping it off by default and expecting the user to turn it on only if required would be correct. Please note that turning this capability ON will result in the new hcall being exposed to the guest. I hope this is the right usage of spapr-caps? > > diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h > > index 73ce2bc951..8e27f8421f 100644 > > --- a/target/ppc/kvm_ppc.h > > +++ b/target/ppc/kvm_ppc.h > > @@ -24,6 +24,7 @@ void kvmppc_enable_logical_ci_hcalls(void); > > void kvmppc_enable_set_mode_hcall(void); > > void kvmppc_enable_clear_ref_mod_hcalls(void); > > void kvmppc_enable_h_page_init(void); > > +void kvmppc_enable_h_rpt_invalidate(void); > > void kvmppc_set_papr(PowerPCCPU *cpu); > > int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr); > > void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy); > > @@ -72,6 +73,7 @@ bool kvmppc_has_cap_nested_kvm_hv(void); > > int kvmppc_set_cap_nested_kvm_hv(int enable); > > int kvmppc_get_cap_large_decr(void); > > int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable); > > +int kvmppc_has_cap_rpt_invalidate(void); > > int kvmppc_enable_hwrng(void); > > int kvmppc_put_books_sregs(PowerPCCPU *cpu); > > PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void); > > @@ -151,6 +153,10 @@ static inline void kvmppc_enable_h_page_init(void) > > { > > } > > > > +static inline void kvmppc_enable_h_rpt_invalidate(void) > > +{ > > g_assert_not_reached() ? Don't see many others doing that, is that a new preferred way? Regards, Bharata.
Re: [RFC PATCH v0 1/1] target/ppc: Support for H_RPT_INVALIDATE hcall
On Tue, Jan 12, 2021 at 10:16:30AM -0300, Daniel Henrique Barboza wrote: > > > On 1/6/21 5:59 AM, Bharata B Rao wrote: > > If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then > > > > - indicate the availability of H_RPT_INVALIDATE hcall to the guest via > >ibm,hypertas-functions property. > > - Enable the hcall > > > > Both the above are done only if the new sPAPR machine capability > > cap-rpt-invalidate is set. > > > > Note: The KVM implementation of the hcall has been posted for upstream > > review here: > > https://lore.kernel.org/linuxppc-dev/20210105090557.2150104-1-bhar...@linux.ibm.com/T/#t > > > > Update to linux-headers/linux/kvm.h here is temporary, will be > > done via header updates once the kernel change is accepted upstream. > > > > Signed-off-by: Bharata B Rao > > --- > > Code LGTM. > > Reviewed-by: Daniel Henrique Barboza > > > A few questions about the logic: > > - does it work only on Power 9 like you mentioned in the error message > down below? If it's supported on Power 10 as well then we would want the > error message to read "H_RPT_INVALIDATE only supported on POWER9 and newer" > to contemplate it. Making it conditional to Power 9 was an oversight, will remove in the next iteration. > > - Does it make sense to expose "rpt-invalidate" to Libvirt? I see that the > capability is turned off by default, which may indicate that even if kernel > and QEMU support is present the user might want to not enable it. Is there > some sort of drawback/compromise when activating this cap? I have added this to take care of migration compatibility between source and target when hcall is present in target and not present in source or vice versa. I wonder if there is any other preferred method than introducing a new machine capability like cap-rpt-invalidate. Regards, Bharata.
[RFC PATCH v0 1/1] target/ppc: Support for H_RPT_INVALIDATE hcall
If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then - indicate the availability of H_RPT_INVALIDATE hcall to the guest via ibm,hypertas-functions property. - Enable the hcall Both the above are done only if the new sPAPR machine capability cap-rpt-invalidate is set. Note: The KVM implementation of the hcall has been posted for upstream review here: https://lore.kernel.org/linuxppc-dev/20210105090557.2150104-1-bhar...@linux.ibm.com/T/#t Update to linux-headers/linux/kvm.h here is temporary, will be done via header updates once the kernel change is accepted upstream. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c| 7 ++ hw/ppc/spapr_caps.c | 49 +++ include/hw/ppc/spapr.h| 8 +-- linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 12 ++ target/ppc/kvm_ppc.h | 11 + 6 files changed, 86 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 489cefcb81..0228083800 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -890,6 +890,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) add_str(hypertas, "hcall-copy"); add_str(hypertas, "hcall-debug"); add_str(hypertas, "hcall-vphn"); +if (kvm_enabled() && +(spapr_get_cap(spapr, SPAPR_CAP_RPT_INVALIDATE) == SPAPR_CAP_ON)) { +add_str(hypertas, "hcall-rpt-invalidate"); +} + add_str(qemu_hypertas, "hcall-memop1"); if (!kvm_enabled() || kvmppc_spapr_use_multitce()) { @@ -2021,6 +2026,7 @@ static const VMStateDescription vmstate_spapr = { &vmstate_spapr_cap_ccf_assist, &vmstate_spapr_cap_fwnmi, &vmstate_spapr_fwnmi, +&vmstate_spapr_cap_rpt_invalidate, NULL } }; @@ -4478,6 +4484,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON; smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON; smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON; +smc->default_caps.caps[SPAPR_CAP_RPT_INVALIDATE] = SPAPR_CAP_OFF; spapr_caps_add_properties(smc); smc->irq = &spapr_irq_dual; smc->dr_phb_enabled = true; diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c index 9341e9782a..ebf81e3b23 100644 --- a/hw/ppc/spapr_caps.c +++ b/hw/ppc/spapr_caps.c @@ -524,6 +524,45 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, uint8_t val, } } +static void cap_rpt_invalidate_apply(SpaprMachineState *spapr, + uint8_t val, Error **errp) +{ +ERRP_GUARD(); +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); + +if (!val) { +/* capability disabled by default */ +return; +} + +if (tcg_enabled()) { +error_setg(errp, "No H_RPT_INVALIDATE support in TCG"); +error_append_hint(errp, "Try appending -machine cap-rpt-invalidate=off\n"); +} else if (kvm_enabled()) { +if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0, + spapr->max_compat_pvr)) { +error_setg(errp, "H_RPT_INVALIDATE only supported on POWER9"); +error_append_hint(errp, + "Try appending -machine max-cpu-compat=power9\n"); +return; +} + +if (!kvmppc_has_cap_mmu_radix()) { +error_setg(errp, "H_RPT_INVALIDATE only supported on Radix"); +return; +} + +if (!kvmppc_has_cap_rpt_invalidate()) { +error_setg(errp, + "KVM implementation does not support H_RPT_INVALIDATE"); +error_append_hint(errp, + "Try appending -machine cap-rpt-invalidate=off\n"); +} else { +kvmppc_enable_h_rpt_invalidate(); +} +} +} + SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = { [SPAPR_CAP_HTM] = { .name = "htm", @@ -632,6 +671,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = { .type = "bool", .apply = cap_fwnmi_apply, }, +[SPAPR_CAP_RPT_INVALIDATE] = { +.name = "rpt-invalidate", +.description = "Allow H_RPT_INVALIDATE", +.index = SPAPR_CAP_RPT_INVALIDATE, +.get = spapr_cap_get_bool, +.set = spapr_cap_set_bool, +.type = "bool", +.apply = cap_rpt_invalidate_apply, +}, }; static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr, @@ -772,6 +820,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV); SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER); SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST); SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI); +SPAPR_CAP_MIG_STATE(rp
Re: [RFC PATCH 1/2] spapr: drc: Add support for async hcalls at the drc level
On Thu, Nov 26, 2020 at 01:54:55AM -0600, Shivaprasad G Bhat wrote: > The patch adds support for async hcalls at the DRC level for the > spapr devices. To be used by spapr-scm devices in the patch/es to follow. > > Signed-off-by: Shivaprasad G Bhat > --- > hw/ppc/spapr_drc.c | 146 > > include/hw/ppc/spapr_drc.h | 25 > 2 files changed, 171 insertions(+) > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 77718cde1f..2cecccf701 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -15,6 +15,7 @@ > #include "qapi/qmp/qnull.h" > #include "cpu.h" > #include "qemu/cutils.h" > +#include "qemu/guest-random.h" > #include "hw/ppc/spapr_drc.h" > #include "qom/object.h" > #include "migration/vmstate.h" > @@ -421,6 +422,145 @@ void spapr_drc_detach(SpaprDrc *drc) > spapr_drc_release(drc); > } > > + > +/* > + * @drc : device DRC targetting which the async hcalls to be made. > + * > + * All subsequent requests to run/query the status should use the > + * unique token returned here. > + */ > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc) > +{ > +Error *err = NULL; > +uint64_t token; > +SpaprDrcDeviceAsyncHCallState *tmp, *next, *state; > + > +state = g_malloc0(sizeof(*state)); > +state->pending = true; > + > +qemu_mutex_lock(&drc->async_hcall_states_lock); > +retry: > +if (qemu_guest_getrandom(&token, sizeof(token), &err) < 0) { > +error_report_err(err); > +g_free(state); > +return 0; > +} Returning w/o releasing the lock. > + > +if (!token) /* Token should be non-zero */ > +goto retry; > + > +if (!QLIST_EMPTY(&drc->async_hcall_states)) { > +QLIST_FOREACH_SAFE(tmp, &drc->async_hcall_states, node, next) { > +if (tmp->continue_token == token) { > +/* If the token already in use, get a new one */ > +goto retry; > +} > +} > +} > + > +state->continue_token = token; > +QLIST_INSERT_HEAD(&drc->async_hcall_states, state, node); > + > +qemu_mutex_unlock(&drc->async_hcall_states_lock); > + > +return state->continue_token; > +} > + > +static void *spapr_drc_async_hcall_runner(void *opaque) > +{ > +int response = -1; > +SpaprDrcDeviceAsyncHCallState *state = opaque; > + > +/* > + * state is freed only after this thread finishes(after pthread_join()), > + * don't worry about it becoming NULL. > + */ > + > +response = state->func(state->data); > + > +state->hcall_ret = response; > +state->pending = 0; > + > +return NULL; > +} > + > +/* > + * @drc : device DRC targetting which the async hcalls to be made. > + * token : The continue token to be used for tracking as recived from > + * spapr_drc_get_new_async_hcall_token > + * @func() : the worker function which needs to be executed asynchronously > + * @data : data to be passed to the asynchronous function. Worker is supposed > + * to free/cleanup the data that is passed here > + */ > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token, > + SpaprDrcAsyncHcallWorkerFunc *func, void > *data) > +{ > +SpaprDrcDeviceAsyncHCallState *state, *next; > + > +qemu_mutex_lock(&drc->async_hcall_states_lock); > +QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, next) { > +if (state->continue_token == token) { > +state->func = func; > +state->data = data; > +qemu_thread_create(&state->thread, "sPAPR Async HCALL", > + spapr_drc_async_hcall_runner, state, > + QEMU_THREAD_JOINABLE); > +break; > +} > +} Looks like QLIST_FOREACH should be enough here as you don't seem to be removing any list entry in this path. > +qemu_mutex_unlock(&drc->async_hcall_states_lock); > +} > + > +/* > + * spapr_drc_finish_async_hcalls > + * Waits for all pending async requests to complete > + * thier execution and free the states > + */ > +static void spapr_drc_finish_async_hcalls(SpaprDrc *drc) > +{ > +SpaprDrcDeviceAsyncHCallState *state, *next; > + > +if (QLIST_EMPTY(&drc->async_hcall_states)) { > +return; > +} > + > +QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, next) { > +qemu_thread_join(&state->thread); > +QLIST_REMOVE(state, node); > +g_free(state); > +} > +} Why is it safe to iterate the list here w/o the lock? Regards, Bharata.
Re: [Qemu-devel] [PATCH] target-ppc: Add quad precision muladd instructions
On Tue, Sep 22, 2020 at 10:04:53AM +0200, David Hildenbrand wrote: > Hi guys, > > I just stumbled over > > https://lore.kernel.org/qemu-devel/1487140636-19955-1-git-send-email-bhar...@linux.vnet.ibm.com/ > > while looking for the state of float128_muladd(), as I need that for > s390x as well. > > @Bharata, did you manage to implement a prototype? Looking into > float128_mul() makes my head spin. I did try but couldn't figure out a way to implement some of the dependent functions. Hence we still don't have quad-precision multiply-add/sub instructions in ppc. Regards, Bharata.
Re: [RFC PATCH 1/1] ppc/spapr: Add hotplugged flag on DIMM LMBs on drmem_v2
On Tue, Mar 10, 2020 at 8:24 AM David Gibson wrote: > > On Thu, Mar 05, 2020 at 09:22:02PM -0300, Leonardo Bras wrote: > > On reboot, all memory that was previously added using object_add and > > device_add is placed in this DIMM area. > > > > The new SPAPR_LMB_FLAGS_HOTPLUGGED flag helps Linux to put this memory in > > the correct memory zone, so no unmovable allocations are made there, > > allowing the object to be easily hot-removed by device_del and > > object_del. > > > > Signed-off-by: Leonardo Bras > > > > --- > > The new flag was already proposed on Power Architecture documentation, > > and it's waiting for approval. > > > > I would like to get your comments on this change, but it's still not > > ready for being merged. > > This looks reasonable to me - at the very least it doesn't look like > it could do much harm. Looks good to me, also tested with PowerKVM guests. Reviewed-by: Bharata B Rao Regards, Bharata. -- http://raobharata.wordpress.com/
[PATCH ppc-for-5.0 1/1] ppc/spapr: Don't call KVM_SVM_OFF ioctl on TCG
Invoking KVM_SVM_OFF ioctl for TCG guests will lead to a QEMU crash. Fix this by ensuring that we don't call KVM_SVM_OFF ioctl on TCG. Reported-by: Alexey Kardashevskiy Fixes: 4930c1966249 ("ppc/spapr: Support reboot of secure pseries guest") Signed-off-by: Bharata B Rao --- target/ppc/kvm.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index b64916dc37..ae2f3c57c0 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2902,9 +2902,12 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) void kvmppc_svm_off(Error **errp) { int rc; -KVMState *s = KVM_STATE(current_machine->accelerator); -rc = kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +if (!kvm_enabled()) { +return; +} + +rc = kvm_vm_ioctl(KVM_STATE(current_machine->accelerator), KVM_PPC_SVM_OFF); if (rc && rc != -ENOTTY) { error_setg_errno(errp, -rc, "KVM_PPC_SVM_OFF ioctl failed"); } -- 2.21.0
[PATCH v4 ppc-for-5.0 1/2] linux-headers: Update
Update to mainline commit: d1eef1c61974 ("Linux 5.5-rc2") Signed-off-by: Bharata B Rao --- include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + 7 files changed, 132 insertions(+), 4 deletions(-) diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index a6f7cf535e..072e2ed546 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_BOOTPARAM_H #define _ASM_X86_BOOTPARAM_H -/* setup_data types */ +/* setup_data/setup_indirect types */ #define SETUP_NONE 0 #define SETUP_E820_EXT 1 #define SETUP_DTB 2 @@ -11,6 +11,11 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE6 +#define SETUP_INDIRECT (1<<31) + +/* SETUP_INDIRECT | max(SETUP_*) */ +#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE) + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG0x8000 diff --git a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h index d019872608..a5a1c8234e 100644 --- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h +++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h @@ -58,7 +58,8 @@ #define PVRDMA_ROCEV1_VERSION 17 #define PVRDMA_ROCEV2_VERSION 18 #define PVRDMA_PPN64_VERSION 19 -#define PVRDMA_VERSION PVRDMA_PPN64_VERSION +#define PVRDMA_QPHANDLE_VERSION20 +#define PVRDMA_VERSION PVRDMA_QPHANDLE_VERSION #define PVRDMA_BOARD_ID1 #define PVRDMA_REV_ID 1 @@ -581,6 +582,17 @@ struct pvrdma_cmd_create_qp_resp { uint32_t max_inline_data; }; +struct pvrdma_cmd_create_qp_resp_v2 { + struct pvrdma_cmd_resp_hdr hdr; + uint32_t qpn; + uint32_t qp_handle; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + struct pvrdma_cmd_modify_qp { struct pvrdma_cmd_hdr hdr; uint32_t qp_handle; @@ -663,6 +675,7 @@ union pvrdma_cmd_resp { struct pvrdma_cmd_create_cq_resp create_cq_resp; struct pvrdma_cmd_resize_cq_resp resize_cq_resp; struct pvrdma_cmd_create_qp_resp create_qp_resp; + struct pvrdma_cmd_create_qp_resp_v2 create_qp_resp_v2; struct pvrdma_cmd_query_qp_resp query_qp_resp; struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp; struct pvrdma_cmd_create_srq_resp create_srq_resp; diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index a308c91b4f..46d279f515 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -68,7 +68,7 @@ extern "C" { #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \ ((uint32_t)(c) << 16) | ((uint32_t)(d) << 24)) -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */ +#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of little endian */ /* Reserve 0 for the invalid format specifier */ #define DRM_FORMAT_INVALID 0 @@ -647,7 +647,21 @@ extern "C" { * Further information on the use of AFBC modifiers can be found in * Documentation/gpu/afbc.rst */ -#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) fourcc_mod_code(ARM, __afbc_mode) + +/* + * The top 4 bits (out of the 56 bits alloted for specifying vendor specific + * modifiers) denote the category for modifiers. Currently we have only two + * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen + * different categories. + */ +#define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \ + fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fULL)) + +#define DRM_FORMAT_MOD_ARM_TYPE_AFBC 0x00 +#define DRM_FORMAT_MOD_ARM_TYPE_MISC 0x01 + +#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) \ + DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFBC, __afbc_mode) /* * AFBC superblock size @@ -741,6 +755,16 @@ extern "C" { */ #define AFBC_FORMAT_MOD_BCH (1ULL << 11) +/* + * Arm 16x16 Block U-Interleaved modifier + * + * This is used by Ar
[PATCH v4 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. If this ioctl fails for a secure guest, the guest is terminated. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 1 + target/ppc/kvm.c | 15 +++ target/ppc/kvm_ppc.h | 6 ++ 3 files changed, 22 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f11422fc41..e62c89b3dd 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState *machine) void *fdt; int rc; +kvmppc_svm_off(&error_fatal); spapr_caps_apply(spapr); first_ppc_cpu = POWERPC_CPU(first_cpu); diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 7406d18945..5e24ae701f 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2900,3 +2900,18 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &tb_offset); } } + +/* + * Don't set error if KVM_PPC_SVM_OFF ioctl is invoked on kernels + * that don't support this ioctl. + */ +void kvmppc_svm_off(Error **errp) +{ +int rc; +KVMState *s = KVM_STATE(current_machine->accelerator); + +rc = kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +if (rc && rc != -ENOTTY) { +error_setg_errno(errp, -rc, "KVM_PPC_SVM_OFF ioctl failed"); +} +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 47b08a4030..9a9bca1b72 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, bool radix, bool gtse, uint64_t proc_tbl); +void kvmppc_svm_off(Error **errp); #ifndef CONFIG_USER_ONLY bool kvmppc_spapr_use_multitce(void); int kvmppc_spapr_enable_inkernel_multitce(void); @@ -201,6 +202,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline void kvmppc_svm_off(Error **errp) +{ +return; +} + static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) { -- 2.21.0
[PATCH v4 ppc-for-5.0 0/2] ppc/spapr: Support reboot of secure pseries guest
This patchset adds KVM_PPC_SVM_OFF ioctl which is required to support reset of secure guest. This includes linux-headers update so that we get the newly introduced ioctl. v3: https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg03685.html Changes in v4: - - s/error_setg/error_setg_errno (Greg Kurz) Bharata B Rao (2): linux-headers: Update ppc/spapr: Support reboot of secure pseries guest hw/ppc/spapr.c| 1 + include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 15 target/ppc/kvm_ppc.h | 6 ++ 10 files changed, 154 insertions(+), 4 deletions(-) -- 2.21.0
[PATCH v3 ppc-for-5.0 1/2] linux-headers: Update
Update to mainline commit: d1eef1c61974 ("Linux 5.5-rc2") Signed-off-by: Bharata B Rao --- include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + 7 files changed, 132 insertions(+), 4 deletions(-) diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index a6f7cf535e..072e2ed546 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_BOOTPARAM_H #define _ASM_X86_BOOTPARAM_H -/* setup_data types */ +/* setup_data/setup_indirect types */ #define SETUP_NONE 0 #define SETUP_E820_EXT 1 #define SETUP_DTB 2 @@ -11,6 +11,11 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE6 +#define SETUP_INDIRECT (1<<31) + +/* SETUP_INDIRECT | max(SETUP_*) */ +#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE) + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG0x8000 diff --git a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h index d019872608..a5a1c8234e 100644 --- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h +++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h @@ -58,7 +58,8 @@ #define PVRDMA_ROCEV1_VERSION 17 #define PVRDMA_ROCEV2_VERSION 18 #define PVRDMA_PPN64_VERSION 19 -#define PVRDMA_VERSION PVRDMA_PPN64_VERSION +#define PVRDMA_QPHANDLE_VERSION20 +#define PVRDMA_VERSION PVRDMA_QPHANDLE_VERSION #define PVRDMA_BOARD_ID1 #define PVRDMA_REV_ID 1 @@ -581,6 +582,17 @@ struct pvrdma_cmd_create_qp_resp { uint32_t max_inline_data; }; +struct pvrdma_cmd_create_qp_resp_v2 { + struct pvrdma_cmd_resp_hdr hdr; + uint32_t qpn; + uint32_t qp_handle; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + struct pvrdma_cmd_modify_qp { struct pvrdma_cmd_hdr hdr; uint32_t qp_handle; @@ -663,6 +675,7 @@ union pvrdma_cmd_resp { struct pvrdma_cmd_create_cq_resp create_cq_resp; struct pvrdma_cmd_resize_cq_resp resize_cq_resp; struct pvrdma_cmd_create_qp_resp create_qp_resp; + struct pvrdma_cmd_create_qp_resp_v2 create_qp_resp_v2; struct pvrdma_cmd_query_qp_resp query_qp_resp; struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp; struct pvrdma_cmd_create_srq_resp create_srq_resp; diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index a308c91b4f..46d279f515 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -68,7 +68,7 @@ extern "C" { #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \ ((uint32_t)(c) << 16) | ((uint32_t)(d) << 24)) -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */ +#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of little endian */ /* Reserve 0 for the invalid format specifier */ #define DRM_FORMAT_INVALID 0 @@ -647,7 +647,21 @@ extern "C" { * Further information on the use of AFBC modifiers can be found in * Documentation/gpu/afbc.rst */ -#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) fourcc_mod_code(ARM, __afbc_mode) + +/* + * The top 4 bits (out of the 56 bits alloted for specifying vendor specific + * modifiers) denote the category for modifiers. Currently we have only two + * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen + * different categories. + */ +#define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \ + fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fULL)) + +#define DRM_FORMAT_MOD_ARM_TYPE_AFBC 0x00 +#define DRM_FORMAT_MOD_ARM_TYPE_MISC 0x01 + +#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) \ + DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFBC, __afbc_mode) /* * AFBC superblock size @@ -741,6 +755,16 @@ extern "C" { */ #define AFBC_FORMAT_MOD_BCH (1ULL << 11) +/* + * Arm 16x16 Block U-Interleaved modifier + * + * This is used by Ar
[PATCH v3 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. If this ioctl fails for a secure guest, the guest is terminated. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 1 + target/ppc/kvm.c | 15 +++ target/ppc/kvm_ppc.h | 6 ++ 3 files changed, 22 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f11422fc41..e62c89b3dd 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState *machine) void *fdt; int rc; +kvmppc_svm_off(&error_fatal); spapr_caps_apply(spapr); first_ppc_cpu = POWERPC_CPU(first_cpu); diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 7406d18945..ae920ec310 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2900,3 +2900,18 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &tb_offset); } } + +/* + * Don't set error if KVM_PPC_SVM_OFF ioctl is invoked on kernels + * that don't support this ioctl. + */ +void kvmppc_svm_off(Error **errp) +{ +int rc; +KVMState *s = KVM_STATE(current_machine->accelerator); + +rc = kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +if (rc && rc != -ENOTTY) { +error_setg(errp, "KVM_PPC_SVM_OFF ioctl failed"); +} +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 47b08a4030..9a9bca1b72 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, bool radix, bool gtse, uint64_t proc_tbl); +void kvmppc_svm_off(Error **errp); #ifndef CONFIG_USER_ONLY bool kvmppc_spapr_use_multitce(void); int kvmppc_spapr_enable_inkernel_multitce(void); @@ -201,6 +202,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline void kvmppc_svm_off(Error **errp) +{ +return; +} + static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) { -- 2.21.0
[PATCH v3 ppc-for-5.0 0/2] ppc/spapr: Support reboot of secure pseries guest
This patchset adds KVM_PPC_SVM_OFF ioctl which is required to support reset of secure guest. This includes linux-headers update so that we get the newly introduced ioctl. v2: https://lists.gnu.org/archive/html/qemu-ppc/2019-12/msg00162.html Changes in v3: - - Use of error_fatal as David Gibson suggested. - Updated linux-headers to 5.5.0-rc2 Bharata B Rao (2): linux-headers: Update ppc/spapr: Support reboot of secure pseries guest hw/ppc/spapr.c| 1 + include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 15 target/ppc/kvm_ppc.h | 6 ++ 10 files changed, 154 insertions(+), 4 deletions(-) -- 2.21.0
Re: [PATCH v2 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Thu, Dec 12, 2019 at 01:27:23PM +0100, Greg Kurz wrote: > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index f11422fc41..25e1a3446e 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1597,6 +1597,21 @@ static void spapr_machine_reset(MachineState > > *machine) > > void *fdt; > > int rc; > > > > +/* > > + * KVM_PPC_SVM_OFF ioctl can fail for secure guests, check and > > + * exit in that case. However check for -ENOTTY explicitly > > + * to ensure that we don't terminate normal guests that are > > + * running on kernels which don't support this ioctl. > > + * > > + * Also, this ioctl returns 0 for normal guests on kernels where > > + * this ioctl is supported. > > + */ > > +rc = kvmppc_svm_off(); > > +if (rc && rc != -ENOTTY) { > > This ioctl can also return -EINVAL if the ultravisor actually failed to move > the guest back to non-secure mode or -EBUSY if a vCPU is still running. I > agree that the former deserve the VM to be terminated. What about the latter ? > Can this happen and if yes, why ? Should we try again as suggested by Alexey ? > Could this reveal a bug in QEMU, in which case we should maybe abort ? We are in machine reset path, so all vcpus are already paused. So we don't expect any vcpus to be running to handle -EBUSY here. Neither do I see any sane recovery path from here. As Alexey mentioned earlier, may be we can just stop the VM? Do vm_stop() with RUN_STATE_PAUSED or some such reason? Regards, Bharata.
Re: [PATCH v2 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Thu, Dec 12, 2019 at 08:34:57AM +0100, Cédric Le Goater wrote: > Hello Bharata, > > > On 12/12/2019 06:50, Bharata B Rao wrote: > > A pseries guest can be run as a secure guest on Ultravisor-enabled > > POWER platforms. When such a secure guest is reset, we need to > > release/reset a few resources both on ultravisor and hypervisor side. > > This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the > > machine reset path. > > > > As part of this ioctl, the secure guest is essentially transitioned > > back to normal mode so that it can reboot like a regular guest and > > become secure again. > > > > This ioctl has no effect when invoked for a normal guest. If this ioctl > > fails for a secure guest, the guest is terminated. > > This looks OK. > > > Signed-off-by: Bharata B Rao > > --- > > hw/ppc/spapr.c | 15 +++ > > target/ppc/kvm.c | 7 +++ > > target/ppc/kvm_ppc.h | 6 ++ > > 3 files changed, 28 insertions(+) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index f11422fc41..25e1a3446e 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1597,6 +1597,21 @@ static void spapr_machine_reset(MachineState > > *machine) > > void *fdt; > > int rc; > > > > +/* > > + * KVM_PPC_SVM_OFF ioctl can fail for secure guests, check and > > + * exit in that case. However check for -ENOTTY explicitly > > + * to ensure that we don't terminate normal guests that are > > + * running on kernels which don't support this ioctl. > > + * > > + * Also, this ioctl returns 0 for normal guests on kernels where > > + * this ioctl is supported. > > + */ > > +rc = kvmppc_svm_off(); > > +if (rc && rc != -ENOTTY) { > > I would put these low level tests under kvmppc_svm_off(). Makes sense. > > > +error_report("Reset of secure guest failed, exiting..."); > > +exit(EXIT_FAILURE); > > The exit() could probably go under kvmppc_svm_off() also. May be not. Then error_report would have also have to go in. Doesn't make sense to print this error from there. Regards, Bharata.
[PATCH v2 ppc-for-5.0 1/2] linux-headers: Update
Update to mainline commit: e42617b825f8 ("Linux 5.5-rc1") Signed-off-by: Bharata B Rao --- include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + 7 files changed, 132 insertions(+), 4 deletions(-) diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index a6f7cf535e..072e2ed546 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_BOOTPARAM_H #define _ASM_X86_BOOTPARAM_H -/* setup_data types */ +/* setup_data/setup_indirect types */ #define SETUP_NONE 0 #define SETUP_E820_EXT 1 #define SETUP_DTB 2 @@ -11,6 +11,11 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE6 +#define SETUP_INDIRECT (1<<31) + +/* SETUP_INDIRECT | max(SETUP_*) */ +#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE) + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG0x8000 diff --git a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h index d019872608..a5a1c8234e 100644 --- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h +++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h @@ -58,7 +58,8 @@ #define PVRDMA_ROCEV1_VERSION 17 #define PVRDMA_ROCEV2_VERSION 18 #define PVRDMA_PPN64_VERSION 19 -#define PVRDMA_VERSION PVRDMA_PPN64_VERSION +#define PVRDMA_QPHANDLE_VERSION20 +#define PVRDMA_VERSION PVRDMA_QPHANDLE_VERSION #define PVRDMA_BOARD_ID1 #define PVRDMA_REV_ID 1 @@ -581,6 +582,17 @@ struct pvrdma_cmd_create_qp_resp { uint32_t max_inline_data; }; +struct pvrdma_cmd_create_qp_resp_v2 { + struct pvrdma_cmd_resp_hdr hdr; + uint32_t qpn; + uint32_t qp_handle; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + struct pvrdma_cmd_modify_qp { struct pvrdma_cmd_hdr hdr; uint32_t qp_handle; @@ -663,6 +675,7 @@ union pvrdma_cmd_resp { struct pvrdma_cmd_create_cq_resp create_cq_resp; struct pvrdma_cmd_resize_cq_resp resize_cq_resp; struct pvrdma_cmd_create_qp_resp create_qp_resp; + struct pvrdma_cmd_create_qp_resp_v2 create_qp_resp_v2; struct pvrdma_cmd_query_qp_resp query_qp_resp; struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp; struct pvrdma_cmd_create_srq_resp create_srq_resp; diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index a308c91b4f..46d279f515 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -68,7 +68,7 @@ extern "C" { #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \ ((uint32_t)(c) << 16) | ((uint32_t)(d) << 24)) -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */ +#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of little endian */ /* Reserve 0 for the invalid format specifier */ #define DRM_FORMAT_INVALID 0 @@ -647,7 +647,21 @@ extern "C" { * Further information on the use of AFBC modifiers can be found in * Documentation/gpu/afbc.rst */ -#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) fourcc_mod_code(ARM, __afbc_mode) + +/* + * The top 4 bits (out of the 56 bits alloted for specifying vendor specific + * modifiers) denote the category for modifiers. Currently we have only two + * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen + * different categories. + */ +#define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \ + fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fULL)) + +#define DRM_FORMAT_MOD_ARM_TYPE_AFBC 0x00 +#define DRM_FORMAT_MOD_ARM_TYPE_MISC 0x01 + +#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) \ + DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFBC, __afbc_mode) /* * AFBC superblock size @@ -741,6 +755,16 @@ extern "C" { */ #define AFBC_FORMAT_MOD_BCH (1ULL << 11) +/* + * Arm 16x16 Block U-Interleaved modifier + * + * This is used by Ar
[PATCH v2 ppc-for-5.0 0/2] ppc/spapr: Support reboot of secure pseries guest
This patchset adds KVM_PPC_SVM_OFF ioctl which is required to support reset of secure guest. This includes linux-headers update so that we get the newly introduced ioctl. v1: https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg01489.html Bharata B Rao (2): linux-headers: Update ppc/spapr: Support reboot of secure pseries guest hw/ppc/spapr.c| 15 include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 7 ++ target/ppc/kvm_ppc.h | 6 ++ 10 files changed, 160 insertions(+), 4 deletions(-) -- 2.21.0
[PATCH v2 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. If this ioctl fails for a secure guest, the guest is terminated. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 15 +++ target/ppc/kvm.c | 7 +++ target/ppc/kvm_ppc.h | 6 ++ 3 files changed, 28 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f11422fc41..25e1a3446e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1597,6 +1597,21 @@ static void spapr_machine_reset(MachineState *machine) void *fdt; int rc; +/* + * KVM_PPC_SVM_OFF ioctl can fail for secure guests, check and + * exit in that case. However check for -ENOTTY explicitly + * to ensure that we don't terminate normal guests that are + * running on kernels which don't support this ioctl. + * + * Also, this ioctl returns 0 for normal guests on kernels where + * this ioctl is supported. + */ +rc = kvmppc_svm_off(); +if (rc && rc != -ENOTTY) { +error_report("Reset of secure guest failed, exiting..."); +exit(EXIT_FAILURE); +} + spapr_caps_apply(spapr); first_ppc_cpu = POWERPC_CPU(first_cpu); diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 7406d18945..1a86fa4f0c 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2900,3 +2900,10 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &tb_offset); } } + +int kvmppc_svm_off(void) +{ +KVMState *s = KVM_STATE(current_machine->accelerator); + +return kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 47b08a4030..5cc812e486 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, bool radix, bool gtse, uint64_t proc_tbl); +int kvmppc_svm_off(void); #ifndef CONFIG_USER_ONLY bool kvmppc_spapr_use_multitce(void); int kvmppc_spapr_enable_inkernel_multitce(void); @@ -201,6 +202,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline int kvmppc_svm_off(void) +{ +return 0; +} + static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) { -- 2.21.0
Re: [PATCH v1 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Wed, Dec 11, 2019 at 04:27:42PM +1100, David Gibson wrote: > Ah, right. We'll need to check for -ENOTTY specifically and ignore > it, then. We don't want this spewing warnings on every non-secure > guest. I am posting v2 with explicit check for -ENOTTY. > > > It looks like we may need a new KVM capability to advertise the presence > > of KVM_PPC_SVM_OFF ioctl (or more generally, to advertise host kernel's > > capability to support secure guests). > > Actually, that's probably a better idea still. If and when we decide to have this KVM capability and that goes upstream, we can update the QEMU accordingly? Regards, Bharata.
Re: [PATCH v1 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Wed, Dec 11, 2019 at 10:41:32AM +1100, David Gibson wrote: > On Tue, Dec 10, 2019 at 12:20:07PM +0530, Bharata B Rao wrote: > > On Tue, Dec 10, 2019 at 04:05:36PM +1100, David Gibson wrote: > > > On Tue, Dec 10, 2019 at 03:03:01PM +1100, Alexey Kardashevskiy wrote: > > > > > > > > > > > > On 10/12/2019 14:50, Bharata B Rao wrote: > > > > > On Tue, Dec 10, 2019 at 02:28:51PM +1100, David Gibson wrote: > > > > >> On Mon, Dec 09, 2019 at 12:30:12PM +0530, Bharata B Rao wrote: > > > > >>> A pseries guest can be run as a secure guest on Ultravisor-enabled > > > > >>> POWER platforms. When such a secure guest is reset, we need to > > > > >>> release/reset a few resources both on ultravisor and hypervisor > > > > >>> side. > > > > >>> This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the > > > > >>> machine reset path. > > > > >>> > > > > >>> As part of this ioctl, the secure guest is essentially transitioned > > > > >>> back to normal mode so that it can reboot like a regular guest and > > > > >>> become secure again. > > > > >>> > > > > >>> This ioctl has no effect when invoked for a normal guest. > > > > >>> > > > > >>> Signed-off-by: Bharata B Rao > > > > >>> --- > > > > >>> hw/ppc/spapr.c | 1 + > > > > >>> target/ppc/kvm.c | 7 +++ > > > > >>> target/ppc/kvm_ppc.h | 6 ++ > > > > >>> 3 files changed, 14 insertions(+) > > > > >>> > > > > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > >>> index f11422fc41..4c7ad3400d 100644 > > > > >>> --- a/hw/ppc/spapr.c > > > > >>> +++ b/hw/ppc/spapr.c > > > > >>> @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState > > > > >>> *machine) > > > > >>> void *fdt; > > > > >>> int rc; > > > > >>> > > > > >>> +kvmppc_svm_off(); > > > > >> > > > > >> If you're going to have this return an error value, you should really > > > > >> check it here. > > > > > > > > > > I could, by spapr_machine_reset() and the callers don't propagate the > > > > > errors up. So may be I could print a warning instead when ioctl fails? > > > > > > > > An error here means you cannot restart the machine and should probably > > > > suspend, or try until it is not EBUSY (==all threads have stopped?). > > > > > > Right, if this fails, something has gone badly wrong. You should > > > absolutely print a message, and in fact it might be appropriate to > > > quit outright. IIUC the way PEF resets work, a failure here means you > > > won't be able to boot after the reset, since the guest memory will > > > still be inaccessible to the host. > > > > Correct. I will send next version with a message and abort() added in > > the ioctl failure path. > > abort() or assert() isn't right either - that's reserved for things > that are definitely caused by a qemu code bug. This should be an > exit(EXIT_FAILURE). Ok, but I see a problem with checking the return value of this ioctl from userspace. If this ioctl is run on older kernels that don't support this ioctl, we get -ENOTTY as return value. We shouldn't be exiting in that case. It looks like we may need a new KVM capability to advertise the presence of KVM_PPC_SVM_OFF ioctl (or more generally, to advertise host kernel's capability to support secure guests). Paul - Do you think we should add such a KVM capability? Here is the summary of the problem: 1. QEMU invokes KVM_PPC_SVM_OFF ioctl from machine reset path and currently we don't check for its return value. 2. On host kernels that support secure guests, 2a. this ioctl returns 0 for regular guests and has no effect. 2b. the ioctl can fail for secure guests and here we could check the return value and exit the guest right away. 3. On host kernels that don't support secure guests, ioctl returns -ENOTTY but we ignore the return value and continue the reset as this is for a non-secure guest. If we have such a KVM capability, we could invoke the ioctl only if it is supported and handle the return value appropriately. Regards, Bharata.
Re: [PATCH v1 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Tue, Dec 10, 2019 at 04:05:36PM +1100, David Gibson wrote: > On Tue, Dec 10, 2019 at 03:03:01PM +1100, Alexey Kardashevskiy wrote: > > > > > > On 10/12/2019 14:50, Bharata B Rao wrote: > > > On Tue, Dec 10, 2019 at 02:28:51PM +1100, David Gibson wrote: > > >> On Mon, Dec 09, 2019 at 12:30:12PM +0530, Bharata B Rao wrote: > > >>> A pseries guest can be run as a secure guest on Ultravisor-enabled > > >>> POWER platforms. When such a secure guest is reset, we need to > > >>> release/reset a few resources both on ultravisor and hypervisor side. > > >>> This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the > > >>> machine reset path. > > >>> > > >>> As part of this ioctl, the secure guest is essentially transitioned > > >>> back to normal mode so that it can reboot like a regular guest and > > >>> become secure again. > > >>> > > >>> This ioctl has no effect when invoked for a normal guest. > > >>> > > >>> Signed-off-by: Bharata B Rao > > >>> --- > > >>> hw/ppc/spapr.c | 1 + > > >>> target/ppc/kvm.c | 7 +++ > > >>> target/ppc/kvm_ppc.h | 6 ++ > > >>> 3 files changed, 14 insertions(+) > > >>> > > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > >>> index f11422fc41..4c7ad3400d 100644 > > >>> --- a/hw/ppc/spapr.c > > >>> +++ b/hw/ppc/spapr.c > > >>> @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState > > >>> *machine) > > >>> void *fdt; > > >>> int rc; > > >>> > > >>> +kvmppc_svm_off(); > > >> > > >> If you're going to have this return an error value, you should really > > >> check it here. > > > > > > I could, by spapr_machine_reset() and the callers don't propagate the > > > errors up. So may be I could print a warning instead when ioctl fails? > > > > An error here means you cannot restart the machine and should probably > > suspend, or try until it is not EBUSY (==all threads have stopped?). > > Right, if this fails, something has gone badly wrong. You should > absolutely print a message, and in fact it might be appropriate to > quit outright. IIUC the way PEF resets work, a failure here means you > won't be able to boot after the reset, since the guest memory will > still be inaccessible to the host. Correct. I will send next version with a message and abort() added in the ioctl failure path. Regards, Bharata.
Re: [PATCH v1 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
On Tue, Dec 10, 2019 at 02:28:51PM +1100, David Gibson wrote: > On Mon, Dec 09, 2019 at 12:30:12PM +0530, Bharata B Rao wrote: > > A pseries guest can be run as a secure guest on Ultravisor-enabled > > POWER platforms. When such a secure guest is reset, we need to > > release/reset a few resources both on ultravisor and hypervisor side. > > This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the > > machine reset path. > > > > As part of this ioctl, the secure guest is essentially transitioned > > back to normal mode so that it can reboot like a regular guest and > > become secure again. > > > > This ioctl has no effect when invoked for a normal guest. > > > > Signed-off-by: Bharata B Rao > > --- > > hw/ppc/spapr.c | 1 + > > target/ppc/kvm.c | 7 +++ > > target/ppc/kvm_ppc.h | 6 ++ > > 3 files changed, 14 insertions(+) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index f11422fc41..4c7ad3400d 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState *machine) > > void *fdt; > > int rc; > > > > +kvmppc_svm_off(); > > If you're going to have this return an error value, you should really > check it here. I could, by spapr_machine_reset() and the callers don't propagate the errors up. So may be I could print a warning instead when ioctl fails? Regards, Bharata.
[PATCH v1 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 1 + target/ppc/kvm.c | 7 +++ target/ppc/kvm_ppc.h | 6 ++ 3 files changed, 14 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f11422fc41..4c7ad3400d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState *machine) void *fdt; int rc; +kvmppc_svm_off(); spapr_caps_apply(spapr); first_ppc_cpu = POWERPC_CPU(first_cpu); diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 7406d18945..1a86fa4f0c 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2900,3 +2900,10 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &tb_offset); } } + +int kvmppc_svm_off(void) +{ +KVMState *s = KVM_STATE(current_machine->accelerator); + +return kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 47b08a4030..5cc812e486 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, bool radix, bool gtse, uint64_t proc_tbl); +int kvmppc_svm_off(void); #ifndef CONFIG_USER_ONLY bool kvmppc_spapr_use_multitce(void); int kvmppc_spapr_enable_inkernel_multitce(void); @@ -201,6 +202,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline int kvmppc_svm_off(void) +{ +return 0; +} + static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) { -- 2.21.0
[PATCH v1 ppc-for-5.0 0/2] ppc/spapr: Support reboot of secure pseries guest
This patchset adds KVM_PPC_SVM_OFF ioctl which is required to support reset of secure guest. This includes linux-headers update so that we get the newly introduced ioctl. v0: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02408.html Bharata B Rao (2): linux-headers: Update ppc/spapr: Support reboot of secure pseries guest hw/ppc/spapr.c| 1 + include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 7 ++ target/ppc/kvm_ppc.h | 6 ++ 10 files changed, 146 insertions(+), 4 deletions(-) -- 2.21.0
[PATCH v1 ppc-for-5.0 1/2] linux-headers: Update
Update to mainline commit: e42617b825f8 ("Linux 5.5-rc1") Signed-off-by: Bharata B Rao --- include/standard-headers/asm-x86/bootparam.h | 7 +- .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++- include/standard-headers/drm/drm_fourcc.h | 28 ++- .../linux/input-event-codes.h | 77 +++ include/standard-headers/linux/pci_regs.h | 3 + .../standard-headers/rdma/vmw_pvrdma-abi.h| 5 ++ linux-headers/linux/kvm.h | 1 + 7 files changed, 132 insertions(+), 4 deletions(-) diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index a6f7cf535e..072e2ed546 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_BOOTPARAM_H #define _ASM_X86_BOOTPARAM_H -/* setup_data types */ +/* setup_data/setup_indirect types */ #define SETUP_NONE 0 #define SETUP_E820_EXT 1 #define SETUP_DTB 2 @@ -11,6 +11,11 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE6 +#define SETUP_INDIRECT (1<<31) + +/* SETUP_INDIRECT | max(SETUP_*) */ +#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE) + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG0x8000 diff --git a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h index d019872608..a5a1c8234e 100644 --- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h +++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h @@ -58,7 +58,8 @@ #define PVRDMA_ROCEV1_VERSION 17 #define PVRDMA_ROCEV2_VERSION 18 #define PVRDMA_PPN64_VERSION 19 -#define PVRDMA_VERSION PVRDMA_PPN64_VERSION +#define PVRDMA_QPHANDLE_VERSION20 +#define PVRDMA_VERSION PVRDMA_QPHANDLE_VERSION #define PVRDMA_BOARD_ID1 #define PVRDMA_REV_ID 1 @@ -581,6 +582,17 @@ struct pvrdma_cmd_create_qp_resp { uint32_t max_inline_data; }; +struct pvrdma_cmd_create_qp_resp_v2 { + struct pvrdma_cmd_resp_hdr hdr; + uint32_t qpn; + uint32_t qp_handle; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + struct pvrdma_cmd_modify_qp { struct pvrdma_cmd_hdr hdr; uint32_t qp_handle; @@ -663,6 +675,7 @@ union pvrdma_cmd_resp { struct pvrdma_cmd_create_cq_resp create_cq_resp; struct pvrdma_cmd_resize_cq_resp resize_cq_resp; struct pvrdma_cmd_create_qp_resp create_qp_resp; + struct pvrdma_cmd_create_qp_resp_v2 create_qp_resp_v2; struct pvrdma_cmd_query_qp_resp query_qp_resp; struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp; struct pvrdma_cmd_create_srq_resp create_srq_resp; diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index a308c91b4f..46d279f515 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -68,7 +68,7 @@ extern "C" { #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \ ((uint32_t)(c) << 16) | ((uint32_t)(d) << 24)) -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */ +#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of little endian */ /* Reserve 0 for the invalid format specifier */ #define DRM_FORMAT_INVALID 0 @@ -647,7 +647,21 @@ extern "C" { * Further information on the use of AFBC modifiers can be found in * Documentation/gpu/afbc.rst */ -#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) fourcc_mod_code(ARM, __afbc_mode) + +/* + * The top 4 bits (out of the 56 bits alloted for specifying vendor specific + * modifiers) denote the category for modifiers. Currently we have only two + * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen + * different categories. + */ +#define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \ + fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fULL)) + +#define DRM_FORMAT_MOD_ARM_TYPE_AFBC 0x00 +#define DRM_FORMAT_MOD_ARM_TYPE_MISC 0x01 + +#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) \ + DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFBC, __afbc_mode) /* * AFBC superblock size @@ -741,6 +755,16 @@ extern "C" { */ #define AFBC_FORMAT_MOD_BCH (1ULL << 11) +/* + * Arm 16x16 Block U-Interleaved modifier + * + * This is used by Ar
Re: [PATCH v3 2/3] spapr: Add NVDIMM device support
On Fri, Nov 22, 2019 at 10:42 AM David Gibson wrote: > > Ok. A number of queries about this. > > 1) The PAPR spec for ibm,dynamic-memory-v2 says that the first word in > each entry is the number of LMBs, but for NVDIMMs you use the > not-necessarily-equal scm_block_size instead. Does the NVDIMM > amendment for PAPR really specify to use different block sizes for > these cases? (In which case that's a really stupid spec decision, but > that wouldn't surprise me at this point). SCM block sizes can be different from LMB sizes, but here we enforce that SCM device size (excluding metadata) to multiple of LMB size so that we don't end up memory range that is not aligned to LMB size. > > 2) Similarly, the ibm,dynamic-memory-v2 description says that the > memory block described by the entry has a whole batch of contiguous > DRCs starting at the DRC index given and continuing for #LMBs DRCs. > For NVDIMMs it appears that you just have one DRC for the whole > NVDIMM. Is that right? One NVDIMM has one DRC, In our case, we need to mark the LMBs corresponding to that address range in ibm,dynamic-memory-v2 as reserved and invalid. > > 3) You're not setting *any* extra flags on the entry. How is the > guest supposed to know which are NVDIMM entries and which are regular > DIMM entries? AFAICT in this version the NVDIMM slots are > indistinguishable from the unassigned hotplug memory (which makes the > difference in LMB and DRC numbering even more troubling). For NVDIMM case, this patch should populate the LMB set in ibm,dynamic-memory-v2 something like below: elem = spapr_get_drconf_cell(size /lmb_size, addr, 0, -1, SPAPR_LMB_FLAGS_RESERVED | SPAPR_LMB_FLAGS_DRC_INVALID); This will ensure that the NVDIMM range will never be considered as valid memory range for memory hotplug. > > 4) AFAICT these are _present_ NVDIMMs, so why is > SPAPR_LMB_FLAGS_ASSIGNED not set for them? (and why is the node > forced to -1, regardless of di->node). > > > QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); > > nr_entries++; > > cur_addr = addr + size; > > @@ -1261,6 +1273,85 @@ static void spapr_dt_hypervisor(SpaprMachineState > > *spapr, void *fdt) > > } > > } > > > > +static void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr) > > +{ > > +MachineState *machine = MACHINE(spapr); > > +int i; > > + > > +for (i = 0; i < machine->ram_slots; i++) { > > +spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM, i); > > What happens if you try to plug an NVDIMM to one of these slots, but a > regular DIMM has already taken it? NVDIMM hotplug won't get that occupied slot. Regards, Bharata.
Re: [Qemu-devel] [PATCH v0] ppc/spapr: Support reboot of secure pseries guest
On Thu, Jul 11, 2019 at 11:31:00AM +1000, David Gibson wrote: > On Wed, Jul 10, 2019 at 11:36:21AM +0530, Bharata B Rao wrote: > > A pseries guest can be run as a secure guest on Ultravisor-enabled > > POWER platforms. When such a secure guest is reset, we need to > > release/reset a few resources both on ultravisor and hypervisor side. > > This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the > > machine reset path. > > > > As part of this ioctl, the secure guest is essentially transitioned > > back to normal mode so that it can reboot like a regular guest and > > become secure again. > > > > This ioctl has no effect when invoked for a normal guest. > > > > Signed-off-by: Bharata B Rao > > --- > > * The ioctl implementation in the kernel can be found as part of this > > patchset: > > https://www.spinics.net/lists/linux-mm/msg184366.html > > * Updated linux-headers/linux/kvm.h here for completeness as the > > definition of KVM_PPC_SVM_OFF isn't yet part of upstream kernel. > > The qemu change looks good to me. To actually merge this, the support > will need to go upstream in the kernel first, then we'll need an > update-kernel-headers as a separate patch. Sure, Thanks. Will post again once ioctl support is included in the kernel. Regards, Bharata.
[Qemu-devel] [PATCH v0] ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. Signed-off-by: Bharata B Rao --- * The ioctl implementation in the kernel can be found as part of this patchset: https://www.spinics.net/lists/linux-mm/msg184366.html * Updated linux-headers/linux/kvm.h here for completeness as the definition of KVM_PPC_SVM_OFF isn't yet part of upstream kernel. hw/ppc/spapr.c| 1 + linux-headers/linux/kvm.h | 1 + target/ppc/kvm.c | 7 +++ target/ppc/kvm_ppc.h | 6 ++ 4 files changed, 15 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 821f0d4a49..6abf71f159 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1709,6 +1709,7 @@ static void spapr_machine_reset(MachineState *machine) void *fdt; int rc; +kvmppc_svm_off(); spapr_caps_apply(spapr); first_ppc_cpu = POWERPC_CPU(first_cpu); diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index c8423e760c..9603fef9bf 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -1327,6 +1327,7 @@ struct kvm_s390_ucas_mapping { #define KVM_PPC_GET_RMMU_INFO_IOW(KVMIO, 0xb0, struct kvm_ppc_rmmu_info) /* Available with KVM_CAP_PPC_GET_CPU_CHAR */ #define KVM_PPC_GET_CPU_CHAR _IOR(KVMIO, 0xb1, struct kvm_ppc_cpu_char) +#define KVM_PPC_SVM_OFF _IO(KVMIO, 0xb2) /* ioctl for vm fd */ #define KVM_CREATE_DEVICE_IOWR(KVMIO, 0xe0, struct kvm_create_device) diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 8a06d3171e..079d83ce6c 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2953,3 +2953,10 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset) kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &tb_offset); } } + +int kvmppc_svm_off(void) +{ +KVMState *s = KVM_STATE(current_machine->accelerator); + +return kvm_vm_ioctl(s, KVM_PPC_SVM_OFF); +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 98bd7d5da6..0fd80e1100 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu); target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, bool radix, bool gtse, uint64_t proc_tbl); +int kvmppc_svm_off(void); #ifndef CONFIG_USER_ONLY bool kvmppc_spapr_use_multitce(void); int kvmppc_spapr_enable_inkernel_multitce(void); @@ -201,6 +202,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline int kvmppc_svm_off(void) +{ + return 0; +} + static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) { -- 2.21.0
[Qemu-devel] [PATCH] spapr_cpu_core: vmstate_[un]register per-CPU data from (un)realizefn
VMStateDescription vmstate_spapr_cpu_state was added by commit b94020268e0b6 (spapr_cpu_core: migrate per-CPU data) to migrate per-CPU data with the required vmstate registration and unregistration calls. However the unregistration is being done only from vcpu creation error path and not from CPU delete path. This causes migration to fail with the following error if migration is attempted after a CPU unplug like this: Unknown savevm section or instance 'spapr_cpu' 16 Additionally this leaves the source VM unresponsive after migration failure. Fix this by ensuring the vmstate_unregister happens during CPU removal. Fixing this becomes easier when vmstate (un)registration calls are moved to vcpu (un)realize functions which is what this patch does. Fixes: https://bugs.launchpad.net/qemu/+bug/1785972 Reported-by: Satheesh Rajendran Signed-off-by: Bharata B Rao --- hw/ppc/spapr_cpu_core.c | 62 + 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 993759db47..bb88a3ce4e 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -113,26 +113,6 @@ const char *spapr_get_cpu_core_type(const char *cpu_type) return object_class_get_name(oc); } -static void spapr_unrealize_vcpu(PowerPCCPU *cpu) -{ -qemu_unregister_reset(spapr_cpu_reset, cpu); -object_unparent(cpu->intc); -cpu_remove_sync(CPU(cpu)); -object_unparent(OBJECT(cpu)); -} - -static void spapr_cpu_core_unrealize(DeviceState *dev, Error **errp) -{ -sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev)); -CPUCore *cc = CPU_CORE(dev); -int i; - -for (i = 0; i < cc->nr_threads; i++) { -spapr_unrealize_vcpu(sc->threads[i]); -} -g_free(sc->threads); -} - static bool slb_shadow_needed(void *opaque) { sPAPRCPUState *spapr_cpu = opaque; @@ -207,10 +187,34 @@ static const VMStateDescription vmstate_spapr_cpu_state = { } }; +static void spapr_unrealize_vcpu(PowerPCCPU *cpu, sPAPRCPUCore *sc) +{ +if (!sc->pre_3_0_migration) { +vmstate_unregister(NULL, &vmstate_spapr_cpu_state, cpu->machine_data); +} +qemu_unregister_reset(spapr_cpu_reset, cpu); +object_unparent(cpu->intc); +cpu_remove_sync(CPU(cpu)); +object_unparent(OBJECT(cpu)); +} + +static void spapr_cpu_core_unrealize(DeviceState *dev, Error **errp) +{ +sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev)); +CPUCore *cc = CPU_CORE(dev); +int i; + +for (i = 0; i < cc->nr_threads; i++) { +spapr_unrealize_vcpu(sc->threads[i], sc); +} +g_free(sc->threads); +} + static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, - Error **errp) + sPAPRCPUCore *sc, Error **errp) { CPUPPCState *env = &cpu->env; +CPUState *cs = CPU(cpu); Error *local_err = NULL; object_property_set_bool(OBJECT(cpu), true, "realized", &local_err); @@ -233,6 +237,11 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, goto error_unregister; } +if (!sc->pre_3_0_migration) { +vmstate_register(NULL, cs->cpu_index, &vmstate_spapr_cpu_state, + cpu->machine_data); +} + return; error_unregister: @@ -272,10 +281,6 @@ static PowerPCCPU *spapr_create_vcpu(sPAPRCPUCore *sc, int i, Error **errp) } cpu->machine_data = g_new0(sPAPRCPUState, 1); -if (!sc->pre_3_0_migration) { -vmstate_register(NULL, cs->cpu_index, &vmstate_spapr_cpu_state, - cpu->machine_data); -} object_unref(obj); return cpu; @@ -290,9 +295,6 @@ static void spapr_delete_vcpu(PowerPCCPU *cpu, sPAPRCPUCore *sc) { sPAPRCPUState *spapr_cpu = spapr_cpu_state(cpu); -if (!sc->pre_3_0_migration) { -vmstate_unregister(NULL, &vmstate_spapr_cpu_state, cpu->machine_data); -} cpu->machine_data = NULL; g_free(spapr_cpu); object_unparent(OBJECT(cpu)); @@ -325,7 +327,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) } for (j = 0; j < cc->nr_threads; j++) { -spapr_realize_vcpu(sc->threads[j], spapr, &local_err); +spapr_realize_vcpu(sc->threads[j], spapr, sc, &local_err); if (local_err) { goto err_unrealize; } @@ -334,7 +336,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) err_unrealize: while (--j >= 0) { -spapr_unrealize_vcpu(sc->threads[j]); +spapr_unrealize_vcpu(sc->threads[j], sc); } err: while (--i >= 0) { -- 2.14.3
[Qemu-devel] [Bug 1785972] Re: v3.0.0-rc4: VM fails to start after vcpuhotunplug, managedsave sequence
The first commit that causes this issue is: b94020268e0b6659499e250d25346baaa9888fed (spapr_cpu_core: migrate per- CPU data) Simpler way to reproduce: 1. Hotplug a CPU 2. Hot unplug it 3. Migrate the VM (will fail) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1785972 Title: v3.0.0-rc4: VM fails to start after vcpuhotunplug, managedsave sequence Status in QEMU: New Bug description: VM fails to start after vcpu hot un-plug, managedsave sequence Host info: Kernel: 4.18.0-rc8-2-g1236568ee3cb qemu: commit 6ad90805383e6d04b3ff49681b8519a48c9f4410 (HEAD -> master, tag: v3.0.0-rc4) QEMU emulator version 2.12.94 (v3.0.0-rc4-dirty) libvirt: commit 087de2f5a3dffb27d2eeb0c50a86d5d6984e5a5e (HEAD -> master) libvirtd (libvirt) 4.6.0 Guest Kernel: 4.18.0-rc8-2-g1236568ee3cb Steps to reproduce: 1. Start a guest(VM) with 2 current , 4 max vcpus virsh start vm1 Domain vm1 started 2. Hotplug 2 vcpus virsh setvcpus vm1 4 --live 3. Hot unplug 2 vcpus virsh setvcpus vm1 2 --live 4. Managedsave the VM virsh managedsave vm1 Domain vm1 state saved by libvirt 5. Start the VM ---Fails to start virsh start vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: qemu unexpectedly closed the monitor: 2018-08-08T06:27:53.853818Z qemu: Unknown savevm section or instance 'spapr_cpu' 2 2018-08-08T06:27:53.854949Z qemu: load of migration failed: Invalid argument Testcase: avocado run libvirt_vcpu_plug_unplug.positive_test.vcpu_set.live.vm_operate.managedsave_with_unplug --vt-type libvirt --vt-extra-params emulator_path=/usr/share/avocado-plugins-vt/bin/qemu create_vm_libvirt=yes kill_vm_libvirt=yes env_cleanup=yes smp=8 backup_image_before_testing=no libvirt_controller=virtio-scsi scsi_hba=virtio-scsi-pci drive_format=scsi-hd use_os_variant=no restore_image_after_testing=no vga=none display=nographic kernel=/home/kvmci/linux/vmlinux kernel_args='root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug' take_regular_screendumps=no --vt-guest-os JeOS.27.ppc64le JOB ID : 1f869477ad87e7d7e7ef631ae08965f41a74 JOB LOG: /root/avocado/job-results/job-2018-08-08T02.42-1f86947/job.log (1/1) type_specific.io-github-autotest-libvirt.libvirt_vcpu_plug_unplug.positive_test.vcpu_set.live.vm_operate.managedsave_with_unplug: ERROR (91.58 s) RESULTS: PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 95.89 s To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1785972/+subscriptions
Re: [Qemu-devel] VCPU hotplug on KVM/ARM
On Tue, Jul 31, 2018 at 3:57 PM, Igor Mammedov wrote: > On Wed, 25 Jul 2018 14:07:12 +0100 > Marc Zyngier wrote: > > > On 25/07/18 13:28, Andrew Jones wrote: > > > On Wed, Jul 25, 2018 at 11:40:54AM +0100, Marc Zyngier wrote: > > >> On 24/07/18 19:35, Maran Wilson wrote: > > >>> It's been a few months since this email thread died off. Has anyone > > >>> started working on a potential solution that would allow VCPU > hotplug on > > >>> KVM/ARM ? Or is this a project that is still waiting for an owner > who > > >>> has the time and inclination to get started? > > >> > > >> This is typically a project for someone who would have this particular > > >> itch to scratch, and who has a demonstrable need for this > functionality. > > >> > > >> Work wise, it would have to include adding physical CPU hotplug > support > > >> to the arm64 kernel as a precondition, before worrying about doing it > in > > >> KVM. > > >> > > >> For KVM itself, particular area of interests would be: > > >> - Making GICv3 redistributors magically appear in the IPA space > > >> - Live resizing of GICv3 structures > > >> - Dynamic allocation of MPIDR, and mapping with vcpu_id > > > > > > I have CPU topology description patches on the QEMU list now[*]. A next > > > step for me is to this MPIDR work. I probably won't get to it until the > > > end of August though. > > > > > > [*] http://lists.nongnu.org/archive/html/qemu-devel/2018- > 07/msg01168.html > > > > > >> > > >> This should keep someone busy for a good couple of weeks (give or > take a > > >> few months). > > > > > > :-) > > > > > >> > > >> That being said, I'd rather see support in QEMU first, creating all > the > > >> vcpu/redistributors upfront, and signalling the hotplug event via the > > >> virtual firmware. And then post some numbers to show that creating all > > >> the vcpus upfront is not acceptable. > > > > > > I think the upfront allocation, allocating all possible cpus, but only > > > activating all present cpus, was the planned approach. What were the > > > concerns about that approach? Just vcpu memory overhead for too many > > > overly ambitious VM configs? > > > > I don't have any ARM-specific concern about that, and I think this is > > the right approach. It has the good property of not requiring much > > change in the kernel (other than actually supporting CPU hotplug). > for x86 we allocate VCPUs dynamically (both QEMU and KVM) > CCing ppc/s390 folks as I don't recall how it's implemented there. > > but we do not delete vcpus in KVM after they were created > (as it deemed to be too complicated), we are just deleting QEMU part > of it and keep kvm's vcpu for reuse with future hotplug. > Same with PPC, we too dynamically create vcpus and during unplug keep the KVM's vcpus for reuse. Regards, Bharata.
[Qemu-devel] [Bug 1780928] Re: v2.12.0-2321-gb34181056c: vcpu hotplug crashes qemu-kvm with segfault
Reverting the below comment makes CPU hotplug work again: commit a028dd423ee6dfd091a8c63028240832bf10f671 ppc/xics: introduce ICP DeviceRealize and DeviceReset handlers This changes the ICP realize and reset handlers in DeviceRealize and DeviceReset handlers. parent handlers are now called from the inheriting classes which is a cleaner object pattern. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1780928 Title: v2.12.0-2321-gb34181056c: vcpu hotplug crashes qemu-kvm with segfault Status in QEMU: Confirmed Bug description: vcpu hotplug crashes upstream qemu(v2.12.0-2321-gb34181056c), vcpu hotplug works fine in v2.12.0-rc4. Host: Power8, kernel: 4.18.0-rc2-00037-g6f0d349d922b Guest: Power8, kernel: 4.18.0-rc3-00183-gc42c12a90545 (base image: fedora27 ppc64le) /usr/share/avocado-plugins-vt/build/qemu/ppc64-softmmu/qemu-system-ppc64 -M pseries,accel=kvm,max-cpu-compat=power8 -m 8192 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-27-ppc64le.qcow2,format=qcow2,if=none,id=drive1 -device scsi-hd,drive=drive1,bus=scsi0.0 -smp 1,cores=1,threads=1,sockets=1,maxcpus=8 -serial /dev/pts/0 -monitor stdio -vga none -nographic -kernel /home/kvmci/linux/vmlinux -append 'root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug' -nic user,model=virtio-net-pci QEMU 2.12.50 monitor - type 'help' for more information (qemu) device_add host-spapr-cpu-core,id=core1,core-id=1 Segmentation fault (core dumped) Guest initial cpu: # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s):1 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1780928/+subscriptions
[Qemu-devel] [RFC PATCH v4] spapr: Support ibm, dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao --- v3: https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02159.html Changes in v4: (Addresses all David Gibson's comments) - Use of QEMU_PACKED for struct sPAPRDrconfCellV2. - Use of sizeof() when creating ibm,dynamic-memory-v2 FDT buffer. - Keep the drconf_queue list local and let the helper return a populated DrconfCellQueue entry. - ibm,dynamic-memory-v2 property is no longer dependent of machine type version. docs/specs/ppc-spapr-hotplug.txt | 19 hw/ppc/spapr.c | 230 +++ include/hw/ppc/spapr_ovec.h | 1 + 3 files changed, 202 insertions(+), 48 deletions(-) diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index f57e2a09c6..cc7833108e 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: - A 32bit flags word. The bit at bit position 0x0008 defines whether the LMB is assigned to the the partition as of boot time. +ibm,dynamic-memory-v2 + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dyanamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32bit integer. +- Logical address of the first LMB in the set encoded as a 64bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ibm,associativity-lookup-arrays property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32bit flags word that applies to all the LMBs in the set. + [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7b2bc4e25d..7a41d7db5f 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -668,63 +668,137 @@ static uint32_t spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) return -1; } -/* - * Adds ibm,dynamic-reconfiguration-memory node. - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation - * of this device tree node. - */ -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) +struct sPAPRDrconfCellV2 { + uint32_t seq_lmbs; + uint64_t base_addr; + uint32_t drc_index; + uint32_t aa_index; + uint32_t flags; +} QEMU_PACKED; + +typedef struct DrconfCellQueue { +struct sPAPRDrconfCellV2 cell; +QSIMPLEQ_ENTRY(DrconfCellQueue) entry; +} DrconfCellQueue; + +static DrconfCellQueue * +spapr_get_drconf_cell(uint32_t seq_lmbs, uint64_t base_addr, + uint32_t drc_index, uint32_t aa_index, + uint32_t flags) { -MachineState *machine = MACHINE(spapr); -int ret, i, offset; -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; -uint32_t nr_lmbs = (spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr)) / - lmb_size; -uint32_t *int_buf, *cur_index, buf_len; -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; -MemoryDeviceInfoList *dimms = NULL; +DrconfCellQueue *elem; -/* - * Don't create the node if there is no hotpluggable memory - */ -if (machine->ram_size == machine->maxram_size) { -return 0; -} +elem = g_malloc0(sizeof(*elem)); +elem->cell.seq_lmbs = cpu_to_be32(seq_lmbs); +elem->cell.base_addr = cpu_to_be64(base_addr); +elem->cell.drc_index = cpu_to_be32(drc_index); +elem->cell.aa_index = cpu_to_be32(aa_index); +elem->cell.flags = cpu_to_be32(flags); -/* - * Allocate enough buffer size to fit in ibm,dynamic-memory - * or ibm,associativity-lookup-arrays - */ -buf_len = MAX(nr_lmbs * SPAPR_DR_LMB_LIST_ENTRY_SIZE + 1, nr_nodes * 4 + 2) - * sizeof(uint32_t); -cur_index = int_buf = g_malloc0(buf_len); +return elem; +} -offset = fdt_add_subnode(fdt, 0, "ibm,dynamic-reconfiguration-memory"); +/* ibm,dynamic-memory-v2 */ +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, + int offset, MemoryDeviceInfoList *dimms) +{ +uint8_t *int_buf, *cur_index, buf_len; +int ret; +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; +uint64_t addr, cur_addr, size; +
Re: [Qemu-devel] [PATCH v3] spapr: Support ibm, dynamic-memory-v2 property
On Tue, Apr 17, 2018 at 11:14:27AM +1000, David Gibson wrote: > > static void spapr_machine_2_12_class_options(MachineClass *mc) > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > > index d60b7c6d7a..5e044c44af 100644 > > --- a/include/hw/ppc/spapr.h > > +++ b/include/hw/ppc/spapr.h > > @@ -149,6 +149,7 @@ struct sPAPRMachineState { > > sPAPROptionVector *ov5; /* QEMU-supported option vectors */ > > sPAPROptionVector *ov5_cas; /* negotiated (via CAS) option vectors > > */ > > uint32_t max_compat_pvr; > > +bool use_ibm_dynamic_memory_v2; > > TBH, I'm not really sure we even need to adjust this by machine type. There are other similar features controlled by ov5 bits that are also determined by machine type version: Memory hotplug support -- sPAPRMachineClass.dr_lmb_enabled Dedicated HP event support -- sPAPRMachineState.use_hotplug_event_source Are you saying that presence of ibm,dynamic-memory-v2 probably shouldn't be dependent on machine type ? Regards, Bharata.
Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property
On Mon, Apr 16, 2018 at 07:47:29PM +0300, Serhii Popovych wrote: > Bharata B Rao wrote: > > On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote: > >> Now recent kernels (i.e. since linux-stable commit a346137e9142 > >> ("powerpc/numa: Use ibm,max-associativity-domains to discover possible > >> nodes") > >> support this property to mark initially memory-less NUMA nodes as > >> "possible" > >> to allow further memory hot-add to them. > >> > >> Advertise this property for pSeries machines to let guest kernels detect > >> maximum supported node configuration and benefit from kernel side change > >> when hot-add memory to specific, possibly empty before, NUMA node. > >> > >> Signed-off-by: Serhii Popovych > >> --- > >> hw/ppc/spapr.c | 10 ++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >> index a81570e..c05bbad 100644 > >> --- a/hw/ppc/spapr.c > >> +++ b/hw/ppc/spapr.c > >> @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, > >> void *fdt) > >> 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), > >> cpu_to_be32(max_cpus / smp_threads), > >> }; > >> +uint32_t maxdomains[] = { > >> +cpu_to_be32(4), > >> +cpu_to_be32(0), > >> +cpu_to_be32(0), > >> +cpu_to_be32(0), > >> +cpu_to_be32(nb_numa_nodes - 1), > >> +}; > >> > >> _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); > >> > >> @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, > >> void *fdt) > >> _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", > >> refpoints, sizeof(refpoints))); > >> > >> +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", > >> + maxdomains, sizeof(maxdomains))); > >> + > >> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", > >>RTAS_ERROR_LOG_MAX)); > >> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", > >> -- > >> 1.8.3.1 > > > > This commit causes hash guest with latest guest kernel to hang at early > > boot. > > I use v4.16 tag from stable and can't reproduce on P8 machine reported > issue. > > Could you please share more details about your setup, kernel commit id > you spot problem? I am on 4.16.0-rc7 (commit id: 0b412605ef5f) BTW this happens only for non-NUMA guest. Regards, Bharata.
[Qemu-devel] [PATCH v3] spapr: Support ibm, dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao --- v2 - https://lists.nongnu.org/archive/html/qemu-ppc/2018-04/msg00052.html Changes in v3: - Addressed David Gibson's review comments: use of CamelCase, separate helper for populating drconf_cell queue entries, removed the user configurability of drmem_v2 machine property. docs/specs/ppc-spapr-hotplug.txt | 19 hw/ppc/spapr.c | 233 +++ include/hw/ppc/spapr.h | 1 + include/hw/ppc/spapr_ovec.h | 1 + 4 files changed, 207 insertions(+), 47 deletions(-) diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index f57e2a09c6..cc7833108e 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: - A 32bit flags word. The bit at bit position 0x0008 defines whether the LMB is assigned to the the partition as of boot time. +ibm,dynamic-memory-v2 + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dyanamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32bit integer. +- Logical address of the first LMB in the set encoded as a 64bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ibm,associativity-lookup-arrays property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32bit flags word that applies to all the LMBs in the set. + [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 3ffadd6ac7..fe327f3c1b 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -669,63 +669,136 @@ static uint32_t spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) return -1; } -/* - * Adds ibm,dynamic-reconfiguration-memory node. - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation - * of this device tree node. - */ -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) +struct sPAPRDrconfCellV2 { + uint32_t seq_lmbs; + uint64_t base_addr; + uint32_t drc_index; + uint32_t aa_index; + uint32_t flags; +} __attribute__((packed)); + +/* 6 uint32_t entries in sPAPRDrconfCellV2 */ +#define SPAPR_NR_DRCONF_CELL_ENTRIES 6 + +typedef struct DrconfCellQueue { +struct sPAPRDrconfCellV2 cell; +QSIMPLEQ_ENTRY(DrconfCellQueue) entry; +} DrconfCellQueue; + +QSIMPLEQ_HEAD(, DrconfCellQueue) drconf_queue += QSIMPLEQ_HEAD_INITIALIZER(drconf_queue); + +static void spapr_add_drconf_cell(uint32_t seq_lmbs, uint64_t base_addr, + uint32_t drc_index, uint32_t aa_index, + uint32_t flags) +{ +DrconfCellQueue *elem; + +elem = g_malloc0(sizeof(*elem)); +elem->cell.seq_lmbs = cpu_to_be32(seq_lmbs); +elem->cell.base_addr = cpu_to_be64(base_addr); +elem->cell.drc_index = cpu_to_be32(drc_index); +elem->cell.aa_index = cpu_to_be32(aa_index); +elem->cell.flags = cpu_to_be32(flags); +QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); +} + +/* ibm,dynamic-memory-v2 */ +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, + int offset, MemoryDeviceInfoList *dimms) { -MachineState *machine = MACHINE(spapr); -int ret, i, offset; -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; -uint32_t nr_lmbs = (spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr)) / - lmb_size; uint32_t *int_buf, *cur_index, buf_len; -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; -MemoryDeviceInfoList *dimms = NULL; +int ret; +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; +uint64_t addr, cur_addr, size; +uint32_t nr_boot_lmbs = (spapr->hotplug_memory.base / lmb_size); +uint64_t mem_end = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); +uint32_t node, nr_entries = 0; +sPAPRDRConnector *drc; +DrconfCellQueue *elem, *next; +MemoryDeviceInfoList *info; -/* - * Don't create the node if there is no hotpluggable memory - */ -if (machine
Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property
On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote: > Now recent kernels (i.e. since linux-stable commit a346137e9142 > ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") > support this property to mark initially memory-less NUMA nodes as "possible" > to allow further memory hot-add to them. > > Advertise this property for pSeries machines to let guest kernels detect > maximum supported node configuration and benefit from kernel side change > when hot-add memory to specific, possibly empty before, NUMA node. > > Signed-off-by: Serhii Popovych > --- > hw/ppc/spapr.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index a81570e..c05bbad 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void > *fdt) > 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), > cpu_to_be32(max_cpus / smp_threads), > }; > +uint32_t maxdomains[] = { > +cpu_to_be32(4), > +cpu_to_be32(0), > +cpu_to_be32(0), > +cpu_to_be32(0), > +cpu_to_be32(nb_numa_nodes - 1), > +}; > > _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); > > @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void > *fdt) > _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", > refpoints, sizeof(refpoints))); > > +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", > + maxdomains, sizeof(maxdomains))); > + > _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", >RTAS_ERROR_LOG_MAX)); > _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", > -- > 1.8.3.1 This commit causes hash guest with latest guest kernel to hang at early boot. Quiescing Open Firmware ... Booting Linux via __start() @ 0x0200 ... [0.00] hash-mmu: Page sizes from device-tree: [0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 [0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 [0.00] Using 1TB segments [0.00] hash-mmu: Initializing hash mmu with SLB [0.00] Linux version 4.16.0-rc7+ (root@localhost.localdomain) (gcc version 7.1.1 20170622 (Red Hat 7.1.1-3) (GCC)) #60 SMP Wed Apr 11 10:36:22 IST 2018 [0.00] Found initrd at 0xc3c0:0xc4f9a34c [0.00] Using pSeries machine description [0.00] bootconsole [udbg0] enabled [0.00] Partition configured for 32 cpus. [0.00] CPU maps initialized for 1 thread per core [0.00] - [0.00] ppc64_pft_size= 0x1a [0.00] phys_mem_size = 0x2 [0.00] dcache_bsize = 0x80 [0.00] icache_bsize = 0x80 [0.00] cpu_features = 0x077c7a6c18500249 [0.00] possible= 0x18500649 [0.00] always = 0x18100040 [0.00] cpu_user_features = 0xdc0065c2 0xae00 [0.00] mmu_features = 0x78006001 [0.00] firmware_features = 0x0001415a445f [0.00] htab_hash_mask= 0x7 [0.00] - No progess after this.
Re: [Qemu-devel] [RFC PATCH v2] spapr: Support ibm, dynamic-memory-v2 property
On Wed, Apr 11, 2018 at 02:45:58PM +1000, David Gibson wrote: > On Mon, Apr 09, 2018 at 11:55:38AM +0530, Bharata B Rao wrote: > > The new property ibm,dynamic-memory-v2 allows memory to be represented > > in a more compact manner in device tree. > > > > Signed-off-by: Bharata B Rao > > --- > > v1: https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg01788.html > > Changes in v1: > > - Minor cleanups in the error paths > > - Rebased on top of ppc-for-2.13 > > > > docs/specs/ppc-spapr-hotplug.txt | 19 +++ > > hw/ppc/spapr.c | 257 > > --- > > include/hw/ppc/spapr.h | 1 + > > include/hw/ppc/spapr_ovec.h | 1 + > > 4 files changed, 233 insertions(+), 45 deletions(-) > > > > > diff --git a/docs/specs/ppc-spapr-hotplug.txt > > b/docs/specs/ppc-spapr-hotplug.txt > > index f57e2a09c6..cc7833108e 100644 > > --- a/docs/specs/ppc-spapr-hotplug.txt > > +++ b/docs/specs/ppc-spapr-hotplug.txt > > @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: > > - A 32bit flags word. The bit at bit position 0x0008 defines whether > >the LMB is assigned to the the partition as of boot time. > > > > +ibm,dynamic-memory-v2 > > + > > +This property describes the dynamically reconfigurable memory. This is > > +an alternate and newer way to describe dyanamically reconfigurable memory. > > +It is a property encoded array that has an integer N (the number of > > +LMB set entries) followed by N LMB set entries. There is an LMB set entry > > +for each sequential group of LMBs that share common attributes. > > + > > +Each LMB set entry consists of the following elements: > > + > > +- Number of sequential LMBs in the entry represented by a 32bit integer. > > +- Logical address of the first LMB in the set encoded as a 64bit integer. > > +- DRC index of the first LMB in the set. > > +- Associativity list index that is used as an index into > > + ibm,associativity-lookup-arrays property described earlier. This > > + is used to retrieve the right associativity list to be used for all > > + the LMBs in this set. > > +- A 32bit flags word that applies to all the LMBs in the set. > > + > > [1] > > http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 3ffadd6ac7..4a24fac38c 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -669,63 +669,139 @@ static uint32_t > > spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) > > return -1; > > } > > > > -/* > > - * Adds ibm,dynamic-reconfiguration-memory node. > > - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation > > - * of this device tree node. > > - */ > > -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void > > *fdt) > > +struct of_drconf_cell_v2 { > > qemu convention is to use CamelCase for types. Ok. > > > + uint32_t seq_lmbs; > > + uint64_t base_addr; > > + uint32_t drc_index; > > + uint32_t aa_index; > > + uint32_t flags; > > +} __attribute__((packed)); > > + > > +#define SPAPR_DRCONF_CELL_SIZE 6 > > Define this using a sizeof() for safety. Ok. > > > +/* ibm,dynamic-memory-v2 */ > > +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, > > + int offset, MemoryDeviceInfoList *dimms) > > { > > -MachineState *machine = MACHINE(spapr); > > -int ret, i, offset; > > -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; > > -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; > > -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; > > -uint32_t nr_lmbs = (spapr->hotplug_memory.base + > > - memory_region_size(&spapr->hotplug_memory.mr)) / > > - lmb_size; > > uint32_t *int_buf, *cur_index, buf_len; > > -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; > > -MemoryDeviceInfoList *dimms = NULL; > > +int ret; > > +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; > > +uint64_t addr, cur_addr, size; > > +uint32_t nr_boot_lmbs = (spapr->hotplug_memory.base / lmb_size); > > +uint64_t mem_end = spapr->hotplug_memory.base + > > + memory_region_size(&spapr->hotplug_memory.mr); > > +uint32_t node, nr_entries = 0;
Re: [Qemu-devel] [RFC PATCH v2] spapr: Support ibm, dynamic-memory-v2 property
On Tue, Apr 10, 2018 at 01:02:45PM +1000, David Gibson wrote: > On Mon, Apr 09, 2018 at 11:55:38AM +0530, Bharata B Rao wrote: > > The new property ibm,dynamic-memory-v2 allows memory to be represented > > in a more compact manner in device tree. > > I still need to look at this in more detail, but to start with: > what's the rationale for this new format? > > It's more compact, but why do we care? The embedded people always > whinge about the size of the deivce tree, but I didn't think that was > really a concern with PAPR. Here's a real example of how this has affected us earlier: SLOF's CAS FDT buffer size was initially 32K, was changed to 64k to support 1TB guest memory and again changed to 2MB to support 16TB guest memory. With ibm,dynamic-memory-v2 we are less likely to hit such scenarios. Also, theoretically it should be more efficient in the guest kernel to handle LMB-sets than individual LMBs. We aren't there yet, but I believe grouping of LMBs should eventually help us do memory hotplug at set (or DIMM) granularity than at individual LMB granularity (Again theoretical possibility) Regards, Bharata.
[Qemu-devel] [RFC PATCH v2] spapr: Support ibm, dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao --- v1: https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg01788.html Changes in v1: - Minor cleanups in the error paths - Rebased on top of ppc-for-2.13 docs/specs/ppc-spapr-hotplug.txt | 19 +++ hw/ppc/spapr.c | 257 --- include/hw/ppc/spapr.h | 1 + include/hw/ppc/spapr_ovec.h | 1 + 4 files changed, 233 insertions(+), 45 deletions(-) diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index f57e2a09c6..cc7833108e 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: - A 32bit flags word. The bit at bit position 0x0008 defines whether the LMB is assigned to the the partition as of boot time. +ibm,dynamic-memory-v2 + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dyanamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32bit integer. +- Logical address of the first LMB in the set encoded as a 64bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ibm,associativity-lookup-arrays property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32bit flags word that applies to all the LMBs in the set. + [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 3ffadd6ac7..4a24fac38c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -669,63 +669,139 @@ static uint32_t spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) return -1; } -/* - * Adds ibm,dynamic-reconfiguration-memory node. - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation - * of this device tree node. - */ -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) +struct of_drconf_cell_v2 { + uint32_t seq_lmbs; + uint64_t base_addr; + uint32_t drc_index; + uint32_t aa_index; + uint32_t flags; +} __attribute__((packed)); + +#define SPAPR_DRCONF_CELL_SIZE 6 + +/* ibm,dynamic-memory-v2 */ +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, + int offset, MemoryDeviceInfoList *dimms) { -MachineState *machine = MACHINE(spapr); -int ret, i, offset; -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; -uint32_t nr_lmbs = (spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr)) / - lmb_size; uint32_t *int_buf, *cur_index, buf_len; -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; -MemoryDeviceInfoList *dimms = NULL; +int ret; +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; +uint64_t addr, cur_addr, size; +uint32_t nr_boot_lmbs = (spapr->hotplug_memory.base / lmb_size); +uint64_t mem_end = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); +uint32_t node, nr_entries = 0; +sPAPRDRConnector *drc; +typedef struct drconf_cell_queue { +struct of_drconf_cell_v2 cell; +QSIMPLEQ_ENTRY(drconf_cell_queue) entry; +} drconf_cell_queue; +QSIMPLEQ_HEAD(, drconf_cell_queue) drconf_queue += QSIMPLEQ_HEAD_INITIALIZER(drconf_queue); +drconf_cell_queue *elem, *next; +MemoryDeviceInfoList *info; -/* - * Don't create the node if there is no hotpluggable memory - */ -if (machine->ram_size == machine->maxram_size) { -return 0; -} +/* Entry to cover RAM and the gap area */ +elem = g_malloc0(sizeof(drconf_cell_queue)); +elem->cell.seq_lmbs = cpu_to_be32(nr_boot_lmbs); +elem->cell.base_addr = cpu_to_be64(0); +elem->cell.drc_index = cpu_to_be32(0); +elem->cell.aa_index = cpu_to_be32(-1); +elem->cell.flags = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED | + SPAPR_LMB_FLAGS_DRC_INVALID); +QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); +nr_entries++; + +cur_addr = spapr->hotplug_memory.base; +for (info = dimms; info; info = info->next) { +PCDIMMDeviceInfo *di = info->value->u.dimm.data; + +
Re: [Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"
On Thu, Apr 05, 2018 at 10:35:22AM -0400, Serhii Popovych wrote: > This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. > > Leave change @node type from uint32_t to to int from reverted commit > because node < 0 is always false. > > Signed-off-by: Serhii Popovych > --- > hw/ppc/spapr.c | 22 -- > 1 file changed, 22 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 2c0be8c..3ad4545 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler > *hotplug_dev, > return; > } > > -/* > - * Currently PowerPC kernel doesn't allow hot-adding memory to > - * memory-less node, but instead will silently add the memory > - * to the first node that has some memory. This causes two > - * unexpected behaviours for the user. > - * > - * - Memory gets hotplugged to a different node than what the user > - * specified. > - * - Since pc-dimm subsystem in QEMU still thinks that memory belongs > - * to memory-less node, a reboot will set things accordingly > - * and the previously hotplugged memory now ends in the right node. > - * This appears as if some memory moved from one node to another. > - * > - * So until kernel starts supporting memory hotplug to memory-less > - * nodes, just prevent such attempts upfront in QEMU. > - */ > -if (nb_numa_nodes && !numa_info[node].node_mem) { > -error_setg(errp, "Can't hotplug memory to memory-less node %d", > - node); > -return; > -} > - If you remove this unconditionally, wouldn't it be a problem in case of newer QEMU with older guest kernels ? Regards, Bharata.
Re: [Qemu-devel] [PATCH v4 1/5] pc-dimm: make qmp_pc_dimm_device_list() sort devices by address
On Thu, Mar 08, 2018 at 10:33:33AM +0800, Haozhong Zhang wrote: > Make qmp_pc_dimm_device_list() return sorted by start address > list of devices so that it could be reused in places that > would need sorted list*. Reuse existing pc_dimm_built_list() > to get sorted list. > > While at it hide recursive callbacks from callers, so that: > > qmp_pc_dimm_device_list(qdev_get_machine(), &list); > > could be replaced with simpler: > > list = qmp_pc_dimm_device_list(); > > * follow up patch will use it in build_srat() > > Signed-off-by: Haozhong Zhang > Reviewed-by: Igor Mammedov > --- > hw/mem/pc-dimm.c | 83 > +--- > hw/ppc/spapr.c | 3 +- I have used this patch with SPAPR to implement the new device tree property ibm,dynamic-memory-v2 at http://patchwork.ozlabs.org/patch/882387/ Reviewed-by: Bharata B Rao
[Qemu-devel] [RFC PATCH v1] spapr: Support ibm, dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao --- v0: http://lists.gnu.org/archive/html/qemu-ppc/2018-02/msg00236.html Changes in v1: - Rebased on top of Haozhong Zhang's qmp_pc_dimm_device_list refactor patch. (http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00978.html) docs/specs/ppc-spapr-hotplug.txt | 19 +++ hw/ppc/spapr.c | 256 --- include/hw/ppc/spapr.h | 1 + include/hw/ppc/spapr_ovec.h | 1 + 4 files changed, 233 insertions(+), 44 deletions(-) diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index f57e2a09c6..cc7833108e 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: - A 32bit flags word. The bit at bit position 0x0008 defines whether the LMB is assigned to the the partition as of boot time. +ibm,dynamic-memory-v2 + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dyanamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32bit integer. +- Logical address of the first LMB in the set encoded as a 64bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ibm,associativity-lookup-arrays property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32bit flags word that applies to all the LMBs in the set. + [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 44a0670d11..6361ec20c7 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -669,63 +669,138 @@ static uint32_t spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) return -1; } -/* - * Adds ibm,dynamic-reconfiguration-memory node. - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation - * of this device tree node. - */ -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) +struct of_drconf_cell_v2 { + uint32_t seq_lmbs; + uint64_t base_addr; + uint32_t drc_index; + uint32_t aa_index; + uint32_t flags; +} __attribute__((packed)); + +#define SPAPR_DRCONF_CELL_SIZE 6 + +/* ibm,dynamic-memory-v2 */ +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, + int offset, MemoryDeviceInfoList *dimms) { -MachineState *machine = MACHINE(spapr); -int ret, i, offset; -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; -uint32_t nr_lmbs = (spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr)) / - lmb_size; uint32_t *int_buf, *cur_index, buf_len; -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; -MemoryDeviceInfoList *dimms = NULL; +int ret; +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; +uint64_t addr, cur_addr, size; +uint32_t nr_boot_lmbs = (spapr->hotplug_memory.base / lmb_size); +uint64_t mem_end = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); +uint32_t node, nr_entries = 0; +sPAPRDRConnector *drc; +typedef struct drconf_cell_queue { +struct of_drconf_cell_v2 cell; +QSIMPLEQ_ENTRY(drconf_cell_queue) entry; +} drconf_cell_queue; +QSIMPLEQ_HEAD(, drconf_cell_queue) drconf_queue += QSIMPLEQ_HEAD_INITIALIZER(drconf_queue); +drconf_cell_queue *elem, *next; +MemoryDeviceInfoList *info; -/* - * Don't create the node if there is no hotpluggable memory - */ -if (machine->ram_size == machine->maxram_size) { -return 0; -} +/* Entry to cover RAM and the gap area */ +elem = g_malloc0(sizeof(drconf_cell_queue)); +elem->cell.seq_lmbs = cpu_to_be32(nr_boot_lmbs); +elem->cell.base_addr = cpu_to_be64(0); +elem->cell.drc_index = cpu_to_be32(0); +elem->cell.aa_index = cpu_to_be32(-1); +elem->cell.flags = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED | + SPAPR_LMB_FLAGS_DRC_INVALID); +QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); +nr_entries++; + +cur_addr = spapr->hotplug_memory.base; +for (info = dimms; info; info = info->next) { +
Re: [Qemu-devel] [RFC PATCH v0 1/2] pc-dimm: Make pc_dimm_built_list() global
On Tue, Feb 20, 2018 at 03:35:10PM +0100, Igor Mammedov wrote: > On Mon, 19 Feb 2018 12:12:53 +0530 > Bharata B Rao wrote: > > > Making pc_dimm_built_list() global allows other parts of QEMU code > > to build and walk through the DIMM list in address-sorted order. > > > > This is needed in the next patch for sPAPR code to create > > ibm,dynamic-memory-v2 device tree property that will have entries > > for populated DIMMs as well as available hotpluggable areas. > > > > CHECK: List of DIMMs is already available via qmp_pc_dimm_device_list(), > maybe make it sorted first and use it? > > (i.e. use pc_dimm_built_list in qmp_pc_dimm_device_list) and hide > recursive callback ugliness from external users. > > MemoryDeviceInfoList *qmp_pc_dimm_device_list(void) { > object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &list); > ... > } Thanks will attempt this in my next version. Regards, Bharata.
[Qemu-devel] [RFC PATCH v0 1/2] pc-dimm: Make pc_dimm_built_list() global
Making pc_dimm_built_list() global allows other parts of QEMU code to build and walk through the DIMM list in address-sorted order. This is needed in the next patch for sPAPR code to create ibm,dynamic-memory-v2 device tree property that will have entries for populated DIMMs as well as available hotpluggable areas. CHECK: List of DIMMs is already available via qmp_pc_dimm_device_list(), but that doesn't provide a sorted list. Signed-off-by: Bharata B Rao --- hw/mem/pc-dimm.c | 2 +- include/hw/mem/pc-dimm.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c index 6e74b61..9bd61ca 100644 --- a/hw/mem/pc-dimm.c +++ b/hw/mem/pc-dimm.c @@ -261,7 +261,7 @@ static gint pc_dimm_addr_sort(gconstpointer a, gconstpointer b) return 0; } -static int pc_dimm_built_list(Object *obj, void *opaque) +int pc_dimm_built_list(Object *obj, void *opaque) { GSList **list = opaque; diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h index d83b957..d880f5e 100644 --- a/include/hw/mem/pc-dimm.h +++ b/include/hw/mem/pc-dimm.h @@ -100,4 +100,5 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, MemoryRegion *mr, uint64_t align, Error **errp); void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, MemoryRegion *mr); +int pc_dimm_built_list(Object *obj, void *opaque); #endif -- 2.7.4
[Qemu-devel] [RFC PATCH v0 2/2] spapr: Support ibm, dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao --- docs/specs/ppc-spapr-hotplug.txt | 19 +++ hw/ppc/spapr.c | 254 +-- include/hw/ppc/spapr.h | 1 + include/hw/ppc/spapr_ovec.h | 1 + 4 files changed, 235 insertions(+), 40 deletions(-) diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index f57e2a0..cc78331 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -387,4 +387,23 @@ Each LMB list entry consists of the following elements: - A 32bit flags word. The bit at bit position 0x0008 defines whether the LMB is assigned to the the partition as of boot time. +ibm,dynamic-memory-v2 + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dyanamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32bit integer. +- Logical address of the first LMB in the set encoded as a 64bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ibm,associativity-lookup-arrays property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32bit flags word that applies to all the LMBs in the set. + [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 83c9d66..e45d127 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -678,64 +678,148 @@ static uint32_t spapr_pc_dimm_node(MemoryDeviceInfoList *list, ram_addr_t addr) return -1; } -/* - * Adds ibm,dynamic-reconfiguration-memory node. - * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation - * of this device tree node. - */ -static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) +struct of_drconf_cell_v2 { + uint32_t seq_lmbs; + uint64_t base_addr; + uint32_t drc_index; + uint32_t aa_index; + uint32_t flags; +} __attribute__((packed)); + +#define SPAPR_DRCONF_CELL_SIZE 6 + +/* ibm,dynamic-memory-v2 */ +static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt, + int offset) { -MachineState *machine = MACHINE(spapr); -int ret, i, offset; -uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; -uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)}; -uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size; -uint32_t nr_lmbs = (spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr)) / - lmb_size; uint32_t *int_buf, *cur_index, buf_len; -int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; -MemoryDeviceInfoList *dimms = NULL; +int ret; +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; +uint64_t addr, cur_addr, size; +uint32_t nr_boot_lmbs = (spapr->hotplug_memory.base / lmb_size); +uint64_t mem_end = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); +uint32_t node, nr_entries = 0; +sPAPRDRConnector *drc; +GSList *list = NULL, *item; +typedef struct drconf_cell_queue { +struct of_drconf_cell_v2 cell; +QSIMPLEQ_ENTRY(drconf_cell_queue) entry; +} drconf_cell_queue; +QSIMPLEQ_HEAD(, drconf_cell_queue) drconf_queue += QSIMPLEQ_HEAD_INITIALIZER(drconf_queue); +drconf_cell_queue *elem, *next; + +/* Entry to cover RAM and the gap area */ +elem = g_malloc0(sizeof(drconf_cell_queue)); +elem->cell.seq_lmbs = cpu_to_be32(nr_boot_lmbs); +elem->cell.base_addr = cpu_to_be64(0); +elem->cell.drc_index = cpu_to_be32(0); +elem->cell.aa_index = cpu_to_be32(-1); +elem->cell.flags = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED | + SPAPR_LMB_FLAGS_DRC_INVALID); +QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry); +nr_entries++; + +object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &list); +cur_addr = spapr->hotplug_memory.base; +for (item = list; item; item = g_slist_next(item)) { +PCDIMMDevice *dimm = item->data; + +addr = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP, +&error_abort); +size = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP, +&error_abort); +
[Qemu-devel] [RFC PATCH v0 0/2] Support for ibm,dynamic-memory-v2
This is an RFC version of patchset to support the new ibm,dynamic-memory-v2 property which allows the LMB information to be represented in a more compact form by grouping LMBs into sets. I have tested it very lightly and observe guest crashes (during reboot) in a few scenarios. I am still trying to figure out the reason. Bharata B Rao (2): pc-dimm: Make pc_dimm_built_list() global spapr: Support ibm,dynamic-memory-v2 property docs/specs/ppc-spapr-hotplug.txt | 19 +++ hw/mem/pc-dimm.c | 2 +- hw/ppc/spapr.c | 254 +-- include/hw/mem/pc-dimm.h | 1 + include/hw/ppc/spapr.h | 1 + include/hw/ppc/spapr_ovec.h | 1 + 6 files changed, 237 insertions(+), 41 deletions(-) -- 2.7.4
Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] pseries: fix TCG migration
On Wed, Nov 29, 2017 at 10:58:29AM +1100, Suraj Jitindar Singh wrote: > On Tue, 2017-11-28 at 18:43 +0100, Laurent Vivier wrote: > > Migration of pseries is broken with TCG because > > QEMU tries to restore KVM MMU state unconditionally. > > > > The result is a SIGSEGV in kvm_vm_ioctl(): > > > > #0 kvm_vm_ioctl (s=0x0, type=-2146390353) > > at qemu/accel/kvm/kvm-all.c:2032 > > #1 0x0001003e3e2c in kvmppc_configure_v3_mmu (cpu= > out>, > > radix=, gtse=, > > proc_tbl=) > > at qemu/target/ppc/kvm.c:396 > > #2 0x0001002f8b88 in spapr_post_load (opaque=0x1019103c0, > > version_id=) at qemu/hw/ppc/spapr.c:1578 > > #3 0x00010059e4cc in vmstate_load_state (f=0x10623, > > vmsd=0x1009479e0 , opaque=0x1019103c0, > > version_id=) at qemu/migration/vmstate.c:165 > > #4 0x0001005987e0 in vmstate_load (f=, > > se=) > > at qemu/migration/savevm.c:748 > > > > This patch fixes the problem by not calling the KVM function with the > > TCG mode. > > > > Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests") > > Signed-off-by: Laurent Vivier > > Ah, guess I never hit this because I never tested tcg migration with a > qemu compiled on a kvm enabled system. Nice catch :) Same here. In fact I had the kvm_enabled() check in the initial verions but removed later as kvmppc_configure_v3_mmu() was defined separately as nop for !CONFIG_KVM. But obviously the above combination wasn't covered. Regards, Bharata.
Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
On Thu, Sep 14, 2017 at 10:59:05AM +0200, Igor Mammedov wrote: > On Thu, 14 Sep 2017 13:48:26 +0530 > Bharata B Rao wrote: > > > On Thu, Sep 14, 2017 at 10:00:11AM +0200, Igor Mammedov wrote: > > > On Thu, 14 Sep 2017 12:31:18 +0530 > > > Bharata B Rao wrote: > > > > > > > Hi, > > > > > > > > QEMU hits the below assert > > > > > > > > qemu-system-ppc64: used ring relocated for ring 2 > > > > qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion > > > > `r >= 0' failed. > > > > > > > > in the following scenario: > > > > > > > > 1. Boot guest with vhost=on > > > > -netdev > > > > tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device > > > > virtio-net-pci,netdev=mynet0 > > > > 2. Hot add a DIMM device > > > > 3. Reboot > > > >When the guest reboots, we can see > > > >vhost_virtqueue_start:vq->used_phys getting assigned an address that > > > >falls in the hotplugged memory range. > > > > 4. Remove the DIMM device > > > >Guest refuses the removal as the hotplugged memory is under use. > > > > 5. Reboot > > > > > > >QEMU forces the removal of the DIMM device during reset and that's > > > >when we hit the above assert. > > > I don't recall implementing forced removal om DIMM, > > > could you point out to the related code, pls? > > > > This is ppc specific. We have DR Connector objects for each LMB (multiple > > LMBs make up one DIMM device) and during reset we invoke the > > release routine for these LMBs which will further invoke > > pc_dimm_memory_unplug(). > > > > See hw/ppc/spapr_drc.c: spapr_drc_reset() > > hw/ppc/spapr.c: spapr_lmb_release() > > > > > > > > > Any pointers on why we are hitting this assert ? Shouldn't vhost be > > > > done with using the hotplugged memory when we hit reset ? > > > > > > >From another point of view, > > > DIMM shouldn't be removed unless guest explicitly ejects it > > > (at least that should be so in x86 case). > > > > While that is true for ppc also, shouldn't we start fresh from reset ? > we should. > > when it aborts vhost should print out error from vhost_verify_ring_mappings() > >if (r == -ENOMEM) { >error_report("Unable to map %s for ring %d", part_name[j], i); > >} else if (r == -EBUSY) { > >error_report("%s relocated for ring %d", part_name[j], i); > > that might give a clue where that memory stuck in. > > Michael might point out where to start look at, but he's on vacation > so ... Michael (or anyone else) - Any pointers on this problem ? Regards, Bharata.
[Qemu-devel] [FIX PATCH] monitor: Don't return CPU marked for unplug as monitor CPU
The following sequence of steps kill the QEMU: - Hotplug a CPU - Change the default CPU to the newly hotplugged cpu using "cpu" HMP command. - Hot unplug the CPU - Run "info cpus" Fix this by not letting monitor_get_cpu() to return a CPU which is marked for unplug. Reported-by: Satheesh Rajendran Signed-off-by: Bharata B Rao --- monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index fe0d1bd..8d60e57 100644 --- a/monitor.c +++ b/monitor.c @@ -1053,7 +1053,7 @@ int monitor_set_cpu(int cpu_index) CPUState *mon_get_cpu(void) { -if (!cur_mon->mon_cpu) { +if (!cur_mon->mon_cpu || cur_mon->mon_cpu->unplug) { if (!first_cpu) { return NULL; } -- 2.7.4
Re: [Qemu-devel] [PATCH v2] qdev: Mark devices as non-hotpluggable by default
On Fri, Sep 22, 2017 at 11:16:34AM +0200, Thomas Huth wrote: > Historically we've marked all devices as hotpluggable by default. However, > most devices are not hotpluggable, and you also need a HotplugHandler to > support these devices. So if the user tries to "device_add" or "device_del" > such a non-hotpluggable device during runtime, either nothing really usable > happens, or QEMU even crashes/aborts unexpectedly (see for example commit > 84ebd3e8c7d4fe955b - "Mark diag288 watchdog as non-hotpluggable"). > So let's change this dangerous default behaviour and mark the devices as > non-hotpluggable by default. Certain parent devices classes which are known > as hotpluggable (e.g. PCI, USB, etc.) are marked with "hotpluggable = true", > so that devices that are derived from these classes continue to work as > expected. I see that the discussion has moved on, but want to note here that CPU hotplug on pseries breaks with this patch. (qemu) device_add host-spapr-cpu-core,core-id=8,id=core8 Device 'host-powerpc64-cpu' does not support hotplugging (qemu) device_add POWER8E_v2.1-spapr-cpu-core,id=core8,core-id=8 Device 'POWER8E_v2.1-powerpc64-cpu' does not support hotplugging Hope I am not missing anything. Regards, Bharata.
Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
On Thu, Sep 14, 2017 at 10:59:05AM +0200, Igor Mammedov wrote: > On Thu, 14 Sep 2017 13:48:26 +0530 > Bharata B Rao wrote: > > > On Thu, Sep 14, 2017 at 10:00:11AM +0200, Igor Mammedov wrote: > > > On Thu, 14 Sep 2017 12:31:18 +0530 > > > Bharata B Rao wrote: > > > > > > > Hi, > > > > > > > > QEMU hits the below assert > > > > > > > > qemu-system-ppc64: used ring relocated for ring 2 > > > > qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion > > > > `r >= 0' failed. > > > > > > > > in the following scenario: > > > > > > > > 1. Boot guest with vhost=on > > > > -netdev > > > > tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device > > > > virtio-net-pci,netdev=mynet0 > > > > 2. Hot add a DIMM device > > > > 3. Reboot > > > >When the guest reboots, we can see > > > >vhost_virtqueue_start:vq->used_phys getting assigned an address that > > > >falls in the hotplugged memory range. > > > > 4. Remove the DIMM device > > > >Guest refuses the removal as the hotplugged memory is under use. > > > > 5. Reboot > > > > > > >QEMU forces the removal of the DIMM device during reset and that's > > > >when we hit the above assert. > > > I don't recall implementing forced removal om DIMM, > > > could you point out to the related code, pls? > > > > This is ppc specific. We have DR Connector objects for each LMB (multiple > > LMBs make up one DIMM device) and during reset we invoke the > > release routine for these LMBs which will further invoke > > pc_dimm_memory_unplug(). > > > > See hw/ppc/spapr_drc.c: spapr_drc_reset() > > hw/ppc/spapr.c: spapr_lmb_release() > > > > > > > > > Any pointers on why we are hitting this assert ? Shouldn't vhost be > > > > done with using the hotplugged memory when we hit reset ? > > > > > > >From another point of view, > > > DIMM shouldn't be removed unless guest explicitly ejects it > > > (at least that should be so in x86 case). > > > > While that is true for ppc also, shouldn't we start fresh from reset ? > we should. > > when it aborts vhost should print out error from vhost_verify_ring_mappings() > >if (r == -ENOMEM) { >error_report("Unable to map %s for ring %d", part_name[j], i); > >} else if (r == -EBUSY) { > >error_report("%s relocated for ring %d", part_name[j], i); > > that might give a clue where that memory stuck in. As I mentioned above, it fails here: > > > qemu-system-ppc64: used ring relocated for ring 2 Regards, Bharata.
Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
On Thu, Sep 14, 2017 at 10:00:11AM +0200, Igor Mammedov wrote: > On Thu, 14 Sep 2017 12:31:18 +0530 > Bharata B Rao wrote: > > > Hi, > > > > QEMU hits the below assert > > > > qemu-system-ppc64: used ring relocated for ring 2 > > qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r > > >= 0' failed. > > > > in the following scenario: > > > > 1. Boot guest with vhost=on > > -netdev tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on > > -device virtio-net-pci,netdev=mynet0 > > 2. Hot add a DIMM device > > 3. Reboot > >When the guest reboots, we can see > >vhost_virtqueue_start:vq->used_phys getting assigned an address that > >falls in the hotplugged memory range. > > 4. Remove the DIMM device > >Guest refuses the removal as the hotplugged memory is under use. > > 5. Reboot > > >QEMU forces the removal of the DIMM device during reset and that's > >when we hit the above assert. > I don't recall implementing forced removal om DIMM, > could you point out to the related code, pls? This is ppc specific. We have DR Connector objects for each LMB (multiple LMBs make up one DIMM device) and during reset we invoke the release routine for these LMBs which will further invoke pc_dimm_memory_unplug(). See hw/ppc/spapr_drc.c: spapr_drc_reset() hw/ppc/spapr.c: spapr_lmb_release() > > > Any pointers on why we are hitting this assert ? Shouldn't vhost be > > done with using the hotplugged memory when we hit reset ? > > >From another point of view, > DIMM shouldn't be removed unless guest explicitly ejects it > (at least that should be so in x86 case). While that is true for ppc also, shouldn't we start fresh from reset ? Related comment from hw/ppc/spapr_drc.c: spapr_drc_reset() /* immediately upon reset we can safely assume DRCs whose devices * are pending removal can be safely removed. */ if (drc->unplug_requested) { spapr_drc_release(drc); } So essentially in the scenario I listed, the unplug request is rejected by the guest, but during next reboot, QEMU excersies the above code and removes any devices (memory, CPU etc) that are marked for removal. Regards, Bharata.
[Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
Hi, QEMU hits the below assert qemu-system-ppc64: used ring relocated for ring 2 qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed. in the following scenario: 1. Boot guest with vhost=on -netdev tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device virtio-net-pci,netdev=mynet0 2. Hot add a DIMM device 3. Reboot When the guest reboots, we can see vhost_virtqueue_start:vq->used_phys getting assigned an address that falls in the hotplugged memory range. 4. Remove the DIMM device Guest refuses the removal as the hotplugged memory is under use. 5. Reboot QEMU forces the removal of the DIMM device during reset and that's when we hit the above assert. Any pointers on why we are hitting this assert ? Shouldn't vhost be done with using the hotplugged memory when we hit reset ? Regards, Bharata.
Re: [Qemu-devel] [PATCH v3] qemu crashes when a negative number used for 'maxcpus'
Seeteena, On Mon, Aug 28, 2017 at 7:23 PM, Seeteena Thoufeek < s1see...@linux.vnet.ibm.com> wrote: > ---Steps to Reproduce--- > > When passed a negative number to 'maxcpus' parameter, Qemu aborts > with a core dump. > > Run the following command with maxcpus argument as negative number > > ppc64-softmmu/qemu-system-ppc64 --nographic -vga none -machine > pseries,accel=kvm,kvm-type=HV -m size=200g -device virtio-blk-pci, > drive=rootdisk -drive file=/home/images/pegas-1.0-ppc64le.qcow2, > if=none,cache=none,id=rootdisk,format=qcow2 -monitor telnet > :127.0.0.1:1234,server,nowait -net nic,model=virtio -net > user -redir tcp:2000::22 -device nec-usb-xhci -smp 8,cores=1, > threads=1,maxcpus=-12 > > (process:12149): GLib-ERROR **: gmem.c:130: failed to allocate > 18446744073709550568 bytes > > Trace/breakpoint trap > > Reported-by: R.Nageswara Sastry > Signed-off-by: Seeteena Thoufeek > Reviewed-by: Bharata B Rao > In the bugzilla, I was only suggesting to post the fix upstream and it doesn't mean a Reviewed-by. You might want to remove this in your next version. Regards, Bharata.
[Qemu-devel] [FIX PATCH v1] spapr: Allow configure-connector to be called multiple times
In case of in-kernel memory hot unplug, when the guest is not able to remove all the LMBs that are requested for removal, it will add back any LMBs that have been successfully removed. The DR Connectors of these LMBs wouldn't have been unconfigured and hence the addition of these LMBs will result in configure-connector call being issued on LMB DR connectors that are already in configured state. Such configure-connector calls will fail resulting in a DIMM which is partially unplugged. This however worked till recently before we overhauled the DRC implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate DRC state variables" is the first commit where this problem shows up as per git bisect. Ideally guest shouldn't be issuing configure-connector call on an already configured DR connector. However for now, work around this in QEMU by allowing configure-connector to be called multiple times for all types of DR connectors. Signed-off-by: Bharata B Rao --- v0: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02942.html Changes in v1: - Allow configure-connector to be called multiple times for all types of DR connectors and not just LMB DRCs. (David Gibson) - Explicitly allow configure-connector to proceed only if the DRC is either in unisolated or in configured state. (David Gibson) hw/ppc/spapr_drc.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 5260b5d..40d1e99 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -446,8 +446,12 @@ void spapr_drc_reset(sPAPRDRConnector *drc) drc->state = drck->empty_state; } -drc->ccs_offset = -1; -drc->ccs_depth = -1; +/* + * Ensure that we are able to send the FDT fragment again + * via configure-connector call if the guest requests. + */ +drc->ccs_offset = drc->fdt_start_offset; +drc->ccs_depth = 0; } static void drc_reset(void *opaque) @@ -1071,8 +1075,14 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu, } if ((drc->state != SPAPR_DRC_STATE_LOGICAL_UNISOLATE) -&& (drc->state != SPAPR_DRC_STATE_PHYSICAL_UNISOLATE)) { -/* Need to unisolate the device before configuring */ +&& (drc->state != SPAPR_DRC_STATE_PHYSICAL_UNISOLATE) +&& (drc->state != SPAPR_DRC_STATE_LOGICAL_CONFIGURED) +&& (drc->state != SPAPR_DRC_STATE_PHYSICAL_CONFIGURED)) { +/* + * Need to unisolate the device before configuring + * or it should already be in configured state to + * allow configure-connector be called repeatedly. + */ rc = SPAPR_DR_CC_RESPONSE_NOT_CONFIGURABLE; goto out; } @@ -1108,8 +1118,13 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu, /* done sending the device tree, move to configured state */ trace_spapr_drc_set_configured(drc_index); drc->state = drck->ready_state; -drc->ccs_offset = -1; -drc->ccs_depth = -1; +/* + * Ensure that we are able to send the FDT fragment + * again via configure-connector call if the guest requests. + */ +drc->ccs_offset = drc->fdt_start_offset; +drc->ccs_depth = 0; +fdt_offset_next = drc->fdt_start_offset; resp = SPAPR_DR_CC_RESPONSE_SUCCESS; } else { resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT; -- 2.7.4
[Qemu-devel] [FIX PATCH v0] spapr: Allow configure-connector to be called multiple times for LMBs
In case of in-kernel memory hot unplug, when the guest is not able to remove all the LMBs that are requested for removal, it will add back any LMBs that have been successfully removed. The DR Connectors of these LMBs wouldn't have been unconfigured and hence the addition of these LMBs will result in configure-connector call being issued on LMB DR connectors that are already in configured state. Such configure-connector calls will fail resulting in a DIMM which is partially unplugged. This however worked till recently before we overhauled the DRC implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate DRC state variables" is the first commit where this problem shows up as per git bisect. Ideally guest shouldn't be issuing configure-connector call on an already configured DR connector. However for now, work around this in QEMU by allowing configure-connector to be called multiple times for LMBs. Signed-off-by: Bharata B Rao --- hw/ppc/spapr_drc.c | 37 +++-- 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 5260b5d..2dd9635 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -446,8 +446,17 @@ void spapr_drc_reset(sPAPRDRConnector *drc) drc->state = drck->empty_state; } -drc->ccs_offset = -1; -drc->ccs_depth = -1; +if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) { +/* + * Ensure that we are able to send the FDT fragment of the + * LMB again via configure-connector call if guest requests. + */ +drc->ccs_offset = drc->fdt_start_offset; +drc->ccs_depth = 0; +} else { +drc->ccs_offset = -1; +drc->ccs_depth = -1; +} } static void drc_reset(void *opaque) @@ -1071,8 +1080,14 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu, } if ((drc->state != SPAPR_DRC_STATE_LOGICAL_UNISOLATE) -&& (drc->state != SPAPR_DRC_STATE_PHYSICAL_UNISOLATE)) { -/* Need to unisolate the device before configuring */ +&& (drc->state != SPAPR_DRC_STATE_PHYSICAL_UNISOLATE) && +(spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_LMB)) { +/* + * Need to unisolate the device before configuring, however + * LMB DRCs are exempted from this check as guest can issue + * configure-connector calls for an already configured + * LMB DRC. + */ rc = SPAPR_DR_CC_RESPONSE_NOT_CONFIGURABLE; goto out; } @@ -1108,8 +1123,18 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu, /* done sending the device tree, move to configured state */ trace_spapr_drc_set_configured(drc_index); drc->state = drck->ready_state; -drc->ccs_offset = -1; -drc->ccs_depth = -1; +if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) { +/* + * Ensure that we are able to send the FDT fragment of the + * LMB again via configure-connector call if guest requests. + */ +drc->ccs_offset = drc->fdt_start_offset; +drc->ccs_depth = 0; +fdt_offset_next = drc->fdt_start_offset; +} else { +drc->ccs_offset = -1; +drc->ccs_depth = -1; +} resp = SPAPR_DR_CC_RESPONSE_SUCCESS; } else { resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT; -- 2.7.4
[Qemu-devel] [FIX PATCH v2] spapr: Fix QEMU abort during memory unplug
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 Signed-off-by: Bharata B Rao --- Changes in v2: - sPAPRDIMMState is now allocated within spapr_pending_dimm_unplugs_add() itself (David Gibson) - spapr_recover_pending_dimm_state() should never return a NULL sPAPRDIMMState, added an assert for the same. hw/ppc/spapr.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1cb09e7..2465b27 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2850,11 +2850,25 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, return dimm_state; } -static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, - sPAPRDIMMState *dimm_state) +static sPAPRDIMMState *spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, + uint32_t nr_lmbs, + PCDIMMDevice *dimm) { -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); -QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); +sPAPRDIMMState *ds = NULL; + +/* + * If this request is for a DIMM whose removal had failed earlier + * (due to guest's refusal to remove the LMBs), we would have this + * dimm already in the pending_dimm_unplugs list. In that + * case don't add again. + */ +if (!spapr_pending_dimm_unplugs_find(spapr, dimm)) { +ds = g_malloc0(sizeof(sPAPRDIMMState)); +ds->nr_lmbs = nr_lmbs; +ds->dimm = dimm; +QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, ds, next); +} +return ds; } static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr, @@ -2875,7 +2889,6 @@ static sPAPRDIMMState *spapr_recover_pending_dimm_state(sPAPRMachineState *ms, uint32_t avail_lmbs = 0; uint64_t addr_start, addr; int i; -sPAPRDIMMState *ds; addr_start = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &error_abort); @@ -2891,11 +2904,7 @@ static sPAPRDIMMState *spapr_recover_pending_dimm_state(sPAPRMachineState *ms, addr += SPAPR_MEMORY_BLOCK_SIZE; } -ds = g_malloc0(sizeof(sPAPRDIMMState)); -ds->nr_lmbs = avail_lmbs; -ds->dimm = dimm; -spapr_pending_dimm_unplugs_add(ms, ds); -return ds; +return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm); } /* Callback to be called during DRC release. */ @@ -2911,6 +2920,7 @@ void spapr_lmb_release(DeviceState *dev) * during the unplug process. In this case recover it. */ if (ds == NULL) { ds = spapr_recover_pending_dimm_state(spapr, PC_DIMM(dev)); +g_assert(ds); /* The DRC being examined by the caller at least must be counted */ g_assert(ds->nr_lmbs); } @@ -2942,18 +2952,13 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev, uint64_t addr_start, addr; int i; sPAPRDRConnector *drc; -sPAPRDIMMState *ds; - addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err); if (local_err) { goto out; } -ds = g_malloc0(sizeof(sPAPRDIMMState)); -ds->nr_lmbs = nr_lmbs; -ds->dimm = dimm; -spapr_pending_dimm_unplugs_add(spapr, ds); +spapr_pending_dimm_unplugs_add(spapr, nr_lmbs, dimm); addr = addr_start; for (i = 0; i < nr_lmbs; i++) { -- 2.7.4
[Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 Signed-off-by: Bharata B Rao --- Changes in v1: - Added comment (David Gibson) - Ensured we free sPAPRDIMMState when corresonding entry already exists (Daniel Henrique Barboza) Daniel had shown another alternative, we can switch over to that if preferred. hw/ppc/spapr.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1cb09e7..c6091e2 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2853,8 +2853,17 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, sPAPRDIMMState *dimm_state) { -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); -QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); +/* + * If this request is for a DIMM whose removal had failed earlier + * (due to guest's refusal to remove the LMBs), we would have this + * dimm_state already in the pending_dimm_unplugs list. In that + * case don't add again. + */ +if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { +QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); +} else { +g_free(dimm_state); +} } static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr, -- 2.7.4
[Qemu-devel] [FIX PATCH] spapr: Fix QEMU abort during memory unplug
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1cb09e7..990bb2d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, sPAPRDIMMState *dimm_state) { -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); -QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); +if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { +QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); +} } static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr, -- 2.7.4
Re: [Qemu-devel] [PATCH] spapr: fix potential memory leak in spapr_core_plug()
On Wed, Jul 12, 2017 at 01:11:30PM +0200, Greg Kurz wrote: > On Wed, 12 Jul 2017 15:52:37 +0530 > Bharata B Rao wrote: > > > On Wed, Jul 12, 2017 at 11:48:39AM +0200, Greg Kurz wrote: > > > Since commit 5c1da81215c7 ("spapr: Remove unnecessary differences between > > > hotplug and coldplug paths"), the CPU DT for the DRC is always allocated. > > > This causes a memory leak for pseries-2.6 and older machine types, that > > > don't support CPU hotplug and don't allocate DRCs for CPUs. > > > > > > Reported-by: Bharata B Rao > > > Signed-off-by: Greg Kurz > > > --- > > > hw/ppc/spapr.c |9 + > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index 12b3f099d4c9..4a480e1c1dd9 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -2993,8 +2993,6 @@ static void spapr_core_plug(HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > CPUState *cs = CPU(core->threads); > > > sPAPRDRConnector *drc; > > > Error *local_err = NULL; > > > -void *fdt = NULL; > > > -int fdt_offset = 0; > > > int smt = kvmppc_smt_threads(); > > > CPUArchId *core_slot; > > > int index; > > > @@ -3009,9 +3007,12 @@ static void spapr_core_plug(HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > > > > g_assert(drc || !mc->has_hotpluggable_cpus); > > > > > > -fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); > > > - > > > if (drc) { > > > +void *fdt; > > > +int fdt_offset; > > > + > > > +fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); > > > + > > > spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err); > > > if (local_err) { > > > g_free(fdt); > > > > You say this in the patch description already, but want to note explicitly > > that this prevents double allocation for pseries-2.6 and ealier types and > > not for newer machine types. > > > > This DT node doesn't have the exact same use as the one allocated in > spapr_populate_cpus_dt_node(). Here, it is needed by the DRC logic: What I mean to say is that even after this fix, for machines types > 2.6, we still allocate memory for FDT and go through the code that creates DT entries for CPUs twice. What we create via spapr_populate_cpus_dt_node() is used while the CPU DT entries created by spapr_populate_hotplug_cpu_dt() won't be used and corresponding drc->fdt is never freed, afaics for boot time and cold-plugged CPUs. May be this can be fixed during DT code reorg effort that David is planning. Regards, Bharata.
Re: [Qemu-devel] [PATCH] spapr: fix potential memory leak in spapr_core_plug()
On Wed, Jul 12, 2017 at 11:48:39AM +0200, Greg Kurz wrote: > Since commit 5c1da81215c7 ("spapr: Remove unnecessary differences between > hotplug and coldplug paths"), the CPU DT for the DRC is always allocated. > This causes a memory leak for pseries-2.6 and older machine types, that > don't support CPU hotplug and don't allocate DRCs for CPUs. > > Reported-by: Bharata B Rao > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr.c |9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 12b3f099d4c9..4a480e1c1dd9 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2993,8 +2993,6 @@ static void spapr_core_plug(HotplugHandler > *hotplug_dev, DeviceState *dev, > CPUState *cs = CPU(core->threads); > sPAPRDRConnector *drc; > Error *local_err = NULL; > -void *fdt = NULL; > -int fdt_offset = 0; > int smt = kvmppc_smt_threads(); > CPUArchId *core_slot; > int index; > @@ -3009,9 +3007,12 @@ static void spapr_core_plug(HotplugHandler > *hotplug_dev, DeviceState *dev, > > g_assert(drc || !mc->has_hotpluggable_cpus); > > -fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); > - > if (drc) { > +void *fdt; > +int fdt_offset; > + > +fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); > + > spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err); > if (local_err) { > g_free(fdt); You say this in the patch description already, but want to note explicitly that this prevents double allocation for pseries-2.6 and ealier types and not for newer machine types. Regards, Bharata.
Re: [Qemu-devel] [PULL 09/17] spapr: Remove unnecessary differences between hotplug and coldplug paths
On Tue, Jul 11, 2017 at 02:39:09PM +1000, David Gibson wrote: > spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into > configured state initially, instead of the usual ISOLATED/UNUSABLE state. > It turns out this is unnecessary: although coldplugged devices do need to > be in CONFIGURED state once the guest starts, that will already be > accomplished by the reset code which will move DRCs for already plugged > devices into a coldplug equivalent state. > > Signed-off-by: David Gibson > Reviewed-by: Laurent Vivier > Reviewed-by: Greg Kurz > --- > hw/ppc/spapr.c | 13 +++-- > hw/ppc/spapr_drc.c | 5 ++--- > hw/ppc/spapr_pci.c | 3 +-- > include/hw/ppc/spapr_drc.h | 2 +- > 4 files changed, 7 insertions(+), 16 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 4fa982d..70b3fd3 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2611,7 +2611,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t > addr_start, uint64_t size, > fdt_offset = spapr_populate_memory_node(fdt, node, addr, > SPAPR_MEMORY_BLOCK_SIZE); > > -spapr_drc_attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp); > +spapr_drc_attach(drc, dev, fdt, fdt_offset, errp); > addr += SPAPR_MEMORY_BLOCK_SIZE; > } > /* send hotplug notification to the > @@ -2956,17 +2956,10 @@ static void spapr_core_plug(HotplugHandler > *hotplug_dev, DeviceState *dev, > > g_assert(drc || !mc->has_hotpluggable_cpus); > > -/* > - * Setup CPU DT entries only for hotplugged CPUs. For boot time or > - * coldplugged CPUs DT entries are setup in spapr_build_fdt(). > - */ > -if (dev->hotplugged) { > -fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); > -} > +fdt = spapr_populate_hotplug_cpu_dt(cs, &fdt_offset, spapr); A harmless (but still good to get rid of) side effect of this change is that we are now building the CPU device tree for boot time and cold-plugged CPUs twice: first from spapr_core_plug() -> spapr_populate_hotplug_cpu_dt() second from spapr_build_fdt() -> spapr_populate_cpus_dt_node() The first one is not used while the 2nd one acutally ends up building the CPUs DT entries for the boot time and cold-plugged CPUs. Regards, Bharata.
Re: [Qemu-devel] [PATCH] spapr: fix memory hotplug error path
On Tue, Jul 04, 2017 at 10:02:46AM +0200, Greg Kurz wrote: > > > There is some history to this. I was doing error recovery and propagation > > > here similarly during memory hotplug development phase until Igor > > > suggested that we shoudn't try to recover after we have done guest > > > visible changes. > > > > > > Refer to "changes in v6" section in this post: > > > https://lists.gnu.org/archive/html/qemu-ppc/2015-06/msg00296.html > > > > > > However at that time we were doing memory add by DRC index method > > > and hence would attach and online one LMB at a time. > > > In that method, if an intermediate attach fails we would end up with a few > > > LMBs being onlined by the guest already. However subsequently > > > we have switched (optionally, based on dedicated_hp_event_source) to > > > count-indexed method of hotplug where we do attach of all LMBs one by one > > > and then request the guest to hotplug all of them at once using > > > count-indexed > > > method. > > > > > > So it will be a bit tricky to abort for index based case and recover > > > correctly for count-indexed case. > > > > Looked at the code again and realized that though we started with > > index based LMB addition, we later switched to count based addition. Then > > we added support for count-indexed type subject to the presence > > of dedidated hotplug event source while still retaining the support > > for count based addition. > > > > So presently we do attach of all LMBs one by one and then do onlining > > (count based or count-indexed based) once. Hence error recovery > > for both cases would be similar now. So I guess you should take care of > > undoing pc_dimm_memory_plug() like Igor mentioned and also undo the > > effects of partial successful attaches. > > > > I've sent a v2 that adds rollback. oh ok, somehow v2 didn't reach me at all and I saw the v2 in archives only now. So just noting that my above replies were sent w/o being aware of v2 :) > > > > > > Regards, > > > Bharata.
Re: [Qemu-devel] [PATCH] spapr: fix memory hotplug error path
On Tue, Jul 04, 2017 at 09:01:43AM +0530, Bharata B Rao wrote: > On Mon, Jul 03, 2017 at 02:21:31PM +0200, Greg Kurz wrote: > > QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails. > > Let's propagate the error instead, like it is done everywhere else > > where spapr_drc_attach() is called. > > > > Signed-off-by: Greg Kurz > > --- > > hw/ppc/spapr.c | 10 -- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 70b3fd374e2b..e103be500189 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -2601,6 +2601,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t > > addr_start, uint64_t size, > > int i, fdt_offset, fdt_size; > > void *fdt; > > uint64_t addr = addr_start; > > +Error *local_err = NULL; > > > > for (i = 0; i < nr_lmbs; i++) { > > drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, > > @@ -2611,7 +2612,12 @@ static void spapr_add_lmbs(DeviceState *dev, > > uint64_t addr_start, uint64_t size, > > fdt_offset = spapr_populate_memory_node(fdt, node, addr, > > SPAPR_MEMORY_BLOCK_SIZE); > > > > -spapr_drc_attach(drc, dev, fdt, fdt_offset, errp); > > +spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err); > > +if (local_err) { > > +g_free(fdt); > > +error_propagate(errp, local_err); > > +return; > > +} > > There is some history to this. I was doing error recovery and propagation > here similarly during memory hotplug development phase until Igor > suggested that we shoudn't try to recover after we have done guest > visible changes. > > Refer to "changes in v6" section in this post: > https://lists.gnu.org/archive/html/qemu-ppc/2015-06/msg00296.html > > However at that time we were doing memory add by DRC index method > and hence would attach and online one LMB at a time. > In that method, if an intermediate attach fails we would end up with a few > LMBs being onlined by the guest already. However subsequently > we have switched (optionally, based on dedicated_hp_event_source) to > count-indexed method of hotplug where we do attach of all LMBs one by one > and then request the guest to hotplug all of them at once using count-indexed > method. > > So it will be a bit tricky to abort for index based case and recover > correctly for count-indexed case. Looked at the code again and realized that though we started with index based LMB addition, we later switched to count based addition. Then we added support for count-indexed type subject to the presence of dedidated hotplug event source while still retaining the support for count based addition. So presently we do attach of all LMBs one by one and then do onlining (count based or count-indexed based) once. Hence error recovery for both cases would be similar now. So I guess you should take care of undoing pc_dimm_memory_plug() like Igor mentioned and also undo the effects of partial successful attaches. > > Regards, > Bharata.
Re: [Qemu-devel] [PATCH] spapr: fix memory hotplug error path
On Mon, Jul 03, 2017 at 02:21:31PM +0200, Greg Kurz wrote: > QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails. > Let's propagate the error instead, like it is done everywhere else > where spapr_drc_attach() is called. > > Signed-off-by: Greg Kurz > --- > hw/ppc/spapr.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 70b3fd374e2b..e103be500189 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2601,6 +2601,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t > addr_start, uint64_t size, > int i, fdt_offset, fdt_size; > void *fdt; > uint64_t addr = addr_start; > +Error *local_err = NULL; > > for (i = 0; i < nr_lmbs; i++) { > drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, > @@ -2611,7 +2612,12 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t > addr_start, uint64_t size, > fdt_offset = spapr_populate_memory_node(fdt, node, addr, > SPAPR_MEMORY_BLOCK_SIZE); > > -spapr_drc_attach(drc, dev, fdt, fdt_offset, errp); > +spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err); > +if (local_err) { > +g_free(fdt); > +error_propagate(errp, local_err); > +return; > +} There is some history to this. I was doing error recovery and propagation here similarly during memory hotplug development phase until Igor suggested that we shoudn't try to recover after we have done guest visible changes. Refer to "changes in v6" section in this post: https://lists.gnu.org/archive/html/qemu-ppc/2015-06/msg00296.html However at that time we were doing memory add by DRC index method and hence would attach and online one LMB at a time. In that method, if an intermediate attach fails we would end up with a few LMBs being onlined by the guest already. However subsequently we have switched (optionally, based on dedicated_hp_event_source) to count-indexed method of hotplug where we do attach of all LMBs one by one and then request the guest to hotplug all of them at once using count-indexed method. So it will be a bit tricky to abort for index based case and recover correctly for count-indexed case. Regards, Bharata.
Re: [Qemu-devel] [PATCH 4/6] spapr: Make DRC reset force DRC into known state
On Thu, Jun 08, 2017 at 03:09:28PM +1000, David Gibson wrote: > The reset handler for DRCs attempts several state transitions which are > subject to various checks and restrictions. But at reset time we know > there is no guest, so we can ignore most of the usual sequencing rules and > just set the DRC back to a known state. In fact, it's safer to do so. > > The existing code also has several redundant checks for > drc->awaiting_release inside a block which has already tested that. This > patch removes those and sets the DRC to a fixed initial state based only > on whether a device is currently plugged or not. > > With DRCs correctly reset to a state based on device presence, we don't > need to force state transitions as cold plugged devices are processed. > This allows us to remove all the callers of the set_*_state() methods from > outside spapr_drc.c. > > Signed-off-by: David Gibson > --- > hw/ppc/spapr.c | 15 --- > hw/ppc/spapr_drc.c | 28 > 2 files changed, 8 insertions(+), 35 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 01dda9e..c988e38 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2545,12 +2545,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t > addr_start, uint64_t size, > > spapr_drc_attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp); > addr += SPAPR_MEMORY_BLOCK_SIZE; > -if (!dev->hotplugged) { > -sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > -/* guests expect coldplugged LMBs to be pre-allocated */ > -drck->set_allocation_state(drc, > SPAPR_DR_ALLOCATION_STATE_USABLE); > -drck->set_isolation_state(drc, > SPAPR_DR_ISOLATION_STATE_UNISOLATED); > -} > } > /* send hotplug notification to the > * guest only in case of hotplugged memory > @@ -2901,15 +2895,6 @@ static void spapr_core_plug(HotplugHandler > *hotplug_dev, DeviceState *dev, > * of hotplugged CPUs. > */ > spapr_hotplug_req_add_by_index(drc); > -} else { > -/* > - * Set the right DRC states for cold plugged CPU. > - */ > -if (drc) { > -sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > -drck->set_allocation_state(drc, > SPAPR_DR_ALLOCATION_STATE_USABLE); > -drck->set_isolation_state(drc, > SPAPR_DR_ISOLATION_STATE_UNISOLATED); So here you are removing the initial state setting for cold plugged CPUs and ... > -} > } > core_slot->cpu = OBJECT(dev); > } > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index dc4ac77..7e17f34 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -393,7 +393,6 @@ static bool release_pending(sPAPRDRConnector *drc) > static void reset(DeviceState *d) > { > sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d); > -sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > trace_spapr_drc_reset(spapr_drc_index(drc)); > > @@ -401,28 +400,17 @@ static void reset(DeviceState *d) > drc->ccs = NULL; > > /* immediately upon reset we can safely assume DRCs whose devices > - * are pending removal can be safely removed, and that they will > - * subsequently be left in an ISOLATED state. move the DRC to this > - * state in these cases (which will in turn complete any pending > - * device removals) > + * are pending removal can be safely removed. > */ > if (drc->awaiting_release) { > -drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_ISOLATED); > -/* generally this should also finalize the removal, but if the device > - * hasn't yet been configured we normally defer removal under the > - * assumption that this transition is taking place as part of device > - * configuration. so check if we're still waiting after this, and > - * force removal if we are > - */ > -if (drc->awaiting_release) { > -spapr_drc_detach(drc, DEVICE(drc->dev), NULL); > -} > +spapr_drc_release(drc); > +} > > -/* non-PCI devices may be awaiting a transition to UNUSABLE */ > -if (spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PCI && > -drc->awaiting_release) { > -drck->set_allocation_state(drc, > SPAPR_DR_ALLOCATION_STATE_UNUSABLE); > -} > +drc->isolation_state = drc->dev ? SPAPR_DR_ISOLATION_STATE_UNISOLATED > +: SPAPR_DR_ISOLATION_STATE_ISOLATED; > +if (spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PCI) { > +drc->allocation_state = drc->dev ? SPAPR_DR_ALLOCATION_STATE_USABLE > +: SPAPR_DR_ALLOCATION_STATE_UNUSABLE; > } ... adding it via reset. However you are setting drc->isolation_state and drc->allocation_state directly rather than going via drck->set_isolation_state() and drck->set_allocation_state() routines. This r
Re: [Qemu-devel] [FIX PATCH] spapr: prevent QEMU crash when CPU realization fails
On Thu, Jun 15, 2017 at 09:32:38AM +0200, Greg Kurz wrote: > On Thu, 15 Jun 2017 08:22:44 +0530 > Bharata B Rao wrote: > > > ICPState objects were being allocated before CPU thread realization. > > However commit 9ed656631d73 (xics: setup cpu at realize time) reversed it > > by allocating ICPState objects after CPU thread is realized. But it > > didn't take care to fix the error path because of which we observe > > a SIGSEGV when CPU thread realization fails during cold/hotplug. > > > > Fix this by ensuring that we do object_unparent() of ICPState object > > only in case when is was created earlier. > > > > Oops, my bad... my initial intent was to conditionally call object_unparent() > and I simply forgot to put the "if (obj) { }". But your patch is valid as well > of course. Maybe you can drop the initialization of obj to NULL on the way, > since it really doesn't make sense anymore. Here is the version w/o initializing obj to NULL. >From cb9cc946df0d1c430cccb1463d78fa4b41e9f0ee Mon Sep 17 00:00:00 2001 From: Bharata B Rao Date: Wed, 14 Jun 2017 19:24:43 +0530 Subject: [FIX PATCH v1] spapr: prevent QEMU crash when CPU realization fails ICPState objects were being allocated before CPU thread realization. However commit 9ed656631d73 (xics: setup cpu at realize time) reversed it by allocating ICPState objects after CPU thread is realized. But it didn't take care to fix the error path because of which we observe a SIGSEGV when CPU thread realization fails during cold/hotplug. Fix this by ensuring that we do object_unparent() of ICPState object only in case when is was created earlier. Signed-off-by: Bharata B Rao Reviewed-by: Greg Kurz --- hw/ppc/spapr_cpu_core.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index d6719d5..ea278ce 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -178,7 +178,7 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); CPUState *cs = CPU(child); PowerPCCPU *cpu = POWERPC_CPU(cs); -Object *obj = NULL; +Object *obj; object_property_set_bool(child, true, "realized", &local_err); if (local_err) { @@ -198,13 +198,14 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort); object_property_set_bool(obj, true, "realized", &local_err); if (local_err) { -goto error; +goto free_icp; } return; -error: +free_icp: object_unparent(obj); +error: error_propagate(errp, local_err); } -- 2.7.4
[Qemu-devel] [FIX PATCH] target/ppc: Proper cleanup when ppc_cpu_realizefn fails
If ppc_cpu_realizefn() fails after cpu_exec_realizefn() has been called, we will have to undo whatever cpu_exec_realizefn() did by explicitly calling cpu_exec_unrealizeffn() which is currently missing. Failure to do this proper cleanup will result in CPU which was never fully realized to linger on the cpus list causing SIGSEGV later (for eg when running "info cpus"). Signed-off-by: Bharata B Rao --- target/ppc/translate_init.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index e837cd2..53aff5a 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -9825,14 +9825,14 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error **errp) error_append_hint(errp, "Adjust the number of cpus to %d " "or try to raise the number of threads per core\n", cpu->cpu_dt_id * smp_threads / max_smt); -return; +goto unrealize; } #endif if (tcg_enabled()) { if (ppc_fixup_cpu(cpu) != 0) { error_setg(errp, "Unable to emulate selected CPU with TCG"); -return; +goto unrealize; } } @@ -9841,14 +9841,14 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error **errp) error_setg(errp, "CPU does not possess a BookE or 4xx MMU. " "Please use qemu-system-ppc or qemu-system-ppc64 instead " "or choose another CPU model."); -return; +goto unrealize; } #endif create_ppc_opcodes(cpu, &local_err); if (local_err != NULL) { error_propagate(errp, local_err); -return; +goto unrealize; } init_ppc_proc(cpu); @@ -10033,6 +10033,10 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error **errp) fflush(stdout); } #endif +return; + +unrealize: +cpu_exec_unrealizefn(cs); } static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp) -- 2.7.4
[Qemu-devel] [FIX PATCH] spapr: prevent QEMU crash when CPU realization fails
ICPState objects were being allocated before CPU thread realization. However commit 9ed656631d73 (xics: setup cpu at realize time) reversed it by allocating ICPState objects after CPU thread is realized. But it didn't take care to fix the error path because of which we observe a SIGSEGV when CPU thread realization fails during cold/hotplug. Fix this by ensuring that we do object_unparent() of ICPState object only in case when is was created earlier. Signed-off-by: Bharata B Rao --- NOTE: There is another SIGSEGV failure that I am investigating which happens when CPU realization fails. It appears that that the CPU object isn't getting fully cleaned up. hw/ppc/spapr_cpu_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index d6719d5..0d0e959 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -198,13 +198,14 @@ static void spapr_cpu_core_realize_child(Object *child, Error **errp) object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort); object_property_set_bool(obj, true, "realized", &local_err); if (local_err) { -goto error; +goto free_icp; } return; -error: +free_icp: object_unparent(obj); +error: error_propagate(errp, local_err); } -- 2.7.4
Re: [Qemu-devel] [PATCH v6 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream
On Mon, Jun 12, 2017 at 05:10:44PM +0800, David Gibson wrote: > On Mon, Jun 12, 2017 at 11:02:34AM +0530, Bharata B Rao wrote: > > Add a "no HPT" encoding (using value -1) to the HTAB migration > > stream (in the place of HPT size) when the guest doesn't allocate HPT. > > This will help the target side to match target HPT with the source HPT > > and thus enable successful migration. > > > > Suggested-by: David Gibson > > Signed-off-by: Bharata B Rao > > --- > > hw/ppc/spapr.c | 29 + > > 1 file changed, 25 insertions(+), 4 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 8b541d9..c425499 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1558,13 +1558,19 @@ static int htab_save_setup(QEMUFile *f, void > > *opaque) > > sPAPRMachineState *spapr = opaque; > > > > /* "Iteration" header */ > > -qemu_put_be32(f, spapr->htab_shift); > > +if (!spapr->htab_shift) { > > +qemu_put_be32(f, -1); > > +} else { > > +qemu_put_be32(f, spapr->htab_shift); > > +} > > > > if (spapr->htab) { > > spapr->htab_save_index = 0; > > spapr->htab_first_pass = true; > > } else { > > -assert(kvm_enabled()); > > +if (spapr->htab_shift) { > > +assert(kvm_enabled()); > > +} > > } > > > > > > @@ -1710,7 +1716,12 @@ static int htab_save_iterate(QEMUFile *f, void > > *opaque) > > int rc = 0; > > > > /* Iteration header */ > > -qemu_put_be32(f, 0); > > +if (!spapr->htab_shift) { > > +qemu_put_be32(f, -1); > > +return 0; > > +} else { > > +qemu_put_be32(f, 0); > > +} > > > > if (!spapr->htab) { > > assert(kvm_enabled()); > > @@ -1744,7 +1755,12 @@ static int htab_save_complete(QEMUFile *f, void > > *opaque) > > int fd; > > > > /* Iteration header */ > > -qemu_put_be32(f, 0); > > +if (!spapr->htab_shift) { > > +qemu_put_be32(f, -1); > > +return 0; > > +} else { > > +qemu_put_be32(f, 0); > > +} > > Do you actually need the modifications for _iterate and _complete? I > would have thought you just wouldn't need to send any more of the HPT > stream at all after sending the -1 header. _setup, _iterate, _complete handler routines for HTAB always get interspersed with similar routines for ram savevm handlers as per what I have seen. And moreover these are called by the core migration code and hence we they get called, we need these changes to ensure that we don't attempt to send HPT stream. > > We should also adjust the downtime estimation logic so we don't allow > for transferring an HPT that isn't there. I will have to check that out. > > > if (!spapr->htab) { > > int rc; > > @@ -1788,6 +1804,11 @@ static int htab_load(QEMUFile *f, void *opaque, int > > version_id) > > > > section_hdr = qemu_get_be32(f); > > > > +if (section_hdr == -1) { > > +spapr_free_hpt(spapr); > > +return 0; > > +} > > Strictly speaking we probably shouldn't just return here. We should > wait and see if there is more data in the stream. Because of the way the source sends data (with HPT data getting interspersed with ram data, I don't think we can say for sure if we got or will get any HPT data following the no-HPT indication. > Any actual content > (i.e. section_hdr == 0 sections) would be an error at this point. > However, a reset to an HPT guest would be represented by a new > non-zero section header, then more data. This isn't urgent, but it > would be nice to fix at some point. Sure. Regards, Bharata.
[Qemu-devel] [PATCH v6 0/2] ppc/spapr: Fix migration of radix guests
This patchset fixes the migration of sPAPR radix guests. Migration of hash and radix guests individually has been tested on TCG and KVM (P8 and P9 hosts). Changeover from HPT to RPT and vice versa via reboot during migration isn't tested yet since it is possible to test that currently only with TCG and TCG reboot is being fixed. Changes in v6: -- - Ensure any allocated HPT is free by the target when the source doesn't send HPT. v5: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01637.html Bharata B Rao (2): spapr: Add a "no HPT" encoding to HTAB migration stream spapr: Fix migration of Radix guests hw/ppc/spapr.c | 41 + 1 file changed, 37 insertions(+), 4 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH v6 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream
Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. Suggested-by: David Gibson Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 8b541d9..c425499 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1558,13 +1558,19 @@ static int htab_save_setup(QEMUFile *f, void *opaque) sPAPRMachineState *spapr = opaque; /* "Iteration" header */ -qemu_put_be32(f, spapr->htab_shift); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +} else { +qemu_put_be32(f, spapr->htab_shift); +} if (spapr->htab) { spapr->htab_save_index = 0; spapr->htab_first_pass = true; } else { -assert(kvm_enabled()); +if (spapr->htab_shift) { +assert(kvm_enabled()); +} } @@ -1710,7 +1716,12 @@ static int htab_save_iterate(QEMUFile *f, void *opaque) int rc = 0; /* Iteration header */ -qemu_put_be32(f, 0); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +return 0; +} else { +qemu_put_be32(f, 0); +} if (!spapr->htab) { assert(kvm_enabled()); @@ -1744,7 +1755,12 @@ static int htab_save_complete(QEMUFile *f, void *opaque) int fd; /* Iteration header */ -qemu_put_be32(f, 0); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +return 0; +} else { +qemu_put_be32(f, 0); +} if (!spapr->htab) { int rc; @@ -1788,6 +1804,11 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) section_hdr = qemu_get_be32(f); +if (section_hdr == -1) { +spapr_free_hpt(spapr); +return 0; +} + if (section_hdr) { Error *local_err = NULL; -- 2.7.4
[Qemu-devel] [PATCH v6 2/2] spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry Signed-off-by: Bharata B Rao Reviewed-by: Suraj Jitindar Singh --- hw/ppc/spapr.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c425499..b2217f3 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1443,6 +1443,18 @@ static int spapr_post_load(void *opaque, int version_id) err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); } +if (spapr->patb_entry) { +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); +bool radix = !!(spapr->patb_entry & PATBE1_GR); +bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); + +err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry); +if (err) { +error_report("Process table config unsupported by the host"); +return -EINVAL; +} +} + return err; } -- 2.7.4
[Qemu-devel] [PATCH v5 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream
Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. Suggested-by: David Gibson Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 28 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 86e6228..df27c5c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1557,13 +1557,19 @@ static int htab_save_setup(QEMUFile *f, void *opaque) sPAPRMachineState *spapr = opaque; /* "Iteration" header */ -qemu_put_be32(f, spapr->htab_shift); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +} else { +qemu_put_be32(f, spapr->htab_shift); +} if (spapr->htab) { spapr->htab_save_index = 0; spapr->htab_first_pass = true; } else { -assert(kvm_enabled()); +if (spapr->htab_shift) { +assert(kvm_enabled()); +} } @@ -1709,7 +1715,12 @@ static int htab_save_iterate(QEMUFile *f, void *opaque) int rc = 0; /* Iteration header */ -qemu_put_be32(f, 0); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +return 0; +} else { +qemu_put_be32(f, 0); +} if (!spapr->htab) { assert(kvm_enabled()); @@ -1743,7 +1754,12 @@ static int htab_save_complete(QEMUFile *f, void *opaque) int fd; /* Iteration header */ -qemu_put_be32(f, 0); +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +return 0; +} else { +qemu_put_be32(f, 0); +} if (!spapr->htab) { int rc; @@ -1787,6 +1803,10 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) section_hdr = qemu_get_be32(f); +if (section_hdr == -1) { +return 0; +} + if (section_hdr) { Error *local_err = NULL; -- 2.7.4
[Qemu-devel] [PATCH v5 2/2] spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry Signed-off-by: Bharata B Rao Reviewed-by: Suraj Jitindar Singh --- hw/ppc/spapr.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index df27c5c..4a33c06 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1442,6 +1442,18 @@ static int spapr_post_load(void *opaque, int version_id) err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); } +if (spapr->patb_entry) { +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); +bool radix = !!(spapr->patb_entry & PATBE1_GR); +bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); + +err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry); +if (err) { +error_report("Process table config unsupported by the host"); +return -EINVAL; +} +} + return err; } -- 2.7.4
[Qemu-devel] [PATCH v5 0/2] ppc/spapr: Fix migration of radix guests
This patchset fixes the migration of sPAPR radix guests. Changes in v5 - - Ensured that assert(kvm_enabled()) isn't touched in any HTAB savevm handlers except in htab_save_setup() where it is made conditional on htab_shift. v4: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg07058.html Bharata B Rao (2): spapr: Add a "no HPT" encoding to HTAB migration stream spapr: Fix migration of Radix guests hw/ppc/spapr.c | 40 1 file changed, 36 insertions(+), 4 deletions(-) -- 2.7.4
Re: [Qemu-devel] [PATCH v4 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream
On Thu, Jun 01, 2017 at 02:54:48PM +1000, David Gibson wrote: > On Wed, May 31, 2017 at 04:56:44PM +0530, Bharata B Rao wrote: > > Add a "no HPT" encoding (using value -1) to the HTAB migration > > stream (in the place of HPT size) when the guest doesn't allocate HPT. > > This will help the target side to match target HPT with the source HPT > > and thus enable successful migration. > > > > A few more fixes to enable TCG migration to work correctly are also > > included in this commit: > > > > - HTAB savevm handlers have a few asserts on kvm_enabled() when > > spapr->htab != 0. Convert these into conditional checks as it is now > > possible to have no HTAB with TCG radix guests. > > - htab_save_setup() asserts for kvm_enabled() when spapr->htab != 0. > > Remove this as we can't assert this for TCG radix guests. > > > > Suggested-by: David Gibson > > [no HPT encoding suggestion] > > Signed-off-by: Bharata B Rao > > Looks basically ok, but there are still some details to address. > > > --- > > hw/ppc/spapr.c | 31 +-- > > 1 file changed, 17 insertions(+), 14 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index ab3aab1..b589ed4 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1559,17 +1559,18 @@ static int htab_save_setup(QEMUFile *f, void > > *opaque) > > { > > sPAPRMachineState *spapr = opaque; > > > > -/* "Iteration" header */ > > -qemu_put_be32(f, spapr->htab_shift); > > +/* "Iteration" header: no-HPT or HPT size encoding */ > > +if (!spapr->htab_shift) { > > +qemu_put_be32(f, -1); > > We're already using htab_shift == 0 to represent no HPT in the runtime > structure; we might as well do the same on the wire. As a bonus it > slightly simplifies the logic here. Non-zero value of iteration header (which is htab_shift) results in htab_load() at the target to reallocate HTAB. zero value of iteration header is used by htab_save_iterate() and htab_save_complete() to tell htab_load() not to freshly allocate HTAB at the target. Hence we can't use 0 value to mean no-HPT. I have addressed the rest of the comments on asserts by ensuring that those code paths are taken only when HPT is present. v5 has those changes. Regards, Bharata.
Re: [Qemu-devel] [PATCH 0/4] spapr:DRC cleanups (part I)
On Thu, Jun 01, 2017 at 11:52:14AM +1000, David Gibson wrote: > The code managing DRCs[0] has quite a few things that are more > complicated than they need to be. In particular the object > representing a DRC has a bunch of method pointers, despite the fact > that there are currently no subclasses, and even if there were the > method implementations would be unlikely to differ. So you are getting rid of a few methods. How about other methods ? Specially attach and detach which have incorporated all the logic needed to handle logical and physical DRs into their implementations ? Guess these will be follow in subequent parts ? Regards, Bharata.
[Qemu-devel] [PATCH v4 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream
Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. A few more fixes to enable TCG migration to work correctly are also included in this commit: - HTAB savevm handlers have a few asserts on kvm_enabled() when spapr->htab != 0. Convert these into conditional checks as it is now possible to have no HTAB with TCG radix guests. - htab_save_setup() asserts for kvm_enabled() when spapr->htab != 0. Remove this as we can't assert this for TCG radix guests. Suggested-by: David Gibson [no HPT encoding suggestion] Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index ab3aab1..b589ed4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1559,17 +1559,18 @@ static int htab_save_setup(QEMUFile *f, void *opaque) { sPAPRMachineState *spapr = opaque; -/* "Iteration" header */ -qemu_put_be32(f, spapr->htab_shift); +/* "Iteration" header: no-HPT or HPT size encoding */ +if (!spapr->htab_shift) { +qemu_put_be32(f, -1); +} else { +qemu_put_be32(f, spapr->htab_shift); +} if (spapr->htab) { spapr->htab_save_index = 0; spapr->htab_first_pass = true; -} else { -assert(kvm_enabled()); } - return 0; } @@ -1714,9 +1715,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque) /* Iteration header */ qemu_put_be32(f, 0); -if (!spapr->htab) { -assert(kvm_enabled()); - +if (!spapr->htab && kvm_enabled()) { fd = get_htab_fd(spapr); if (fd < 0) { return fd; @@ -1748,7 +1747,7 @@ static int htab_save_complete(QEMUFile *f, void *opaque) /* Iteration header */ qemu_put_be32(f, 0); -if (!spapr->htab) { +if (!spapr->htab && kvm_enabled()) { int rc; assert(kvm_enabled()); @@ -1793,6 +1792,12 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) if (section_hdr) { Error *local_err = NULL; +if (section_hdr == -1) { +spapr_free_hpt(spapr); +unregister_savevm(NULL, "spapr/htab", spapr); +return 0; +} + /* First section gives the htab size */ spapr_reallocate_hpt(spapr, section_hdr, &local_err); if (local_err) { @@ -1802,9 +1807,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) return 0; } -if (!spapr->htab) { -assert(kvm_enabled()); - +if (!spapr->htab && kvm_enabled()) { fd = kvmppc_get_htab_fd(true); if (fd < 0) { error_report("Unable to open fd to restore KVM hash table: %s", @@ -1843,7 +1846,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) memset(HPTE(spapr->htab, index + n_valid), 0, HASH_PTE_SIZE_64 * n_invalid); } -} else { +} else if (kvm_enabled()) { int rc; assert(fd >= 0); @@ -1855,7 +1858,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) } } -if (!spapr->htab) { +if (!spapr->htab && kvm_enabled()) { assert(fd >= 0); close(fd); } -- 2.7.4
[Qemu-devel] [PATCH v4 2/2] spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry Signed-off-by: Bharata B Rao Reviewed-by: Suraj Jitindar Singh --- hw/ppc/spapr.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index b589ed4..3f66a93 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1445,6 +1445,18 @@ static int spapr_post_load(void *opaque, int version_id) err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); } +if (spapr->patb_entry) { +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); +bool radix = !!(spapr->patb_entry & PATBE1_GR); +bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); + +err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry); +if (err) { +error_report("Process table config unsupported by the host"); +return -EINVAL; +} +} + return err; } -- 2.7.4
[Qemu-devel] [PATCH v4 0/2] ppc/spapr: Fix migration of radix guests
This patchset fixes the migration of sPAPR radix guests. Changes in v4 - - The earlier approach of keeping the HTAB savevm handlers registered only for hash guests by explicitly deregistering does leave out a few corner cases which become ugly/difficult to fix. Hence based on David Gibson's suggestion, this new approach adds a new "no HPT" encoding into the migration stream when guest doesn't allocate HPT. - There are a few assumptions in HTAB savevm handlers that spapr->htab will not be set only for KVM guests. This is no longer true with radix TCG guests. Fix this assumption. These changes enable TCG migration to work. However guest kernel failures are observed post migration which need to be fixed separately. This changes could acutally go into a separate patch. v3: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05434.html Bharata B Rao (2): spapr: Add a "no HPT" encoding to HTAB migration stream spapr: Fix migration of Radix guests hw/ppc/spapr.c | 43 +-- 1 file changed, 29 insertions(+), 14 deletions(-) -- 2.7.4
Re: [Qemu-devel] [PATCH v3 2/3] spapr: Unregister HPT savevm handlers for radix guests
On Wed, May 24, 2017 at 10:23:33AM +0530, Bharata B Rao wrote: > HPT gets created by default for TCG guests and later when the guest turns > out to be a radix guest, the HPT is destroyed when guest does > H_REGISTER_PROC_TBL hcall. Let HTAB savevm handlers registration and > unregistration follow the same model so that we don't end up having > unrequired HTAB savevm handlers for radix guests. > > This also ensures that HTAB savevm handlers seemlessly get destroyed and > recreated like HTAB itself when hash guest reboots. > > HTAB savevm handlers registration/unregistration is now done from > spapr_reallocate_hpt() which itself is called from one of the > savevm_htab_handlers.htab_load(). To cater to this circular dependency > spapr_reallocate_hpt() is made global. > > Signed-off-by: Bharata B Rao > Reviewed-by: David Gibson With Laurent's patch that switches all callers to use register_savevm_live(), this patch will get replaced by the below updated patch which uses unregister_savevm() instead of unregister_savevm_live(). Regards, Bharata. >From 4c33af53b53b14c510ab745b69dcd52dadf65d4e Mon Sep 17 00:00:00 2001 From: Bharata B Rao Date: Tue, 16 May 2017 10:33:40 +0530 Subject: [PATCH v3.1 2/3] spapr: Unregister HPT savevm handlers for radix guests HPT gets created by default for TCG guests and later when the guest turns out to be a radix guest, the HPT is destroyed when guest does H_REGISTER_PROC_TBL hcall. Let HTAB savevm handlers registration and unregistration follow the same model so that we don't end up having unrequired HTAB savevm handlers for radix guests. This also ensures that HTAB savevm handlers seemlessly get destroyed and recreated like HTAB itself when hash guest reboots. HTAB savevm handlers registration/unregistration is now done from spapr_reallocate_hpt() which itself is called from one of the savevm_htab_handlers.htab_load(). To cater to this circular dependency spapr_reallocate_hpt() is made global. Signed-off-by: Bharata B Rao Reviewed-by: David Gibson --- hw/ppc/spapr.c | 99 +- include/hw/ppc/spapr.h | 2 + 2 files changed, 52 insertions(+), 49 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 91f7434..4cc3d8c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1233,53 +1233,7 @@ void spapr_free_hpt(sPAPRMachineState *spapr) spapr->htab = NULL; spapr->htab_shift = 0; close_htab_fd(spapr); -} - -static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, - Error **errp) -{ -long rc; - -/* Clean up any HPT info from a previous boot */ -spapr_free_hpt(spapr); - -rc = kvmppc_reset_htab(shift); -if (rc < 0) { -/* kernel-side HPT needed, but couldn't allocate one */ -error_setg_errno(errp, errno, - "Failed to allocate KVM HPT of order %d (try smaller maxmem?)", - shift); -/* This is almost certainly fatal, but if the caller really - * wants to carry on with shift == 0, it's welcome to try */ -} else if (rc > 0) { -/* kernel-side HPT allocated */ -if (rc != shift) { -error_setg(errp, - "Requested order %d HPT, but kernel allocated order %ld (try smaller maxmem?)", - shift, rc); -} - -spapr->htab_shift = shift; -spapr->htab = NULL; -} else { -/* kernel-side HPT not needed, allocate in userspace instead */ -size_t size = 1ULL << shift; -int i; - -spapr->htab = qemu_memalign(size, size); -if (!spapr->htab) { -error_setg_errno(errp, errno, - "Could not allocate HPT of order %d", shift); -return; -} - -memset(spapr->htab, 0, size); -spapr->htab_shift = shift; - -for (i = 0; i < size / HASH_PTE_SIZE_64; i++) { -DIRTY_HPTE(HPTE(spapr->htab, i)); -} -} +unregister_savevm(NULL, "spapr/htab", spapr); } void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr) @@ -1879,6 +1833,55 @@ static SaveVMHandlers savevm_htab_handlers = { .load_state = htab_load, }; +void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, + Error **errp) +{ +long rc; + +/* Clean up any HPT info from a previous boot */ +spapr_free_hpt(spapr); + +rc = kvmppc_reset_htab(shift); +if (rc < 0) { +/* kernel-side HPT needed, but couldn't allocate one */ +error_setg_errno(errp, errno, + "Failed to allocate KVM HPT of order %d (try smaller maxmem?)", + shift); +/* This is almost certainly fatal, but if the
[Qemu-devel] [PATCH v3 3/3] spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index daf335c..ea14bed 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int version_id) err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); } +if (spapr->patb_entry) { +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); +bool radix = !!(spapr->patb_entry & PATBE1_GR); +bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); + +err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry); +if (err) { +error_report("Process table config unsupported by the host"); +return -EINVAL; +} +} + return err; } -- 2.7.4
[Qemu-devel] [PATCH v3 1/3] migration: Introduce unregister_savevm_live()
Introduce a new function unregister_savevm_live() to unregister the vmstate handlers registered via register_savevm_live(). register_savevm() allocates SaveVMHandlers while register_savevm_live() gets passed with SaveVMHandlers. During unregistration, we want to free SaveVMHandlers in the former case but not free in the latter case. Hence this new API is needed to differentiate this. This new API will be needed by PowerPC to unregister the HTAB savevm handlers. Signed-off-by: Bharata B Rao Reviewed-by: David Gibson Cc: Juan Quintela Cc: Dr. David Alan Gilbert --- include/migration/vmstate.h | 1 + migration/savevm.c | 17 +++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 8489659..02a1bac 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -79,6 +79,7 @@ int register_savevm_live(DeviceState *dev, void *opaque); void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque); +void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque); typedef struct VMStateInfo VMStateInfo; typedef struct VMStateDescription VMStateDescription; diff --git a/migration/savevm.c b/migration/savevm.c index f5e8194..4ef6fdc 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -630,7 +630,8 @@ int register_savevm(DeviceState *dev, ops, opaque); } -void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) +static void unregister_savevm_common(DeviceState *dev, const char *idstr, + void *opaque, bool free_savevmhandlers) { SaveStateEntry *se, *new_se; char id[256] = ""; @@ -649,12 +650,24 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) { QTAILQ_REMOVE(&savevm_state.handlers, se, entry); g_free(se->compat); -g_free(se->ops); +if (free_savevmhandlers) { +g_free(se->ops); +} g_free(se); } } } +void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) +{ +unregister_savevm_common(dev, idstr, opaque, true); +} + +void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque) +{ +unregister_savevm_common(dev, idstr, opaque, false); +} + int vmstate_register_with_alias_id(DeviceState *dev, int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, -- 2.7.4
[Qemu-devel] [PATCH v3 0/3] ppc/spapr: Fix migration of radix guests
This patchset fixes the migration of sPAPR radix guests. Changes in v3: -- - Dropped the patch that made h_register_process_table flags global as it is not required with the changed logic. - The post load logic is now same for both TCG as well as KVM radix guests. Also ensured that radix and future v3 hash migration post load is treated similarly. - Ensure TCG Radix guest migration isn't broken. v2: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04491.html Bharata B Rao (3): migration: Introduce unregister_savevm_live() spapr: Unregister HPT savevm handlers for radix guests spapr: Fix migration of Radix guests hw/ppc/spapr.c | 111 +--- include/hw/ppc/spapr.h | 2 + include/migration/vmstate.h | 1 + migration/savevm.c | 17 ++- 4 files changed, 80 insertions(+), 51 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH v3 2/3] spapr: Unregister HPT savevm handlers for radix guests
HPT gets created by default for TCG guests and later when the guest turns out to be a radix guest, the HPT is destroyed when guest does H_REGISTER_PROC_TBL hcall. Let HTAB savevm handlers registration and unregistration follow the same model so that we don't end up having unrequired HTAB savevm handlers for radix guests. This also ensures that HTAB savevm handlers seemlessly get destroyed and recreated like HTAB itself when hash guest reboots. HTAB savevm handlers registration/unregistration is now done from spapr_reallocate_hpt() which itself is called from one of the savevm_htab_handlers.htab_load(). To cater to this circular dependency spapr_reallocate_hpt() is made global. Signed-off-by: Bharata B Rao Reviewed-by: David Gibson --- hw/ppc/spapr.c | 99 +- include/hw/ppc/spapr.h | 2 + 2 files changed, 52 insertions(+), 49 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 91f7434..daf335c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1233,53 +1233,7 @@ void spapr_free_hpt(sPAPRMachineState *spapr) spapr->htab = NULL; spapr->htab_shift = 0; close_htab_fd(spapr); -} - -static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, - Error **errp) -{ -long rc; - -/* Clean up any HPT info from a previous boot */ -spapr_free_hpt(spapr); - -rc = kvmppc_reset_htab(shift); -if (rc < 0) { -/* kernel-side HPT needed, but couldn't allocate one */ -error_setg_errno(errp, errno, - "Failed to allocate KVM HPT of order %d (try smaller maxmem?)", - shift); -/* This is almost certainly fatal, but if the caller really - * wants to carry on with shift == 0, it's welcome to try */ -} else if (rc > 0) { -/* kernel-side HPT allocated */ -if (rc != shift) { -error_setg(errp, - "Requested order %d HPT, but kernel allocated order %ld (try smaller maxmem?)", - shift, rc); -} - -spapr->htab_shift = shift; -spapr->htab = NULL; -} else { -/* kernel-side HPT not needed, allocate in userspace instead */ -size_t size = 1ULL << shift; -int i; - -spapr->htab = qemu_memalign(size, size); -if (!spapr->htab) { -error_setg_errno(errp, errno, - "Could not allocate HPT of order %d", shift); -return; -} - -memset(spapr->htab, 0, size); -spapr->htab_shift = shift; - -for (i = 0; i < size / HASH_PTE_SIZE_64; i++) { -DIRTY_HPTE(HPTE(spapr->htab, i)); -} -} +unregister_savevm_live(NULL, "spapr/htab", spapr); } void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr) @@ -1879,6 +1833,55 @@ static SaveVMHandlers savevm_htab_handlers = { .load_state = htab_load, }; +void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, + Error **errp) +{ +long rc; + +/* Clean up any HPT info from a previous boot */ +spapr_free_hpt(spapr); + +rc = kvmppc_reset_htab(shift); +if (rc < 0) { +/* kernel-side HPT needed, but couldn't allocate one */ +error_setg_errno(errp, errno, + "Failed to allocate KVM HPT of order %d (try smaller maxmem?)", + shift); +/* This is almost certainly fatal, but if the caller really + * wants to carry on with shift == 0, it's welcome to try */ +} else if (rc > 0) { +/* kernel-side HPT allocated */ +if (rc != shift) { +error_setg(errp, + "Requested order %d HPT, but kernel allocated order %ld (try smaller maxmem?)", + shift, rc); +} + +spapr->htab_shift = shift; +spapr->htab = NULL; +} else { +/* kernel-side HPT not needed, allocate in userspace instead */ +size_t size = 1ULL << shift; +int i; + +spapr->htab = qemu_memalign(size, size); +if (!spapr->htab) { +error_setg_errno(errp, errno, + "Could not allocate HPT of order %d", shift); +return; +} + +memset(spapr->htab, 0, size); +spapr->htab_shift = shift; + +for (i = 0; i < size / HASH_PTE_SIZE_64; i++) { +DIRTY_HPTE(HPTE(spapr->htab, i)); +} +} +register_savevm_live(NULL, "spapr/htab", -1, 1, + &savevm_htab_handlers, spapr); +} + static void spapr_boot_set(void *opaque, const char *boot_device, Error **errp) { @@ -2341,8 +2344,6 @@ static void ppc_spapr_init(Mac
Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests
On Tue, May 23, 2017 at 06:42:28PM +1000, Suraj Jitindar Singh wrote: > On Tue, 2017-05-23 at 10:18 +0530, Bharata B Rao wrote: > > On Mon, May 22, 2017 at 04:30:50PM +1000, Suraj Jitindar Singh wrote: > > > On Fri, 2017-05-19 at 11:10 +0530, Bharata B Rao wrote: > > > > Fix migration of radix guests by ensuring that we issue > > > > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. > > > > > > > > Reported-by: Nageswara R Sastry > > > > Signed-off-by: Bharata B Rao > > > > --- > > > > hw/ppc/spapr.c | 12 > > > > 1 file changed, 12 insertions(+) > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > index daf335c..8f20f14 100644 > > > > --- a/hw/ppc/spapr.c > > > > +++ b/hw/ppc/spapr.c > > > > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, > > > > int > > > > version_id) > > > > err = spapr_rtc_import_offset(&spapr->rtc, spapr- > > > > > rtc_offset); > > > > > > > > } > > > > > > This will break migration for tcg radix guests. > > > > > > Given that there is essentially nothing special we need to do on > > > incoming tcg migration, I suggest we make it: > > > > > > if (spapr->patb_entry && kvm_enabled()) { > > > [snip] > > > } > > > > > > > > > > > +if (spapr->patb_entry) { > > > > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); > > > > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) { > > > > > > Why not make this a bit more generic? Something like: > > > > > > err = kvmppc_configure_v3_mmu(cpu, !!(spapr->patb_entry & > > > PATBE1_GR), > > > !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE), spapr->patb_entry); > > > if (err) { > > > error_report("Process table config unsupported by the host"); > > > return -EINVAL; > > > } > > > > > > return err; > > > > > > While I don't think it's possible currently, there is nothing > > > inherently incorrect about having a non-zero process table entry > > > for a > > > hash guest. And this saves a future update. > > > > How about having this logic in spapr_post_load() ? > > Looks a lot better :) > > > > > if (spapr->patb_entry) { > > /* Can be Hash(in future?) or Radix guest (current) */ > > PowerPCCPU *cpu = POWERPC_CPU(first_cpu); > > bool radix = !!(spapr->patb_entry & PATBE1_GR); > > bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); > > > > Don't think we need this if statement though. When hash with patb entry > is possible it will still need to call kvmppc_configure_v3_mmu on > incoming migration, that isn't radix specific. Right. > > > if (radix) { > > /* Radix guest, configure MMU */ > > /* kvmppc_configure_v3_mmu() is NOP for TCG */ > > err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr- > > >patb_entry); > > if (err) { > > error_report("Process table config unsupported by the > > host"); > > return -EINVAL; > > } > > } else { > > /* Can be Hash guest (in future ?), nothing to do */ > > } > > } else { > > Don't need this else statement. Can just have the comment below if you > feel it's necessary. Was just being verbose with empty else blocks, will not have them in the actual patch. Thanks for the review. Regards, Bharata.
Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests
On Mon, May 22, 2017 at 04:30:50PM +1000, Suraj Jitindar Singh wrote: > On Fri, 2017-05-19 at 11:10 +0530, Bharata B Rao wrote: > > Fix migration of radix guests by ensuring that we issue > > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. > > > > Reported-by: Nageswara R Sastry > > Signed-off-by: Bharata B Rao > > --- > > hw/ppc/spapr.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index daf335c..8f20f14 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int > > version_id) > > err = spapr_rtc_import_offset(&spapr->rtc, spapr- > > >rtc_offset); > > } > > This will break migration for tcg radix guests. > > Given that there is essentially nothing special we need to do on > incoming tcg migration, I suggest we make it: > > if (spapr->patb_entry && kvm_enabled()) { > [snip] > } > > > > > +if (spapr->patb_entry) { > > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); > > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) { > > Why not make this a bit more generic? Something like: > > err = kvmppc_configure_v3_mmu(cpu, !!(spapr->patb_entry & PATBE1_GR), > !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE), spapr->patb_entry); > if (err) { > error_report("Process table config unsupported by the host"); > return -EINVAL; > } > > return err; > > While I don't think it's possible currently, there is nothing > inherently incorrect about having a non-zero process table entry for a > hash guest. And this saves a future update. How about having this logic in spapr_post_load() ? if (spapr->patb_entry) { /* Can be Hash(in future?) or Radix guest (current) */ PowerPCCPU *cpu = POWERPC_CPU(first_cpu); bool radix = !!(spapr->patb_entry & PATBE1_GR); bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE); if (radix) { /* Radix guest, configure MMU */ /* kvmppc_configure_v3_mmu() is NOP for TCG */ err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry); if (err) { error_report("Process table config unsupported by the host"); return -EINVAL; } } else { /* Can be Hash guest (in future ?), nothing to do */ } } else { /* Hash guest (current), nothing to do */ } Regards, Bharata.
Re: [Qemu-devel] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests
On Mon, May 22, 2017 at 12:44:48PM +1000, David Gibson wrote: > On Fri, May 19, 2017 at 12:06:14PM +0530, Bharata B Rao wrote: > > On Fri, May 19, 2017 at 11:10:39AM +0530, Bharata B Rao wrote: > > > Fix migration of radix guests by ensuring that we issue > > > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. > > > > > > Reported-by: Nageswara R Sastry > > > Signed-off-by: Bharata B Rao > > > --- > > > hw/ppc/spapr.c | 12 > > > 1 file changed, 12 insertions(+) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index daf335c..8f20f14 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int > > > version_id) > > > err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); > > > } > > > > > > +if (spapr->patb_entry) { > > > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); > > > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) { > > > +err = kvmppc_configure_v3_mmu(cpu, SPAPR_PROC_TABLE_RADIX, > > > +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? > > > SPAPR_PROC_TABLE_GTSE : > > > +0), spapr->patb_entry); > > > > Better to use explicit 'true' and 'false' in the above call. Here is > > the updated patch: > > Or just !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE) and avoid the ?: > entirely. > > With this version you no longer need patch 3/4 AFAICT. Ah yes, will send the updated version next. Regards, Bharata.
Re: [Qemu-devel] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests
On Fri, May 19, 2017 at 11:10:39AM +0530, Bharata B Rao wrote: > Fix migration of radix guests by ensuring that we issue > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. > > Reported-by: Nageswara R Sastry > Signed-off-by: Bharata B Rao > --- > hw/ppc/spapr.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index daf335c..8f20f14 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int > version_id) > err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); > } > > +if (spapr->patb_entry) { > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) { > +err = kvmppc_configure_v3_mmu(cpu, SPAPR_PROC_TABLE_RADIX, > +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? > SPAPR_PROC_TABLE_GTSE : > +0), spapr->patb_entry); Better to use explicit 'true' and 'false' in the above call. Here is the updated patch: >From 937c51cac73b4211ef153c1f5940215960383494 Mon Sep 17 00:00:00 2001 From: Bharata B Rao Date: Tue, 16 May 2017 12:19:54 +0530 Subject: [RFC PATCH v2.1 4/4] spapr: Fix migration of Radix guests Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry Signed-off-by: Bharata B Rao --- hw/ppc/spapr.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index daf335c..69e184b 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int version_id) err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset); } +if (spapr->patb_entry) { +PowerPCCPU *cpu = POWERPC_CPU(first_cpu); +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) { +err = kvmppc_configure_v3_mmu(cpu, true, +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? true : false), +spapr->patb_entry); +} else { +error_report("Radix guest is unsupported by the host"); +return -EINVAL; +} +} + return err; } -- 2.7.4